Addressing data privacy in matched studies via virtual pooling.
Saha-Chaudhuri, P; Weinberg, C R
2017-09-07
Data confidentiality and shared use of research data are two desirable but sometimes conflicting goals in research with multi-center studies and distributed data. While ideal for straightforward analysis, confidentiality restrictions forbid creation of a single dataset that includes covariate information of all participants. Current approaches such as aggregate data sharing, distributed regression, meta-analysis and score-based methods can have important limitations. We propose a novel application of an existing epidemiologic tool, specimen pooling, to enable confidentiality-preserving analysis of data arising from a matched case-control, multi-center design. Instead of pooling specimens prior to assay, we apply the methodology to virtually pool (aggregate) covariates within nodes. Such virtual pooling retains most of the information used in an analysis with individual data and since individual participant data is not shared externally, within-node virtual pooling preserves data confidentiality. We show that aggregated covariate levels can be used in a conditional logistic regression model to estimate individual-level odds ratios of interest. The parameter estimates from the standard conditional logistic regression are compared to the estimates based on a conditional logistic regression model with aggregated data. The parameter estimates are shown to be similar to those without pooling and to have comparable standard errors and confidence interval coverage. Virtual data pooling can be used to maintain confidentiality of data from multi-center study and can be particularly useful in research with large-scale distributed data.
Lyles, Robert H.; Mitchell, Emily M.; Weinberg, Clarice R.; Umbach, David M.; Schisterman, Enrique F.
2016-01-01
Summary Potential reductions in laboratory assay costs afforded by pooling equal aliquots of biospecimens have long been recognized in disease surveillance and epidemiological research and, more recently, have motivated design and analytic developments in regression settings. For example, Weinberg and Umbach (1999, Biometrics 55, 718–726) provided methods for fitting set-based logistic regression models to case-control data when a continuous exposure variable (e.g., a biomarker) is assayed on pooled specimens. We focus on improving estimation efficiency by utilizing available subject-specific information at the pool allocation stage. We find that a strategy that we call “(y,c)-pooling,” which forms pooling sets of individuals within strata defined jointly by the outcome and other covariates, provides more precise estimation of the risk parameters associated with those covariates than does pooling within strata defined only by the outcome. We review the approach to set-based analysis through offsets developed by Weinberg and Umbach in a recent correction to their original paper. We propose a method for variance estimation under this design and use simulations and a real-data example to illustrate the precision benefits of (y,c)-pooling relative to y-pooling. We also note and illustrate that set-based models permit estimation of covariate interactions with exposure. PMID:26964741
Logistic regression trees for initial selection of interesting loci in case-control studies
Nickolov, Radoslav Z; Milanov, Valentin B
2007-01-01
Modern genetic epidemiology faces the challenge of dealing with hundreds of thousands of genetic markers. The selection of a small initial subset of interesting markers for further investigation can greatly facilitate genetic studies. In this contribution we suggest the use of a logistic regression tree algorithm known as logistic tree with unbiased selection. Using the simulated data provided for Genetic Analysis Workshop 15, we show how this algorithm, with incorporation of multifactor dimensionality reduction method, can reduce an initial large pool of markers to a small set that includes the interesting markers with high probability. PMID:18466557
Hierarchical Bayesian Logistic Regression to forecast metabolic control in type 2 DM patients.
Dagliati, Arianna; Malovini, Alberto; Decata, Pasquale; Cogni, Giulia; Teliti, Marsida; Sacchi, Lucia; Cerra, Carlo; Chiovato, Luca; Bellazzi, Riccardo
2016-01-01
In this work we present our efforts in building a model able to forecast patients' changes in clinical conditions when repeated measurements are available. In this case the available risk calculators are typically not applicable. We propose a Hierarchical Bayesian Logistic Regression model, which allows taking into account individual and population variability in model parameters estimate. The model is used to predict metabolic control and its variation in type 2 diabetes mellitus. In particular we have analyzed a population of more than 1000 Italian type 2 diabetic patients, collected within the European project Mosaic. The results obtained in terms of Matthews Correlation Coefficient are significantly better than the ones gathered with standard logistic regression model, based on data pooling.
A secure distributed logistic regression protocol for the detection of rare adverse drug events
El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat
2013-01-01
Background There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. Objective To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. Methods We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. Results The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. Conclusion The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models. PMID:22871397
A secure distributed logistic regression protocol for the detection of rare adverse drug events.
El Emam, Khaled; Samet, Saeed; Arbuckle, Luk; Tamblyn, Robyn; Earle, Craig; Kantarcioglu, Murat
2013-05-01
There is limited capacity to assess the comparative risks of medications after they enter the market. For rare adverse events, the pooling of data from multiple sources is necessary to have the power and sufficient population heterogeneity to detect differences in safety and effectiveness in genetic, ethnic and clinically defined subpopulations. However, combining datasets from different data custodians or jurisdictions to perform an analysis on the pooled data creates significant privacy concerns that would need to be addressed. Existing protocols for addressing these concerns can result in reduced analysis accuracy and can allow sensitive information to leak. To develop a secure distributed multi-party computation protocol for logistic regression that provides strong privacy guarantees. We developed a secure distributed logistic regression protocol using a single analysis center with multiple sites providing data. A theoretical security analysis demonstrates that the protocol is robust to plausible collusion attacks and does not allow the parties to gain new information from the data that are exchanged among them. The computational performance and accuracy of the protocol were evaluated on simulated datasets. The computational performance scales linearly as the dataset sizes increase. The addition of sites results in an exponential growth in computation time. However, for up to five sites, the time is still short and would not affect practical applications. The model parameters are the same as the results on pooled raw data analyzed in SAS, demonstrating high model accuracy. The proposed protocol and prototype system would allow the development of logistic regression models in a secure manner without requiring the sharing of personal health information. This can alleviate one of the key barriers to the establishment of large-scale post-marketing surveillance programs. We extended the secure protocol to account for correlations among patients within sites through generalized estimating equations, and to accommodate other link functions by extending it to generalized linear models.
Sakamoto, Torao; Horiuchi, Akira; Nakayama, Yoshiko
2013-08-01
Endoscopic evaluation of swallowing (EES) is not commonly used by gastroenterologists to evaluate swallowing in patients with dysphagia. To use transnasal endoscopy to identify factors predicting successful or failed swallowing of pureed foods in elderly patients with dysphagia. EES of pureed foods was performed by a gastroenterologist using a small-calibre transnasal endoscope. Factors related to successful versus unsuccessful swallowing of pureed foods were analyzed with regard to age, comorbid diseases, swallowing activity, saliva pooling, vallecular residues, pharyngeal residues and airway penetration⁄aspiration. Unsuccessful swallowing was defined in patients who could not eat pureed foods at bedside during hospitalization. Logistic regression analysis was used to identify independent predictors of swallowing of pureed foods. During a six-year period, 458 consecutive patients (mean age 80 years [range 39 to 97 years]) were considered for the study, including 285 (62%) men. Saliva pooling, vallecular residues, pharyngeal residues and penetration⁄aspiration were found in 240 (52%), 73 (16%), 226 (49%) and 232 patients (51%), respectively. Overall, 247 patients (54%) failed to swallow pureed foods. Multivariate logistic regression analysis demonstrated that the presence of pharyngeal residues (OR 6.0) and saliva pooling (OR 4.6) occurred significantly more frequently in patients who failed to swallow pureed foods. Pharyngeal residues and saliva pooling predicted impaired swallowing of pureed foods. Transnasal EES performed by a gastroenterologist provided a unique bedside method of assessing the ability to swallow pureed foods in elderly patients with dysphagia.
Ngwa, Julius S; Cabral, Howard J; Cheng, Debbie M; Pencina, Michael J; Gagnon, David R; LaValley, Michael P; Cupples, L Adrienne
2016-11-03
Typical survival studies follow individuals to an event and measure explanatory variables for that event, sometimes repeatedly over the course of follow up. The Cox regression model has been used widely in the analyses of time to diagnosis or death from disease. The associations between the survival outcome and time dependent measures may be biased unless they are modeled appropriately. In this paper we explore the Time Dependent Cox Regression Model (TDCM), which quantifies the effect of repeated measures of covariates in the analysis of time to event data. This model is commonly used in biomedical research but sometimes does not explicitly adjust for the times at which time dependent explanatory variables are measured. This approach can yield different estimates of association compared to a model that adjusts for these times. In order to address the question of how different these estimates are from a statistical perspective, we compare the TDCM to Pooled Logistic Regression (PLR) and Cross Sectional Pooling (CSP), considering models that adjust and do not adjust for time in PLR and CSP. In a series of simulations we found that time adjusted CSP provided identical results to the TDCM while the PLR showed larger parameter estimates compared to the time adjusted CSP and the TDCM in scenarios with high event rates. We also observed upwardly biased estimates in the unadjusted CSP and unadjusted PLR methods. The time adjusted PLR had a positive bias in the time dependent Age effect with reduced bias when the event rate is low. The PLR methods showed a negative bias in the Sex effect, a subject level covariate, when compared to the other methods. The Cox models yielded reliable estimates for the Sex effect in all scenarios considered. We conclude that survival analyses that explicitly account in the statistical model for the times at which time dependent covariates are measured provide more reliable estimates compared to unadjusted analyses. We present results from the Framingham Heart Study in which lipid measurements and myocardial infarction data events were collected over a period of 26 years.
Who Chooses Non-Public Schools for Their Children?
ERIC Educational Resources Information Center
Yang, Philip Q.; Kayaardi, Nihan
2004-01-01
Using the pooled 1998-2000 GSS data, this study examines what kinds of parents tend to select non-public schools for their children, a question that is fundamental but lacks direct, adequate answers in the literature. The results of logistic regression analysis show that religion, socio-economic status, age, nativity, number of children and region…
Is economic inequality in infant mortality higher in urban than in rural India?
Kumar, Abhishek; Singh, Abhishek
2014-11-01
This paper examines the trends in economic inequality in infant mortality across urban-rural residence in India over last 14 years. We analysed data from the three successive rounds of the National Family Health Survey conducted in India during 1992-1993, 1998-1999, and 2005-2006. Asset-based household wealth index was used as the economic indicator for the study. Concentration index and pooled logistic regression analysis were applied to measure the extent of economic inequality in infant mortality in urban and rural India. Infant mortality rate differs considerably by urban-rural residence: infant mortality in rural India being substantially higher than that in urban India. The findings suggest that economic inequalities are higher in urban than in rural India in each of the three survey rounds. Pooled logistic regression results suggest that, in urban areas, infant mortality has declined by 22 % in poorest and 43 % in richest. In comparison, the decline is 29 and 32 % respectively in rural India. Economic inequality in infant mortality has widened more in urban than in rural India in the last two decades.
Sakamoto, Torao; Horiuchi, Akira; Nakayama, Yoshiko
2013-01-01
BACKGROUND: Endoscopic evaluation of swallowing (EES) is not commonly used by gastroenterologists to evaluate swallowing in patients with dysphagia. OBJECTIVE: To use transnasal endoscopy to identify factors predicting successful or failed swallowing of pureed foods in elderly patients with dysphagia. METHODS: EES of pureed foods was performed by a gastroenterologist using a small-calibre transnasal endoscope. Factors related to successful versus unsuccessful swallowing of pureed foods were analyzed with regard to age, comorbid diseases, swallowing activity, saliva pooling, vallecular residues, pharyngeal residues and airway penetration/aspiration. Unsuccessful swallowing was defined in patients who could not eat pureed foods at bedside during hospitalization. Logistic regression analysis was used to identify independent predictors of swallowing of pureed foods. RESULTS: During a six-year period, 458 consecutive patients (mean age 80 years [range 39 to 97 years]) were considered for the study, including 285 (62%) men. Saliva pooling, vallecular residues, pharyngeal residues and penetration/aspiration were found in 240 (52%), 73 (16%), 226 (49%) and 232 patients (51%), respectively. Overall, 247 patients (54%) failed to swallow pureed foods. Multivariate logistic regression analysis demonstrated that the presence of pharyngeal residues (OR 6.0) and saliva pooling (OR 4.6) occurred significantly more frequently in patients who failed to swallow pureed foods. CONCLUSIONS: Pharyngeal residues and saliva pooling predicted impaired swallowing of pureed foods. Transnasal EES performed by a gastroenterologist provided a unique bedside method of assessing the ability to swallow pureed foods in elderly patients with dysphagia. PMID:23936875
Milne, Elizabeth; Greenop, Kathryn R.; Metayer, Catherine; Schüz, Joachim; Petridou, Eleni; Pombo-de-Oliveira, Maria S.; Infante-Rivard, Claire; Roman, Eve; Dockerty, John D.; Spector, Logan G.; Koifman, Sérgio; Orsi, Laurent; Rudant, Jérémie; Dessypris, Nick; Simpson, Jill; Lightfoot, Tracy; Kaatsch, Peter; Baka, Margarita; Faro, Alessandra; Armstrong, Bruce K.; Clavel, Jacqueline; Buffler, Patricia A.
2013-01-01
Positive associations have been reported between measures of accelerated fetal growth and risk of childhood acute lymphoblastic leukemia (ALL). We investigated this association by pooling individual-level data from 12 case-control studies participating in the Childhood Leukemia International Consortium. Two measures of fetal growth – weight-for-gestational-age and proportion of optimal birth weight (POBW) – were analysed. Study-specific odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using multivariable logistic regression, and combined in fixed effects meta-analyses. Pooled analyses of all data were also undertaken using multivariable logistic regression. Subgroup analyses were undertaken when possible. Data on weight for gestational age were available for 7,348 cases and 12,489 controls from all 12 studies and POBW data were available for 1,680 cases and 3,139 controls from three studies. The summary ORs from the meta-analyses were 1.24 (95% CI 1.13, 1.36) for children who were large for gestational age relative to appropriate for gestational age, and 1.16 (95% CI: 1.09, 1.24) for a one standard deviation increase in POBW. The pooled analyses produced similar results. The summary and pooled ORs for small-for-gestational-age children were 0.83 (95% CI: 0.75, 0.92) and 0.86 (95% CI 0.77, 0.95) respectively. Results were consistent across subgroups defined by sex, ethnicity and immunophenotype, and when the analysis was restricted to children who did not have high birth weight. The evidence that accelerated fetal growth is associated with a modest increased risk of childhood ALL is strong and consistent with known biological mechanisms involving insulin like growth factors. PMID:23754574
Kumar, Abhishek; Kumari, Divya; Singh, Aditya
2015-10-01
This article examines the trends and pattern in socioeconomic inequality in stunting, underweight and wasting among children aged <3 years in urban India over a 14-year period. We use three successive rounds of the National Family Health Survey data conducted during 1992-93, 1998-99 and 2005-06. The selected socioeconomic predictors are household wealth and mother's education level. We use principal component analysis to compute a separate wealth index for urban India for all three rounds of the survey. We have used descriptive statistics, concentration index and pooled logistic regression to analyse the data. The results show that between 1992-93 and 2005-06, the prevalence of childhood undernutrition has declined across household wealth quintiles and educational level of mothers. However, the pace of decline is much higher among the better-off socioeconomic groups than among the least-affluent groups. The result of pooled logistic regression analysis shows that the socioeconomic inequality in childhood undernutrition in urban India has increased over the study period. The salient findings of this study call for separate programmes targeting the children of lower socioeconomic groups in urban population of India. Published by Oxford University Press in association with The London School of Hygiene and Tropical Medicine © The Author 2014; all rights reserved.
Broderick, Joseph P.; Berkhemer, Olvert A.; Palesch, Yuko Y.; Dippel, Diederik W.J.; Foster, Lydia D.; Roos, Yvo B.W.E.M.; van der Lugt, Aad; Tomsick, Thomas A.; Majoie, Charles B.L.M.; van Zwam, Wim H.; Demchuk, Andrew M.; van Oostenbrugge, Robert J.; Khatri, Pooja; Lingsma, Hester F.; Hill, Michael D.; Roozenbeek, Bob; Jauch, Edward C.; Jovin, Tudor G.; Yan, Bernard; von Kummer, Rüdiger; Molina, Carlos A.; Goyal, Mayank; Schonewille, Wouter J.; Mazighi, Mikael; Engelter, Stefan T.; Anderson, Craig S.; Spilker, Judith; Carrozzella, Janice; Ryckborst, Karla J.; Janis, L. Scott; Simpson, Kit
2015-01-01
Background and Purpose We assessed the effect of endovascular treatment in acute ischemic stroke patients with severe neurological deficit (NIHSS ≥20) following a pre-specified analysis plan. Methods The pooled analysis of the IMS III and MR CLEAN trial included participants with an NIHSS ≥20 prior to intravenous (IV) t-PA treatment (IMS III) or randomization (MR CLEAN) who were treated with IV t-PA ≤ 3 hours of stroke onset. Our hypothesis was that participants with severe stroke randomized to endovascular therapy following IV t-PA would have improved 90-day outcome (distribution of modified Rankin scale [mRS] scores), as compared to those who received IV t-PA alone. Results Among 342 participants in the pooled analysis (194 from IMS III, 148 from MR CLEAN), an ordinal logistic regression model showed that the endovascular group had superior 90-day outcome compared to the IV t-PA group (adjusted odds ratio [aOR] 1.78; 95% confidence interval [CI] 1.20-2.66). In the logistic regression model of the dichotomous outcome (mRS 0-2, or ‘functional independence’), the endovascular group had superior outcomes (aOR 1.97; 95% CI 1.09-3.56). Functional independence (mRS ≤2) at 90 days was 25% in the endovascular group as compared to 14% in the IV t-PA group. Conclusions Endovascular therapy following IV t-PA within 3 hours of symptom onset improves functional outcome at 90 days after severe ischemic stroke. PMID:26486865
Assessing the potential for improving S2S forecast skill through multimodel ensembling
NASA Astrophysics Data System (ADS)
Vigaud, N.; Robertson, A. W.; Tippett, M. K.; Wang, L.; Bell, M. J.
2016-12-01
Non-linear logistic regression is well suited to probability forecasting and has been successfully applied in the past to ensemble weather and climate predictions, providing access to the full probabilities distribution without any Gaussian assumption. However, little work has been done at sub-monthly lead times where relatively small re-forecast ensembles and lengths represent new challenges for which post-processing avenues have yet to be investigated. A promising approach consists in extending the definition of non-linear logistic regression by including the quantile of the forecast distribution as one of the predictors. So-called Extended Logistic Regression (ELR), which enables mutually consistent individual threshold probabilities, is here applied to ECMWF, CFSv2 and CMA re-forecasts from the S2S database in order to produce rainfall probabilities at weekly resolution. The ELR model is trained on seasonally-varying tercile categories computed for lead times of 1 to 4 weeks. It is then tested in a cross-validated manner, i.e. allowing real-time predictability applications, to produce rainfall tercile probabilities from individual weekly hindcasts that are finally combined by equal pooling. Results will be discussed over a broader North American region, where individual and MME forecasts generated out to 4 weeks lead are characterized by good probabilistic reliability but low sharpness, exhibiting systematically more skill in winter than summer.
Sensory impairments of the lower limb after stroke: a pooled analysis of individual patient data.
Tyson, Sarah F; Crow, J Lesley; Connell, Louise; Winward, Charlotte; Hillier, Susan
2013-01-01
To obtain more generalizable information on the frequency and factors influencing sensory impairment after stroke and their relationship to mobility and function. A pooled analysis of individual data of stroke survivors (N = 459); mean (SD) age = 67.2 (14.8) years, 54% male, mean (SD) time since stroke = 22.33 (63.1) days, 50% left-sided weakness. Where different measurement tools were used, data were recorded. Descriptive statistics described frequency of sensory impairments, kappa coefficients investigated relationships between sensory modalities, binary logistic regression explored the factors influencing sensory impairments, and linear regression assessed the impact of sensory impairments on activity limitations. Most patients' sensation was intact (55%), and individual sensory modalities were highly associated (κ = 0.60, P < .001). Weakness and neglect influenced sensory impairment (P < .001), but demographics, stroke pathology, and spasticity did not. Sensation influenced independence in activities of daily living, mobility, and balance but less strongly than weakness. Pooled individual data analysis showed sensation of the lower limb is grossly preserved in most stroke survivors but, when present, it affects function. Sensory modalities are highly interrelated; interventions that treat the motor system during functional tasks may be as effective at treating the sensory system as sensory retraining alone.
Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X
2016-09-01
The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.
2012-01-01
Background For complex diseases like cancer, pooled-analysis of individual data represents a powerful tool to investigate the joint contribution of genetic, phenotypic and environmental factors to the development of a disease. Pooled-analysis of epidemiological studies has many advantages over meta-analysis, and preliminary results may be obtained faster and with lower costs than with prospective consortia. Design and methods Based on our experience with the study design of the Melanocortin-1 receptor (MC1R) gene, SKin cancer and Phenotypic characteristics (M-SKIP) project, we describe the most important steps in planning and conducting a pooled-analysis of genetic epidemiological studies. We then present the statistical analysis plan that we are going to apply, giving particular attention to methods of analysis recently proposed to account for between-study heterogeneity and to explore the joint contribution of genetic, phenotypic and environmental factors in the development of a disease. Within the M-SKIP project, data on 10,959 skin cancer cases and 14,785 controls from 31 international investigators were checked for quality and recoded for standardization. We first proposed to fit the aggregated data with random-effects logistic regression models. However, for the M-SKIP project, a two-stage analysis will be preferred to overcome the problem regarding the availability of different study covariates. The joint contribution of MC1R variants and phenotypic characteristics to skin cancer development will be studied via logic regression modeling. Discussion Methodological guidelines to correctly design and conduct pooled-analyses are needed to facilitate application of such methods, thus providing a better summary of the actual findings on specific fields. PMID:22862891
Cannabis smoking and lung cancer risk: Pooled analysis in the International Lung Cancer Consortium
Zhang, Li Rita; Morgenstern, Hal; Greenland, Sander; Chang, Shen-Chih; Lazarus, Philip; Teare, M. Dawn; Woll, Penella J.; Orlow, Irene; Cox, Brian; Brhane, Yonathan; Liu, Geoffrey; Hung, Rayjean J.
2014-01-01
To investigate the association between cannabis smoking and lung cancer risk, data on 2,159 lung cancer cases and 2,985 controls were pooled from 6 case-control studies in the US, Canada, UK, and New Zealand within the International Lung Cancer Consortium. Study-specific associations between cannabis smoking and lung cancer were estimated using unconditional logistic regression adjusting for sociodemographic factors, tobacco smoking status and pack-years; odds-ratio estimates were pooled using random effects models. Subgroup analyses were done for sex, histology and tobacco smoking status. The shapes of dose-response associations were examined using restricted cubic spline regression. The overall pooled OR for habitual versus nonhabitual or never users was 0.96 (95% CI: 0.66–1.38). Compared to nonhabitual or never users, the summary OR was 0.88 (95%CI: 0.63–1.24) for individuals who smoked 1 or more joint-equivalents of cannabis per day and 0.94 (95%CI: 0.67–1.32) for those consumed at least 10 joint-years. For adenocarcinoma cases the ORs were 1.73 (95%CI: 0.75–4.00) and 1.74 (95%CI: 0.85–3.55), respectively. However, no association was found for the squamous cell carcinoma based on small numbers. Weak associations between cannabis smoking and lung cancer were observed in never tobacco smokers. Spline modeling indicated a weak positive monotonic association between cumulative cannabis use and lung cancer, but precision was low at high exposure levels. Results from our pooled analyses provide little evidence for an increased risk of lung cancer among habitual or long-term cannabis smokers, although the possibility of potential adverse effect for heavy consumption cannot be excluded. PMID:24947688
Broderick, Joseph P; Berkhemer, Olvert A; Palesch, Yuko Y; Dippel, Diederik W J; Foster, Lydia D; Roos, Yvo B W E M; van der Lugt, Aad; Tomsick, Thomas A; Majoie, Charles B L M; van Zwam, Wim H; Demchuk, Andrew M; van Oostenbrugge, Robert J; Khatri, Pooja; Lingsma, Hester F; Hill, Michael D; Roozenbeek, Bob; Jauch, Edward C; Jovin, Tudor G; Yan, Bernard; von Kummer, Rüdiger; Molina, Carlos A; Goyal, Mayank; Schonewille, Wouter J; Mazighi, Mikael; Engelter, Stefan T; Anderson, Craig S; Spilker, Judith; Carrozzella, Janice; Ryckborst, Karla J; Janis, L Scott; Simpson, Kit N
2015-12-01
We assessed the effect of endovascular treatment in acute ischemic stroke patients with severe neurological deficit (National Institutes of Health Stroke Scale score, ≥20) after a prespecified analysis plan. The pooled analysis of the Interventional Management of Stroke III (IMS III) and Multicenter Randomized Clinical Trial of Endovascular Therapy for Acute Ischemic Stroke in the Netherlands (MR CLEAN) trials included participants with an National Institutes of Health Stroke Scale score of ≥20 before intravenous tissue-type plasminogen activator (tPA) treatment (IMS III) or randomization (MR CLEAN) who were treated with intravenous tPA ≤3 hours of stroke onset. Our hypothesis was that participants with severe stroke randomized to endovascular therapy after intravenous tPA would have improved 90-day outcome (distribution of modified Rankin Scale scores), when compared with those who received intravenous tPA alone. Among 342 participants in the pooled analysis (194 from IMS III and 148 from MR CLEAN), an ordinal logistic regression model showed that the endovascular group had superior 90-day outcome compared with the intravenous tPA group (adjusted odds ratio, 1.78; 95% confidence interval, 1.20-2.66). In the logistic regression model of the dichotomous outcome (modified Rankin Scale score, 0-2, or functional independence), the endovascular group had superior outcomes (adjusted odds ratio, 1.97; 95% confidence interval, 1.09-3.56). Functional independence (modified Rankin Scale score, ≤2) at 90 days was 25% in the endovascular group when compared with 14% in the intravenous tPA group. Endovascular therapy after intravenous tPA within 3 hours of symptom onset improves functional outcome at 90 days after severe ischemic stroke. URL: http://www.clinicaltrials.gov. Unique identifier: NCT00359424 (IMS III) and ISRCTN10888758 (MR CLEAN). © 2015 American Heart Association, Inc.
Wassenaar, Catherine A.; Ye, Yuanqing; Cai, Qiuyin; Aldrich, Melinda C.; Knight, Joanne; Spitz, Margaret R.; Wu, Xifeng; Blot, William J.; Tyndale, Rachel F.
2015-01-01
We investigated genetic variation in CYP2A6 in relation to lung cancer risk among African American smokers, a high-risk population. Previously, we found that CYP2A6, a nicotine/nitrosamine metabolism gene, was associated with lung cancer risk in European Americans, but smoking habits, lung cancer risk and CYP2A6 gene variants differ significantly between European and African ancestry populations. Herein, African American ever-smokers, drawn from two independent lung cancer case–control studies, were genotyped for reduced activity CYP2A6 alleles and grouped by predicted metabolic activity. Lung cancer risk in the Southern Community Cohort Study (n = 494) was lower among CYP2A6 reduced versus normal metabolizers, as estimated by multivariate conditional logistic regression [odds ratio (OR) = 0.44; 95% confidence interval (CI) = 0.26–0.73] and by unconditional logistic regression (OR = 0.62; 95% CI = 0.41–0.94). The association was replicated in an independent study from MD Anderson Cancer Center (n = 407) (OR = 0.64; 95% CI = 0.42–0.98), and pooling the studies yielded an OR of 0.64 (95% CI = 0.48–0.86). Exploratory analyses revealed a significant interaction between CYP2A6 genotype and sex on the risk for lung cancer (Southern Community Cohort Study: P = 0.04; MD Anderson: P = 0.03; Pooled studies: P = 0.002) with a CYP2A6 effect in men only. These findings support a contribution of genetic variation in CYP2A6 to lung cancer risk among African American smokers, particularly men, whereby CYP2A6 genotypes associated with reduced metabolic activity confer a lower risk of developing lung cancer. PMID:25416559
Morcos, Peter N; Nueesch, Eveline; Jaminion, Felix; Guerini, Elena; Hsu, Joy C; Bordogna, Walter; Balas, Bogdana; Mercier, Francois
2018-05-10
Alectinib is a selective and potent anaplastic lymphoma kinase (ALK) inhibitor that is active in the central nervous system (CNS). Alectinib demonstrated robust efficacy in a pooled analysis of two single-arm, open-label phase II studies (NP28673, NCT01801111; NP28761, NCT01871805) in crizotinib-resistant ALK-positive non-small-cell lung cancer (NSCLC): median overall survival (OS) 29.1 months (95% confidence interval [CI]: 21.3-39.0) for alectinib 600 mg twice daily (BID). We investigated exposure-response relationships from final pooled phase II OS and safety data to assess alectinib dose selection. A semi-parametric Cox proportional hazards model analyzed relationships between individual median observed steady-state trough concentrations (C trough,ss ) for combined exposure of alectinib and its major metabolite (M4), baseline covariates (demographics and disease characteristics) and OS. Univariate logistic regression analysis analyzed relationships between C trough,ss and incidence of adverse events (AEs: serious and Grade ≥ 3). Overall, 92% of patients (n = 207/225) had C trough,ss data and were included in the analysis. No statistically significant relationship was found between C trough,ss and OS following alectinib treatment. The only baseline covariates that statistically influenced OS were baseline tumor size and prior crizotinib treatment duration. Larger baseline tumor size and shorter prior crizotinib treatment were both associated with shorter OS. Logistic regression confirmed no significant relationship between C trough,ss and AEs. Alectinib 600 mg BID provides systemic exposures at plateau of response for OS while maintaining a well-tolerated safety profile. This analysis confirms alectinib 600 mg BID as the recommended global dose for patients with crizotinib-resistant ALK-positive NSCLC.
Martin, Barbara A.; Saiki, Michael K.; Fong, Darren
2009-01-01
This study was conducted to better understand the habitat requirements and environmental limiting factors of Syncaris pacifica, the California freshwater shrimp. This federally listed endangered species is native to perennial lowland streams in a few watersheds in northern California. Field sampling occurred in Lagunitas and Olema creeks at seasonal intervals from February 2003 to November 2004. Ten glides, five pools, and five riffles served as fixed sampling reaches, with eight glides, four pools, and four riffles located in Lagunitas Creek and the remainder in Olema Creek. A total of 1773 S. pacifica was counted during this study, all of which were captured along vegetated banks in Lagunitas Creek. Syncaris pacifica was most numerous in glides (64), then in pools (31), and lastly in riffles (5). According to logistic regression analysis, S. pacifica was mostly associated with submerged portions of streambank vegetation (especially overhanging vegetation such as ferns and blackberries, emergent vegetation such as sedge and brooklime, and fine roots associated with water hemlock, willow, sedge, and blackberries) along with low water current velocity and a sandy substrate. These seemingly favorable habitat conditions for S. pacifica were present in glides and pools in Lagunitas Creek, but not in Olema Creek. ?? 2009 The Crustacean Society.
Meng, Ge; Feng, Yan; Nie, Zhiqing; Wu, Xiaomeng; Wei, Hongying; Wu, Shaowei; Yin, Yong; Wang, Yan
2016-04-01
Polybrominated diphenyl ethers (PBDEs), polychlorinated biphenyls (PCBs) and organochlorine pesticides (OCPs) are common persistent organic pollutants (POPs) that may be associated with childhood asthma. The concentrations of PBDEs, PCBs and OCPs were analyzed in pooled serum samples from both asthmatic and non-asthmatic children. The differences in the internal exposure levels between the case and control groups were tested (p value <0.0012). The associations between the internal exposure concentrations of the POPs and childhood asthma were estimated based on the odds ratios (ORs) calculated using logistic regression models. There were significant differences in three PBDEs, 26 PCBs and seven OCPs between the two groups, with significantly higher levels in the cases. The multiple logistic regression models demonstrated that the internal exposure concentrations of a number of the POPs (23 PCBs, p,p'-DDE and α-HCH) were positively associated with childhood asthma. Some synergistic effects were observed when the children were co-exposed to the chemicals. BDE-209 was positively associated with asthma aggravation. This study indicates the potential relationships between the internal exposure concentrations of particular POPs and the development of childhood asthma. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Ariffin, Syaiba Balqish; Midi, Habshah
2014-06-01
This article is concerned with the performance of logistic ridge regression estimation technique in the presence of multicollinearity and high leverage points. In logistic regression, multicollinearity exists among predictors and in the information matrix. The maximum likelihood estimator suffers a huge setback in the presence of multicollinearity which cause regression estimates to have unduly large standard errors. To remedy this problem, a logistic ridge regression estimator is put forward. It is evident that the logistic ridge regression estimator outperforms the maximum likelihood approach for handling multicollinearity. The effect of high leverage points are then investigated on the performance of the logistic ridge regression estimator through real data set and simulation study. The findings signify that logistic ridge regression estimator fails to provide better parameter estimates in the presence of both high leverage points and multicollinearity.
Vilar-Compte, Mireya; Sandoval-Olascoaga, Sebastian; Bernal-Stuart, Ana; Shimoga, Sandhya; Vargas-Bustamante, Arturo
2015-11-01
The present paper investigated the impact of the 2008 financial crisis on food security in Mexico and how it disproportionally affected vulnerable households. A generalized ordered logistic regression was estimated to assess the impact of the crisis on households' food security status. An ordinary least squares and a quantile regression were estimated to evaluate the effect of the financial crisis on a continuous proxy measure of food security defined as the share of a household's current income devoted to food expenditures. Setting Both analyses were performed using pooled cross-sectional data from the Mexican National Household Income and Expenditure Survey 2008 and 2010. The analytical sample included 29,468 households in 2008 and 27,654 in 2010. The generalized ordered logistic model showed that the financial crisis significantly (P<0·05) decreased the probability of being food secure, mildly or moderately food insecure, compared with being severely food insecure (OR=0·74). A similar but smaller effect was found when comparing severely and moderately food-insecure households with mildly food-insecure and food-secure households (OR=0·81). The ordinary least squares model showed that the crisis significantly (P<0·05) increased the share of total income spent on food (β coefficient of 0·02). The quantile regression confirmed the findings suggested by the generalized ordered logistic model, showing that the effects of the crisis were more profound among poorer households. The results suggest that households that were more vulnerable before the financial crisis saw a worsened effect in terms of food insecurity with the crisis. Findings were consistent with both measures of food security--one based on self-reported experience and the other based on food spending.
Sample size determination for logistic regression on a logit-normal distribution.
Kim, Seongho; Heath, Elisabeth; Heilbrun, Lance
2017-06-01
Although the sample size for simple logistic regression can be readily determined using currently available methods, the sample size calculation for multiple logistic regression requires some additional information, such as the coefficient of determination ([Formula: see text]) of a covariate of interest with other covariates, which is often unavailable in practice. The response variable of logistic regression follows a logit-normal distribution which can be generated from a logistic transformation of a normal distribution. Using this property of logistic regression, we propose new methods of determining the sample size for simple and multiple logistic regressions using a normal transformation of outcome measures. Simulation studies and a motivating example show several advantages of the proposed methods over the existing methods: (i) no need for [Formula: see text] for multiple logistic regression, (ii) available interim or group-sequential designs, and (iii) much smaller required sample size.
Dimitrov, Borislav D; Motterlini, Nicola; Fahey, Tom
2015-01-01
Objective Estimating calibration performance of clinical prediction rules (CPRs) in systematic reviews of validation studies is not possible when predicted values are neither published nor accessible or sufficient or no individual participant or patient data are available. Our aims were to describe a simplified approach for outcomes prediction and calibration assessment and evaluate its functionality and validity. Study design and methods: Methodological study of systematic reviews of validation studies of CPRs: a) ABCD2 rule for prediction of 7 day stroke; and b) CRB-65 rule for prediction of 30 day mortality. Predicted outcomes in a sample validation study were computed by CPR distribution patterns (“derivation model”). As confirmation, a logistic regression model (with derivation study coefficients) was applied to CPR-based dummy variables in the validation study. Meta-analysis of validation studies provided pooled estimates of “predicted:observed” risk ratios (RRs), 95% confidence intervals (CIs), and indexes of heterogeneity (I2) on forest plots (fixed and random effects models), with and without adjustment of intercepts. The above approach was also applied to the CRB-65 rule. Results Our simplified method, applied to ABCD2 rule in three risk strata (low, 0–3; intermediate, 4–5; high, 6–7 points), indicated that predictions are identical to those computed by univariate, CPR-based logistic regression model. Discrimination was good (c-statistics =0.61–0.82), however, calibration in some studies was low. In such cases with miscalibration, the under-prediction (RRs =0.73–0.91, 95% CIs 0.41–1.48) could be further corrected by intercept adjustment to account for incidence differences. An improvement of both heterogeneities and P-values (Hosmer-Lemeshow goodness-of-fit test) was observed. Better calibration and improved pooled RRs (0.90–1.06), with narrower 95% CIs (0.57–1.41) were achieved. Conclusion Our results have an immediate clinical implication in situations when predicted outcomes in CPR validation studies are lacking or deficient by describing how such predictions can be obtained by everyone using the derivation study alone, without any need for highly specialized knowledge or sophisticated statistics. PMID:25931829
Staley, James R; Jones, Edmund; Kaptoge, Stephen; Butterworth, Adam S; Sweeting, Michael J; Wood, Angela M; Howson, Joanna M M
2017-06-01
Logistic regression is often used instead of Cox regression to analyse genome-wide association studies (GWAS) of single-nucleotide polymorphisms (SNPs) and disease outcomes with cohort and case-cohort designs, as it is less computationally expensive. Although Cox and logistic regression models have been compared previously in cohort studies, this work does not completely cover the GWAS setting nor extend to the case-cohort study design. Here, we evaluated Cox and logistic regression applied to cohort and case-cohort genetic association studies using simulated data and genetic data from the EPIC-CVD study. In the cohort setting, there was a modest improvement in power to detect SNP-disease associations using Cox regression compared with logistic regression, which increased as the disease incidence increased. In contrast, logistic regression had more power than (Prentice weighted) Cox regression in the case-cohort setting. Logistic regression yielded inflated effect estimates (assuming the hazard ratio is the underlying measure of association) for both study designs, especially for SNPs with greater effect on disease. Given logistic regression is substantially more computationally efficient than Cox regression in both settings, we propose a two-step approach to GWAS in cohort and case-cohort studies. First to analyse all SNPs with logistic regression to identify associated variants below a pre-defined P-value threshold, and second to fit Cox regression (appropriately weighted in case-cohort studies) to those identified SNPs to ensure accurate estimation of association with disease.
Gender equality as a means to improve maternal and child health in Africa.
Singh, Kavita; Bloom, Shelah; Brodish, Paul
2015-01-01
In this article we examine whether measures of gender equality, household decision making, and attitudes toward gender-based violence are associated with maternal and child health outcomes in Africa. We pooled Demographic and Health Surveys data from eight African countries and used multilevel logistic regression on two maternal health outcomes (low body mass index and facility delivery) and two child health outcomes (immunization status and treatment for an acute respiratory infection). We found protective associations between the gender equality measures and the outcomes studied, indicating that gender equality is a potential strategy to improve maternal and child health in Africa.
Gender Equality as a Means to Improve Maternal and Child Health in Africa
Singh, Kavita; Bloom, Shelah; Brodish, Paul
2015-01-01
In this paper we examine whether measures of gender equality, household decision-making and attitudes toward gender-based violence are associated with maternal and child health outcomes in Africa. We pooled Demographic and Health Surveys (DHS) data from eight African countries and used multilevel logistic regression on two maternal health outcomes (low body mass index and facility delivery) and two child health outcomes (immunization status and treatment for an acute respiratory infection). We found protective associations between the gender equality measures and the outcomes studied, indicating that gender equality is a potential strategy to improve maternal and child health in Africa. PMID:24028632
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
Using Dominance Analysis to Determine Predictor Importance in Logistic Regression
ERIC Educational Resources Information Center
Azen, Razia; Traxel, Nicole
2009-01-01
This article proposes an extension of dominance analysis that allows researchers to determine the relative importance of predictors in logistic regression models. Criteria for choosing logistic regression R[superscript 2] analogues were determined and measures were selected that can be used to perform dominance analysis in logistic regression. A…
Steffensen, Charlotte; Pereira, Alberto M; Dekkers, Olaf M; Jørgensen, Jens Otto L
2016-12-01
Type 2 diabetes (T2D) and Cushing's syndrome (CS) share clinical characteristics, and several small studies have recorded a high prevalence of hypercortisolism in T2D, which could have therapeutic implications. We aimed to assess the prevalence of endogenous hypercortisolism in T2D patients. Systematic review and meta-analysis of the literature. A search was performed in SCOPUS, MEDLINE, and EMBASE for original articles assessing the prevalence of endogenous hypercortisolism and CS in T2D. Data were pooled in a random-effect logistic regression model and reported with 95% confidence intervals (95% CI). Fourteen articles were included, with a total of 2827 T2D patients. The pooled prevalence of hypercortisolism and CS was 3.4% (95% CI: 1.5-5.9) and 1.4% (95 CI: 0.4-2.9) respectively. The prevalence did not differ between studies of unselected patients and patients selected based on the presence of metabolic features such as obesity or poor glycemic control (P = 0.41 from meta-regression). Imaging in patients with hypercortisolism (n = 102) revealed adrenal tumors and pituitary tumors in 52 and 14% respectively. Endogenous hypercortisolism is a relatively frequent finding in T2D, which may have therapeutic implications. © 2016 European Society of Endocrinology.
Association between adult height, genetic susceptibility and risk of glioma
Kitahara, Cari M; Wang, Sophia S; Melin, Beatrice S; Wang, Zhaoming; Braganza, Melissa; Inskip, Peter D; Albanes, Demetrius; Andersson, Ulrika; Beane Freeman, Laura E; Buring, Julie E; Carreón, Tania; Feychting, Maria; Gapstur, Susan M; Gaziano, J Michael; Giles, Graham G; Hallmans, Goran; Hankinson, Susan E; Henriksson, Roger; Hsing, Ann W; Johansen, Christoffer; Linet, Martha S; McKean-Cowdin, Roberta; Michaud, Dominique S; Peters, Ulrike; Purdue, Mark P; Rothman, Nathaniel; Ruder, Avima M; Sesso, Howard D; Severi, Gianluca; Shu, Xiao-Ou; Stevens, Victoria L; Visvanathan, Kala; Waters, Martha A; White, Emily; Wolk, Alicja; Zeleniuch-Jacquotte, Anne; Zheng, Wei; Hoover, Robert; Fraumeni, Joseph F; Chatterjee, Nilanjan; Yeager, Meredith; Chanock, Stephen J; Hartge, Patricia; Rajaraman, Preetha
2012-01-01
Background Some, but not all, observational studies have suggested that taller stature is associated with a significant increased risk of glioma. In a pooled analysis of observational studies, we investigated the strength and consistency of this association, overall and for major sub-types, and investigated effect modification by genetic susceptibility to the disease. Methods We standardized and combined individual-level data on 1354 cases and 4734 control subjects from 13 prospective and 2 case–control studies. Pooled odds ratios (ORs) and 95% confidence intervals (CIs) for glioma and glioma sub-types were estimated using logistic regression models stratified by sex and adjusted for birth cohort and study. Pooled ORs were additionally estimated after stratifying the models according to seven recently identified glioma-related genetic variants. Results Among men, we found a positive association between height and glioma risk (≥190 vs 170–174 cm, pooled OR = 1.70, 95% CI: 1.11–2.61; P-trend = 0.01), which was slightly stronger after restricting to cases with glioblastoma (pooled OR = 1.99, 95% CI: 1.17–3.38; P-trend = 0.02). Among women, these associations were less clear (≥175 vs 160–164 cm, pooled OR for glioma = 1.06, 95% CI: 0.70–1.62; P-trend = 0.22; pooled OR for glioblastoma = 1.36, 95% CI: 0.77–2.39; P-trend = 0.04). In general, we did not observe evidence of effect modification by glioma-related genotypes on the association between height and glioma risk. Conclusion An association of taller adult stature with glioma, particularly for men and stronger for glioblastoma, should be investigated further to clarify the role of environmental and genetic determinants of height in the etiology of this disease. PMID:22933650
Association between adult height, genetic susceptibility and risk of glioma.
Kitahara, Cari M; Wang, Sophia S; Melin, Beatrice S; Wang, Zhaoming; Braganza, Melissa; Inskip, Peter D; Albanes, Demetrius; Andersson, Ulrika; Beane Freeman, Laura E; Buring, Julie E; Carreón, Tania; Feychting, Maria; Gapstur, Susan M; Gaziano, J Michael; Giles, Graham G; Hallmans, Goran; Hankinson, Susan E; Henriksson, Roger; Hsing, Ann W; Johansen, Christoffer; Linet, Martha S; McKean-Cowdin, Roberta; Michaud, Dominique S; Peters, Ulrike; Purdue, Mark P; Rothman, Nathaniel; Ruder, Avima M; Sesso, Howard D; Severi, Gianluca; Shu, Xiao-Ou; Stevens, Victoria L; Visvanathan, Kala; Waters, Martha A; White, Emily; Wolk, Alicja; Zeleniuch-Jacquotte, Anne; Zheng, Wei; Hoover, Robert; Fraumeni, Joseph F; Chatterjee, Nilanjan; Yeager, Meredith; Chanock, Stephen J; Hartge, Patricia; Rajaraman, Preetha
2012-08-01
Some, but not all, observational studies have suggested that taller stature is associated with a significant increased risk of glioma. In a pooled analysis of observational studies, we investigated the strength and consistency of this association, overall and for major sub-types, and investigated effect modification by genetic susceptibility to the disease. We standardized and combined individual-level data on 1354 cases and 4734 control subjects from 13 prospective and 2 case-control studies. Pooled odds ratios (ORs) and 95% confidence intervals (CIs) for glioma and glioma sub-types were estimated using logistic regression models stratified by sex and adjusted for birth cohort and study. Pooled ORs were additionally estimated after stratifying the models according to seven recently identified glioma-related genetic variants. Among men, we found a positive association between height and glioma risk (≥ 190 vs 170-174 cm, pooled OR = 1.70, 95% CI: 1.11-2.61; P-trend = 0.01), which was slightly stronger after restricting to cases with glioblastoma (pooled OR = 1.99, 95% CI: 1.17-3.38; P-trend = 0.02). Among women, these associations were less clear (≥ 175 vs 160-164 cm, pooled OR for glioma = 1.06, 95% CI: 0.70-1.62; P-trend = 0.22; pooled OR for glioblastoma = 1.36, 95% CI: 0.77-2.39; P-trend = 0.04). In general, we did not observe evidence of effect modification by glioma-related genotypes on the association between height and glioma risk. An association of taller adult stature with glioma, particularly for men and stronger for glioblastoma, should be investigated further to clarify the role of environmental and genetic determinants of height in the etiology of this disease.
Çelik, Serdar; Turgut, Niyazi Emre; Cengiz Çelik, Dilek; Boynukalın, Kübra; Abalı, Remzi; Purisa, Sevim; Yağmur, Erbil; Bahçeci, Mustafa
2018-03-01
Pooling is an alternative method to achieve in vitro fertilization outcomes. This study was to investigate the effect of pooling method on pregnancy outcomes in poor responder patients according to Bologna criteria. Two hundred-fifty five poor responder patients were enrolled in this study. Pooling embryo transfer (ET) group had 110 and fresh ET group had 145 patients. Although, age was similar between both treatment groups (p=0.31), antral follicle count (p<0.001), total number of retrieved oocyte (p<0.001), total metaphase II oocyte count (p<0.001), number of stimulation cycles (p<0.001), were significantly different between the groups. The day of ET were similiar between two groups (p=0.72) but the number of ET procedure was significantly higher in pooling ET group compared to fresh ET (p<0.001). Positive pregnancy test [35/110 (32%) vs 53/145 (37%)] (p=0.43) and clinical pregnacy rates [31/110 (28%) vs 49/145 (34%)] (p=0.33) were similar between groups, whereas, implantation [31/191 (16%) vs 49/198 (25%)] (p=0.03) and live birth rates [15/110 (14%) vs 36/145 (25%)] (p=0.04) were significantly higher in fresh ET group. Despite that, abortion rates were significantly higher in pooling ET group [16/31 (52%) vs 13/49 (27%)] (p=0.04). Binary logistic regression analyese has revealed no effect of variables on live birth rates. Even though, pooling strategy seems to have a slight positive effect on pregnancy outcomes, there is no benefical effect on live birth rates. Furthermore, this strategy is increasing the abortion rates in parallel with clinical pregnancy rates.
Çelik, Serdar; Turgut, Niyazi Emre; Cengiz Çelik, Dilek; Boynukalın, Kübra; Abalı, Remzi; Purisa, Sevim; Yağmur, Erbil; Bahçeci, Mustafa
2018-01-01
Objective: Pooling is an alternative method to achieve in vitro fertilization outcomes. This study was to investigate the effect of pooling method on pregnancy outcomes in poor responder patients according to Bologna criteria. Materials and Methods: Two hundred-fifty five poor responder patients were enrolled in this study. Pooling embryo transfer (ET) group had 110 and fresh ET group had 145 patients. Results: Although, age was similar between both treatment groups (p=0.31), antral follicle count (p<0.001), total number of retrieved oocyte (p<0.001), total metaphase II oocyte count (p<0.001), number of stimulation cycles (p<0.001), were significantly different between the groups. The day of ET were similiar between two groups (p=0.72) but the number of ET procedure was significantly higher in pooling ET group compared to fresh ET (p<0.001). Positive pregnancy test [35/110 (32%) vs 53/145 (37%)] (p=0.43) and clinical pregnacy rates [31/110 (28%) vs 49/145 (34%)] (p=0.33) were similar between groups, whereas, implantation [31/191 (16%) vs 49/198 (25%)] (p=0.03) and live birth rates [15/110 (14%) vs 36/145 (25%)] (p=0.04) were significantly higher in fresh ET group. Despite that, abortion rates were significantly higher in pooling ET group [16/31 (52%) vs 13/49 (27%)] (p=0.04). Binary logistic regression analyese has revealed no effect of variables on live birth rates. Conclusion: Even though, pooling strategy seems to have a slight positive effect on pregnancy outcomes, there is no benefical effect on live birth rates. Furthermore, this strategy is increasing the abortion rates in parallel with clinical pregnancy rates. PMID:29662715
Health risks of early swimming pool attendance.
Schoefer, Yvonne; Zutavern, Anne; Brockow, Inken; Schäfer, Torsten; Krämer, Ursula; Schaaf, Beate; Herbarth, Olf; von Berg, Andrea; Wichmann, H-Erich; Heinrich, Joachim
2008-07-01
Swimming pool attendance and exposure to chlorination by-products showed adverse health effects on children. We assessed whether early swimming pool attendance, especially baby swimming, is related to higher rates of early infections and to the development of allergic diseases. In 2003-2005, 2192 children were analysed for the 6-year follow-up of a prospective birth cohort study. Data on early swimming pool attendance, other lifestyle factors and medical history were collected by parental-administered questionnaire. Bivariate and multivariate logistic regression analyses were used to evaluate associations. Babies who did not participate in baby swimming had lower rates of infection in the 1st year of life (i) diarrhoea: OR 0.68 CI 95% 0.54-0.85; (ii) otitis media: OR 0.81 CI 95% 0.62-1.05; (iii) airway infections: OR 0.85 CI 95% 0.67-1.09. No clear association could be found between late or non-swimmers and atopic dermatitis or hay fever until the age of 6 years, while higher rates of asthma were found (OR 2.15 95% CI 1.16-3.99), however, potentially due to reverse causation. The study indicates that, in terms of infections, baby swimming might not be as harmless as commonly thought. Further evidence is needed to make conclusions if the current regulations on chlorine in Germany might not protect swimming pool attendees from an increased risk of gastrointestinal infections. In terms of developing atopic diseases there is no verifiable detrimental effect of early swimming.
Applying Kaplan-Meier to Item Response Data
ERIC Educational Resources Information Center
McNeish, Daniel
2018-01-01
Some IRT models can be equivalently modeled in alternative frameworks such as logistic regression. Logistic regression can also model time-to-event data, which concerns the probability of an event occurring over time. Using the relation between time-to-event models and logistic regression and the relation between logistic regression and IRT, this…
Mulcahey, M J; Merenda, Lisa; Tian, Feng; Kozin, Scott; James, Michelle; Gogola, Gloria; Ni, Pengsheng
2013-01-01
This study examined the psychometric properties of item pools relevant to upper-extremity function and activity performance and evaluated simulated 5-, 10-, and 15-item computer adaptive tests (CATs). In a multicenter, cross-sectional study of 200 children and youth with brachial plexus birth palsy (BPBP), parents responded to upper-extremity (n = 52) and activity (n = 34) items using a 5-point response scale. We used confirmatory and exploratory factor analysis, ordinal logistic regression, item maps, and standard errors to evaluate the psychometric properties of the item banks. Validity was evaluated using analysis of variance and Pearson correlation coefficients. Results show that the two item pools have acceptable model fit, scaled well for children and youth with BPBP, and had good validity, content range, and precision. Simulated CATs performed comparably to the full item banks, suggesting that a reduced number of items provide similar information to the entire set of items. Copyright © 2013 by the American Occupational Therapy Association, Inc.
NASA Astrophysics Data System (ADS)
Lin, Yingzhi; Deng, Xiangzheng; Li, Xing; Ma, Enjun
2014-12-01
Spatially explicit simulation of land use change is the basis for estimating the effects of land use and cover change on energy fluxes, ecology and the environment. At the pixel level, logistic regression is one of the most common approaches used in spatially explicit land use allocation models to determine the relationship between land use and its causal factors in driving land use change, and thereby to evaluate land use suitability. However, these models have a drawback in that they do not determine/allocate land use based on the direct relationship between land use change and its driving factors. Consequently, a multinomial logistic regression method was introduced to address this flaw, and thereby, judge the suitability of a type of land use in any given pixel in a case study area of the Jiangxi Province, China. A comparison of the two regression methods indicated that the proportion of correctly allocated pixels using multinomial logistic regression was 92.98%, which was 8.47% higher than that obtained using logistic regression. Paired t-test results also showed that pixels were more clearly distinguished by multinomial logistic regression than by logistic regression. In conclusion, multinomial logistic regression is a more efficient and accurate method for the spatial allocation of land use changes. The application of this method in future land use change studies may improve the accuracy of predicting the effects of land use and cover change on energy fluxes, ecology, and environment.
Harford, Thomas C.; Yi, Hsiao-ye; Freeman, Robert C.
2015-01-01
This study examined associations between binge drinking and other substance use and perpetration of violence against self and others. Data were pooled from the 2003, 2005, and was constructed to reflect four categories of behaviors: other-directed violence only, self-directed violence only, combined other- and self-directed violence, and no violence. Results from multinomial logistic regressions show that the frequency of binge drinking and other substance use were significant risk factors for each of the violence categories relative to no-violence. However, the strengths of these associations varied across the violence categories. PMID:26478688
Boyle, Terry; Fritschi, Lin; Kobayashi, Lindsay C; Heyworth, Jane S; Lee, Derrick G; Si, Si; Aronson, Kristan J; Spinelli, John J
2016-11-01
There is limited research on the association between sedentary behaviour and breast cancer risk, particularly whether sedentary behaviour is differentially associated with premenopausal and postmenopausal breast cancer. We pooled data from 2 case-control studies from Australia and Canada to investigate this association. This pooled analysis included 1762 incident breast cancer cases and 2532 controls. Participants in both studies completed a lifetime occupational history and self-rated occupational physical activity level. A job-exposure matrix (JEM) was also applied to job titles to assess sedentary work. Logistic regression analyses (6 pooled and 12 study-specific) were conducted to estimate associations between both self-reported and JEM-assessed sedentary work and breast cancer risk among premenopausal and postmenopausal women. No association was observed in the 6 pooled analyses, and 10 of the study-specific analyses also showed null results. 2 study-specific analyses provided inconsistent and contradictory results, with 1 showing statistically significant increased risk of breast cancer for self-reported sedentary work among premenopausal women cancer in the Canadian study, and the other a non-significant inverse association between JEM-assessed sedentary work and breast cancer risk among postmenopausal women in the Australian study. While a suggestion of increased risk was seen for premenopausal women in the Canadian study when using the self-reported measure, overall this pooled study does not provide evidence that sedentary work is associated with breast cancer risk. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
An Analysis of the Number of Medical Malpractice Claims and Their Amounts
Bonetti, Marco; Cirillo, Pasquale; Musile Tanzi, Paola; Trinchero, Elisabetta
2016-01-01
Starting from an extensive database, pooling 9 years of data from the top three insurance brokers in Italy, and containing 38125 reported claims due to alleged cases of medical malpractice, we use an inhomogeneous Poisson process to model the number of medical malpractice claims in Italy. The intensity of the process is allowed to vary over time, and it depends on a set of covariates, like the size of the hospital, the medical department and the complexity of the medical operations performed. We choose the combination medical department by hospital as the unit of analysis. Together with the number of claims, we also model the associated amounts paid by insurance companies, using a two-stage regression model. In particular, we use logistic regression for the probability that a claim is closed with a zero payment, whereas, conditionally on the fact that an amount is strictly positive, we make use of lognormal regression to model it as a function of several covariates. The model produces estimates and forecasts that are relevant to both insurance companies and hospitals, for quality assurance, service improvement and cost reduction. PMID:27077661
Schörgendorfer, Angela; Branscum, Adam J; Hanson, Timothy E
2013-06-01
Logistic regression is a popular tool for risk analysis in medical and population health science. With continuous response data, it is common to create a dichotomous outcome for logistic regression analysis by specifying a threshold for positivity. Fitting a linear regression to the nondichotomized response variable assuming a logistic sampling model for the data has been empirically shown to yield more efficient estimates of odds ratios than ordinary logistic regression of the dichotomized endpoint. We illustrate that risk inference is not robust to departures from the parametric logistic distribution. Moreover, the model assumption of proportional odds is generally not satisfied when the condition of a logistic distribution for the data is violated, leading to biased inference from a parametric logistic analysis. We develop novel Bayesian semiparametric methodology for testing goodness of fit of parametric logistic regression with continuous measurement data. The testing procedures hold for any cutoff threshold and our approach simultaneously provides the ability to perform semiparametric risk estimation. Bayes factors are calculated using the Savage-Dickey ratio for testing the null hypothesis of logistic regression versus a semiparametric generalization. We propose a fully Bayesian and a computationally efficient empirical Bayesian approach to testing, and we present methods for semiparametric estimation of risks, relative risks, and odds ratios when parametric logistic regression fails. Theoretical results establish the consistency of the empirical Bayes test. Results from simulated data show that the proposed approach provides accurate inference irrespective of whether parametric assumptions hold or not. Evaluation of risk factors for obesity shows that different inferences are derived from an analysis of a real data set when deviations from a logistic distribution are permissible in a flexible semiparametric framework. © 2013, The International Biometric Society.
Gemzell-Danielsson, K; Kardos, L; von Hertzen, H
2015-12-01
A pooled analysis of two randomized controlled trials (RCTs) suggested that increased bodyweight and body mass index (BMI) may be associated with a greater probability of pregnancy. To address this issue we investigated whether higher bodyweight and/or BMI negatively impacted the risk of pregnancy in women receiving LNG-EC (levonorgestrel - emergency contraception) after unprotected sexual intercourse in a pooled analysis of three large multinational RCTs conducted by the World Health Organization (WHO). A pooled analysis of three double-blind, multinational RCTs conducted by the WHO to investigate the efficacy of LNG-EC in the general population. All analyses were done on the per-protocol set (PPS) which included 5812 women who received LNG-EC within 72 hours following unprotected sexual intercourse. The analysis was based on logistic regression, with pregnancy as the outcome. BMI and weight were represented in the same model. A total of 56 pregnancies were available for analysis in the PPS. Increasing bodyweight and BMI were not correlated with an increased risk of pregnancy in the studied population. A limitation of this study is that despite the large study population in the pooled analysis there were relatively small numbers of women in the high-BMI and high-bodyweight subgroups. LNG-EC is effective for preventing pregnancy after unprotected intercourse or contraceptive failure and no evidence was found to support the hypothesis of a loss of EC efficacy in subjects with high BMI or bodyweight. Therefore, access to LNG-EC should not be limited only to women of lower bodyweight or BMI.
Clayborne, Zahra M; Colman, Ian
2018-01-01
The primary objective of this study was to examine associations between depression and several measures of health behaviour change across 8 cycles of a population-based, cross-sectional survey of Canadians. The secondary objective of this study was to describe the prevalence of the types of health behaviour changes undergone/sought and types of barriers to change reported, comparing those with and without depression. The sample comprised 65,801 respondents to the Canadian Community Health Survey between 2007 and 2014. Past-year depression was assessed via structured interview (CIDI-SF). Measures of health behaviour change included recent changes made, desire to make changes, and barriers towards making changes. Analyses involved logistic regression, with estimates across cycles pooled using fixed-effects meta-analyses. Pooled prevalences of types of health behaviour changes undergone/sought and types of barriers to change experienced were reported, and associations with depression were examined. Depression was associated with higher odds of reporting a recent health behaviour change (pooled odds ratio [OR] = 1.39; 95% confidence interval [CI], 1.30 to 1.48), desire to make health behaviour changes (pooled OR = 1.61; 95% CI, 1.49 to 1.74), and barriers towards change (pooled OR = 1.54; 95% CI, 1.44 to 1.65). The most common change undergone and sought was increased exercise; the most common barrier reported was a lack of willpower. Individuals dealing with depression are more likely to report recent health behaviour changes and the desire to make changes but are also more likely to report barriers towards change.
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-01-01
Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332
Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
Fungible weights in logistic regression.
Jones, Jeff A; Waller, Niels G
2016-06-01
In this article we develop methods for assessing parameter sensitivity in logistic regression models. To set the stage for this work, we first review Waller's (2008) equations for computing fungible weights in linear regression. Next, we describe 2 methods for computing fungible weights in logistic regression. To demonstrate the utility of these methods, we compute fungible logistic regression weights using data from the Centers for Disease Control and Prevention's (2010) Youth Risk Behavior Surveillance Survey, and we illustrate how these alternate weights can be used to evaluate parameter sensitivity. To make our work accessible to the research community, we provide R code (R Core Team, 2015) that will generate both kinds of fungible logistic regression weights. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-08-01
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Should metacognition be measured by logistic regression?
Rausch, Manuel; Zehetleitner, Michael
2017-03-01
Are logistic regression slopes suitable to quantify metacognitive sensitivity, i.e. the efficiency with which subjective reports differentiate between correct and incorrect task responses? We analytically show that logistic regression slopes are independent from rating criteria in one specific model of metacognition, which assumes (i) that rating decisions are based on sensory evidence generated independently of the sensory evidence used for primary task responses and (ii) that the distributions of evidence are logistic. Given a hierarchical model of metacognition, logistic regression slopes depend on rating criteria. According to all considered models, regression slopes depend on the primary task criterion. A reanalysis of previous data revealed that massive numbers of trials are required to distinguish between hierarchical and independent models with tolerable accuracy. It is argued that researchers who wish to use logistic regression as measure of metacognitive sensitivity need to control the primary task criterion and rating criteria. Copyright © 2017 Elsevier Inc. All rights reserved.
McCullough, Marjorie L; Weinstein, Stephanie J; Freedman, D Michal; Helzlsouer, Kathy; Flanders, W Dana; Koenig, Karen; Kolonel, Laurence; Laden, Francine; Le Marchand, Loic; Purdue, Mark; Snyder, Kirk; Stevens, Victoria L; Stolzenberg-Solomon, Rachael; Virtamo, Jarmo; Yang, Gong; Yu, Kai; Zheng, Wei; Albanes, Demetrius; Ashby, Jason; Bertrand, Kimberly; Cai, Hui; Chen, Yu; Gallicchio, Lisa; Giovannucci, Edward; Jacobs, Eric J; Hankinson, Susan E; Hartge, Patricia; Hartmuller, Virginia; Harvey, Chinonye; Hayes, Richard B; Horst, Ronald L; Shu, Xiao-Ou
2010-07-01
Low vitamin D status is common globally and is associated with multiple disease outcomes. Understanding the correlates of vitamin D status will help guide clinical practice, research, and interpretation of studies. Correlates of circulating 25-hydroxyvitamin D (25(OH)D) concentrations measured in a single laboratory were examined in 4,723 cancer-free men and women from 10 cohorts participating in the Cohort Consortium Vitamin D Pooling Project of Rarer Cancers, which covers a worldwide geographic area. Demographic and lifestyle characteristics were examined in relation to 25(OH)D using stepwise linear regression and polytomous logistic regression. The prevalence of 25(OH)D concentrations less than 25 nmol/L ranged from 3% to 36% across cohorts, and the prevalence of 25(OH)D concentrations less than 50 nmol/L ranged from 29% to 82%. Seasonal differences in circulating 25(OH)D were most marked among whites from northern latitudes. Statistically significant positive correlates of 25(OH)D included male sex, summer blood draw, vigorous physical activity, vitamin D intake, fish intake, multivitamin use, and calcium supplement use. Significant inverse correlates were body mass index, winter and spring blood draw, history of diabetes, sedentary behavior, smoking, and black race/ethnicity. Correlates varied somewhat within season, race/ethnicity, and sex. These findings help identify persons at risk for low vitamin D status for both clinical and research purposes.
Pooling score: an endoscopic model for evaluating severity of dysphagia
Farneti, D
2008-01-01
Summary The finding of secretions and bolus pooling is of great diagnostic interest in the evaluation of subjects with swallowing disorders. Bedside evaluation alone, in subjects at risk for aspiration, can underestimate this parameter. The usefulness of endoscopic investigation for the evaluation of subjects with swallowing disorders is stressed, in order to plan treatment and follow-up. Based on endoscopic evaluation of material pooling we devised a score expressing the severity of dysphagia. This value takes into account endoscopic landmarks and other parameters of bedside evaluation. Endoscopic and bedside data were collected from a heterogeneous population of 520 consecutive patients seen in our Service over a 6-year period. By means of the test of equality of group means and logistic regression, parameters able to significantly predict aspiration in the series were identified. An ordinal number was attributed to each parameter in order to obtain scores expressing three degrees of severity of dysphagia: mild, moderate, severe. The scores can be used to guide the management of patients in a simple way, providing indications for targeted referral to the speech pathologist and for tracking the disorder over time. This investigation represents the basis for future research aimed at validating the scores in a larger case series. PMID:18646575
London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure
Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith
2017-01-01
Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Wassenaar, Catherine A; Ye, Yuanqing; Cai, Qiuyin; Aldrich, Melinda C; Knight, Joanne; Spitz, Margaret R; Wu, Xifeng; Blot, William J; Tyndale, Rachel F
2015-01-01
We investigated genetic variation in CYP2A6 in relation to lung cancer risk among African American smokers, a high-risk population. Previously, we found that CYP2A6, a nicotine/nitrosamine metabolism gene, was associated with lung cancer risk in European Americans, but smoking habits, lung cancer risk and CYP2A6 gene variants differ significantly between European and African ancestry populations. Herein, African American ever-smokers, drawn from two independent lung cancer case-control studies, were genotyped for reduced activity CYP2A6 alleles and grouped by predicted metabolic activity. Lung cancer risk in the Southern Community Cohort Study (n = 494) was lower among CYP2A6 reduced versus normal metabolizers, as estimated by multivariate conditional logistic regression [odds ratio (OR) = 0.44; 95% confidence interval (CI) = 0.26-0.73] and by unconditional logistic regression (OR = 0.62; 95% CI = 0.41-0.94). The association was replicated in an independent study from MD Anderson Cancer Center (n = 407) (OR = 0.64; 95% CI = 0.42-0.98), and pooling the studies yielded an OR of 0.64 (95% CI = 0.48-0.86). Exploratory analyses revealed a significant interaction between CYP2A6 genotype and sex on the risk for lung cancer (Southern Community Cohort Study: P = 0.04; MD Anderson: P = 0.03; Pooled studies: P = 0.002) with a CYP2A6 effect in men only. These findings support a contribution of genetic variation in CYP2A6 to lung cancer risk among African American smokers, particularly men, whereby CYP2A6 genotypes associated with reduced metabolic activity confer a lower risk of developing lung cancer. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Association Between Distance From Home to Tobacco Outlet and Smoking Cessation and Relapse.
Pulakka, Anna; Halonen, Jaana I; Kawachi, Ichiro; Pentti, Jaana; Stenholm, Sari; Jokela, Markus; Kaate, Ilkka; Koskenvuo, Markku; Vahtera, Jussi; Kivimäki, Mika
2016-10-01
Reduced availability of tobacco outlets is hypothesized to reduce smoking, but longitudinal evidence on this issue is scarce. To examine whether changes in distance from home to tobacco outlet are associated with changes in smoking behaviors. The data from 2 prospective cohort studies included geocoded residential addresses, addresses of tobacco outlets, and responses to smoking surveys in 2008 and 2012 (the Finnish Public Sector [FPS] study, n = 53 755) or 2003 and 2012 (the Health and Social Support [HeSSup] study, n = 11 924). All participants were smokers or ex-smokers at baseline. We used logistic regression in between-individual analyses and conditional logistic regression in case-crossover design analyses to examine change in walking distance from home to the nearest tobacco outlet as a predictor of quitting smoking in smokers and smoking relapse in ex-smokers. Study-specific estimates were pooled using fixed-effect meta-analysis. Walking distance from home to the nearest tobacco outlet. Quitting smoking and smoking relapse as indicated by self-reported current and previous smoking at baseline and follow-up. Overall, 20 729 men and women (age range 18-75 years) were recruited. Of the 6259 and 2090 baseline current smokers, 1744 (28%) and 818 (39%) quit, and of the 8959 and 3421 baseline ex-smokers, 617 (7%) and 205 (6%) relapsed in the FPS and HeSSup studies, respectively. Among the baseline smokers, a 500-m increase in distance from home to the nearest tobacco outlet was associated with a 16% increase in odds of quitting smoking in the between-individual analysis (pooled odds ratio, 1.16; 95% CI, 1.05-1.28) and 57% increase in within-individual analysis (pooled odds ratio, 1.57; 95% CI, 1.32-1.86), after adjusting for changes in self-reported marital and working status, substantial worsening of financial situation, illness in the family, and own health status. Increase in distance to the nearest tobacco outlet was not associated with smoking relapse among the ex-smokers. These data suggest that increase in distance from home to the nearest tobacco outlet may increase quitting among smokers. No effect of change in distance on relapse in ex-smokers was observed.
Bressler, Neil M; Boyer, David S; Williams, David F; Butler, Steven; Francom, Steven F; Brown, Benton; Di Nucci, Flavia; Cramm, Timothy; Tuomi, Lisa L; Ianchulev, Tsontcho; Rubio, Roman G
2012-10-01
To analyze cerebrovascular accidents (CVAs) pooled from large, randomized, controlled clinical trials of ranibizumab treatment for neovascular age-related macular degeneration. Events in five trials (FOCUS, MARINA, ANCHOR, PIER, and SAILOR) were analyzed using a standard safety monitoring process. Exact methods, stratified by study, were used to test for treatment differences based on odds ratios. A stepwise logistic regression model was fit to classify subjects' risk for CVA based on medical history. Treatment differences in CVA rates at 1 year or 2 years were evaluated within risk groups using stratified exact methods. Pooled 2-year CVA rates were <3%; odds ratios (95% confidence intervals) for CVA risk were 1.2 (0.4-4.4) for ranibizumab 0.3-mg versus control, 2.2 (0.8-7.1) for 0.5 mg versus control, and 1.5 (0.8-3.0) for 0.5-mg versus 0.3-mg ranibizumab. No substantial increased risk of CVA for 0.5 mg versus 0.3 mg was identified in pooled analyses or any of the individual trials. In pooled analyses, the difference between 0.5-mg ranibizumab and control was larger (7.7 [1.2-177]) among high-risk CVA patients. This analysis provided some evidence, although not definitive, of a potential increased risk of CVA with ranibizumab versus control or with 0.5-mg versus 0.3-mg ranibizumab. Continued monitoring for CVA within clinical trials seems warrented.
Olson, Sara H; Hsu, Meier; Satagopan, Jaya M; Maisonneuve, Patrick; Silverman, Debra T; Lucenteforte, Ersilia; Anderson, Kristin E; Borgida, Ayelet; Bracci, Paige M; Bueno-de-Mesquita, H Bas; Cotterchio, Michelle; Dai, Qi; Duell, Eric J; Fontham, Elizabeth H; Gallinger, Steven; Holly, Elizabeth A; Ji, Bu-Tian; Kurtz, Robert C; La Vecchia, Carlo; Lowenfels, Albert B; Luckett, Brian; Ludwig, Emmy; Petersen, Gloria M; Polesel, Jerry; Seminara, Daniela; Strayer, Lori; Talamini, Renato
2013-09-01
In order to quantify the risk of pancreatic cancer associated with history of any allergy and specific allergies, to investigate differences in the association with risk according to age, gender, smoking status, or body mass index, and to study the influence of age at onset, we pooled data from 10 case-control studies. In total, there were 3,567 cases and 9,145 controls. Study-specific odds ratios and 95% confidence intervals were calculated by using unconditional logistic regression adjusted for age, gender, smoking status, and body mass index. Between-study heterogeneity was assessed by using the Cochran Q statistic. Study-specific odds ratios were pooled by using a random-effects model. The odds ratio for any allergy was 0.79 (95% confidence interval (CI): 0.62, 1.00) with heterogeneity among studies (P < 0.001). Heterogeneity was attributable to one study; with that study excluded, the pooled odds ratio was 0.73 (95% CI: 0.64, 0.84) (Pheterogeneity = 0.23). Hay fever (odds ratio = 0.74, 95% CI: 0.56, 0.96) and allergy to animals (odds ratio = 0.62, 95% CI: 0.41, 0.94) were related to lower risk, while there was no statistically significant association with other allergies or asthma. There were no major differences among subgroups defined by age, gender, smoking status, or body mass index. Older age at onset of allergies was slightly more protective than earlier age.
Olson, Sara H.; Hsu, Meier; Satagopan, Jaya M.; Maisonneuve, Patrick; Silverman, Debra T.; Lucenteforte, Ersilia; Anderson, Kristin E.; Borgida, Ayelet; Bracci, Paige M.; Bueno-de-Mesquita, H. Bas; Cotterchio, Michelle; Dai, Qi; Duell, Eric J.; Fontham, Elizabeth H.; Gallinger, Steven; Holly, Elizabeth A.; Ji, Bu-Tian; Kurtz, Robert C.; La Vecchia, Carlo; Lowenfels, Albert B.; Luckett, Brian; Ludwig, Emmy; Petersen, Gloria M.; Polesel, Jerry; Seminara, Daniela; Strayer, Lori; Talamini, Renato
2013-01-01
In order to quantify the risk of pancreatic cancer associated with history of any allergy and specific allergies, to investigate differences in the association with risk according to age, gender, smoking status, or body mass index, and to study the influence of age at onset, we pooled data from 10 case-control studies. In total, there were 3,567 cases and 9,145 controls. Study-specific odds ratios and 95% confidence intervals were calculated by using unconditional logistic regression adjusted for age, gender, smoking status, and body mass index. Between-study heterogeneity was assessed by using the Cochran Q statistic. Study-specific odds ratios were pooled by using a random-effects model. The odds ratio for any allergy was 0.79 (95% confidence interval (CI): 0.62, 1.00) with heterogeneity among studies (P < 0.001). Heterogeneity was attributable to one study; with that study excluded, the pooled odds ratio was 0.73 (95% CI: 0.64, 0.84) (Pheterogeneity = 0.23). Hay fever (odds ratio = 0.74, 95% CI: 0.56, 0.96) and allergy to animals (odds ratio = 0.62, 95% CI: 0.41, 0.94) were related to lower risk, while there was no statistically significant association with other allergies or asthma. There were no major differences among subgroups defined by age, gender, smoking status, or body mass index. Older age at onset of allergies was slightly more protective than earlier age. PMID:23820785
Li, J C; Silverberg, J I
2015-11-01
Chickenpox infection early in childhood has previously been shown to protect against the development of childhood eczema in line with the hygiene hypothesis. In 1995, the American Academy of Pediatrics recommended routine vaccination against varicella zoster virus in the United States. Subsequently, rates of chickenpox infection have dramatically decreased in childhood. We sought to understand the impact of declining rates of chickenpox infection on the prevalence of eczema. We analysed data from 207 007 children in the 1997-2013 National Health Interview Survey. One-year prevalence of eczema and 'ever had' history of chickenpox were analysed. Associations between chickenpox infection and eczema were tested using survey-weighted logistic regression. The impact of chickenpox on trends of eczema prevalence was tested using survey logistic regression and generalized linear models. Children with a history of chickenpox compared with those without chickenpox had a lower prevalence [survey-weighted logistic regression (95% confidence interval, CI)] of eczema [8·8% (8·5-9·0%) vs. 10·6% (10·4-10·8%)]. In pooled multivariate models controlling for age, sex, race/ethnicity, household income, highest level of household education, insurance coverage, U.S. birthplace and family size, eczema was inversely associated with chickenpox [adjusted odds ratio (95% CI), 0·90 (0·86-0·94), P < 0·001]. The prevalence of eczema significantly increased over time (Tukey post-hoc test, P < 0·001 for comparisons of survey years 2001-13 vs. 1997-2000, 2008-13 vs. 2001-04 and 2008-13 vs. 2005-07). In multivariate generalized linear models, the odds of eczema was not associated with chickenpox in 2001-13 (P ≥ 0·06). These findings suggest that lower rates of chickenpox infection secondary to widespread vaccination against varicella zoster virus are not contributing to higher rates of childhood eczema in the U.S. © 2015 British Association of Dermatologists.
Vilar-Compte, Mireya; Sandoval-Olascoaga, Sebastian; Bernal-Stuart, Ana; Shimoga, Sandhya; Vargas-Bustamante, Arturo
2015-01-01
Objective The present paper investigated the impact of the 2008 financial crisis on food security in Mexico and how it disproportionally affected vulnerable households. Design A generalized ordered logistic regression was estimated to assess the impact of the crisis on households’ food security status. An ordinary least squares and a quantile regression were estimated to evaluate the effect of the financial crisis on a continuous proxy measure of food security defined as the share of a household’s current income devoted to food expenditures. Setting Both analyses were performed using pooled cross-sectional data from the Mexican National Household Income and Expenditure Survey 2008 and 2010. Subjects The analytical sample included 29 468 households in 2008 and 27 654 in 2010. Results The generalized ordered logistic model showed that the financial crisis significantly (P < 0·05) decreased the probability of being food secure, mildly or moderately food insecure, compared with being severely food insecure (OR = 0·74). A similar but smaller effect was found when comparing severely and moderately food-insecure households with mildly food-insecure and food-secure households (OR = 0·81). The ordinary least squares model showed that the crisis significantly (P < 0·05) increased the share of total income spent on food (β coefficient of 0·02). The quantile regression confirmed the findings suggested by the generalized ordered logistic model, showing that the effects of the crisis were more profound among poorer households. Conclusions The results suggest that households that were more vulnerable before the financial crisis saw a worsened effect in terms of food insecurity with the crisis. Findings were consistent with both measures of food security – one based on self-reported experience and the other based on food spending. PMID:25428800
Parameters Estimation of Geographically Weighted Ordinal Logistic Regression (GWOLR) Model
NASA Astrophysics Data System (ADS)
Zuhdi, Shaifudin; Retno Sari Saputro, Dewi; Widyaningsih, Purnami
2017-06-01
A regression model is the representation of relationship between independent variable and dependent variable. The dependent variable has categories used in the logistic regression model to calculate odds on. The logistic regression model for dependent variable has levels in the logistics regression model is ordinal. GWOLR model is an ordinal logistic regression model influenced the geographical location of the observation site. Parameters estimation in the model needed to determine the value of a population based on sample. The purpose of this research is to parameters estimation of GWOLR model using R software. Parameter estimation uses the data amount of dengue fever patients in Semarang City. Observation units used are 144 villages in Semarang City. The results of research get GWOLR model locally for each village and to know probability of number dengue fever patient categories.
The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...
Predicting U.S. Army Reserve Unit Manning Using Market Demographics
2015-06-01
develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
ERIC Educational Resources Information Center
Chen, Chau-Kuang
2005-01-01
Logistic and Cox regression methods are practical tools used to model the relationships between certain student learning outcomes and their relevant explanatory variables. The logistic regression model fits an S-shaped curve into a binary outcome with data points of zero and one. The Cox regression model allows investigators to study the duration…
Pre-diagnostic plasma urate and the risk of amyotrophic lateral sclerosis.
O'Reilly, Éilis J; Bjornevik, Kjetil; Schwarzschild, Michael A; McCullough, Marjorie L; Kolonel, Laurence N; Le Marchand, Loic; Manson, Joann E; Ascherio, Alberto
2018-05-01
To prospectively examine for the first time the association between plasma urate levels measured in healthy participants and future amyotrophic lateral sclerosis (ALS) risk. A pooled case-control study nested in five US prospective cohorts comprising 319,617 participants who provided blood, of which 275 had ALS during follow-up. Pre-diagnostic plasma urate was determined for all participants using a clinical colorimetric enzyme assay. Gender-specific multivariable-adjusted rate ratios (RR) of ALS incidence or death estimated by conditional logistic regression and pooled using inverse-variance weighting. In age- and matching factor-adjusted analyses, a 1 mg/dL increase in urate concentration was associated with RR = 0.88 (95% CI: [0.78, 0.997] p = 0.044). After adjustment for BMI, a strong predictor of ALS and urate levels, and other potential covariates, the RR = 0.89 (95% CI: [0.78, 1.02]; p = 0.08 for 1mg/dL increase in urate). Elevation of plasma urate was modestly inversely associated with the risk of ALS and warrants further study for a potential role in this disease.
Szyda, Joanna; Liu, Zengting; Zatoń-Dobrowolska, Magdalena; Wierzbicki, Heliodor; Rzasa, Anna
2008-01-01
We analysed data from a selective DNA pooling experiment with 130 individuals of the arctic fox (Alopex lagopus), which originated from 2 different types regarding body size. The association between alleles of 6 selected unlinked molecular markers and body size was tested by using univariate and multinomial logistic regression models, applying odds ratio and test statistics from the power divergence family. Due to the small sample size and the resulting sparseness of the data table, in hypothesis testing we could not rely on the asymptotic distributions of the tests. Instead, we tried to account for data sparseness by (i) modifying confidence intervals of odds ratio; (ii) using a normal approximation of the asymptotic distribution of the power divergence tests with different approaches for calculating moments of the statistics; and (iii) assessing P values empirically, based on bootstrap samples. As a result, a significant association was observed for 3 markers. Furthermore, we used simulations to assess the validity of the normal approximation of the asymptotic distribution of the test statistics under the conditions of small and sparse samples.
Yusuf, O B; Bamgboye, E A; Afolabi, R F; Shodimu, M A
2014-09-01
Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether quasi or complete occurs, how to identify it and how to fix it. This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Journal of Medicine and medical sciences between 2004 and 2013. Problems of quasi or complete separation were described and were illustrated with the National Demographic and Health Survey dataset. A critical evaluation of articles that employed logistic regression was conducted. A total of 581 articles was reviewed, of which 40 (6.9%) used binary logistic regression. Twenty-four (60.0%) stated the use of logistic regression model in the methodology while none of the articles assessed model fit. Only 3 (12.5%) properly described the procedures. Of the 40 that used the logistic regression model, the problem of convergence occurred in 6 (15.0%) of the articles. Logistic regression tends to be poorly reported in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
Logistic Regression: Concept and Application
ERIC Educational Resources Information Center
Cokluk, Omay
2010-01-01
The main focus of logistic regression analysis is classification of individuals in different groups. The aim of the present study is to explain basic concepts and processes of binary logistic regression analysis intended to determine the combination of independent variables which best explain the membership in certain groups called dichotomous…
Proximity to sports facilities and sports participation for adolescents in Germany.
Reimers, Anne K; Wagner, Matthias; Alvanides, Seraphim; Steinmayr, Andreas; Reiner, Miriam; Schmidt, Steffen; Woll, Alexander
2014-01-01
To assess the relationship between proximity to specific sports facilities and participation in the corresponding sports activities for adolescents in Germany. A sample of 1,768 adolescents aged 11-17 years old and living in 161 German communities was examined. Distances to the nearest sports facilities were calculated as an indicator of proximity to sports facilities using Geographic Information Systems (GIS). Participation in specific leisure-time sports activities in sports clubs was assessed using a self-report questionnaire and individual-level socio-demographic variables were derived from a parent questionnaire. Community-level socio-demographics as covariates were selected from the INKAR database, in particular from indicators and maps on land development. Logistic regression analyses were conducted to examine associations between proximity to the nearest sports facilities and participation in the corresponding sports activities. The logistic regression analyses showed that girls residing longer distances from the nearest gym were less likely to engage in indoor sports activities; a significant interaction between distances to gyms and level of urbanization was identified. Decomposition of the interaction term showed that for adolescent girls living in rural areas participation in indoor sports activities was positively associated with gym proximity. Proximity to tennis courts and indoor pools was not associated with participation in tennis or water sports, respectively. Improved proximity to gyms is likely to be more important for female adolescents living in rural areas.
Suzuki, Seitaro; Yoshino, Koichi; Takayanagi, Atsushi; Ishizuka, Yoichi; Satou, Ryouichi; Kamijo, Hideyuki; Sugihara, Naoki
2016-06-10
This cross-sectional study was conducted to examine tooth loss and associated factors among professional drivers and white-collar workers. The participants were recruited by applying screening procedures to a pool of Japanese registrants in an online database. The participants were asked to complete a self-reported questionnaire. A total of 592 professional drivers and 328 white-collar workers (male, aged 30 to 69 years) were analyzed. A multiple logistic regression analysis was performed to identify differences between professional drivers and white-collar workers. The results showed that professional drivers had fewer teeth than white-collar workers (odds ratio [OR], 1.74; 95% confidence interval [95% CI], 1.150-2.625). Moreover, a second multiple logistic regression analysis revealed that several factors were associated with the number of teeth among professional drivers: diabetes mellitus (OR, 2.68; 95% CI, 1.388-5.173), duration of brushing teeth (OR, 1.66; 95% CI, 1.066-2.572), frequency of eating breakfast (OR, 2.23; 95% CI, 1.416-3.513), frequency of eating out (OR, 1.70; 95% CI, 1.086-2.671) and smoking status (OR, 2.88; 95% CI, 1.388-5.964). These findings suggest that the lifestyles of professional drivers could be related to not only their general health status, but also tooth loss.
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression
Weiss, Brandi A.; Dardick, William
2015-01-01
This article introduces an entropy-based measure of data–model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data–model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data–model fit to assess how well logistic regression models classify cases into observed categories. PMID:29795897
Logistic regression applied to natural hazards: rare event logistic regression with replications
NASA Astrophysics Data System (ADS)
Guns, M.; Vanacker, V.
2012-06-01
Statistical analysis of natural hazards needs particular attention, as most of these phenomena are rare events. This study shows that the ordinary rare event logistic regression, as it is now commonly used in geomorphologic studies, does not always lead to a robust detection of controlling factors, as the results can be strongly sample-dependent. In this paper, we introduce some concepts of Monte Carlo simulations in rare event logistic regression. This technique, so-called rare event logistic regression with replications, combines the strength of probabilistic and statistical methods, and allows overcoming some of the limitations of previous developments through robust variable selection. This technique was here developed for the analyses of landslide controlling factors, but the concept is widely applicable for statistical analyses of natural hazards.
Large unbalanced credit scoring using Lasso-logistic regression ensemble.
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression.
Weiss, Brandi A; Dardick, William
2016-12-01
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify the quality of classification and separation of group membership. Entropy complements preexisting measures of data-model fit and provides unique information not contained in other measures. Hypothetical data scenarios, an applied example, and Monte Carlo simulation results are used to demonstrate the application of entropy in logistic regression. Entropy should be used in conjunction with other measures of data-model fit to assess how well logistic regression models classify cases into observed categories.
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
A Methodology for Generating Placement Rules that Utilizes Logistic Regression
ERIC Educational Resources Information Center
Wurtz, Keith
2008-01-01
The purpose of this article is to provide the necessary tools for institutional researchers to conduct a logistic regression analysis and interpret the results. Aspects of the logistic regression procedure that are necessary to evaluate models are presented and discussed with an emphasis on cutoff values and choosing the appropriate number of…
John Hogland; Nedret Billor; Nathaniel Anderson
2013-01-01
Discriminant analysis, referred to as maximum likelihood classification within popular remote sensing software packages, is a common supervised technique used by analysts. Polytomous logistic regression (PLR), also referred to as multinomial logistic regression, is an alternative classification approach that is less restrictive, more flexible, and easy to interpret. To...
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
Rossi, Carmine; Shrier, Ian; Marshall, Lee; Cnossen, Sonya; Schwartzman, Kevin; Klein, Marina B; Schwarzer, Guido; Greenaway, Chris
2012-01-01
International migrants experience increased mortality from hepatocellular carcinoma compared to host populations, largely due to undetected chronic hepatitis B infection (HBV). We conducted a systematic review of the seroprevalence of chronic HBV and prior immunity in migrants arriving in low HBV prevalence countries to identify those at highest risk in order to guide disease prevention and control strategies. Medline, Medline In-Process, EMBASE and the Cochrane Database of Systematic Reviews were searched. Studies that reported HBV surface antigen or surface antibodies in migrants were included. The seroprevalence of chronic HBV and prior immunity were pooled by region of origin and immigrant class, using a random-effects model. A random-effects logistic regression was performed to explore heterogeneity. The number of chronically infected migrants in each immigrant-receiving country was estimated using the pooled HBV seroprevalences and country-specific census data. A total of 110 studies, representing 209,822 immigrants and refugees were included. The overall pooled seroprevalence of infection was 7.2% (95% CI: 6.3%-8.2%) and the seroprevalence of prior immunity was 39.7% (95% CI: 35.7%-43.9%). HBV seroprevalence differed significantly by region of origin. Migrants from East Asia and Sub-Saharan Africa were at highest risk and migrants from Eastern Europe were at an intermediate risk of infection. Region of origin, refugee status and decade of study were independently associated with infection in the adjusted random-effects logistic model. Almost 3.5 million migrants (95% CI: 2.8-4.5 million) are estimated to be chronically infected with HBV. The seroprevalence of chronic HBV infection is high in migrants from most world regions, particularly among those from East Asia, Sub-Saharan Africa and Eastern Europe, and more than 50% were found to be susceptible to HBV. Targeted screening and vaccination of international migrants can become an important component of HBV disease control efforts in immigrant-receiving countries.
An Entropy-Based Measure for Assessing Fuzziness in Logistic Regression
ERIC Educational Resources Information Center
Weiss, Brandi A.; Dardick, William
2016-01-01
This article introduces an entropy-based measure of data-model fit that can be used to assess the quality of logistic regression models. Entropy has previously been used in mixture-modeling to quantify how well individuals are classified into latent classes. The current study proposes the use of entropy for logistic regression models to quantify…
What Are the Odds of that? A Primer on Understanding Logistic Regression
ERIC Educational Resources Information Center
Huang, Francis L.; Moon, Tonya R.
2013-01-01
The purpose of this Methodological Brief is to present a brief primer on logistic regression, a commonly used technique when modeling dichotomous outcomes. Using data from the National Education Longitudinal Study of 1988 (NELS:88), logistic regression techniques were used to investigate student-level variables in eighth grade (i.e., enrolled in a…
Niche-induced cell death and epithelial phagocytosis regulate hair follicle stem cell pool.
Mesa, Kailin R; Rompolas, Panteleimon; Zito, Giovanni; Myung, Peggy; Sun, Thomas Y; Brown, Samara; Gonzalez, David G; Blagoev, Krastan B; Haberman, Ann M; Greco, Valentina
2015-06-04
Tissue homeostasis is achieved through a balance of cell production (growth) and elimination (regression). In contrast to tissue growth, the cells and molecular signals required for tissue regression remain unknown. To investigate physiological tissue regression, we use the mouse hair follicle, which cycles stereotypically between phases of growth and regression while maintaining a pool of stem cells to perpetuate tissue regeneration. Here we show by intravital microscopy in live mice that the regression phase eliminates the majority of the epithelial cells by two distinct mechanisms: terminal differentiation of suprabasal cells and a spatial gradient of apoptosis of basal cells. Furthermore, we demonstrate that basal epithelial cells collectively act as phagocytes to clear dying epithelial neighbours. Through cellular and genetic ablation we show that epithelial cell death is extrinsically induced through transforming growth factor (TGF)-β activation and mesenchymal crosstalk. Strikingly, our data show that regression acts to reduce the stem cell pool, as inhibition of regression results in excess basal epithelial cells with regenerative abilities. This study identifies the cellular behaviours and molecular mechanisms of regression that counterbalance growth to maintain tissue homeostasis.
On the Usefulness of a Multilevel Logistic Regression Approach to Person-Fit Analysis
ERIC Educational Resources Information Center
Conijn, Judith M.; Emons, Wilco H. M.; van Assen, Marcel A. L. M.; Sijtsma, Klaas
2011-01-01
The logistic person response function (PRF) models the probability of a correct response as a function of the item locations. Reise (2000) proposed to use the slope parameter of the logistic PRF as a person-fit measure. He reformulated the logistic PRF model as a multilevel logistic regression model and estimated the PRF parameters from this…
Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W
2015-08-01
Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Valle, Denis; Lima, Joanna M Tucker; Millar, Justin; Amratia, Punam; Haque, Ubydul
2015-11-04
Logistic regression is a statistical model widely used in cross-sectional and cohort studies to identify and quantify the effects of potential disease risk factors. However, the impact of imperfect tests on adjusted odds ratios (and thus on the identification of risk factors) is under-appreciated. The purpose of this article is to draw attention to the problem associated with modelling imperfect diagnostic tests, and propose simple Bayesian models to adequately address this issue. A systematic literature review was conducted to determine the proportion of malaria studies that appropriately accounted for false-negatives/false-positives in a logistic regression setting. Inference from the standard logistic regression was also compared with that from three proposed Bayesian models using simulations and malaria data from the western Brazilian Amazon. A systematic literature review suggests that malaria epidemiologists are largely unaware of the problem of using logistic regression to model imperfect diagnostic test results. Simulation results reveal that statistical inference can be substantially improved when using the proposed Bayesian models versus the standard logistic regression. Finally, analysis of original malaria data with one of the proposed Bayesian models reveals that microscopy sensitivity is strongly influenced by how long people have lived in the study region, and an important risk factor (i.e., participation in forest extractivism) is identified that would have been missed by standard logistic regression. Given the numerous diagnostic methods employed by malaria researchers and the ubiquitous use of logistic regression to model the results of these diagnostic tests, this paper provides critical guidelines to improve data analysis practice in the presence of misclassification error. Easy-to-use code that can be readily adapted to WinBUGS is provided, enabling straightforward implementation of the proposed Bayesian models.
Tanyingoh, Divine; Dixon, Elijah; Johnson, Markey; Wheeler, Amanda J.; Myers, Robert P.; Bertazzon, Stefania; Saini, Vineet; Madsen, Karen; Ghosh, Subrata; Villeneuve, Paul J.
2013-01-01
Background: Environmental determinants of appendicitis are poorly understood. Past work suggests that air pollution may increase the risk of appendicitis. Objectives: We investigated whether ambient ground-level ozone (O3) concentrations were associated with appendicitis and whether these associations varied between perforated and nonperforated appendicitis. Methods: We based this time-stratified case-crossover study on 35,811 patients hospitalized with appendicitis from 2004 to 2008 in 12 Canadian cities. Data from a national network of fixed-site monitors were used to calculate daily maximum O3 concentrations for each city. Conditional logistic regression was used to estimate city-specific odds ratios (ORs) relative to an interquartile range (IQR) increase in O3 adjusted for temperature and relative humidity. A random-effects meta-analysis was used to derive a pooled risk estimate. Stratified analyses were used to estimate associations separately for perforated and nonperforated appendicitis. Results: Overall, a 16-ppb increase in the 7-day cumulative average daily maximum O3 concentration was associated with all appendicitis cases across the 12 cities (pooled OR = 1.07; 95% CI: 1.02, 1.13). The association was stronger among patients presenting with perforated appendicitis for the 7-day average (pooled OR = 1.22; 95% CI: 1.09, 1.36) when compared with the corresponding estimate for nonperforated appendicitis [7-day average (pooled OR = 1.02, 95% CI: 0.95, 1.09)]. Heterogeneity was not statistically significant across cities for either perforated or nonperforated appendicitis (p > 0.20). Conclusions: Higher levels of ambient O3 exposure may increase the risk of perforated appendicitis. PMID:23842601
Logistic regression for risk factor modelling in stuttering research.
Reed, Phil; Wu, Yaqionq
2013-06-01
To outline the uses of logistic regression and other statistical methods for risk factor analysis in the context of research on stuttering. The principles underlying the application of a logistic regression are illustrated, and the types of questions to which such a technique has been applied in the stuttering field are outlined. The assumptions and limitations of the technique are discussed with respect to existing stuttering research, and with respect to formulating appropriate research strategies to accommodate these considerations. Finally, some alternatives to the approach are briefly discussed. The way the statistical procedures are employed are demonstrated with some hypothetical data. Research into several practical issues concerning stuttering could benefit if risk factor modelling were used. Important examples are early diagnosis, prognosis (whether a child will recover or persist) and assessment of treatment outcome. After reading this article you will: (a) Summarize the situations in which logistic regression can be applied to a range of issues about stuttering; (b) Follow the steps in performing a logistic regression analysis; (c) Describe the assumptions of the logistic regression technique and the precautions that need to be checked when it is employed; (d) Be able to summarize its advantages over other techniques like estimation of group differences and simple regression. Copyright © 2012 Elsevier Inc. All rights reserved.
Predicting stress in pre-registration nursing students.
Pryjmachuk, Steven; Richards, David A
2007-02-01
To determine which variables from a pool of potential predictors predict General Health Questionnaire 'caseness' in pre-registration nursing students. Cross-sectional survey, utilizing self-report measures of sources of stress, stress (psychological distress) and coping, together with pertinent demographic measures such as sex, ethnicity, educational programme and nursing specialty being pursued, and age, social class and highest qualifications on entry to the programme. Questionnaire packs were distributed to all pre-registration nursing students (N=1,362) in a large English university. Completed packs were coded, entered into statistical software and subjected to a series of logistic regression analyses. Of the questionnaire packs 1,005 (74%) were returned, of which up to 973 were available for the regression analyses undertaken. Four logistic regression models were considered and, on the principle of parsimony, a single model was chosen for discussion. This model suggested that the key predictors of caseness in the population studied were self-report of pressure, whether or not respondents had children (specifically, whether these children were pre-school or school-age), scores on a 'personal problems' scale and the type of coping employed. The overall caseness rate among the population was around one-third. Since self-report and personal, rather than academic, concerns predict stress, personal teachers need to play a key role in supporting students through 'active listening', especially when students self-report high levels of stress and where personal/social problems are evident. The work-life balance of students, especially those with child-care responsibilities, should be a central tenet in curriculum design in nurse education (and, indeed, the education of other professional and occupational groups). There may be some benefit in offering stress management (coping skills) training to nursing students and, indeed, students of other disciplines.
Dynamic Dimensionality Selection for Bayesian Classifier Ensembles
2015-03-19
learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but much more...classifier, Generative learning, Discriminative learning, Naïve Bayes, Feature selection, Logistic regression , higher order attribute independence 16...discriminative learning of weights in an otherwise generatively learned naive Bayes classifier. WANBIA-C is very cometitive to Logistic Regression but
Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen Fitzgerald
2012-01-01
Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...
Preserving Institutional Privacy in Distributed binary Logistic Regression.
Wu, Yuan; Jiang, Xiaoqian; Ohno-Machado, Lucila
2012-01-01
Privacy is becoming a major concern when sharing biomedical data across institutions. Although methods for protecting privacy of individual patients have been proposed, it is not clear how to protect the institutional privacy, which is many times a critical concern of data custodians. Built upon our previous work, Grid Binary LOgistic REgression (GLORE)1, we developed an Institutional Privacy-preserving Distributed binary Logistic Regression model (IPDLR) that considers both individual and institutional privacy for building a logistic regression model in a distributed manner. We tested our method using both simulated and clinical data, showing how it is possible to protect the privacy of individuals and of institutions using a distributed strategy.
Covariate Imbalance and Adjustment for Logistic Regression Analysis of Clinical Trial Data
Ciolino, Jody D.; Martin, Reneé H.; Zhao, Wenle; Jauch, Edward C.; Hill, Michael D.; Palesch, Yuko Y.
2014-01-01
In logistic regression analysis for binary clinical trial data, adjusted treatment effect estimates are often not equivalent to unadjusted estimates in the presence of influential covariates. This paper uses simulation to quantify the benefit of covariate adjustment in logistic regression. However, International Conference on Harmonization guidelines suggest that covariate adjustment be pre-specified. Unplanned adjusted analyses should be considered secondary. Results suggest that that if adjustment is not possible or unplanned in a logistic setting, balance in continuous covariates can alleviate some (but never all) of the shortcomings of unadjusted analyses. The case of log binomial regression is also explored. PMID:24138438
Differentially private distributed logistic regression using private and public data.
Ji, Zhanglong; Jiang, Xiaoqian; Wang, Shuang; Xiong, Li; Ohno-Machado, Lucila
2014-01-01
Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee.
Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules’ 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively. PMID:29228030
Pang, Tiantian; Huang, Leidan; Deng, Yingyuan; Wang, Tianfu; Chen, Siping; Gong, Xuehao; Liu, Weixiang
2017-01-01
The aim of the study is to screen the significant sonographic features by logistic regression analysis and fit a model to diagnose thyroid nodules. A total of 525 pathological thyroid nodules were retrospectively analyzed. All the nodules underwent conventional ultrasonography (US), strain elastosonography (SE), and contrast -enhanced ultrasound (CEUS). Those nodules' 12 suspicious sonographic features were used to assess thyroid nodules. The significant features of diagnosing thyroid nodules were picked out by logistic regression analysis. All variables that were statistically related to diagnosis of thyroid nodules, at a level of p < 0.05 were embodied in a logistic regression analysis model. The significant features in the logistic regression model of diagnosing thyroid nodules were calcification, suspected cervical lymph node metastasis, hypoenhancement pattern, margin, shape, vascularity, posterior acoustic, echogenicity, and elastography score. According to the results of logistic regression analysis, the formula that could predict whether or not thyroid nodules are malignant was established. The area under the receiver operating curve (ROC) was 0.930 and the sensitivity, specificity, accuracy, positive predictive value, and negative predictive value were 83.77%, 89.56%, 87.05%, 86.04%, and 87.79% respectively.
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
A Highly Efficient Design Strategy for Regression with Outcome Pooling
Mitchell, Emily M.; Lyles, Robert H.; Manatunga, Amita K.; Perkins, Neil J.; Schisterman, Enrique F.
2014-01-01
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. PMID:25220822
A highly efficient design strategy for regression with outcome pooling.
Mitchell, Emily M; Lyles, Robert H; Manatunga, Amita K; Perkins, Neil J; Schisterman, Enrique F
2014-12-10
The potential for research involving biospecimens can be hindered by the prohibitive cost of performing laboratory assays on individual samples. To mitigate this cost, strategies such as randomly selecting a portion of specimens for analysis or randomly pooling specimens prior to performing laboratory assays may be employed. These techniques, while effective in reducing cost, are often accompanied by a considerable loss of statistical efficiency. We propose a novel pooling strategy based on the k-means clustering algorithm to reduce laboratory costs while maintaining a high level of statistical efficiency when predictor variables are measured on all subjects, but the outcome of interest is assessed in pools. We perform simulations motivated by the BioCycle study to compare this k-means pooling strategy with current pooling and selection techniques under simple and multiple linear regression models. While all of the methods considered produce unbiased estimates and confidence intervals with appropriate coverage, pooling under k-means clustering provides the most precise estimates, closely approximating results from the full data and losing minimal precision as the total number of pools decreases. The benefits of k-means clustering evident in the simulation study are then applied to an analysis of the BioCycle dataset. In conclusion, when the number of lab tests is limited by budget, pooling specimens based on k-means clustering prior to performing lab assays can be an effective way to save money with minimal information loss in a regression setting. Copyright © 2014 John Wiley & Sons, Ltd.
Kaddoura, Mahmoud A; Flint, Elizabeth P; Van Dyke, Olga; Yang, Qing; Chiang, Li-Chi
Relatively few studies have addressed predictors of first-attempt outcomes (pass-fail) on the National Council Licensure Examination-Registered Nurses (NCLEX-RN) for accelerated BSN programs. The purpose of this study was to compare potential predictors of NCLEX outcomes in graduates of first-degree accelerated (FDA; n=62) and second-degree accelerated (SDA; n=173) BSN programs sharing a common nursing curriculum. In this retrospective study, bivariate analyses and multiple logistic regression assessed significance of selected demographic and academic characteristics as predictors of NCLEX-RN outcomes. FDA graduates were more likely than SDA graduates to fail the NCLEX-RN (P=.0013). FDA graduates were more likely to speak English as a second or additional language (P<.0001), have lower end-of-program GPA and HESI Exit Exam scores (both P<.0001), and have a higher proportions of grades ≤ C (P=.0023). All four variables were significant predictors of NCLEX-RN outcomes within both FDA and SDA programs. The only significant predictors in adjusted logistic regression of NCLEX-RN outcome for the pooled FDA+SDA graduate sample were proportion of grades ≤ C (a predictor of NCLEX-RN failure) and HESI Exit Exam score (a predictor of passing NCLEX-RN). Grades of C or lower on any course may indicate inadequate mastery of critical NCLEX-RN content and increased risk of NCLEX-RN failure. Copyright © 2016 Elsevier Inc. All rights reserved.
Quist, M.C.; Rahel, F.J.; Hubert, W.A.
2005-01-01
Understanding factors related to the occurrence of species across multiple spatial and temporal scales is critical to the conservation and management of native fishes, especially for those species at the edge of their natural distribution. We used the concept of hierarchical faunal filters to provide a framework for investigating the influence of habitat characteristics and normative piscivores on the occurrence of 10 native fishes in streams of the North Platte River watershed in Wyoming. Three faunal filters were developed for each species: (i) large-scale biogeographic, (ii) local abiotic, and (iii) biotic. The large-scale biogeographic filter, composed of elevation and stream-size thresholds, was used to determine the boundaries within which each species might be expected to occur. Then, a local abiotic filter (i.e., habitat associations), developed using binary logistic-regression analysis, estimated the probability of occurrence of each species from features such as maximum depth, substrate composition, submergent aquatic vegetation, woody debris, and channel morphology (e.g., amount of pool habitat). Lastly, a biotic faunal filter was developed using binary logistic regression to estimate the probability of occurrence of each species relative to the abundance of nonnative piscivores in a reach. Conceptualising fish assemblages within a framework of hierarchical faunal filters is simple and logical, helps direct conservation and management activities, and provides important information on the ecology of fishes in the western Great Plains of North America. ?? Blackwell Munksgaard, 2004.
Doering, Stefan; Bose-O'Reilly, Stephan; Berger, Ursula
2016-01-01
The continuous exposure to inorganic mercury vapour in artisanal small-scale gold mining (ASGM) areas leads to chronic health problems. It is therefore essential to have a quick, but reliable risk assessing tool to diagnose chronic inorganic mercury intoxication. This study re-evaluates the state-of-the-art toolkit to diagnose chronic inorganic mercury intoxication by analysing data from multiple pooled cross-sectional studies. The primary research question aims to reduce the currently used set of indicators without affecting essentially the capability to diagnose chronic inorganic mercury intoxication. In addition, a sensitivity analysis is performed on established biomonitoring exposure limits for mercury in blood, hair, urine and urine adjusted by creatinine, where the biomonitoring exposure limits are compared to thresholds most associated with chronic inorganic mercury intoxication in artisanal small-scale gold mining. Health data from miners and community members in Indonesia, Tanzania and Zimbabwe were obtained as part of the Global Mercury Project and pooled into one dataset together with their biomarkers mercury in urine, blood and hair. The individual prognostic impact of the indicators on the diagnosis of mercury intoxication is quantified using logistic regression models. The selection is performed by a stepwise forward/backward selection. Different models are compared based on the Bayesian information criterion (BIC) and Cohen`s kappa is used to evaluate the level of agreement between the diagnosis of mercury intoxication based on the currently used set of indicators and the result based on our reduced set of indicators. The sensitivity analysis of biomarker exposure limits of mercury is based on a sequence of chi square tests. The variable selection in logistic regression reduced the number of medical indicators from thirteen to ten in addition to the biomarkers. The estimated level of agreement using ten of thirteen medical indicators and all four biomarkers to diagnose chronic inorganic mercury intoxication yields a Cohen`s Kappa of 0.87. While in an additional stepwise selection the biomarker blood was not selected, the level of agreement based on ten medical indicators and only the three biomarkers urine, urine/creatinine and hair reduced Cohen`s Kappa to 0.46. The optimal cut-point for the biomarkers blood, hair, urine and urine/creatinine were estimated at 11. 6 μg/l, 3.84 μg/g, 24.4 μg/l and 4.26 μg/g, respectively. The results show that a reduction down to only ten indicators still allows a reliable diagnosis of chronic inorganic mercury intoxication. This reduction of indicators will simplify health assessments in artisanal small-scale gold mining areas.
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Niche induced cell death and epithelial phagocytosis regulate hair follicle stem cell pool
Mesa, Kailin R.; Rompolas, Panteleimon; Zito, Giovanni; Myung, Peggy; Sun, Thomas Yang; Brown, Samara; Gonzalez, David; Blagoev, Krastan B.; Haberman, Ann M.; Greco, Valentina
2015-01-01
Summary Tissue homeostasis is achieved through a balance of cell production (growth) and elimination (regression)1,2. Contrary to tissue growth, the cells and molecular signals required for tissue regression remain unknown. To investigate physiological tissue regression, we use the mouse hair follicle, which cycles stereotypically between phases of growth and regression while maintaining a pool of stem cells to perpetuate tissue regeneration3. Here we show by intravital microscopy in live mice4–6 that the regression phase eliminates the majority of the epithelial cells by two distinct mechanisms: terminal differentiation of suprabasal cells and a spatial gradient of apoptosis of basal cells. Furthermore, we demonstrate that basal epithelial cells collectively act as phagocytes to clear dying epithelial neighbors. Through cellular and genetic ablation we show that epithelial cell death is extrinsically induced through TGFβ activation and mesenchymal crosstalk. Strikingly, our data show that regression acts to reduce the stem cell pool as inhibition of regression results in excess basal epithelial cells with regenerative abilities. This study identifies the cellular behaviors and molecular mechanisms of regression that counterbalance growth to maintain tissue homeostasis. PMID:25849774
ERIC Educational Resources Information Center
McKinley, Robert L.; Reckase, Mark D.
A two-stage study was conducted to compare the ability estimates yielded by tailored testing procedures based on the one-parameter logistic (1PL) and three-parameter logistic (3PL) models. The first stage of the study employed real data, while the second stage employed simulated data. In the first stage, response data for 3,000 examinees were…
Pförtner, Timo-Kolja; De Clercq, Bart; Lenzi, Michela; Vieno, Alessio; Rathmann, Katharina; Moor, Irene; Hublet, Anne; Molcho, Michal; Kunst, Anton E; Richter, Matthias
2015-12-01
To analyze how dimensions of social capital at the individual level are associated with adolescent smoking and whether associations differ by socioeconomic status. Data were from the 'Health Behaviour in School-aged Children' study 2005/2006 including 6511 15-year-old adolescents from Flemish Belgium, Canada, Romania and England. Socioeconomic status was measured using the Family Affluence Scale (FAS). Social capital was indicated by friend-related social capital, participation in school and voluntary organizations, trust and reciprocity in family, neighborhood and school. We conducted pooled logistic regression models with interaction terms and tested for cross-national differences. Almost all dimensions of social capital were associated with a lower likelihood of smoking, except for friend-related social capital and school participation. The association of family-related social capital with smoking was significantly stronger for low FAS adolescents, whereas the association of vertical trust and reciprocity in school with smoking was significantly stronger for high FAS adolescents. Social capital may act both as a protective and a risk factor for adolescent smoking. Achieving higher levels of family-related social capital might reduce socioeconomic inequalities in adolescent smoking.
Breen, Nancy; Liu, Benmei; Lee, Richard; Kagawa-Singer, Marjorie
2015-01-01
Objectives. We examined patterns of cervical and breast cancer screening among Asian American women in California and assessed their screening trends over time. Methods. We pooled weighted data from 5 cycles of the California Health Interview Survey (2001, 2003, 2005, 2007, 2009) to examine breast and cervical cancer screening trends and predictors among 6 Asian nationalities. We calculated descriptive statistics, bivariate associations, multivariate logistic regressions, predictive margins, and 95% confidence intervals. Results. Multivariate analyses indicated that Papanicolaou test rates did not significantly change over time (77.9% in 2001 vs 81.2% in 2007), but mammography receipt increased among Asian American women overall (75.6% in 2001 vs 81.8% in 2009). Length of time in the United States was associated with increased breast and cervical cancer screening among all nationalities. Sociodemographic and health care access factors had varied effects, with education and insurance coverage significantly predicting screening for certain groups. Overall, we observed striking variation by nationality. Conclusions. Our results underscore the need for intervention and policy efforts that are targeted to specific Asian nationalities, recent immigrants, and individuals without health care access to increase screening rates among Asian women in California. PMID:25521898
Sputum colour and bacteria in chronic bronchitis exacerbations: a pooled analysis.
Miravitlles, Marc; Kruesmann, Frank; Haverstock, Daniel; Perroncel, Renee; Choudhri, Shurjeel H; Arvis, Pierre
2012-06-01
We examined the correlation between sputum colour and the presence of potentially pathogenic bacteria in acute exacerbations of chronic bronchitis (AECBs). Data were pooled from six multicentre studies comparing moxifloxacin with other antimicrobials in patients with an AECB. Sputum was collected before antimicrobial therapy, and bacteria were identified by culture and Gram staining. Association between sputum colour and bacteria was determined using logistic regression. Of 4,089 sputum samples, a colour was reported in 4,003; 1,898 (46.4%) were culture-positive. Green or yellow sputum samples were most likely to yield bacteria (58.9% and 45.5% of samples, respectively), compared with 18% of clear and 39% of rust-coloured samples positive for potentially pathogenic microorganisms. Factors predicting a positive culture were sputum colour (the strongest predictor), sputum purulence, increased dyspnoea, male sex and absence of fever. Green or yellow versus white sputum colour was associated with a sensitivity of 94.7% and a specificity of 15% for the presence of bacteria. Sputum colour, particularly green and yellow, was a stronger predictor of potentially pathogenic bacteria than sputum purulence and increased dyspnoea in AECB patients. However, it does not necessarily predict the need for antibiotic treatment in all patients with AECB.
Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L
2017-02-06
Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.
Differentially private distributed logistic regression using private and public data
2014-01-01
Background Privacy protecting is an important issue in medical informatics and differential privacy is a state-of-the-art framework for data privacy research. Differential privacy offers provable privacy against attackers who have auxiliary information, and can be applied to data mining models (for example, logistic regression). However, differentially private methods sometimes introduce too much noise and make outputs less useful. Given available public data in medical research (e.g. from patients who sign open-consent agreements), we can design algorithms that use both public and private data sets to decrease the amount of noise that is introduced. Methodology In this paper, we modify the update step in Newton-Raphson method to propose a differentially private distributed logistic regression model based on both public and private data. Experiments and results We try our algorithm on three different data sets, and show its advantage over: (1) a logistic regression model based solely on public data, and (2) a differentially private distributed logistic regression model based on private data under various scenarios. Conclusion Logistic regression models built with our new algorithm based on both private and public datasets demonstrate better utility than models that trained on private or public datasets alone without sacrificing the rigorous privacy guarantee. PMID:25079786
Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung
2015-12-01
This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
Yang, Lixue; Chen, Kean
2015-11-01
To improve the design of underwater target recognition systems based on auditory perception, this study compared human listeners with automatic classifiers. Performances measures and strategies in three discrimination experiments, including discriminations between man-made and natural targets, between ships and submarines, and among three types of ships, were used. In the experiments, the subjects were asked to assign a score to each sound based on how confident they were about the category to which it belonged, and logistic regression, which represents linear discriminative models, also completed three similar tasks by utilizing many auditory features. The results indicated that the performances of logistic regression improved as the ratio between inter- and intra-class differences became larger, whereas the performances of the human subjects were limited by their unfamiliarity with the targets. Logistic regression performed better than the human subjects in all tasks but the discrimination between man-made and natural targets, and the strategies employed by excellent human subjects were similar to that of logistic regression. Logistic regression and several human subjects demonstrated similar performances when discriminating man-made and natural targets, but in this case, their strategies were not similar. An appropriate fusion of their strategies led to further improvement in recognition accuracy.
NASA Astrophysics Data System (ADS)
Mei, Zhixiong; Wu, Hao; Li, Shiyun
2018-06-01
The Conversion of Land Use and its Effects at Small regional extent (CLUE-S), which is a widely used model for land-use simulation, utilizes logistic regression to estimate the relationships between land use and its drivers, and thus, predict land-use change probabilities. However, logistic regression disregards possible spatial autocorrelation and self-organization in land-use data. Autologistic regression can depict spatial autocorrelation but cannot address self-organization, while logistic regression by considering only self-organization (NElogistic regression) fails to capture spatial autocorrelation. Therefore, this study developed a regression (NE-autologistic regression) method, which incorporated both spatial autocorrelation and self-organization, to improve CLUE-S. The Zengcheng District of Guangzhou, China was selected as the study area. The land-use data of 2001, 2005, and 2009, as well as 10 typical driving factors, were used to validate the proposed regression method and the improved CLUE-S model. Then, three future land-use scenarios in 2020: the natural growth scenario, ecological protection scenario, and economic development scenario, were simulated using the improved model. Validation results showed that NE-autologistic regression performed better than logistic regression, autologistic regression, and NE-logistic regression in predicting land-use change probabilities. The spatial allocation accuracy and kappa values of NE-autologistic-CLUE-S were higher than those of logistic-CLUE-S, autologistic-CLUE-S, and NE-logistic-CLUE-S for the simulations of two periods, 2001-2009 and 2005-2009, which proved that the improved CLUE-S model achieved the best simulation and was thereby effective to a certain extent. The scenario simulation results indicated that under all three scenarios, traffic land and residential/industrial land would increase, whereas arable land and unused land would decrease during 2009-2020. Apparent differences also existed in the simulated change sizes and locations of each land-use type under different scenarios. The results not only demonstrate the validity of the improved model but also provide a valuable reference for relevant policy-makers.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
Binary logistic regression-Instrument for assessing museum indoor air impact on exhibits.
Bucur, Elena; Danet, Andrei Florin; Lehr, Carol Blaziu; Lehr, Elena; Nita-Lazar, Mihai
2017-04-01
This paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The prediction of the impact on the exhibits during certain pollution scenarios (environmental impact) was calculated by a mathematical model based on the binary logistic regression; it allows the identification of those environmental parameters from a multitude of possible parameters with a significant impact on exhibitions and ranks them according to their severity effect. Air quality (NO 2 , SO 2 , O 3 and PM 2.5 ) and microclimate parameters (temperature, humidity) monitoring data from a case study conducted within exhibition and storage spaces of the Romanian National Aviation Museum Bucharest have been used for developing and validating the binary logistic regression method and the mathematical model. The logistic regression analysis was used on 794 data combinations (715 to develop of the model and 79 to validate it) by a Statistical Package for Social Sciences (SPSS 20.0). The results from the binary logistic regression analysis demonstrated that from six parameters taken into consideration, four of them present a significant effect upon exhibits in the following order: O 3 >PM 2.5 >NO 2 >humidity followed at a significant distance by the effects of SO 2 and temperature. The mathematical model, developed in this study, correctly predicted 95.1 % of the cumulated effect of the environmental parameters upon the exhibits. Moreover, this model could also be used in the decisional process regarding the preventive preservation measures that should be implemented within the exhibition space. The paper presents a new way to assess the environmental impact on historical artifacts using binary logistic regression. The mathematical model developed on the environmental parameters analyzed by the binary logistic regression method could be useful in a decision-making process establishing the best measures for pollution reduction and preventive preservation of exhibits.
Determining factors influencing survival of breast cancer by fuzzy logistic regression model.
Nikbakht, Roya; Bahrampour, Abbas
2017-01-01
Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.
Mixed conditional logistic regression for habitat selection studies.
Duchesne, Thierry; Fortin, Daniel; Courbin, Nicolas
2010-05-01
1. Resource selection functions (RSFs) are becoming a dominant tool in habitat selection studies. RSF coefficients can be estimated with unconditional (standard) and conditional logistic regressions. While the advantage of mixed-effects models is recognized for standard logistic regression, mixed conditional logistic regression remains largely overlooked in ecological studies. 2. We demonstrate the significance of mixed conditional logistic regression for habitat selection studies. First, we use spatially explicit models to illustrate how mixed-effects RSFs can be useful in the presence of inter-individual heterogeneity in selection and when the assumption of independence from irrelevant alternatives (IIA) is violated. The IIA hypothesis states that the strength of preference for habitat type A over habitat type B does not depend on the other habitat types also available. Secondly, we demonstrate the significance of mixed-effects models to evaluate habitat selection of free-ranging bison Bison bison. 3. When movement rules were homogeneous among individuals and the IIA assumption was respected, fixed-effects RSFs adequately described habitat selection by simulated animals. In situations violating the inter-individual homogeneity and IIA assumptions, however, RSFs were best estimated with mixed-effects regressions, and fixed-effects models could even provide faulty conclusions. 4. Mixed-effects models indicate that bison did not select farmlands, but exhibited strong inter-individual variations in their response to farmlands. Less than half of the bison preferred farmlands over forests. Conversely, the fixed-effect model simply suggested an overall selection for farmlands. 5. Conditional logistic regression is recognized as a powerful approach to evaluate habitat selection when resource availability changes. This regression is increasingly used in ecological studies, but almost exclusively in the context of fixed-effects models. Fitness maximization can imply differences in trade-offs among individuals, which can yield inter-individual differences in selection and lead to departure from IIA. These situations are best modelled with mixed-effects models. Mixed-effects conditional logistic regression should become a valuable tool for ecological research.
Advanced colorectal neoplasia risk stratification by penalized logistic regression.
Lin, Yunzhi; Yu, Menggang; Wang, Sijian; Chappell, Richard; Imperiale, Thomas F
2016-08-01
Colorectal cancer is the second leading cause of death from cancer in the United States. To facilitate the efficiency of colorectal cancer screening, there is a need to stratify risk for colorectal cancer among the 90% of US residents who are considered "average risk." In this article, we investigate such risk stratification rules for advanced colorectal neoplasia (colorectal cancer and advanced, precancerous polyps). We use a recently completed large cohort study of subjects who underwent a first screening colonoscopy. Logistic regression models have been used in the literature to estimate the risk of advanced colorectal neoplasia based on quantifiable risk factors. However, logistic regression may be prone to overfitting and instability in variable selection. Since most of the risk factors in our study have several categories, it was tempting to collapse these categories into fewer risk groups. We propose a penalized logistic regression method that automatically and simultaneously selects variables, groups categories, and estimates their coefficients by penalizing the [Formula: see text]-norm of both the coefficients and their differences. Hence, it encourages sparsity in the categories, i.e. grouping of the categories, and sparsity in the variables, i.e. variable selection. We apply the penalized logistic regression method to our data. The important variables are selected, with close categories simultaneously grouped, by penalized regression models with and without the interactions terms. The models are validated with 10-fold cross-validation. The receiver operating characteristic curves of the penalized regression models dominate the receiver operating characteristic curve of naive logistic regressions, indicating a superior discriminative performance. © The Author(s) 2013.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.
Cook, Michael B.; Guénel, Pascal; Gapstur, Susan M.; van den Brandt, Piet A.; Michels, Karin B.; Casagrande, John T.; Cooke, Rosie; Van Den Eeden, Stephen K.; Ewertz, Marianne; Falk, Roni T.; Gaudet, Mia M.; Gkiokas, George; Habel, Laurel A.; Hsing, Ann W.; Johnson, Kenneth; Kolonel, Laurence N.; La Vecchia, Carlo; Lynge, Elsebeth; Lubin, Jay H.; McCormack, Valerie A.; Negri, Eva; Olsson, Håkan; Parisi, Dominick; Petridou, Eleni Th.; Riboli, Elio; Sesso, Howard D.; Swerdlow, Anthony; Thomas, David B.; Willett, Walter C.; Brinton, Louise A.
2015-01-01
Background The etiology of male breast cancer is poorly understood, partly due to its relative rarity. Although tobacco and alcohol exposures are known carcinogens, their association with male breast cancer risk remains ill-defined. Methods The Male Breast Cancer Pooling Project consortium provided 2,378 cases and 51,959 controls for analysis from 10 case-control and 10 cohort studies. Individual participant data were harmonized and pooled. Unconditional logistic regression was used to estimate study design-specific (case-control/cohort) odds ratios (OR) and 95% confidence intervals (CI), which were then combined using fixed effects meta-analysis. Results Cigarette smoking status, smoking pack-years, duration, intensity, and age at initiation were not associated with male breast cancer risk. Relations with cigar and pipe smoking, tobacco chewing, and snuff use were also null. Recent alcohol consumption and average grams of alcohol consumed per day were also not associated with risk; only one sub-analysis of very high recent alcohol consumption (>60 grams/day) was tentatively associated with male breast cancer (ORunexposed referent=1.29, 95%CI:0.97–1.71; OR>0–<7 g/day referent=1.36, 95%CI:1.04–1.77). Specific alcoholic beverage types were not associated with male breast cancer. Relations were not altered when stratified by age or body mass index. Conclusions In this analysis of the Male Breast Cancer Pooling Project we found little evidence that tobacco and alcohol exposures were associated with risk of male breast cancer. Impact Tobacco and alcohol do not appear to be carcinogenic for male breast cancer. Future studies should aim to assess these exposures in relation to subtypes of male breast cancer. PMID:25515550
Cook, Michael B; Guénel, Pascal; Gapstur, Susan M; van den Brandt, Piet A; Michels, Karin B; Casagrande, John T; Cooke, Rosie; Van Den Eeden, Stephen K; Ewertz, Marianne; Falk, Roni T; Gaudet, Mia M; Gkiokas, George; Habel, Laurel A; Hsing, Ann W; Johnson, Kenneth; Kolonel, Laurence N; La Vecchia, Carlo; Lynge, Elsebeth; Lubin, Jay H; McCormack, Valerie A; Negri, Eva; Olsson, Håkan; Parisi, Dominick; Petridou, Eleni Th; Riboli, Elio; Sesso, Howard D; Swerdlow, Anthony; Thomas, David B; Willett, Walter C; Brinton, Louise A
2015-03-01
The etiology of male breast cancer is poorly understood, partly due to its relative rarity. Although tobacco and alcohol exposures are known carcinogens, their association with male breast cancer risk remains ill-defined. The Male Breast Cancer Pooling Project consortium provided 2,378 cases and 51,959 controls for analysis from 10 case-control and 10 cohort studies. Individual participant data were harmonized and pooled. Unconditional logistic regression was used to estimate study design-specific (case-control/cohort) ORs and 95% confidence intervals (CI), which were then combined using fixed-effects meta-analysis. Cigarette smoking status, smoking pack-years, duration, intensity, and age at initiation were not associated with male breast cancer risk. Relations with cigar and pipe smoking, tobacco chewing, and snuff use were also null. Recent alcohol consumption and average grams of alcohol consumed per day were also not associated with risk; only one subanalysis of very high recent alcohol consumption (>60 g/day) was tentatively associated with male breast cancer (ORunexposed referent = 1.29; 95% CI, 0.97-1.71; OR>0-<7 g/day referent = 1.36; 95% CI, 1.04-1.77). Specific alcoholic beverage types were not associated with male breast cancer. Relations were not altered when stratified by age or body mass index. In this analysis of the Male Breast Cancer Pooling Project, we found little evidence that tobacco and alcohol exposures were associated with risk of male breast cancer. Tobacco and alcohol do not appear to be carcinogenic for male breast cancer. Future studies should aim to assess these exposures in relation to subtypes of male breast cancer. ©2014 American Association for Cancer Research.
Bailey, Helen D; Fritschi, Lin; Metayer, Catherine; Infante-Rivard, Claire; Magnani, Corrado; Petridou, Eleni; Roman, Eve; Spector, Logan G; Kaatsch, Peter; Clavel, Jacqueline; Milne, Elizabeth; Dockerty, John D; Glass, Deborah C; Lightfoot, Tracy; Miligi, Lucia; Rudant, Jérémie; Baka, Margarita; Rondelli, Roberto; Amigou, Alicia; Simpson, Jill; Kang, Alice; Moschovi, Maria; Schüz, Joachim
2014-01-01
Purpose It has been suggested that parental occupational paint exposure around the time of conception or pregnancy increases the risk of childhood leukemia in the offspring. Methods We obtained individual level data from 13 case-control studies participating in the Childhood Leukemia International Consortium (CLIC). Occupational data were harmonized to a compatible format. Meta-analyses of study-specific odds ratios (ORs) were undertaken, as well as pooled analyses of individual data using unconditional logistic regression. Results Using individual data from fathers of 8,185 cases and 14,210 controls, the pooled OR for paternal exposure around conception and risk of acute lymphoblastic leukaemia (ALL) was 0.93 (95% confidence interval (CI) 0.76, 1.14). Analysis of data from 8,156 ALL case mothers and 14,568 control mothers produced a pooled OR of 0.81 (95% CI 0.39, 1.68) for exposure during pregnancy. For acute myeloid leukaemia (AML), the pooled ORs for paternal and maternal exposure were 0.96 (95% CI 0.65, 1.41) and 1.31 (95% CI 0.38, 4.47) respectively, based on data from 1,231 case and 11,392 control fathers and 1,329 case and 12,141 control mothers. Heterogeneity among the individual studies ranged from low to modest. Conclusions Null findings for paternal exposure for both ALL and AML are consistent with previous reports. Despite the large sample size, results for maternal exposure to paints in pregnancy were based on small numbers of exposed. Overall, we found no evidence that parental occupational exposure to paints increases the risk of leukemia in the offspring, but further data on home exposure are needed. PMID:25088805
Milne, Elizabeth; Greenop, Kathryn R; Petridou, Eleni; Bailey, Helen D; Orsi, Laurent; Kang, Alice Y; Baka, Margarita; Bonaventure, Audrey; Kourti, Maria; Metayer, Catherine; Clavel, Jacqueline
2018-06-01
The early onset of childhood acute lymphoblastic leukemia (ALL) suggests that critical exposures occurring during pregnancy may increase risk. We investigated the effects of maternal coffee and tea consumption during pregnancy on ALL risk by pooling data from eight case-control studies participating in the Childhood Leukemia International Consortium. Data on maternal coffee intake were available for 2,552 cases and 4,876 controls, and data on tea intake were available for 2,982 cases and 5,367 controls. Coffee and tea intake was categorized into 0, > 0-1, > 1-2, and > 2 cups/day, and covariates were combined and harmonized. Data on genetic variants in NAT2, CYP1A1, and NQO1 were also available in a subset. Pooled odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using unconditional logistic regression, and linear trends across categories were assessed. No association was seen with 'any' maternal coffee consumption during pregnancy, but there was evidence of a positive exposure-response; the pooled OR for > 2 cups/day versus none was 1.27 (95% CI 1.09-1.43), p trend = 0.005. No associations were observed with tea consumption. No interactions were seen between coffee or tea intake and age, maternal smoking or genotype, and there was little or no evidence that associations with coffee or tea differed among cases with and without chromosomal translocations. Despite some limitations, our findings suggest that high coffee intake during pregnancy may increase risk of childhood ALL. Thus, current advice to limit caffeine intake during pregnancy to reduce risk of preterm birth may have additional benefits.
Zhang, Chao; Jia, Pengli; Yu, Liu; Xu, Chang
2018-05-01
Dose-response meta-analysis (DRMA) is widely applied to investigate the dose-specific relationship between independent and dependent variables. Such methods have been in use for over 30 years and are increasingly employed in healthcare and clinical decision-making. In this article, we give an overview of the methodology used in DRMA. We summarize the commonly used regression model and the pooled method in DRMA. We also use an example to illustrate how to employ a DRMA by these methods. Five regression models, linear regression, piecewise regression, natural polynomial regression, fractional polynomial regression, and restricted cubic spline regression, were illustrated in this article to fit the dose-response relationship. And two types of pooling approaches, that is, one-stage approach and two-stage approach are illustrated to pool the dose-response relationship across studies. The example showed similar results among these models. Several dose-response meta-analysis methods can be used for investigating the relationship between exposure level and the risk of an outcome. However the methodology of DRMA still needs to be improved. © 2018 Chinese Cochrane Center, West China Hospital of Sichuan University and John Wiley & Sons Australia, Ltd.
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Background: Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. Methods: In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. Results: The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Conclusion: Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended. PMID:26793655
Ebrahimzadeh, Farzad; Hajizadeh, Ebrahim; Vahabi, Nasim; Almasian, Mohammad; Bakhteyar, Katayoon
2015-01-01
Unwanted pregnancy not intended by at least one of the parents has undesirable consequences for the family and the society. In the present study, three classification models were used and compared to predict unwanted pregnancies in an urban population. In this cross-sectional study, 887 pregnant mothers referring to health centers in Khorramabad, Iran, in 2012 were selected by the stratified and cluster sampling; relevant variables were measured and for prediction of unwanted pregnancy, logistic regression, discriminant analysis, and probit regression models and SPSS software version 21 were used. To compare these models, indicators such as sensitivity, specificity, the area under the ROC curve, and the percentage of correct predictions were used. The prevalence of unwanted pregnancies was 25.3%. The logistic and probit regression models indicated that parity and pregnancy spacing, contraceptive methods, household income and number of living male children were related to unwanted pregnancy. The performance of the models based on the area under the ROC curve was 0.735, 0.733, and 0.680 for logistic regression, probit regression, and linear discriminant analysis, respectively. Given the relatively high prevalence of unwanted pregnancies in Khorramabad, it seems necessary to revise family planning programs. Despite the similar accuracy of the models, if the researcher is interested in the interpretability of the results, the use of the logistic regression model is recommended.
Soil Bulk Density by Soil Type, Land Use and Data Source: Putting the Error in SOC Estimates
NASA Astrophysics Data System (ADS)
Wills, S. A.; Rossi, A.; Loecke, T.; Ramcharan, A. M.; Roecker, S.; Mishra, U.; Waltman, S.; Nave, L. E.; Williams, C. O.; Beaudette, D.; Libohova, Z.; Vasilas, L.
2017-12-01
An important part of SOC stock and pool assessment is the assessment, estimation, and application of bulk density estimates. The concept of bulk density is relatively simple (the mass of soil in a given volume), the specifics Bulk density can be difficult to measure in soils due to logistical and methodological constraints. While many estimates of SOC pools use legacy data in their estimates, few concerted efforts have been made to assess the process used to convert laboratory carbon concentration measurements and bulk density collection into volumetrically based SOC estimates. The methodologies used are particularly sensitive in wetlands and organic soils with high amounts of carbon and very low bulk densities. We will present an analysis across four database measurements: NCSS - the National Cooperative Soil Survey Characterization dataset, RaCA - the Rapid Carbon Assessment sample dataset, NWCA - the National Wetland Condition Assessment, and ISCN - the International soil Carbon Network. The relationship between bulk density and soil organic carbon will be evaluated by dataset and land use/land cover information. Prediction methods (both regression and machine learning) will be compared and contrasted across datasets and available input information. The assessment and application of bulk density, including modeling, aggregation and error propagation will be evaluated. Finally, recommendations will be made about both the use of new data in soil survey products (such as SSURGO) and the use of that information as legacy data in SOC pool estimates.
Stolzenberg-Solomon, Rachael Z; Jacobs, Eric J; Arslan, Alan A; Qi, Dai; Patel, Alpa V; Helzlsouer, Kathy J; Weinstein, Stephanie J; McCullough, Marjorie L; Purdue, Mark P; Shu, Xiao-Ou; Snyder, Kirk; Virtamo, Jarmo; Wilkins, Lynn R; Yu, Kai; Zeleniuch-Jacquotte, Anne; Zheng, Wei; Albanes, Demetrius; Cai, Qiuyin; Harvey, Chinonye; Hayes, Richard; Clipp, Sandra; Horst, Ronald L; Irish, Lonn; Koenig, Karen; Le Marchand, Loic; Kolonel, Laurence N
2010-07-01
Results from epidemiologic studies examining pancreatic cancer risk and vitamin D intake or 25-hydroxyvitamin D (25(OH)D) concentrations (the best indicator of vitamin D derived from diet and sun) have been inconsistent. Therefore, the authors conducted a pooled nested case-control study of participants from 8 cohorts within the Cohort Consortium Vitamin D Pooling Project of Rarer Cancers (VDPP) (1974-2006) to evaluate whether prediagnostic circulating 25(OH)D concentrations were associated with the development of pancreatic cancer. In total, 952 incident pancreatic adenocarcinoma cases occurred among participants (median follow-up, 6.5 years). Controls (n = 1,333) were matched to each case by cohort, age, sex, race/ethnicity, date of blood draw, and follow-up time. Conditional logistic regression analysis was used to calculate smoking-, body mass index-, and diabetes-adjusted odds ratios and 95% confidence intervals for pancreatic cancer. Clinically relevant 25(OH)D cutpoints were compared with a referent category of 50-<75 nmol/L. No significant associations were observed for participants with lower 25(OH)D status. However, a high 25(OH)D concentration (> or =100 nmol/L) was associated with a statistically significant 2-fold increase in pancreatic cancer risk overall (odds ratio = 2.12, 95% confidence interval: 1.23, 3.64). Given this result, recommendations to increase vitamin D concentrations in healthy persons for the prevention of cancer should be carefully considered.
Kawakita, Daisuke; Lee, Yuan-Chin Amy; Turati, Federica; Parpinel, Maria; Decarli, Adriano; Serraino, Diego; Matsuo, Keitaro; Olshan, Andrew F; Zevallos, Jose P; Winn, Deborah M; Moysich, Kirsten; Zhang, Zuo-Feng; Morgenstern, Hal; Levi, Fabio; Kelsey, Karl; McClean, Michael; Bosetti, Cristina; Garavello, Werner; Schantz, Stimson; Yu, Guo-Pei; Boffetta, Paolo; Chuang, Shu-Chun; Hashibe, Mia; Ferraroni, Monica; La Vecchia, Carlo; Edefonti, Valeria
2017-11-01
The possible role of dietary fiber in the etiology of head neck cancers (HNCs) is unclear. We used individual-level pooled data from ten case-control studies (5959 cases and 12,248 controls) participating in the International Head and Neck Cancer Epidemiology (INHANCE) consortium, to examine the association between fiber intake and cancer of the oral cavity/pharynx and larynx. Odds Ratios (ORs) and their 95% Confidence Intervals (CIs) were estimated using unconditional multiple logistic regression applied to quintile categories of non-alcohol energy-adjusted fiber intake and adjusted for tobacco and alcohol use and other known or putative confounders. Fiber intake was inversely associated with oral and pharyngeal cancer combined (OR for 5th vs. 1st quintile category = 0.49, 95% CI: 0.40-0.59; p for trend <0.001) and with laryngeal cancer (OR = 0.66, 95% CI: 0.54-0.82, p for trend <0.001). There was, however, appreciable heterogeneity of the estimated effect across studies for oral and pharyngeal cancer combined. Nonetheless, inverse associations were consistently observed for the subsites of oral and pharyngeal cancers and within most strata of the considered covariates, for both cancer sites. Our findings from a multicenter large-scale pooled analysis suggest that, although in the presence of between-study heterogeneity, a greater intake of fiber may lower HNC risk. © 2017 UICC.
Orsi, Laurent; Magnani, Corrado; Petridou, Eleni T; Dockerty, John D; Metayer, Catherine; Milne, Elizabeth; Bailey, Helen D; Dessypris, Nick; Kang, Alice Y; Wesseling, Catharina; Infante-Rivard, Claire; Wünsch-Filho, Victor; Mora, Ana M; Spector, Logan G; Clavel, Jacqueline
2018-06-01
The associations between childhood acute lymphoblastic leukemia (ALL) and several factors related to early stimulation of the immune system, that is, farm residence and regular contacts with farm animals (livestock, poultry) or pets in early childhood, were investigated using data from 13 case-control studies participating in the Childhood Leukemia International Consortium. The sample included 7847 ALL cases and 11,667 controls aged 1-14 years. In all studies, the data were obtained from case and control parents using standardized questionnaires. Pooled odds ratios (ORs) and 95% confidence intervals (CIs) were estimated by unconditional logistic regression adjusted for age, sex, study, maternal education, and maternal age. Contact with livestock in the first year of life was inversely associated with ALL (OR = 0.65, 95% CI: 0.50, 0.85). Inverse associations were also observed for contact with dogs (OR = 0.92, 95% CI: 0.86, 0.99) and cats (OR = 0.87, 95% CI: 0.80, 0.94) in the first year of life. There was no evidence of a significant association with farm residence in the first year of life. The findings of these large pooled and meta-analyses add additional evidence to the hypothesis that regular contact with animals in early childhood is inversely associated with childhood ALL occurrence which is consistent with Greaves' delayed infection hypothesis. © 2018 The Authors. Cancer Medicine published by John Wiley & Sons Ltd.
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
2007-09-01
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
Hilton, N Zoe; Harris, Grant T; Rice, Marnie E; Lang, Carol; Cormier, Catherine A; Lines, Kathryn J
2004-09-01
An actuarial assessment to predict male-to-female marital violence was constructed from a pool of potential predictors in a sample of 589 offenders identified in police records and followed up for an average of almost 5 years. Archival information in several domains (offender characteristics, domestic violence history, nondomestic criminal history, relationship characteristics, victim characteristics, index offense) and recidivism were subjected to setwise and stepwise logistic regression. The resulting 13-item scale, the Ontario Domestic Assault Risk Assessment (ODARA), showed a large effect size in predicting new assaults against legal or common-law wives or ex-wives (Cohen's d = 1.1, relative operating characteristic area =.77) and was associated with number and severity of new assaults and time until recidivism. Cross-validation and comparisons with other instruments are also reported.
Estimating the exceedance probability of rain rate by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.; Kedem, Benjamin
1990-01-01
Recent studies have shown that the fraction of an area with rain intensity above a fixed threshold is highly correlated with the area-averaged rain rate. To estimate the fractional rainy area, a logistic regression model, which estimates the conditional probability that rain rate over an area exceeds a fixed threshold given the values of related covariates, is developed. The problem of dependency in the data in the estimation procedure is bypassed by the method of partial likelihood. Analyses of simulated scanning multichannel microwave radiometer and observed electrically scanning microwave radiometer data during the Global Atlantic Tropical Experiment period show that the use of logistic regression in pixel classification is superior to multiple regression in predicting whether rain rate at each pixel exceeds a given threshold, even in the presence of noisy data. The potential of the logistic regression technique in satellite rain rate estimation is discussed.
NASA Astrophysics Data System (ADS)
Cary, Theodore W.; Cwanger, Alyssa; Venkatesh, Santosh S.; Conant, Emily F.; Sehgal, Chandra M.
2012-03-01
This study compares the performance of two proven but very different machine learners, Naïve Bayes and logistic regression, for differentiating malignant and benign breast masses using ultrasound imaging. Ultrasound images of 266 masses were analyzed quantitatively for shape, echogenicity, margin characteristics, and texture features. These features along with patient age, race, and mammographic BI-RADS category were used to train Naïve Bayes and logistic regression classifiers to diagnose lesions as malignant or benign. ROC analysis was performed using all of the features and using only a subset that maximized information gain. Performance was determined by the area under the ROC curve, Az, obtained from leave-one-out cross validation. Naïve Bayes showed significant variation (Az 0.733 +/- 0.035 to 0.840 +/- 0.029, P < 0.002) with the choice of features, but the performance of logistic regression was relatively unchanged under feature selection (Az 0.839 +/- 0.029 to 0.859 +/- 0.028, P = 0.605). Out of 34 features, a subset of 6 gave the highest information gain: brightness difference, margin sharpness, depth-to-width, mammographic BI-RADs, age, and race. The probabilities of malignancy determined by Naïve Bayes and logistic regression after feature selection showed significant correlation (R2= 0.87, P < 0.0001). The diagnostic performance of Naïve Bayes and logistic regression can be comparable, but logistic regression is more robust. Since probability of malignancy cannot be measured directly, high correlation between the probabilities derived from two basic but dissimilar models increases confidence in the predictive power of machine learning models for characterizing solid breast masses on ultrasound.
Wang, Qingliang; Li, Xiaojie; Hu, Kunpeng; Zhao, Kun; Yang, Peisheng; Liu, Bo
2015-05-12
To explore the risk factors of portal hypertensive gastropathy (PHG) in patients with hepatitis B associated cirrhosis and establish a Logistic regression model of noninvasive prediction. The clinical data of 234 hospitalized patients with hepatitis B associated cirrhosis from March 2012 to March 2014 were analyzed retrospectively. The dependent variable was the occurrence of PHG while the independent variables were screened by binary Logistic analysis. Multivariate Logistic regression was used for further analysis of significant noninvasive independent variables. Logistic regression model was established and odds ratio was calculated for each factor. The accuracy, sensitivity and specificity of model were evaluated by the curve of receiver operating characteristic (ROC). According to univariate Logistic regression, the risk factors included hepatic dysfunction, albumin (ALB), bilirubin (TB), prothrombin time (PT), platelet (PLT), white blood cell (WBC), portal vein diameter, spleen index, splenic vein diameter, diameter ratio, PLT to spleen volume ratio, esophageal varices (EV) and gastric varices (GV). Multivariate analysis showed that hepatic dysfunction (X1), TB (X2), PLT (X3) and splenic vein diameter (X4) were the major occurring factors for PHG. The established regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4. The accuracy of model for PHG was 79.1% with a sensitivity of 77.2% and a specificity of 80.8%. Hepatic dysfunction, TB, PLT and splenic vein diameter are risk factors for PHG and the noninvasive predicted Logistic regression model was Logit P=-2.667+2.186X1-2.167X2+0.725X3+0.976X4.
Variable Selection in Logistic Regression.
1987-06-01
23 %. AUTIOR(.) S. CONTRACT OR GRANT NUMBE Rf.i %Z. D. Bai, P. R. Krishnaiah and . C. Zhao F49620-85- C-0008 " PERFORMING ORGANIZATION NAME AND AOORESS...d I7 IOK-TK- d 7 -I0 7’ VARIABLE SELECTION IN LOGISTIC REGRESSION Z. D. Bai, P. R. Krishnaiah and L. C. Zhao Center for Multivariate Analysis...University of Pittsburgh Center for Multivariate Analysis University of Pittsburgh Y !I VARIABLE SELECTION IN LOGISTIC REGRESSION Z- 0. Bai, P. R. Krishnaiah
NASA Astrophysics Data System (ADS)
Madhu, B.; Ashok, N. C.; Balasubramanian, S.
2014-11-01
Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
Background: The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Methods: Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. Results: The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Conclusions: Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant. PMID:23113198
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.
NASA Astrophysics Data System (ADS)
Kamaruddin, Ainur Amira; Ali, Zalila; Noor, Norlida Mohd.; Baharum, Adam; Ahmad, Wan Muhamad Amir W.
2014-07-01
Logistic regression analysis examines the influence of various factors on a dichotomous outcome by estimating the probability of the event's occurrence. Logistic regression, also called a logit model, is a statistical procedure used to model dichotomous outcomes. In the logit model the log odds of the dichotomous outcome is modeled as a linear combination of the predictor variables. The log odds ratio in logistic regression provides a description of the probabilistic relationship of the variables and the outcome. In conducting logistic regression, selection procedures are used in selecting important predictor variables, diagnostics are used to check that assumptions are valid which include independence of errors, linearity in the logit for continuous variables, absence of multicollinearity, and lack of strongly influential outliers and a test statistic is calculated to determine the aptness of the model. This study used the binary logistic regression model to investigate overweight and obesity among rural secondary school students on the basis of their demographics profile, medical history, diet and lifestyle. The results indicate that overweight and obesity of students are influenced by obesity in family and the interaction between a student's ethnicity and routine meals intake. The odds of a student being overweight and obese are higher for a student having a family history of obesity and for a non-Malay student who frequently takes routine meals as compared to a Malay student.
Understanding logistic regression analysis.
Sperandei, Sandro
2014-01-01
Logistic regression is used to obtain odds ratio in the presence of more than one explanatory variable. The procedure is quite similar to multiple linear regression, with the exception that the response variable is binomial. The result is the impact of each variable on the odds ratio of the observed event of interest. The main advantage is to avoid confounding effects by analyzing the association of all variables together. In this article, we explain the logistic regression procedure using examples to make it as simple as possible. After definition of the technique, the basic interpretation of the results is highlighted and then some special issues are discussed.
Quantifying Antimicrobial Resistance at Veal Calf Farms
Bosman, Angela B.; Wagenaar, Jaap; Stegeman, Arjan; Vernooij, Hans; Mevius, Dik
2012-01-01
This study was performed to determine a sampling strategy to quantify the prevalence of antimicrobial resistance on veal calf farms, based on the variation in antimicrobial resistance within and between calves on five farms. Faecal samples from 50 healthy calves (10 calves/farm) were collected. From each individual sample and one pooled faecal sample per farm, 90 selected Escherichia coli isolates were tested for their resistance against 25 mg/L amoxicillin, 25 mg/L tetracycline, 0.5 mg/L cefotaxime, 0.125 mg/L ciprofloxacin and 8/152 mg/L trimethoprim/sulfamethoxazole (tmp/s) by replica plating. From each faecal sample another 10 selected E. coli isolates were tested for their resistance by broth microdilution as a reference. Logistic regression analysis was performed to compare the odds of testing an isolate resistant between both test methods (replica plating vs. broth microdilution) and to evaluate the effect of pooling faecal samples. Bootstrap analysis was used to investigate the precision of the estimated prevalence of resistance to each antimicrobial obtained by several simulated sampling strategies. Replica plating showed similar odds of E. coli isolates tested resistant compared to broth microdilution, except for ciprofloxacin (OR 0.29, p≤0.05). Pooled samples showed in general lower odds of an isolate being resistant compared to individual samples, although these differences were not significant. Bootstrap analysis showed that within each antimicrobial the various compositions of a pooled sample provided consistent estimates for the mean proportion of resistant isolates. Sampling strategies should be based on the variation in resistance among isolates within faecal samples and between faecal samples, which may vary by antimicrobial. In our study, the optimal sampling strategy from the perspective of precision of the estimated levels of resistance and practicality consists of a pooled faecal sample from 20 individual animals, of which 90 isolates are tested for their susceptibility by replica plating. PMID:22970313
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
2017-03-23
PUBLIC RELEASE; DISTRIBUTION UNLIMITED Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and... Cost and Probability of Cost and Schedule Overrun for Program Managers Ryan C. Trudelle Follow this and additional works at: https://scholar.afit.edu...afit.edu. Recommended Citation Trudelle, Ryan C., "Using Multiple and Logistic Regression to Estimate the Median Will- Cost and Probability of Cost and
2013-11-01
Ptrend 0.78 0.62 0.75 Unconditional logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for risk of node...Ptrend 0.71 0.67 Unconditional logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for risk of high-grade tumors... logistic regression was used to estimate odds ratios (OR) and 95 % confidence intervals (CI) for the associations between each of the seven SNPs and
Kim, Sun Mi; Kim, Yongdai; Jeong, Kuhwan; Jeong, Heeyeong; Kim, Jiyoung
2018-01-01
The aim of this study was to compare the performance of image analysis for predicting breast cancer using two distinct regression models and to evaluate the usefulness of incorporating clinical and demographic data (CDD) into the image analysis in order to improve the diagnosis of breast cancer. This study included 139 solid masses from 139 patients who underwent a ultrasonography-guided core biopsy and had available CDD between June 2009 and April 2010. Three breast radiologists retrospectively reviewed 139 breast masses and described each lesion using the Breast Imaging Reporting and Data System (BI-RADS) lexicon. We applied and compared two regression methods-stepwise logistic (SL) regression and logistic least absolute shrinkage and selection operator (LASSO) regression-in which the BI-RADS descriptors and CDD were used as covariates. We investigated the performances of these regression methods and the agreement of radiologists in terms of test misclassification error and the area under the curve (AUC) of the tests. Logistic LASSO regression was superior (P<0.05) to SL regression, regardless of whether CDD was included in the covariates, in terms of test misclassification errors (0.234 vs. 0.253, without CDD; 0.196 vs. 0.258, with CDD) and AUC (0.785 vs. 0.759, without CDD; 0.873 vs. 0.735, with CDD). However, it was inferior (P<0.05) to the agreement of three radiologists in terms of test misclassification errors (0.234 vs. 0.168, without CDD; 0.196 vs. 0.088, with CDD) and the AUC without CDD (0.785 vs. 0.844, P<0.001), but was comparable to the AUC with CDD (0.873 vs. 0.880, P=0.141). Logistic LASSO regression based on BI-RADS descriptors and CDD showed better performance than SL in predicting the presence of breast cancer. The use of CDD as a supplement to the BI-RADS descriptors significantly improved the prediction of breast cancer using logistic LASSO regression.
Yu, Yuanyuan; Li, Hongkai; Sun, Xiaoru; Su, Ping; Wang, Tingting; Liu, Yi; Yuan, Zhongshang; Liu, Yanxun; Xue, Fuzhong
2017-12-28
Confounders can produce spurious associations between exposure and outcome in observational studies. For majority of epidemiologists, adjusting for confounders using logistic regression model is their habitual method, though it has some problems in accuracy and precision. It is, therefore, important to highlight the problems of logistic regression and search the alternative method. Four causal diagram models were defined to summarize confounding equivalence. Both theoretical proofs and simulation studies were performed to verify whether conditioning on different confounding equivalence sets had the same bias-reducing potential and then to select the optimum adjusting strategy, in which logistic regression model and inverse probability weighting based marginal structural model (IPW-based-MSM) were compared. The "do-calculus" was used to calculate the true causal effect of exposure on outcome, then the bias and standard error were used to evaluate the performances of different strategies. Adjusting for different sets of confounding equivalence, as judged by identical Markov boundaries, produced different bias-reducing potential in the logistic regression model. For the sets satisfied G-admissibility, adjusting for the set including all the confounders reduced the equivalent bias to the one containing the parent nodes of the outcome, while the bias after adjusting for the parent nodes of exposure was not equivalent to them. In addition, all causal effect estimations through logistic regression were biased, although the estimation after adjusting for the parent nodes of exposure was nearest to the true causal effect. However, conditioning on different confounding equivalence sets had the same bias-reducing potential under IPW-based-MSM. Compared with logistic regression, the IPW-based-MSM could obtain unbiased causal effect estimation when the adjusted confounders satisfied G-admissibility and the optimal strategy was to adjust for the parent nodes of outcome, which obtained the highest precision. All adjustment strategies through logistic regression were biased for causal effect estimation, while IPW-based-MSM could always obtain unbiased estimation when the adjusted set satisfied G-admissibility. Thus, IPW-based-MSM was recommended to adjust for confounders set.
Use and interpretation of logistic regression in habitat-selection studies
Keating, Kim A.; Cherry, Steve
2004-01-01
Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in use-availability studies. Although promising, this model fails to converge to a unique solution in some important situations. Further work is needed to obtain a robust method that is broadly applicable to use-availability studies.
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression
NASA Astrophysics Data System (ADS)
Khikmah, L.; Wijayanto, H.; Syafitri, U. D.
2017-04-01
The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
Logistic regression models of factors influencing the location of bioenergy and biofuels plants
T.M. Young; R.L. Zaretzki; J.H. Perdue; F.M. Guess; X. Liu
2011-01-01
Logistic regression models were developed to identify significant factors that influence the location of existing wood-using bioenergy/biofuels plants and traditional wood-using facilities. Logistic models provided quantitative insight for variables influencing the location of woody biomass-using facilities. Availability of "thinnings to a basal area of 31.7m2/ha...
Babic, Ana; Harris, Holly R; Vitonis, Allison F; Titus, Linda J; Jordan, Susan J; Webb, Penelope M; Risch, Harvey A; Rossing, Mary Anne; Doherty, Jennifer A; Wicklund, Kristine; Goodman, Marc T; Modugno, Francesmary; Moysich, Kirsten B; Ness, Roberta B; Kjaer, Susanne K; Schildkraut, Joellen; Berchuck, Andrew; Pearce, Celeste L; Wu, Anna H; Cramer, Daniel W; Terry, Kathryn L
2018-02-01
Menstrual pain, a common gynecological condition, has been associated with increased risk of ovarian cancer in some, but not all studies. Furthermore, potential variations in the association between menstrual pain and ovarian cancer by histologic subtype have not been adequately evaluated due to lack of power. We assessed menstrual pain using either direct questions about having experienced menstrual pain, or indirect questions about menstrual pain as indication for use of hormones or medications. We used multivariate logistic regression to calculate the odds ratio (OR) for the association between severe menstrual pain and ovarian cancer, adjusting for potential confounders and multinomial logistic regression to calculate ORs for specific histologic subtypes. We observed no association between ovarian cancer and menstrual pain assessed by indirect questions. Among studies using direct question, severe pain was associated with a small but significant increase in overall risk of ovarian cancer (OR = 1.07, 95% CI: 1.01-1.13), after adjusting for endometriosis and other potential confounders. The association appeared to be more relevant for clear cell (OR = 1.48, 95% CI: 1.10-1.99) and serous borderline (OR = 1.31, 95% CI: 1.05-1.63) subtypes. In this large international pooled analysis of case-control studies, we observed a small increase in risk of ovarian cancer for women reporting severe menstrual pain. While we observed an increased ovarian cancer risk with severe menstrual pain, the possibility of recall bias and undiagnosed endometriosis cannot be excluded. Future validation in prospective studies with detailed information on endometriosis is needed. © 2017 UICC.
Litzenburger, Friederike; Heck, Katrin; Pitchika, Vinay; Neuhaus, Klaus W; Jost, Fabian N; Hickel, Reinhard; Jablonski-Momeni, Anahita; Welk, Alexander; Lederer, Alexander; Kühnisch, Jan
2018-02-01
The purpose of this in vitro study was to evaluate the inter- and intraexaminer reliability of digital bitewing (DBW) radiography and near-infrared light transillumination (NIRT) for proximal caries detection and assessment in posterior teeth. From a pool of 85 patients, 100 corresponding pairs of DBW and NIRT images (~1/3 healthy, ~1/3 with enamel caries and ~1/3 with dentin caries) were chosen. 12 dentists with different professional status and clinical experience repeated the evaluation in two blinded cycles. Two experienced dentists provided a reference diagnosis after analysing all images independently. Statistical analysis included the calculation of simple (κ) and weighted Kappa (wκ) values as a measure of reliability. Logistic regression with a backward elimination model was used to investigate the influence of the diagnostic method, evaluation cycle, type of tooth, and clinical experience on reliability. Altogether, inter- and intraexaminer reliability exhibited good to excellent κ and wκ values for DBW radiography (Inter: κ = 0.60/ 0.63; wκ = 0.74/0.76; Intra: κ = 0.64; wκ = 0.77) and NIRT (Inter: κ = 0.74/0.64; wκ = 0.86/0.82; Intra: κ = 0.68; wκ = 0.84). The backward elimination model revealed NIRT to be significantly more reliable than DBW radiography. This study revealed a good to excellent inter- and intraexaminer reliability for proximal caries detection using DBW and NIRT images. The logistic regression analysis revealed significantly better reliability for NIRT. Additionally, the first evaluation cycle was more reliable according to the reference diagnoses.
Alghnam, Suliman; Castillo, Renan
2017-04-01
Although opioid abuse is a rising epidemic in the USA, there are no studies to date on the incidence of persistent opioid use following injuries. Therefore, the aims of this study are: (1) to examine the incidence of persistent opioid use among a nationally representative sample of injured and non-injured populations; (2) to evaluate whether an injury is an independent predictor of persistent opioid use. Data from the Medical Expenditure Panel Survey were pooled (years 2009-2012). Adults were followed for about 2 years, during which they were surveyed about injury status and opioid use every 4-5 months. To determine whether injuries are associated with persistent opioid use, weighted multiple logistic regressions were constructed. While 2.3 million injured individuals received any opioid during the follow-up, 371 170 (15.6%) individuals became persistent opioid users (defined as opioid use across multiple time points). In a multiple logistic regression analysis adjusting for sociodemographic characteristics and self-reported health, those who sustained injuries were 1.4 times (95% CI 1.1 to 1.9) more likely to report persistent opioid use than those without injuries. We found injuries to be significantly associated with persistent opioid use in a nationally representative sample. Further investment in injury prevention may facilitate reduction of persistent opioid use and, thus, improve population health and reduce health expenditures. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Discrete post-processing of total cloud cover ensemble forecasts
NASA Astrophysics Data System (ADS)
Hemri, Stephan; Haiden, Thomas; Pappenberger, Florian
2017-04-01
This contribution presents an approach to post-process ensemble forecasts for the discrete and bounded weather variable of total cloud cover. Two methods for discrete statistical post-processing of ensemble predictions are tested. The first approach is based on multinomial logistic regression, the second involves a proportional odds logistic regression model. Applying them to total cloud cover raw ensemble forecasts from the European Centre for Medium-Range Weather Forecasts improves forecast skill significantly. Based on station-wise post-processing of raw ensemble total cloud cover forecasts for a global set of 3330 stations over the period from 2007 to early 2014, the more parsimonious proportional odds logistic regression model proved to slightly outperform the multinomial logistic regression model. Reference Hemri, S., Haiden, T., & Pappenberger, F. (2016). Discrete post-processing of total cloud cover ensemble forecasts. Monthly Weather Review 144, 2565-2577.
Fuzzy multinomial logistic regression analysis: A multi-objective programming approach
NASA Astrophysics Data System (ADS)
Abdalla, Hesham A.; El-Sayed, Amany A.; Hamed, Ramadan
2017-05-01
Parameter estimation for multinomial logistic regression is usually based on maximizing the likelihood function. For large well-balanced datasets, Maximum Likelihood (ML) estimation is a satisfactory approach. Unfortunately, ML can fail completely or at least produce poor results in terms of estimated probabilities and confidence intervals of parameters, specially for small datasets. In this study, a new approach based on fuzzy concepts is proposed to estimate parameters of the multinomial logistic regression. The study assumes that the parameters of multinomial logistic regression are fuzzy. Based on the extension principle stated by Zadeh and Bárdossy's proposition, a multi-objective programming approach is suggested to estimate these fuzzy parameters. A simulation study is used to evaluate the performance of the new approach versus Maximum likelihood (ML) approach. Results show that the new proposed model outperforms ML in cases of small datasets.
A Primer on Logistic Regression.
ERIC Educational Resources Information Center
Woldbeck, Tanya
This paper introduces logistic regression as a viable alternative when the researcher is faced with variables that are not continuous. If one is to use simple regression, the dependent variable must be measured on a continuous scale. In the behavioral sciences, it may not always be appropriate or possible to have a measured dependent variable on a…
A Solution to Separation and Multicollinearity in Multiple Logistic Regression
Shen, Jianzhao; Gao, Sujuan
2010-01-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
Ye, Dong-qing; Hu, Yi-song; Li, Xiang-pei; Huang, Fen; Yang, Shi-gui; Hao, Jia-hu; Yin, Jing; Zhang, Guo-qing; Liu, Hui-hui
2004-11-01
To explore the impact of environmental factors, daily lifestyle, psycho-social factors and the interactions between environmental factors and chemokines genes on systemic lupus erythematosus (SLE). Case-control study was carried out and environmental factors for SLE were analyzed by univariate and multivariate unconditional logistic regression. Interactions between environmental factors and chemokines polymorphism contributing to systemic lupus erythematosus were also analyzed by logistic regression model. There were nineteen factors associated with SLE when univariate unconditional logistic regression was used. However, when multivariate unconditional logistic regression was used, only five factors showed having impacts on the disease, in which drinking well water (OR=0.099) was protective factor for SLE, and multiple drug allergy (OR=8.174), over-exposure to sunshine (OR=18.339), taking antibiotics (OR=9.630) and oral contraceptives were risk factors for SLE. When unconditional logistic regression model was used, results showed that there was interaction between eating irritable food and -2518MCP-1G/G genotype (OR=4.387). No interaction between environmental factors was found that contributing to SLE in this study. Many environmental factors were related to SLE, and there was an interaction between -2518MCP-1G/G genotype and eating irritable food.
Mielniczuk, Jan; Teisseyre, Paweł
2018-03-01
Detection of gene-gene interactions is one of the most important challenges in genome-wide case-control studies. Besides traditional logistic regression analysis, recently the entropy-based methods attracted a significant attention. Among entropy-based methods, interaction information is one of the most promising measures having many desirable properties. Although both logistic regression and interaction information have been used in several genome-wide association studies, the relationship between them has not been thoroughly investigated theoretically. The present paper attempts to fill this gap. We show that although certain connections between the two methods exist, in general they refer two different concepts of dependence and looking for interactions in those two senses leads to different approaches to interaction detection. We introduce ordering between interaction measures and specify conditions for independent and dependent genes under which interaction information is more discriminative measure than logistic regression. Moreover, we show that for so-called perfect distributions those measures are equivalent. The numerical experiments illustrate the theoretical findings indicating that interaction information and its modified version are more universal tools for detecting various types of interaction than logistic regression and linkage disequilibrium measures. © 2017 WILEY PERIODICALS, INC.
ERIC Educational Resources Information Center
Shih, Ching-Lin; Liu, Tien-Hsiang; Wang, Wen-Chung
2014-01-01
The simultaneous item bias test (SIBTEST) method regression procedure and the differential item functioning (DIF)-free-then-DIF strategy are applied to the logistic regression (LR) method simultaneously in this study. These procedures are used to adjust the effects of matching true score on observed score and to better control the Type I error…
Access disparities to Magnet hospitals for patients undergoing neurosurgical operations
Missios, Symeon; Bekelis, Kimon
2017-01-01
Background Centers of excellence focusing on quality improvement have demonstrated superior outcomes for a variety of surgical interventions. We investigated the presence of access disparities to hospitals recognized by the Magnet Recognition Program of the American Nurses Credentialing Center (ANCC) for patients undergoing neurosurgical operations. Methods We performed a cohort study of all neurosurgery patients who were registered in the New York Statewide Planning and Research Cooperative System (SPARCS) database from 2009–2013. We examined the association of African-American race and lack of insurance with Magnet status hospitalization for neurosurgical procedures. A mixed effects propensity adjusted multivariable regression analysis was used to control for confounding. Results During the study period, 190,535 neurosurgical patients met the inclusion criteria. Using a multivariable logistic regression, we demonstrate that African-Americans had lower admission rates to Magnet institutions (OR 0.62; 95% CI, 0.58–0.67). This persisted in a mixed effects logistic regression model (OR 0.77; 95% CI, 0.70–0.83) to adjust for clustering at the patient county level, and a propensity score adjusted logistic regression model (OR 0.75; 95% CI, 0.69–0.82). Additionally, lack of insurance was associated with lower admission rates to Magnet institutions (OR 0.71; 95% CI, 0.68–0.73), in a multivariable logistic regression model. This persisted in a mixed effects logistic regression model (OR 0.72; 95% CI, 0.69–0.74), and a propensity score adjusted logistic regression model (OR 0.72; 95% CI, 0.69–0.75). Conclusions Using a comprehensive all-payer cohort of neurosurgery patients in New York State we identified an association of African-American race and lack of insurance with lower rates of admission to Magnet hospitals. PMID:28684152
Adjusting for Confounding in Early Postlaunch Settings: Going Beyond Logistic Regression Models.
Schmidt, Amand F; Klungel, Olaf H; Groenwold, Rolf H H
2016-01-01
Postlaunch data on medical treatments can be analyzed to explore adverse events or relative effectiveness in real-life settings. These analyses are often complicated by the number of potential confounders and the possibility of model misspecification. We conducted a simulation study to compare the performance of logistic regression, propensity score, disease risk score, and stabilized inverse probability weighting methods to adjust for confounding. Model misspecification was induced in the independent derivation dataset. We evaluated performance using relative bias confidence interval coverage of the true effect, among other metrics. At low events per coefficient (1.0 and 0.5), the logistic regression estimates had a large relative bias (greater than -100%). Bias of the disease risk score estimates was at most 13.48% and 18.83%. For the propensity score model, this was 8.74% and >100%, respectively. At events per coefficient of 1.0 and 0.5, inverse probability weighting frequently failed or reduced to a crude regression, resulting in biases of -8.49% and 24.55%. Coverage of logistic regression estimates became less than the nominal level at events per coefficient ≤5. For the disease risk score, inverse probability weighting, and propensity score, coverage became less than nominal at events per coefficient ≤2.5, ≤1.0, and ≤1.0, respectively. Bias of misspecified disease risk score models was 16.55%. In settings with low events/exposed subjects per coefficient, disease risk score methods can be useful alternatives to logistic regression models, especially when propensity score models cannot be used. Despite better performance of disease risk score methods than logistic regression and propensity score models in small events per coefficient settings, bias, and coverage still deviated from nominal.
Yang, H P; Cook, L S; Weiderpass, E; Adami, H-O; Anderson, K E; Cai, H; Cerhan, J R; Clendenen, T V; Felix, A S; Friedenreich, C M; Garcia-Closas, M; Goodman, M T; Liang, X; Lissowska, J; Lu, L; Magliocco, A M; McCann, S E; Moysich, K B; Olson, S H; Petruzella, S; Pike, M C; Polidoro, S; Ricceri, F; Risch, H A; Sacerdote, C; Setiawan, V W; Shu, X O; Spurdle, A B; Trabert, B; Webb, P M; Wentzensen, N; Xiang, Y-B; Xu, Y; Yu, H; Zeleniuch-Jacquotte, A; Brinton, L A
2015-03-03
Nulliparity is an endometrial cancer risk factor, but whether or not this association is due to infertility is unclear. Although there are many underlying infertility causes, few studies have assessed risk relations by specific causes. We conducted a pooled analysis of 8153 cases and 11 713 controls from 2 cohort and 12 case-control studies. All studies provided self-reported infertility and its causes, except for one study that relied on data from national registries. Logistic regression was used to estimate adjusted odds ratios (OR) and 95% confidence intervals (CI). Nulliparous women had an elevated endometrial cancer risk compared with parous women, even after adjusting for infertility (OR=1.76; 95% CI: 1.59-1.94). Women who reported infertility had an increased risk compared with those without infertility concerns, even after adjusting for nulliparity (OR=1.22; 95% CI: 1.13-1.33). Among women who reported infertility, none of the individual infertility causes were substantially related to endometrial cancer. Based on mainly self-reported infertility data that used study-specific definitions of infertility, nulliparity and infertility appeared to independently contribute to endometrial cancer risk. Understanding residual endometrial cancer risk related to infertility, its causes and its treatments may benefit from large studies involving detailed data on various infertility parameters.
McElroy, Jane A; Trentham-Dietz, Amy; Gangnon, Ronald E; Hampton, John M; Bersch, Andrew J; Kanarek, Marty S; Newcomb, Polly A
2008-09-01
One unintentional result of widespread adoption of nitrogen application to croplands over the past 50 years has been nitrate contamination of drinking water with few studies evaluating the risk of colorectal cancer. In our population-based case-control study of 475 women age 20-74 years with colorectal cancer and 1447 community controls living in rural Wisconsin, drinking water nitrate exposure were interpolated to subjects residences based on measurements which had been taken as part of a separate water quality survey in 1994. Individual level risk factor data was gathered in 1990-1992 and 1999-2001. Logistic regression models estimated the risk of colorectal cancer for the study period, separately and pooled. In the pooled analyses, an overall colorectal cancer risk was not observed for exposure to nitrate-nitrogen in the highest category (> or =10 ppm) compared to the lowest category (<0.5 ppm). However, a 2.9 fold increase risk was observed for proximal colon cancer cases in the highest compared to the lowest category. Statistically significant increased distal colon or rectal cancer risk was not observed. These results suggest that if an association exists with nitrate-nitrogen exposure from residential drinking water consumption, it may be limited to proximal colon cancer.
Sedentary behavior is associated with colorectal adenoma recurrence in men
Molmenti, Christine L. Sardo; Hibler, Elizabeth A.; Ashbeck, Erin L.; Thomson, Cynthia A.; Garcia, David O.; Roe, Denise; Harris, Robin B.; Lance, Peter; Cisneroz, Martin; Martinez, Maria Elena; Thompson, Patricia A.; Jacobs, Elizabeth T.
2014-01-01
Purpose The association between physical activity and colorectal adenoma is equivocal. This study was designed to assess the relationship between physical activity and colorectal adenoma recurrence. Methods Pooled analyses from two randomized, controlled trials included 1,730 participants who completed the Arizona Activity Frequency Questionnaire at baseline, had a colorectal adenoma removed within 6 months of study registration, and had a follow-up colonoscopy during the trial. Logistic regression modeling was employed to estimate the effect of sedentary behavior, light-intensity physical activity, and moderate-vigorous physical activity on colorectal adenoma recurrence. Results No statistically significant trends were found for any activity type and odds of colorectal adenoma recurrence in the pooled population. However, males with the highest levels of sedentary time experienced 47% higher odds of adenoma recurrence. Compared to the lowest quartile of sedentary time, the ORs (95% CIs) for the second, third, and fourth quartiles among men were 1.23 (0.88, 1.74), 1.41 (0.99, 2.01), and 1.47 (1.03, 2.11) respectively (P trend=0.03). No similar association was observed for women. Conclusions This study suggests that sedentary behavior is associated with a higher risk of colorectal adenoma recurrence among men, providing evidence of detrimental effects of a sedentary lifestyle early in the carcinogenesis pathway. PMID:25060482
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Casini, Annalisa; Clays, Els; Godin, Isabelle; De Backer, Guy; Kornitzer, Marcel; Kittel, France
2010-12-01
To evaluate (1) whether the physical and mental health of male workers differs from that of female workers, and, if so, whether (2) this is affected by the interplay between work and nonwork burden. We pooled two large Belgian databases (BELSTRESS III, SOMSTRESS) comprising data on 4810 (2847 women). Gender-specific logistic regressions were performed using a four-level variable as predictor. This combined two predictors: isolated job strain (isostrain) and home-work interference (HWI). Male workers are at greater risk of chronic fatigue when they experience high isostrain but not high HWI. Although accumulated high isostrain and high HWI affect women mainly via chronic fatigue, the same pattern has a greater impact on men's perceived health. There was no difference for the other patterns. To improve workers' well-being, organizations should develop work and nonwork balance policies specific for men and women.
Nie, Z Q; Ou, Y Q; Zhuang, J; Qu, Y J; Mai, J Z; Chen, J M; Liu, X Q
2016-05-01
Conditional logistic regression analysis and unconditional logistic regression analysis are commonly used in case control study, but Cox proportional hazard model is often used in survival data analysis. Most literature only refer to main effect model, however, generalized linear model differs from general linear model, and the interaction was composed of multiplicative interaction and additive interaction. The former is only statistical significant, but the latter has biological significance. In this paper, macros was written by using SAS 9.4 and the contrast ratio, attributable proportion due to interaction and synergy index were calculated while calculating the items of logistic and Cox regression interactions, and the confidence intervals of Wald, delta and profile likelihood were used to evaluate additive interaction for the reference in big data analysis in clinical epidemiology and in analysis of genetic multiplicative and additive interactions.
Doering, Stefan
2016-01-01
Background The continuous exposure to inorganic mercury vapour in artisanal small-scale gold mining (ASGM) areas leads to chronic health problems. It is therefore essential to have a quick, but reliable risk assessing tool to diagnose chronic inorganic mercury intoxication. This study re-evaluates the state-of-the-art toolkit to diagnose chronic inorganic mercury intoxication by analysing data from multiple pooled cross-sectional studies. The primary research question aims to reduce the currently used set of indicators without affecting essentially the capability to diagnose chronic inorganic mercury intoxication. In addition, a sensitivity analysis is performed on established biomonitoring exposure limits for mercury in blood, hair, urine and urine adjusted by creatinine, where the biomonitoring exposure limits are compared to thresholds most associated with chronic inorganic mercury intoxication in artisanal small-scale gold mining. Methods Health data from miners and community members in Indonesia, Tanzania and Zimbabwe were obtained as part of the Global Mercury Project and pooled into one dataset together with their biomarkers mercury in urine, blood and hair. The individual prognostic impact of the indicators on the diagnosis of mercury intoxication is quantified using logistic regression models. The selection is performed by a stepwise forward/backward selection. Different models are compared based on the Bayesian information criterion (BIC) and Cohen`s kappa is used to evaluate the level of agreement between the diagnosis of mercury intoxication based on the currently used set of indicators and the result based on our reduced set of indicators. The sensitivity analysis of biomarker exposure limits of mercury is based on a sequence of chi square tests. Results The variable selection in logistic regression reduced the number of medical indicators from thirteen to ten in addition to the biomarkers. The estimated level of agreement using ten of thirteen medical indicators and all four biomarkers to diagnose chronic inorganic mercury intoxication yields a Cohen`s Kappa of 0.87. While in an additional stepwise selection the biomarker blood was not selected, the level of agreement based on ten medical indicators and only the three biomarkers urine, urine/creatinine and hair reduced Cohen`s Kappa to 0.46. The optimal cut-point for the biomarkers blood, hair, urine and urine/creatinine were estimated at 11. 6 μg/l, 3.84 μg/g, 24.4 μg/l and 4.26 μg/g, respectively. Conclusion The results show that a reduction down to only ten indicators still allows a reliable diagnosis of chronic inorganic mercury intoxication. This reduction of indicators will simplify health assessments in artisanal small-scale gold mining areas. PMID:27575533
No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.
van Smeden, Maarten; de Groot, Joris A H; Moons, Karel G M; Collins, Gary S; Altman, Douglas G; Eijkemans, Marinus J C; Reitsma, Johannes B
2016-11-24
Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies. The current study uses Monte Carlo simulations to evaluate small sample bias, coverage of confidence intervals and mean square error of logit coefficients. Logistic regression models fitted by maximum likelihood and a modified estimation procedure, known as Firth's correction, are compared. The results show that besides EPV, the problems associated with low EPV depend on other factors such as the total sample size. It is also demonstrated that simulation results can be dominated by even a few simulated data sets for which the prediction of the outcome by the covariates is perfect ('separation'). We reveal that different approaches for identifying and handling separation leads to substantially different simulation results. We further show that Firth's correction can be used to improve the accuracy of regression coefficients and alleviate the problems associated with separation. The current evidence supporting EPV rules for binary logistic regression is weak. Given our findings, there is an urgent need for new research to provide guidance for supporting sample size considerations for binary logistic regression analysis.
Li, Yi; Tseng, Yufeng J.; Pan, Dahua; Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Hopfinger, Anton J.
2008-01-01
Currently, the only validated methods to identify skin sensitization effects are in vivo models, such as the Local Lymph Node Assay (LLNA) and guinea pig studies. There is a tremendous need, in particular due to novel legislation, to develop animal alternatives, eg. Quantitative Structure-Activity Relationship (QSAR) models. Here, QSAR models for skin sensitization using LLNA data have been constructed. The descriptors used to generate these models are derived from the 4D-molecular similarity paradigm and are referred to as universal 4D-fingerprints. A training set of 132 structurally diverse compounds and a test set of 15 structurally diverse compounds were used in this study. The statistical methodologies used to build the models are logistic regression (LR), and partial least square coupled logistic regression (PLS-LR), which prove to be effective tools for studying skin sensitization measures expressed in the two categorical terms of sensitizer and non-sensitizer. QSAR models with low values of the Hosmer-Lemeshow goodness-of-fit statistic, χHL2, are significant and predictive. For the training set, the cross-validated prediction accuracy of the logistic regression models ranges from 77.3% to 78.0%, while that of PLS-logistic regression models ranges from 87.1% to 89.4%. For the test set, the prediction accuracy of logistic regression models ranges from 80.0%-86.7%, while that of PLS-logistic regression models ranges from 73.3%-80.0%. The QSAR models are made up of 4D-fingerprints related to aromatic atoms, hydrogen bond acceptors and negatively partially charged atoms. PMID:17226934
The relationship between patient data and pooled clinical management decisions.
Ludbrook, G I; O'Loughlin, E J; Corcoran, T B; Grant, C
2013-01-01
A strong relationship between patient data and preoperative clinical decisions could potentially be used to support clinical decisions in preoperative management. The aim of this exploratory study was to determine the relationship between key patient data and pooled clinical opinions on management. In a previous study, panels of anaesthetists compared the quality of computer-assisted patient health assessments with outpatient consultations and made decisions on the need for preoperative tests, no preoperative outpatient assessment, possible postoperative intensive care unit/high dependency unit requirements and aspiration prophylaxis. In the current study, the relationship between patient data and these decisions was examined using binomial logistic regression analysis. Backward stepwise regression was used to identify independent predictors of each decision (at P >0.15), which were then incorporated into a predictive model. The number of factors related to each decision varied: blood picture (four factors), biochemistry (six factors), coagulation studies (three factors), electrocardiography (eight factors), chest X-ray (seven factors), preoperative outpatient assessment (17 factors), intensive care unit requirement (eight factors) and aspiration prophylaxis (one factor). The factor types also varied, but included surgical complexity, age, gender, number of medications or comorbidities, body mass index, hypertension, central nervous system condition, heart disease, sleep apnoea, smoking, persistent pain and stroke. Models based on these relationships usually demonstrated good sensitivity and specificity, with receiver operating characteristics in the following areas under curve: blood picture (0.75), biochemistry (0.86), coagulation studies (0.71), electrocardiography (0.90), chest X-ray (0.85), outpatient assessment (0.85), postoperative intensive care unit requirement (0.88) and aspiration prophylaxis (0.85). These initial results suggest modelling of patient data may have utility supporting clinicians' preoperative decisions.
Aerts, Marc; Minalu, Girma; Bösner, Stefan; Buntinx, Frank; Burnand, Bernard; Haasenritter, Jörg; Herzig, Lilli; Knottnerus, J André; Nilsson, Staffan; Renier, Walter; Sox, Carol; Sox, Harold; Donner-Banzhoff, Norbert
2017-01-01
To construct a clinical prediction rule for coronary artery disease (CAD) presenting with chest pain in primary care. Meta-Analysis using 3,099 patients from five studies. To identify candidate predictors, we used random forest trees, multiple imputation of missing values, and logistic regression within individual studies. To generate a prediction rule on the pooled data, we applied a regression model that took account of the differing standard data sets collected by the five studies. The most parsimonious rule included six equally weighted predictors: age ≥55 (males) or ≥65 (females) (+1); attending physician suspected a serious diagnosis (+1); history of CAD (+1); pain brought on by exertion (+1); pain feels like "pressure" (+1); pain reproducible by palpation (-1). CAD was considered absent if the prediction score is <2. The area under the ROC curve was 0.84. We applied this rule to a study setting with a CAD prevalence of 13.2% using a prediction score cutoff of <2 (i.e., -1, 0, or +1). When the score was <2, the probability of CAD was 2.1% (95% CI: 1.1-3.9%); when the score was ≥ 2, it was 43.0% (95% CI: 35.8-50.4%). Clinical prediction rules are a key strategy for individualizing care. Large data sets based on electronic health records from diverse sites create opportunities for improving their internal and external validity. Our patient-level meta-analysis from five primary care sites should improve external validity. Our strategy for addressing site-to-site systematic variation in missing data should improve internal validity. Using principles derived from decision theory, we also discuss the problem of setting the cutoff prediction score for taking action. Copyright © 2016 Elsevier Inc. All rights reserved.
Smith, Lindsey P; Ng, Shu Wen; Popkin, Barry M
2014-05-01
We examined the effects of state-level unemployment rates during the recession of 2008 on patterns of home food preparation and away-from-home (AFH) eating among low-income and minority populations. We analyzed pooled cross-sectional data on 118 635 adults aged 18 years or older who took part in the American Time Use Study. Multinomial logistic regression models stratified by gender were used to evaluate the associations between state-level unemployment, poverty, race/ethnicity, and time spent cooking, and log binomial regression was used to assess respondents' AFH consumption patterns. High state-level unemployment was associated with only trivial increases in respondents' cooking patterns and virtually no change in their AFH eating patterns. Low-income and racial/ethnic minority groups were not disproportionately affected by the recession. Even during a major economic downturn, US adults are resistant to food-related behavior change. More work is needed to understand whether this reluctance to change is attributable to time limits, lack of knowledge or skill related to food preparation, or lack of access to fresh produce and raw ingredients.
Smith, Lindsey P.; Ng, Shu Wen
2014-01-01
Objectives. We examined the effects of state-level unemployment rates during the recession of 2008 on patterns of home food preparation and away-from-home (AFH) eating among low-income and minority populations. Methods. We analyzed pooled cross-sectional data on 118 635 adults aged 18 years or older who took part in the American Time Use Study. Multinomial logistic regression models stratified by gender were used to evaluate the associations between state-level unemployment, poverty, race/ethnicity, and time spent cooking, and log binomial regression was used to assess respondents’ AFH consumption patterns. Results. High state-level unemployment was associated with only trivial increases in respondents’ cooking patterns and virtually no change in their AFH eating patterns. Low-income and racial/ethnic minority groups were not disproportionately affected by the recession. Conclusions. Even during a major economic downturn, US adults are resistant to food-related behavior change. More work is needed to understand whether this reluctance to change is attributable to time limits, lack of knowledge or skill related to food preparation, or lack of access to fresh produce and raw ingredients. PMID:24625145
MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION
Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...
Cigarette smoking and risk of ovarian cancer: a pooled analysis of 21 case–control studies
Faber, Mette T.; Kjær, Susanne K.; Dehlendorff, Christian; Chang-Claude, Jenny; Andersen, Klaus K.; Høgdall, Estrid; Webb, Penelope M.; Jordan, Susan J.; Rossing, Mary Anne; Doherty, Jennifer A.; Lurie, Galina; Thompson, Pamela J.; Carney, Michael E.; Goodman, Marc T.; Ness, Roberta B.; Modugnos, Francesmary; Edwards, Robert P.; Bunker, Clareann H.; Goode, Ellen L.; Fridley, Brooke L.; Vierkant, Robert A.; Larson, Melissa C.; Schildkraut, Joellen; Cramer, Daniel W.; Terry, Kathryn L.; Vitonis, Allison F.; Bandera, Elisa V.; Olson, Sara H.; King, Melony; Chandran, Urmila; Kiemeney, Lambertus A.; Massuger, Leon F. A. G.; van Altena, Anne M.; Vermeulen, Sita H.; Brinton, Louise; Wentzensen, Nicolas; Lissowska, Jolanta; Yang, Hannah P.; Moysich, Kirsten B.; Odunsi, Kunle; Kasza, Karin; Odunsi-Akanji, Oluwatosin; Song, Honglin; Pharaoh, Paul; Shah, Mitul; Whittemore, Alice S.; McGuire, Valerie; Sieh, Weiva; Sutphen, Rebecca; Menon, Usha; Gayther, Simon A.; Ramus, Susan J.; Gentry-Maharaj, Aleksandra; Pearce, Celeste Leigh; Wu, Anna H.; Pike, Malcolm C.; Risch, Harvey A.
2013-01-01
Purpose The majority of previous studies have observed an increased risk of mucinous ovarian tumors associated with cigarette smoking, but the association with other histological types is unclear. In a large pooled analysis, we examined the risk of epithelial ovarian cancer associated with multiple measures of cigarette smoking with a focus on characterizing risks according to tumor behavior and histology. Methods We used data from 21 case–control studies of ovarian cancer (19,066 controls, 11,972 invasive and 2,752 borderline cases). Study-specific odds ratios (OR) and 95 % confidence intervals (CI) were obtained from logistic regression models and combined into a pooled odds ratio using a random effects model. Results Current cigarette smoking increased the risk of invasive mucinous (OR = 1.31; 95 % CI: 1.03–1.65) and borderline mucinous ovarian tumors (OR = 1.83; 95 % CI: 1.39–2.41), while former smoking increased the risk of borderline serous ovarian tumors (OR = 1.30; 95 % CI: 1.12–1.50). For these histological types, consistent dose– response associations were observed. No convincing associations between smoking and risk of invasive serous and endometrioid ovarian cancer were observed, while our results provided some evidence of a decreased risk of invasive clear cell ovarian cancer. Conclusions Our results revealed marked differences in the risk profiles of histological types of ovarian cancer with regard to cigarette smoking, although the magnitude of the observed associations was modest. Our findings, which may reflect different etiologies of the histological types, add to the fact that ovarian cancer is a heterogeneous disease. PMID:23456270
A chronic fatigue syndrome – related proteome in human cerebrospinal fluid
Baraniuk, James N; Casado, Begona; Maibach, Hilda; Clauw, Daniel J; Pannell, Lewis K; Hess S, Sonja
2005-01-01
Background Chronic Fatigue Syndrome (CFS), Persian Gulf War Illness (PGI), and fibromyalgia are overlapping symptom complexes without objective markers or known pathophysiology. Neurological dysfunction is common. We assessed cerebrospinal fluid to find proteins that were differentially expressed in this CFS-spectrum of illnesses compared to control subjects. Methods Cerebrospinal fluid specimens from 10 CFS, 10 PGI, and 10 control subjects (50 μl/subject) were pooled into one sample per group (cohort 1). Cohort 2 of 12 control and 9 CFS subjects had their fluids (200 μl/subject) assessed individually. After trypsin digestion, peptides were analyzed by capillary chromatography, quadrupole-time-of-flight mass spectrometry, peptide sequencing, bioinformatic protein identification, and statistical analysis. Results Pooled CFS and PGI samples shared 20 proteins that were not detectable in the pooled control sample (cohort 1 CFS-related proteome). Multilogistic regression analysis (GLM) of cohort 2 detected 10 proteins that were shared by CFS individuals and the cohort 1 CFS-related proteome, but were not detected in control samples. Detection of ≥1 of a select set of 5 CFS-related proteins predicted CFS status with 80% concordance (logistic model). The proteins were α-1-macroglobulin, amyloid precursor-like protein 1, keratin 16, orosomucoid 2 and pigment epithelium-derived factor. Overall, 62 of 115 proteins were newly described. Conclusion This pilot study detected an identical set of central nervous system, innate immune and amyloidogenic proteins in cerebrospinal fluids from two independent cohorts of subjects with overlapping CFS, PGI and fibromyalgia. Although syndrome names and definitions were different, the proteome and presumed pathological mechanism(s) may be shared. PMID:16321154
Loyen, Anne; Clarke-Cornwell, Alexandra M; Anderssen, Sigmund A; Hagströmer, Maria; Sardinha, Luís B; Sundquist, Kristina; Ekelund, Ulf; Steene-Johannessen, Jostein; Baptista, Fátima; Hansen, Bjørge H; Wijndaele, Katrien; Brage, Søren; Lakerveld, Jeroen; Brug, Johannes; van der Ploeg, Hidde P
2017-07-01
The objective of this study was to pool, harmonise and re-analyse national accelerometer data from adults in four European countries in order to describe population levels of sedentary time and physical inactivity. Five cross-sectional studies were included from England, Portugal, Norway and Sweden. ActiGraph accelerometer count data were centrally processed using the same algorithms. Multivariable logistic regression analyses were conducted to study the associations of sedentary time and physical inactivity with sex, age, weight status and educational level, in both the pooled sample and the separate study samples. Data from 9509 participants were used. On average, participants were sedentary for 530 min/day, and accumulated 36 min/day of moderate to vigorous intensity physical activity. Twenty-three percent accumulated more than 10 h of sedentary time/day, and 72% did not meet the physical activity recommendations. Nine percent of all participants were classified as high sedentary and low active. Participants from Norway showed the highest levels of sedentary time, while participants from England were the least physically active. Age and weight status were positively associated with sedentary time and not meeting the physical activity recommendations. Men and higher-educated people were more likely to be highly sedentary, while women and lower-educated people were more likely to be inactive. We found high levels of sedentary time and physical inactivity in four European countries. Older people and obese people were most likely to display these behaviours and thus deserve special attention in interventions and policy planning. In order to monitor these behaviours, accelerometer-based cross-European surveillance is recommended.
Rios, Paula; Bailey, Helen D; Orsi, Laurent; Lacour, Brigitte; Valteau-Couanet, Dominique; Levy, Dominique; Corradini, Nadège; Leverger, Guy; Defachelles, Anne-Sophie; Gambart, Marion; Sirvent, Nicolas; Thebaud, Estelle; Ducassou, Stéphane; Clavel, Jacqueline
2016-11-01
Neuroblastoma (NB), an embryonic tumour arising from neural crest cells, is the most common malignancy among infants. The aetiology of NB is largely unknown. We conducted a pooled analysis to explore whether there is an association between NB and preconception and perinatal factors using data from two French national population-based case-control studies. The mothers of 357 NB cases and 1783 controls younger than 6 years, frequency-matched by age and gender, responded to a telephone interview that focused on demographic, socioeconomic and perinatal characteristics, childhood environment, life-style and maternal reproductive history. Unconditional logistic regression was used to estimate pooled odds ratios and 95% confidence intervals. After controlling for matching variables, study of origin and potential confounders, being born either small (OR 1.4 95% CI 1.0-2.0) or large (OR 1.5 95% CI 1.1-2.2) for gestational age and, among children younger than 18 months, having congenital malformations (OR 3.6 95% CI 1.3-8.9), were significantly associated with NB. Inverse associations were observed with breastfeeding (OR 0.7 95% CI 0.5-1.0) and maternal use of any supplements containing folic acid, vitamins or minerals (OR 0.5 95% CI 0.3-0.9) during the preconception period. Our findings reinforce the hypothesis that fetal growth anomalies and congenital malformations may be associated with an increased risk of NB. Further investigations are needed in order to clarify the role of folic acid supplementation and breastfeeding, given their potential importance in NB prevention. © 2016 UICC.
Brenn, T; Arnesen, E
1985-01-01
For comparative evaluation, discriminant analysis, logistic regression and Cox's model were used to select risk factors for total and coronary deaths among 6595 men aged 20-49 followed for 9 years. Groups with mortality between 5 and 93 per 1000 were considered. Discriminant analysis selected variable sets only marginally different from the logistic and Cox methods which always selected the same sets. A time-saving option, offered for both the logistic and Cox selection, showed no advantage compared with discriminant analysis. Analysing more than 3800 subjects, the logistic and Cox methods consumed, respectively, 80 and 10 times more computer time than discriminant analysis. When including the same set of variables in non-stepwise analyses, all methods estimated coefficients that in most cases were almost identical. In conclusion, discriminant analysis is advocated for preliminary or stepwise analysis, otherwise Cox's method should be used.
ERIC Educational Resources Information Center
DeMars, Christine E.
2009-01-01
The Mantel-Haenszel (MH) and logistic regression (LR) differential item functioning (DIF) procedures have inflated Type I error rates when there are large mean group differences, short tests, and large sample sizes.When there are large group differences in mean score, groups matched on the observed number-correct score differ on true score,…
Satellite rainfall retrieval by logistic regression
NASA Technical Reports Server (NTRS)
Chiu, Long S.
1986-01-01
The potential use of logistic regression in rainfall estimation from satellite measurements is investigated. Satellite measurements provide covariate information in terms of radiances from different remote sensors.The logistic regression technique can effectively accommodate many covariates and test their significance in the estimation. The outcome from the logistical model is the probability that the rainrate of a satellite pixel is above a certain threshold. By varying the thresholds, a rainrate histogram can be obtained, from which the mean and the variant can be estimated. A logistical model is developed and applied to rainfall data collected during GATE, using as covariates the fractional rain area and a radiance measurement which is deduced from a microwave temperature-rainrate relation. It is demonstrated that the fractional rain area is an important covariate in the model, consistent with the use of the so-called Area Time Integral in estimating total rain volume in other studies. To calibrate the logistical model, simulated rain fields generated by rainfield models with prescribed parameters are needed. A stringent test of the logistical model is its ability to recover the prescribed parameters of simulated rain fields. A rain field simulation model which preserves the fractional rain area and lognormality of rainrates as found in GATE is developed. A stochastic regression model of branching and immigration whose solutions are lognormally distributed in some asymptotic limits has also been developed.
ERIC Educational Resources Information Center
Jones, Douglas H.
The progress of modern mental test theory depends very much on the techniques of maximum likelihood estimation, and many popular applications make use of likelihoods induced by logistic item response models. While, in reality, item responses are nonreplicate within a single examinee and the logistic models are only ideal, practitioners make…
Practical Session: Logistic Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
An exercise is proposed to illustrate the logistic regression. One investigates the different risk factors in the apparition of coronary heart disease. It has been proposed in Chapter 5 of the book of D.G. Kleinbaum and M. Klein, "Logistic Regression", Statistics for Biology and Health, Springer Science Business Media, LLC (2010) and also by D. Chessel and A.B. Dufour in Lyon 1 (see Sect. 6 of http://pbil.univ-lyon1.fr/R/pdf/tdr341.pdf). This example is based on data given in the file evans.txt coming from http://www.sph.emory.edu/dkleinb/logreg3.htm#data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test ofmore » the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.« less
NASA Astrophysics Data System (ADS)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-10-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
The cross-validated AUC for MCP-logistic regression with high-dimensional data.
Jiang, Dingfeng; Huang, Jian; Zhang, Ying
2013-10-01
We propose a cross-validated area under the receiving operator characteristic (ROC) curve (CV-AUC) criterion for tuning parameter selection for penalized methods in sparse, high-dimensional logistic regression models. We use this criterion in combination with the minimax concave penalty (MCP) method for variable selection. The CV-AUC criterion is specifically designed for optimizing the classification performance for binary outcome data. To implement the proposed approach, we derive an efficient coordinate descent algorithm to compute the MCP-logistic regression solution surface. Simulation studies are conducted to evaluate the finite sample performance of the proposed method and its comparison with the existing methods including the Akaike information criterion (AIC), Bayesian information criterion (BIC) or Extended BIC (EBIC). The model selected based on the CV-AUC criterion tends to have a larger predictive AUC and smaller classification error than those with tuning parameters selected using the AIC, BIC or EBIC. We illustrate the application of the MCP-logistic regression with the CV-AUC criterion on three microarray datasets from the studies that attempt to identify genes related to cancers. Our simulation studies and data examples demonstrate that the CV-AUC is an attractive method for tuning parameter selection for penalized methods in high-dimensional logistic regression models.
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Kesselmeier, Miriam; Lorenzo Bermejo, Justo
2017-11-01
Logistic regression is the most common technique used for genetic case-control association studies. A disadvantage of standard maximum likelihood estimators of the genotype relative risk (GRR) is their strong dependence on outlier subjects, for example, patients diagnosed at unusually young age. Robust methods are available to constrain outlier influence, but they are scarcely used in genetic studies. This article provides a non-intimidating introduction to robust logistic regression, and investigates its benefits and limitations in genetic association studies. We applied the bounded Huber and extended the R package 'robustbase' with the re-descending Hampel functions to down-weight outlier influence. Computer simulations were carried out to assess the type I error rate, mean squared error (MSE) and statistical power according to major characteristics of the genetic study and investigated markers. Simulations were complemented with the analysis of real data. Both standard and robust estimation controlled type I error rates. Standard logistic regression showed the highest power but standard GRR estimates also showed the largest bias and MSE, in particular for associated rare and recessive variants. For illustration, a recessive variant with a true GRR=6.32 and a minor allele frequency=0.05 investigated in a 1000 case/1000 control study by standard logistic regression resulted in power=0.60 and MSE=16.5. The corresponding figures for Huber-based estimation were power=0.51 and MSE=0.53. Overall, Hampel- and Huber-based GRR estimates did not differ much. Robust logistic regression may represent a valuable alternative to standard maximum likelihood estimation when the focus lies on risk prediction rather than identification of susceptibility variants. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Sampson, Maureen L; Gounden, Verena; van Deventer, Hendrik E; Remaley, Alan T
2016-02-01
The main drawback of the periodic analysis of quality control (QC) material is that test performance is not monitored in time periods between QC analyses, potentially leading to the reporting of faulty test results. The objective of this study was to develop a patient based QC procedure for the more timely detection of test errors. Results from a Chem-14 panel measured on the Beckman LX20 analyzer were used to develop the model. Each test result was predicted from the other 13 members of the panel by multiple regression, which resulted in correlation coefficients between the predicted and measured result of >0.7 for 8 of the 14 tests. A logistic regression model, which utilized the measured test result, the predicted test result, the day of the week and time of day, was then developed for predicting test errors. The output of the logistic regression was tallied by a daily CUSUM approach and used to predict test errors, with a fixed specificity of 90%. The mean average run length (ARL) before error detection by CUSUM-Logistic Regression (CSLR) was 20 with a mean sensitivity of 97%, which was considerably shorter than the mean ARL of 53 (sensitivity 87.5%) for a simple prediction model that only used the measured result for error detection. A CUSUM-Logistic Regression analysis of patient laboratory data can be an effective approach for the rapid and sensitive detection of clinical laboratory errors. Published by Elsevier Inc.
Nonconvex Sparse Logistic Regression With Weakly Convex Regularization
NASA Astrophysics Data System (ADS)
Shen, Xinyue; Gu, Yuantao
2018-06-01
In this work we propose to fit a sparse logistic regression model by a weakly convex regularized nonconvex optimization problem. The idea is based on the finding that a weakly convex function as an approximation of the $\\ell_0$ pseudo norm is able to better induce sparsity than the commonly used $\\ell_1$ norm. For a class of weakly convex sparsity inducing functions, we prove the nonconvexity of the corresponding sparse logistic regression problem, and study its local optimality conditions and the choice of the regularization parameter to exclude trivial solutions. Despite the nonconvexity, a method based on proximal gradient descent is used to solve the general weakly convex sparse logistic regression, and its convergence behavior is studied theoretically. Then the general framework is applied to a specific weakly convex function, and a necessary and sufficient local optimality condition is provided. The solution method is instantiated in this case as an iterative firm-shrinkage algorithm, and its effectiveness is demonstrated in numerical experiments by both randomly generated and real datasets.
A comparative study on entrepreneurial attitudes modeled with logistic regression and Bayes nets.
López Puga, Jorge; García García, Juan
2012-11-01
Entrepreneurship research is receiving increasing attention in our context, as entrepreneurs are key social agents involved in economic development. We compare the success of the dichotomic logistic regression model and the Bayes simple classifier to predict entrepreneurship, after manipulating the percentage of missing data and the level of categorization in predictors. A sample of undergraduate university students (N = 1230) completed five scales (motivation, attitude towards business creation, obstacles, deficiencies, and training needs) and we found that each of them predicted different aspects of the tendency to business creation. Additionally, our results show that the receiver operating characteristic (ROC) curve is affected by the rate of missing data in both techniques, but logistic regression seems to be more vulnerable when faced with missing data, whereas Bayes nets underperform slightly when categorization has been manipulated. Our study sheds light on the potential entrepreneur profile and we propose to use Bayesian networks as an additional alternative to overcome the weaknesses of logistic regression when missing data are present in applied research.
Campos-Filho, N; Franco, E L
1989-02-01
A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.
Comparison of cranial sex determination by discriminant analysis and logistic regression.
Amores-Ampuero, Anabel; Alemán, Inmaculada
2016-04-05
Various methods have been proposed for estimating dimorphism. The objective of this study was to compare sex determination results from cranial measurements using discriminant analysis or logistic regression. The study sample comprised 130 individuals (70 males) of known sex, age, and cause of death from San José cemetery in Granada (Spain). Measurements of 19 neurocranial dimensions and 11 splanchnocranial dimensions were subjected to discriminant analysis and logistic regression, and the percentages of correct classification were compared between the sex functions obtained with each method. The discriminant capacity of the selected variables was evaluated with a cross-validation procedure. The percentage accuracy with discriminant analysis was 78.2% for the neurocranium (82.4% in females and 74.6% in males) and 73.7% for the splanchnocranium (79.6% in females and 68.8% in males). These percentages were higher with logistic regression analysis: 85.7% for the neurocranium (in both sexes) and 94.1% for the splanchnocranium (100% in females and 91.7% in males).
Kim, Sun Mi; Han, Heon; Park, Jeong Mi; Choi, Yoon Jung; Yoon, Hoi Soo; Sohn, Jung Hee; Baek, Moon Hee; Kim, Yoon Nam; Chae, Young Moon; June, Jeon Jong; Lee, Jiwon; Jeon, Yong Hwan
2012-10-01
To determine which Breast Imaging Reporting and Data System (BI-RADS) descriptors for ultrasound are predictors for breast cancer using logistic regression (LR) analysis in conjunction with interobserver variability between breast radiologists, and to compare the performance of artificial neural network (ANN) and LR models in differentiation of benign and malignant breast masses. Five breast radiologists retrospectively reviewed 140 breast masses and described each lesion using BI-RADS lexicon and categorized final assessments. Interobserver agreements between the observers were measured by kappa statistics. The radiologists' responses for BI-RADS were pooled. The data were divided randomly into train (n = 70) and test sets (n = 70). Using train set, optimal independent variables were determined by using LR analysis with forward stepwise selection. The LR and ANN models were constructed with the optimal independent variables and the biopsy results as dependent variable. Performances of the models and radiologists were evaluated on the test set using receiver-operating characteristic (ROC) analysis. Among BI-RADS descriptors, margin and boundary were determined as the predictors according to stepwise LR showing moderate interobserver agreement. Area under the ROC curves (AUC) for both of LR and ANN were 0.87 (95% CI, 0.77-0.94). AUCs for the five radiologists ranged 0.79-0.91. There was no significant difference in AUC values among the LR, ANN, and radiologists (p > 0.05). Margin and boundary were found as statistically significant predictors with good interobserver agreement. Use of the LR and ANN showed similar performance to that of the radiologists for differentiation of benign and malignant breast masses.
Kim, Sung-Wan; Kang, Hee-Ju; Bae, Kyung-Yeol; Shin, Il-Seon; Hong, Young Joon; Ahn, Young-Keun; Jeong, Myung Ho; Berk, Michael; Yoon, Jin-Sang; Kim, Jae-Min
2018-01-03
Pro-inflammatory cytokines are associated with the development of depression and statins exert anti-inflammatory and antidepressant effects. The present study aimed to investigate associations between interleukin (IL)-6 and IL-18 and depression in patients with acute coronary syndrome (ACS) and potential interactions between statin use and pro-inflammatory cytokines on depression in this population. We used pooled datasets from 1-year follow-up data from a 24-week randomized double-blind placebo-controlled trial (RCT) of escitalopram for treatment of depressive disorder and data from a naturalistic, prospective, observational cohort study in patients with ACS. IL-6 and IL-18 levels were measured at baseline. Logistic regression models were used to investigate independent associations of IL-6/IL-18 levels with depressive disorder at baseline and at 1year. We repeated all analyses by reference to statin use to determine whether any significant association emerged. Of the 969 participants, 378 (39.0%) had major or minor depression at baseline. Of 711 patients followed-up at 1year, 183 (25.7%) had depression. Logistic regression analysis showed that higher IL-6 and IL-18 levels at baseline were significantly associated with baseline depression after adjusting for other variables (adjusted p-values=0.005 and 0.001, respectively). IL-6 and IL-18 levels were also significantly higher in patients with depression at the 1-year follow-up after adjusting for other variables amongst those not taking statins (adjusted p-values=0.040 and 0.004, respectively); but this was not the case in patients taking statins. Levels of pro-inflammatory cytokines appear to predict development of depression after ACS and statins attenuate the effects of cytokines on depression. Copyright © 2017 Elsevier Inc. All rights reserved.
Heikkilä, Katriina; Nyberg, Solja T.; Fransson, Eleonor I.; Alfredsson, Lars; De Bacquer, Dirk; Bjorner, Jakob B.; Bonenfant, Sébastien; Borritz, Marianne; Burr, Hermann; Clays, Els; Casini, Annalisa; Dragano, Nico; Erbel, Raimund; Geuskens, Goedele A.; Goldberg, Marcel; Hooftman, Wendela E.; Houtman, Irene L.; Joensuu, Matti; Jöckel, Karl-Heinz; Kittel, France; Knutsson, Anders; Koskenvuo, Markku; Koskinen, Aki; Kouvonen, Anne; Leineweber, Constanze; Lunau, Thorsten; Madsen, Ida E. H.; Hanson, Linda L. Magnusson; Marmot, Michael G.; Nielsen, Martin L.; Nordin, Maria; Pentti, Jaana; Salo, Paula; Rugulies, Reiner; Steptoe, Andrew; Siegrist, Johannes; Suominen, Sakari; Vahtera, Jussi; Virtanen, Marianna; Väänänen, Ari; Westerholm, Peter; Westerlund, Hugo; Zins, Marie; Theorell, Töres; Hamer, Mark; Ferrie, Jane E.; Singh-Manoux, Archana; Batty, G. David; Kivimäki, Mika
2012-01-01
Background Tobacco smoking is a major contributor to the public health burden and healthcare costs worldwide, but the determinants of smoking behaviours are poorly understood. We conducted a large individual-participant meta-analysis to examine the extent to which work-related stress, operationalised as job strain, is associated with tobacco smoking in working adults. Methodology and Principal Findings We analysed cross-sectional data from 15 European studies comprising 166 130 participants. Longitudinal data from six studies were used. Job strain and smoking were self-reported. Smoking was harmonised into three categories never, ex- and current. We modelled the cross-sectional associations using logistic regression and the results pooled in random effects meta-analyses. Mixed effects logistic regression was used to examine longitudinal associations. Of the 166 130 participants, 17% reported job strain, 42% were never smokers, 33% ex-smokers and 25% current smokers. In the analyses of the cross-sectional data, current smokers had higher odds of job strain than never-smokers (age, sex and socioeconomic position-adjusted odds ratio: 1.11, 95% confidence interval: 1.03, 1.18). Current smokers with job strain smoked, on average, three cigarettes per week more than current smokers without job strain. In the analyses of longitudinal data (1 to 9 years of follow-up), there was no clear evidence for longitudinal associations between job strain and taking up or quitting smoking. Conclusions Our findings show that smokers are slightly more likely than non-smokers to report work-related stress. In addition, smokers who reported work stress smoked, on average, slightly more cigarettes than stress-free smokers. PMID:22792154
Predictors of Donor Heart Utilization for Transplantation in United States.
Trivedi, Jaimin R; Cheng, Allen; Gallo, Michele; Schumer, Erin M; Massey, H Todd; Slaughter, Mark S
2017-06-01
Optimum use of donor organs can increase the reach of the transplantation therapy to more patients on waiting list. The heart transplantation (HTx) has remained stagnant in United States over the past decade at approximately 2,500 HTx annually. With the use of the United Network of Organ Sharing (UNOS) deceased donor database (DCD) we aimed to evaluate donor factors predicting donor heart utilization. UNOS DCD was queried from 2005 to 2014 to identify total number of donors who had at least one of their organs donated. We then generated a multivariate logistic regression model using various demographic and clinical donor factors to predict donor heart use for HTx. Donor hearts not recovered due to consent or family issues or recovered for nontransplantation reasons were excluded from the analysis. During the study period there were 80,782 donors of which 23,606 (29%) were used for HTx, and 38,877 transplants (48%) were not used after obtaining consent because of poor organ function (37%), donor medical history (13%), and organ refused by all programs (5%). Of all, 22,791 donors with complete data were used for logistic regression (13,389 HTx, 9,402 no-HTx) which showed significant predictors of donor heart use for HTx. From this model we assigned probability of donor heart use and identified 3,070 donors with HTx-eligible unused hearts for reasons of poor organ function (28%), organ refused by all programs (15%), and recipient not located (9%). An objective system based on donor factors can predict donor heart use for HTx and may help increase availability of hearts for transplantation from existing donor pool. Copyright © 2017 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.
Hill, Andrew; Loh, Po-Ru; Bharadwaj, Ragu B.; Pons, Pascal; Shang, Jingbo; Guinan, Eva; Lakhani, Karim; Kilty, Iain
2017-01-01
Abstract Background: The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Results: Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Conclusions: Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics. PMID:28327993
Hill, Andrew; Loh, Po-Ru; Bharadwaj, Ragu B; Pons, Pascal; Shang, Jingbo; Guinan, Eva; Lakhani, Karim; Kilty, Iain; Jelinsky, Scott A
2017-05-01
The association of differing genotypes with disease-related phenotypic traits offers great potential to both help identify new therapeutic targets and support stratification of patients who would gain the greatest benefit from specific drug classes. Development of low-cost genotyping and sequencing has made collecting large-scale genotyping data routine in population and therapeutic intervention studies. In addition, a range of new technologies is being used to capture numerous new and complex phenotypic descriptors. As a result, genotype and phenotype datasets have grown exponentially. Genome-wide association studies associate genotypes and phenotypes using methods such as logistic regression. As existing tools for association analysis limit the efficiency by which value can be extracted from increasing volumes of data, there is a pressing need for new software tools that can accelerate association analyses on large genotype-phenotype datasets. Using open innovation (OI) and contest-based crowdsourcing, the logistic regression analysis in a leading, community-standard genetics software package (PLINK 1.07) was substantially accelerated. OI allowed us to do this in <6 months by providing rapid access to highly skilled programmers with specialized, difficult-to-find skill sets. Through a crowd-based contest a combination of computational, numeric, and algorithmic approaches was identified that accelerated the logistic regression in PLINK 1.07 by 18- to 45-fold. Combining contest-derived logistic regression code with coarse-grained parallelization, multithreading, and associated changes to data initialization code further developed through distributed innovation, we achieved an end-to-end speedup of 591-fold for a data set size of 6678 subjects by 645 863 variants, compared to PLINK 1.07's logistic regression. This represents a reduction in run time from 4.8 hours to 29 seconds. Accelerated logistic regression code developed in this project has been incorporated into the PLINK2 project. Using iterative competition-based OI, we have developed a new, faster implementation of logistic regression for genome-wide association studies analysis. We present lessons learned and recommendations on running a successful OI process for bioinformatics. © The Author 2017. Published by Oxford University Press.
Lin, Chao-Cheng; Bai, Ya-Mei; Chen, Jen-Yeu; Hwang, Tzung-Jeng; Chen, Tzu-Ting; Chiu, Hung-Wen; Li, Yu-Chuan
2010-03-01
Metabolic syndrome (MetS) is an important side effect of second-generation antipsychotics (SGAs). However, many SGA-treated patients with MetS remain undetected. In this study, we trained and validated artificial neural network (ANN) and multiple logistic regression models without biochemical parameters to rapidly identify MetS in patients with SGA treatment. A total of 383 patients with a diagnosis of schizophrenia or schizoaffective disorder (DSM-IV criteria) with SGA treatment for more than 6 months were investigated to determine whether they met the MetS criteria according to the International Diabetes Federation. The data for these patients were collected between March 2005 and September 2005. The input variables of ANN and logistic regression were limited to demographic and anthropometric data only. All models were trained by randomly selecting two-thirds of the patient data and were internally validated with the remaining one-third of the data. The models were then externally validated with data from 69 patients from another hospital, collected between March 2008 and June 2008. The area under the receiver operating characteristic curve (AUC) was used to measure the performance of all models. Both the final ANN and logistic regression models had high accuracy (88.3% vs 83.6%), sensitivity (93.1% vs 86.2%), and specificity (86.9% vs 83.8%) to identify MetS in the internal validation set. The mean +/- SD AUC was high for both the ANN and logistic regression models (0.934 +/- 0.033 vs 0.922 +/- 0.035, P = .63). During external validation, high AUC was still obtained for both models. Waist circumference and diastolic blood pressure were the common variables that were left in the final ANN and logistic regression models. Our study developed accurate ANN and logistic regression models to detect MetS in patients with SGA treatment. The models are likely to provide a noninvasive tool for large-scale screening of MetS in this group of patients. (c) 2010 Physicians Postgraduate Press, Inc.
Bayesian logistic regression in detection of gene-steroid interaction for cancer at PDLIM5 locus.
Wang, Ke-Sheng; Owusu, Daniel; Pan, Yue; Xie, Changchun
2016-06-01
The PDZ and LIM domain 5 (PDLIM5) gene may play a role in cancer, bipolar disorder, major depression, alcohol dependence and schizophrenia; however, little is known about the interaction effect of steroid and PDLIM5 gene on cancer. This study examined 47 single-nucleotide polymorphisms (SNPs) within the PDLIM5 gene in the Marshfield sample with 716 cancer patients (any diagnosed cancer, excluding minor skin cancer) and 2848 noncancer controls. Multiple logistic regression model in PLINK software was used to examine the association of each SNP with cancer. Bayesian logistic regression in PROC GENMOD in SAS statistical software, ver. 9.4 was used to detect gene- steroid interactions influencing cancer. Single marker analysis using PLINK identified 12 SNPs associated with cancer (P< 0.05); especially, SNP rs6532496 revealed the strongest association with cancer (P = 6.84 × 10⁻³); while the next best signal was rs951613 (P = 7.46 × 10⁻³). Classic logistic regression in PROC GENMOD showed that both rs6532496 and rs951613 revealed strong gene-steroid interaction effects (OR=2.18, 95% CI=1.31-3.63 with P = 2.9 × 10⁻³ for rs6532496 and OR=2.07, 95% CI=1.24-3.45 with P = 5.43 × 10⁻³ for rs951613, respectively). Results from Bayesian logistic regression showed stronger interaction effects (OR=2.26, 95% CI=1.2-3.38 for rs6532496 and OR=2.14, 95% CI=1.14-3.2 for rs951613, respectively). All the 12 SNPs associated with cancer revealed significant gene-steroid interaction effects (P < 0.05); whereas 13 SNPs showed gene-steroid interaction effects without main effect on cancer. SNP rs4634230 revealed the strongest gene-steroid interaction effect (OR=2.49, 95% CI=1.5-4.13 with P = 4.0 × 10⁻⁴ based on the classic logistic regression and OR=2.59, 95% CI=1.4-3.97 from Bayesian logistic regression; respectively). This study provides evidence of common genetic variants within the PDLIM5 gene and interactions between PLDIM5 gene polymorphisms and steroid use influencing cancer.
Deletion Diagnostics for Alternating Logistic Regressions
Preisser, John S.; By, Kunthel; Perin, Jamie; Qaqish, Bahjat F.
2013-01-01
Deletion diagnostics are introduced for the regression analysis of clustered binary outcomes estimated with alternating logistic regressions, an implementation of generalized estimating equations (GEE) that estimates regression coefficients in a marginal mean model and in a model for the intracluster association given by the log odds ratio. The diagnostics are developed within an estimating equations framework that recasts the estimating functions for association parameters based upon conditional residuals into equivalent functions based upon marginal residuals. Extensions of earlier work on GEE diagnostics follow directly, including computational formulae for one-step deletion diagnostics that measure the influence of a cluster of observations on the estimated regression parameters and on the overall marginal mean or association model fit. The diagnostic formulae are evaluated with simulations studies and with an application concerning an assessment of factors associated with health maintenance visits in primary care medical practices. The application and the simulations demonstrate that the proposed cluster-deletion diagnostics for alternating logistic regressions are good approximations of their exact fully iterated counterparts. PMID:22777960
Knol, Mirjam J; van der Tweel, Ingeborg; Grobbee, Diederick E; Numans, Mattijs E; Geerlings, Mirjam I
2007-10-01
To determine the presence of interaction in epidemiologic research, typically a product term is added to the regression model. In linear regression, the regression coefficient of the product term reflects interaction as departure from additivity. However, in logistic regression it refers to interaction as departure from multiplicativity. Rothman has argued that interaction estimated as departure from additivity better reflects biologic interaction. So far, literature on estimating interaction on an additive scale using logistic regression only focused on dichotomous determinants. The objective of the present study was to provide the methods to estimate interaction between continuous determinants and to illustrate these methods with a clinical example. and results From the existing literature we derived the formulas to quantify interaction as departure from additivity between one continuous and one dichotomous determinant and between two continuous determinants using logistic regression. Bootstrapping was used to calculate the corresponding confidence intervals. To illustrate the theory with an empirical example, data from the Utrecht Health Project were used, with age and body mass index as risk factors for elevated diastolic blood pressure. The methods and formulas presented in this article are intended to assist epidemiologists to calculate interaction on an additive scale between two variables on a certain outcome. The proposed methods are included in a spreadsheet which is freely available at: http://www.juliuscenter.nl/additive-interaction.xls.
ERIC Educational Resources Information Center
Osborne, Jason W.
2012-01-01
Logistic regression is slowly gaining acceptance in the social sciences, and fills an important niche in the researcher's toolkit: being able to predict important outcomes that are not continuous in nature. While OLS regression is a valuable tool, it cannot routinely be used to predict outcomes that are binary or categorical in nature. These…
Lee, Hui; Wang, Qiong; Yang, Fei; Tao, Ping; Li, Hui; Huang, Yuan; Li, Jia-Yuan
2012-05-01
SULT1A1 is involved in both detoxification of estrogens and bioactivation of carcinogens in smoked meat. SULT1A1 Arg213His polymorphism's effect on breast cancer risk is still unclear. We recruited 400 case-control pairs to investigate the association between SULT1A1 genotypes and breast cancer risk, and the combined effect of SULT1A1 polymorphism and daily intake of smoked meat. Participants were questioned about their dietary habits and other risk factors, and their SULT1A1 genotypes were determined. Adjusted odds ratios (aORs) and 95% confidence intervals (CIs) were estimated by multivariable unconditional logistic regression. We also performed a meta-analysis of relevant published studies to test these associations. In the case-control study, no significant associations were observed between SULT1A1 polymorphism and breast cancer risk. In the meta-analysis, SULT1A1 His/His genotype slightly increased risk among both overall and postmenopausal women (OR(pooled-overall)=1.12, 95% CI: 1.02-1.24; OR(pooled-post)=1.17, 95% CI: 1.03-1.32). A larger positive association was observed in Asian populations (OR(pooled-Asian)=2.01, 95% CI: 1.24-3.26). In our case-control study, high energy-adjusted daily intake of smoked meat was significantly associated with breast cancer risk in overall, pre- and postmenopausal women (aORs: 2.31-3.13, OR 95% CIs exclude 1). High smoked meat intake interacted positively with the His variant allele (all γ>1). These results correlated with those of the meta-analysis (γ(pooled-overall)=1.27). The SULT1A1 His/His genotype may increase the risk of breast cancer among Asian women, and dietary exposure to heterocyclic amines and polycyclic aromatic hydrocarbons, along with the SULT1A1 His/His variant genotype, may synergistically increase the risk of breast cancer.
Saber, Hamidreza; Yakoob, Mohammad Yawar; Shi, Peilin; Longstreth, W T; Lemaitre, Rozenn N; Siscovick, David; Rexrode, Kathryn M; Willett, Walter C; Mozaffarian, Dariush
2017-10-01
The associations of individual long-chain n-3 polyunsaturated fatty acids with incident ischemic stroke and its main subtypes are not well established. We aimed to investigate prospectively the relationship of circulating eicosapentaenoic acid, docosapentaenoic acid (DPA), and docosahexaenoic acid (DHA) with risk of total ischemic, atherothrombotic, and cardioembolic stroke. We measured circulating phospholipid fatty acids at baseline in 3 separate US cohorts: CHS (Cardiovascular Health Study), NHS (Nurses' Health Study), and HPFS (Health Professionals Follow-Up Study). Ischemic strokes were prospectively adjudicated and classified into atherothrombotic (large- and small-vessel infarctions) or cardioembolic by imaging studies and medical records. Risk according to fatty acid levels was assessed using Cox proportional hazards (CHS) or conditional logistic regression (NHS, HPFS) according to study design. Cohort findings were pooled using fixed-effects meta-analysis. A total of 953 incident ischemic strokes were identified (408 atherothrombotic, 256 cardioembolic, and 289 undetermined subtypes) during median follow-up of 11.2 years (CHS) and 8.3 years (pooled, NHS and HPFS). After multivariable adjustment, lower risk of total ischemic stroke was seen with higher DPA (highest versus lowest quartiles; pooled hazard ratio [HR], 0.74; 95% confidence interval [CI], 0.58-0.92) and DHA (HR, 0.80; 95% CI, 0.64-1.00) but not eicosapentaenoic acid (HR, 0.94; 95% CI, 0.77-1.19). DHA was associated with lower risk of atherothrombotic stroke (HR, 0.53; 95% CI, 0.34-0.83) and DPA with lower risk of cardioembolic stroke (HR, 0.58; 95% CI, 0.37-0.92). Findings in each individual cohort were consistent with pooled results. In 3 large US cohorts, higher circulating levels of DHA were inversely associated with incident atherothrombotic stroke and DPA with cardioembolic stroke. These novel findings suggest differential pathways of benefit for DHA, DPA, and eicosapentaenoic acid. © 2017 American Heart Association, Inc.
C-Reactive Protein and the Incidence of Macular Degeneration – Pooled Analysis of 5 Cohorts
Mitta, Vinod P.; Christen, William G.; Glynn, Robert J.; Semba, Richard D.; Ridker, Paul M.; Rimm, Eric B.; Hankinson, Susan E.; Schaumberg, Debra A.
2013-01-01
Objectives To investigate the relationship between high-sensitivity C-reactive protein (hsCRP) and future risk of age-related macular degeneration (AMD) in US men and women. Methods We measured hsCRP in baseline blood samples from participants in five ongoing cohort studies. Patients were initially free of AMD. We prospectively identified 647 incident cases of AMD and selected age- and sex-matched controls for each AMD case (2 controls for each case with dry AMD, or 3 controls for each case of neovascular AMD). We used conditional logistic regression models to examine the relationship between hsCRP and AMD, and pooled findings using meta-analytic techniques. Results After adjusting for cigarette smoking status, participants with high (> 3 mg/L) compared with low (< 1 mg/L) hsCRP levels, had cohort-specific odds ratios (OR) for incident AMD ranging from 0.94 (95% CI 0.58-1.51) in the Physicians’ Health Study to 2.59 (95% CI 0.58-11.67) in the Women’s Antioxidant and Folic Acid Cardiovascular Study. After testing for heterogeneity between studies (Q=5.61, p=0.23), we pooled findings across cohorts, and observed a significantly increased risk of incident AMD for high versus low hsCRP levels (OR=1.49, 95% CI 1.06-2.08). Risk of neovascular AMD was also increased among those with high hsCRP levels (OR=1.84, 95% CI 1.14-2.98). Conclusion Overall these pooled findings from 5 prospective cohorts add further evidence that elevated levels of hsCRP predict greater future risk of AMD. This information might shed light on underlying mechanisms, and could be of clinical utility in the identification of persons at high risk of AMD who may benefit from increased adherence to lifestyle recommendations, eye examination schedules, and therapeutic protocols. PMID:23392454
Intermediate and advanced topics in multilevel logistic regression analysis
Merlo, Juan
2017-01-01
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher‐level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within‐cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster‐level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28543517
Intermediate and advanced topics in multilevel logistic regression analysis.
Austin, Peter C; Merlo, Juan
2017-09-10
Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Predicting Social Trust with Binary Logistic Regression
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph; Hufstedler, Shirley
2015-01-01
This study used binary logistic regression to predict social trust with five demographic variables from a national sample of adult individuals who participated in The General Social Survey (GSS) in 2012. The five predictor variables were respondents' highest degree earned, race, sex, general happiness and the importance of personally assisting…
Effect of folic acid on appetite in children: ordinal logistic and fuzzy logistic regressions.
Namdari, Mahshid; Abadi, Alireza; Taheri, S Mahmoud; Rezaei, Mansour; Kalantari, Naser; Omidvar, Nasrin
2014-03-01
Reduced appetite and low food intake are often a concern in preschool children, since it can lead to malnutrition, a leading cause of impaired growth and mortality in childhood. It is occasionally considered that folic acid has a positive effect on appetite enhancement and consequently growth in children. The aim of this study was to assess the effect of folic acid on the appetite of preschool children 3 to 6 y old. The study sample included 127 children ages 3 to 6 who were randomly selected from 20 preschools in the city of Tehran in 2011. Since appetite was measured by linguistic terms, a fuzzy logistic regression was applied for modeling. The obtained results were compared with a statistical ordinal logistic model. After controlling for the potential confounders, in a statistical ordinal logistic model, serum folate showed a significantly positive effect on appetite. A small but positive effect of folate was detected by fuzzy logistic regression. Based on fuzzy regression, the risk for poor appetite in preschool children was related to the employment status of their mothers. In this study, a positive association was detected between the levels of serum folate and improved appetite. For further investigation, a randomized controlled, double-blind clinical trial could be helpful to address causality. Copyright © 2014 Elsevier Inc. All rights reserved.
Rios, Paula; Bailey, Helen D; Lacour, Brigitte; Valteau-Couanet, Dominique; Michon, Jean; Bergeron, Christophe; Boutroux, Hélène; Defachelles, Anne-Sophie; Gambart, Marion; Sirvent, Nicolas; Thebaud, Estelle; Ducassou, Stéphane; Orsi, Laurent; Clavel, Jacqueline
2017-10-01
Neuroblastoma (NB) is an embryonic tumor that occurs almost exclusively in infancy and early childhood. While considerable evidence suggests that it may be initiated during embryonic development, the etiology of NB is still unknown. The aim of this study was to explore whether there is an association between maternal use of household pesticides during pregnancy and the risk of NB in the offspring. We conducted a pooled analysis of two French national-based case-control studies. The mothers of 357 NB cases and 1,783 controls younger than 6 years, frequency-matched by age and gender, responded to a telephone interview that focused on sociodemographic and perinatal characteristics, childhood environment, and life-style. Unconditional logistic regression was used to estimate pooled odds ratios and 95% confidence intervals. After controlling for matching variables, study of origin, and potential confounders, the maternal use of any type of pesticide during pregnancy was associated with NB (OR 1.5 [95% CI 1.2-1.9]). The most commonly used type of pesticides were insecticides and there was a positive association with their use alone (OR 1.4 [95% CI 1.1-1.9]) or with other pesticides (OR 2.0 [95% CI 1.1-3.4]). Although there is the potential for recall bias due to the study design, our findings add to the evidence of an association between the household use of pesticides and NB. Until a better study design can be found, our findings add yet another reason why to advise pregnant women to limit pesticide exposure during the periconceptional period.
Reduced Risk of Barrett's Esophagus in Statin Users: Case-Control Study and Meta-Analysis.
Beales, Ian L P; Dearman, Leanne; Vardi, Inna; Loke, Yoon
2016-01-01
Use of statins has been associated with a reduced incidence of esophageal adenocarcinoma in population-based studies. However there are few studies examining statin use and the development of Barrett's esophagus. The purpose of this study was to examine the association between statin use and the presence of Barrett's esophagus in patients having their first gastroscopy. We have performed a case-control study comparing statin use between patients with, and without, an incident diagnosis of non-dysplastic Barrett's esophagus. Male Barrett's cases (134) were compared to 268 male age-matched controls in each of two control groups (erosive gastro-esophageal reflux and dyspepsia without significant upper gastrointestinal disease). Risk factor and drug exposure were established using standardised interviews. Logistic regression was used to compare statin exposure and correct for confounding factors. We performed a meta-analysis pooling our results with three other case-control studies. Regular statin use was associated with a significantly lower incidence of Barrett's esophagus compared to the combined control groups [adjusted OR 0.62 (95 % confidence intervals 0.37-0.93)]. This effect was more marked in combined statin plus aspirin users [adjusted OR 0.43 (95 % CI 0.21-0.89)]. The inverse association between statin or statin plus aspirin use and risk of Barrett's was significantly greater with longer duration of use. Meta-analysis of pooled data (1098 Barrett's, 2085 controls) showed that statin use was significantly associated with a reduced risk of Barrett's esophagus [pooled adjusted OR 0.63 (95 % CI 0.51-0.77)]. Statin use is associated with a reduced incidence of a new diagnosis of Barrett's esophagus.
Perfluoroalkyl substances and time to pregnancy in couples from Greenland, Poland and Ukraine.
Jørgensen, Kristian T; Specht, Ina O; Lenters, Virissa; Bach, Cathrine C; Rylander, Lars; Jönsson, Bo A G; Lindh, Christian H; Giwercman, Aleksander; Heederik, Dick; Toft, Gunnar; Bonde, Jens Peter
2014-12-22
Perfluoroalkyl substances (PFAS) are suggested to affect human fecundity through longer time to pregnancy (TTP). We studied the relationship between four abundant PFAS and TTP in pregnant women from Greenland, Poland and Ukraine representing varying PFAS exposures and pregnancy planning behaviors. We measured serum levels of perfluorooctanoic acid (PFOA), perfluorooctane sulfonate (PFOS), perfluorohexane sulfonic acid (PFHxS) and perfluorononanoic acid (PFNA) in 938 women from Greenland (448 women), Poland (203 women) and Ukraine (287 women). PFAS exposure was assessed on a continuous logarithm transformed scale and in country-specific tertiles. We used Cox discrete-time models and logistic regression to estimate fecundability ratios (FRs) and infertility (TTP >13 months) odds ratios (ORs), respectively, and 95% confidence intervals (CI) according to PFAS levels. Adjusted analyses of the association between PFAS and TTP were done for each study population and in a pooled sample. Higher PFNA levels were associated with longer TTP in the pooled sample (log-scale FR = 0.80; 95% CI 0.69-0.94) and specifically in women from Greenland (log-scale FR = 0.72; 95% CI 0.58-0.89). ORs for infertility were also increased in the pooled sample (log-scale OR = 1.53; 95% CI 1.08-2.15) and in women from Greenland (log-scale OR = 1.97; 95% CI 1.22-3.19). However, in a sensitivity analysis of primiparous women these associations could not be replicated. Associations with PFNA were weaker for women from Poland and Ukraine. PFOS, PFOA and PFHxS were not consistently associated with TTP. Findings do not provide consistent evidence that environmental exposure to PFAS is impairing female fecundity by delaying time taken to conceive.
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Suh, Sunghwan; Song, Sun Ok; Kim, Jae Hyeon; Cho, Hyungjin; Lee, Woo Je; Lee, Byung-Wan
2017-01-01
The present observational study aimed to evaluate the clinical effectiveness of vildagliptin with metformin in Korean patients with type 2 diabetes mellitus (T2DM). Data were pooled from the vildagliptin postmarketing survey (PMS), the vildagliptin/metformin fixed drug combination (DC) PMS, and a retrospective observational study of vildagliptin/metformin (fixed DC or free DC). The effectiveness endpoint was the proportion of patients who achieved a glycemic target (HbA1c) of ≤7.0% at 24 weeks. In total, 4303 patients were included in the analysis; of these, 2087 patients were eligible. The mean patient age was 56.99 ± 11.25 years. Overall, 58.94% patients achieved an HbA1c target of ≤7.0% at 24 weeks. The glycemic target achievement rate was significantly greater in patients with baseline HbA1c < 7.5% versus ≥7.5% (84.64% versus 43.97%), receiving care at the hospital versus clinic (67.95% versus 52.33%), and receiving vildagliptin/metformin fixed DC versus free DC (70.69% versus 55.42%). Multivariate logistic regression analysis indicated that disease duration ( P < 0.0001), baseline HbA1c ( P < 0.0001), and DC type ( P = 0.0103) had significant effects on drug effectiveness. Vildagliptin plus metformin appeared as an effective treatment option for patients with T2DM in clinical practice settings in Korea.
The influence of weather on migraine – are migraine attacks predictable?
Hoffmann, Jan; Schirra, Tonio; Lo, Hendra; Neeb, Lars; Reuter, Uwe; Martus, Peter
2015-01-01
Objective The study aimed at elucidating a potential correlation between specific meteorological variables and the prevalence and intensity of migraine attacks as well as exploring a potential individual predictability of a migraine attack based on meteorological variables and their changes. Methods Attack prevalence and intensity of 100 migraineurs were correlated with atmospheric pressure, relative air humidity, and ambient temperature in 4-h intervals over 12 consecutive months. For each correlation, meteorological parameters at the time of the migraine attack as well as their variation within the preceding 24 h were analyzed. For migraineurs showing a positive correlation, logistic regression analysis was used to assess the predictability of a migraine attack based on meteorological information. Results In a subgroup of migraineurs, a significant weather sensitivity could be observed. In contrast, pooled analysis of all patients did not reveal a significant association. An individual prediction of a migraine attack based on meteorological data was not possible, mainly as a result of the small prevalence of attacks. Interpretation The results suggest that only a subgroup of migraineurs is sensitive to specific weather conditions. Our findings may provide an explanation as to why previous studies, which commonly rely on a pooled analysis, show inconclusive results. The lack of individual attack predictability indicates that the use of preventive measures based on meteorological conditions is not feasible. PMID:25642431
Tunstall, H; Pearce, J R; Shortt, N K; Mitchell, R J
2015-12-01
Selective migration may influence the association between physical environments and health. This analysis assessed whether residential mobility concentrates people with poor health in neighbourhoods of the UK with disadvantaged physical environments. Data were from the British Household Panel Survey. Moves were over 1 year between adjacent survey waves, pooled over 10 pairs of waves, 1996-2006. Health outcomes were self-reported poor general health and mental health problems. Neighbourhood physical environment was defined using the Multiple Environmental Deprivation Index (MEDIx) for wards. Logistic regression analysis compared risk of poor health in MEDIx categories before and after moves. Analyses were stratified by age groups 18-29, 30-44, 45-59 and 60+ years and adjusted for age, sex, marital status, household type, housing tenure, education and social class. The pooled data contained 122 570 observations. 8.5% moved between survey waves but just 3.0% changed their MEDIx category. In all age groups odds ratios for poor general and mental health were not significantly increased in the most environmentally deprived neighbourhoods following moves. Over a 1-year time period residential moves between environments with different levels of multiple physical deprivation were rare and did not significantly raise rates of poor health in the most deprived areas. © The Author 2014. Published by Oxford University Press on behalf of Faculty of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Circulating 25-Hydroxyvitamin D and Risk of Pancreatic Cancer
Stolzenberg-Solomon, Rachael Z.; Jacobs, Eric J.; Arslan, Alan A.; Qi, Dai; Patel, Alpa V.; Helzlsouer, Kathy J.; Weinstein, Stephanie J.; McCullough, Marjorie L.; Purdue, Mark P.; Shu, Xiao-Ou; Snyder, Kirk; Virtamo, Jarmo; Wilkins, Lynn R.; Yu, Kai; Zeleniuch-Jacquotte, Anne; Zheng, Wei; Albanes, Demetrius; Cai, Qiuyin; Harvey, Chinonye; Hayes, Richard; Clipp, Sandra; Horst, Ronald L.; Irish, Lonn; Koenig, Karen; Le Marchand, Loic; Kolonel, Laurence N.
2010-01-01
Results from epidemiologic studies examining pancreatic cancer risk and vitamin D intake or 25-hydroxyvitamin D (25(OH)D) concentrations (the best indicator of vitamin D derived from diet and sun) have been inconsistent. Therefore, the authors conducted a pooled nested case-control study of participants from 8 cohorts within the Cohort Consortium Vitamin D Pooling Project of Rarer Cancers (VDPP) (1974–2006) to evaluate whether prediagnostic circulating 25(OH)D concentrations were associated with the development of pancreatic cancer. In total, 952 incident pancreatic adenocarcinoma cases occurred among participants (median follow-up, 6.5 years). Controls (n = 1,333) were matched to each case by cohort, age, sex, race/ethnicity, date of blood draw, and follow-up time. Conditional logistic regression analysis was used to calculate smoking-, body mass index-, and diabetes-adjusted odds ratios and 95% confidence intervals for pancreatic cancer. Clinically relevant 25(OH)D cutpoints were compared with a referent category of 50–<75 nmol/L. No significant associations were observed for participants with lower 25(OH)D status. However, a high 25(OH)D concentration (≥100 nmol/L) was associated with a statistically significant 2-fold increase in pancreatic cancer risk overall (odds ratio = 2.12, 95% confidence interval: 1.23, 3.64). Given this result, recommendations to increase vitamin D concentrations in healthy persons for the prevention of cancer should be carefully considered. PMID:20562185
Rudant, Jérémie; Lightfoot, Tracy; Urayama, Kevin Y.; Petridou, Eleni; Dockerty, John D.; Magnani, Corrado; Milne, Elizabeth; Spector, Logan G.; Ashton, Lesley J.; Dessypris, Nikolaos; Kang, Alice Y.; Miller, Margaret; Rondelli, Roberto; Simpson, Jill; Stiakaki, Eftichia; Orsi, Laurent; Roman, Eve; Metayer, Catherine; Infante-Rivard, Claire; Clavel, Jacqueline
2015-01-01
The associations between childhood acute lymphoblastic leukemia (ALL) and several proxies of early stimulation of the immune system, that is, day-care center attendance, birth order, maternally reported common infections in infancy, and breastfeeding, were investigated by using data from 11 case-control studies participating in the Childhood Leukemia International Consortium (enrollment period: 1980–2010). The sample included 7,399 ALL cases and 11,181 controls aged 2–14 years. The data were collected by questionnaires administered to the parents. Pooled odds ratios and 95% confidence intervals were estimated by unconditional logistic regression adjusted for age, sex, study, maternal education, and maternal age. Day-care center attendance in the first year of life was associated with a reduced risk of ALL (odds ratio = 0.77, 95% confidence interval: 0.71, 0.84), with a marked inverse trend with earlier age at start (P < 0.0001). An inverse association was also observed with breastfeeding duration of 6 months or more (odds ratio = 0.86, 95% confidence interval: 0.79, 0.94). No significant relationship with a history of common infections in infancy was observed even though the odds ratio was less than 1 for more than 3 infections. The findings of this large pooled analysis reinforce the hypothesis that day-care center attendance in infancy and prolonged breastfeeding are associated with a decreased risk of ALL. PMID:25731888
Apfel, Christian C; Souza, Kimberly; Portillo, Juan; Dalal, Poorvi; Bergese, Sergio D
2015-01-01
Intravenous (IV) acetaminophen has been shown to reduce postoperative pain and opioid consumption, which may lead to increased patient satisfaction. To determine the effect IV acetaminophen has on patient satisfaction, a pooled analysis from methodologically homogenous studies was conducted. We obtained patient-level data from five randomized, placebo-controlled studies in adults undergoing elective surgery in which patient satisfaction was measured using a 4-point categorical rating scale. The primary endpoint was "excellent" satisfaction and the secondary endpoint was "good" or "excellent" satisfaction at 24 hr after first study drug administration. Bivariate analyses were conducted using the chi-square test and Student's t-test and multivariable analyses were conducted using logistic regression analysis. Patients receiving IV acetaminophen were more than twice as likely as those who received placebo to report "excellent" patient satisfaction ratings (32.3% vs. 15.9%, respectively). Of all variables that remained statistically significant in the multivariable analysis (i.e., type of surgery, duration of anesthesia, last pain rating, and opioid consumption), IV acetaminophen had the strongest positive effect on "excellent" patient satisfaction with an odds ratio of 2.76 (95% CI 1.81-4.23). Results for "excellent" or "good" satisfaction were similar. When given as part of a perioperative analgesic regimen, IV acetaminophen was associated with significantly improved patient satisfaction.
Voltzke, Kristin J; Lee, Yuan-Chin Amy; Zhang, Zuo-Feng; Zevallos, Jose P; Yu, Guo-Pei; Winn, Deborah M; Vaughan, Thomas L; Sturgis, Erich M; Smith, Elaine; Schwartz, Stephen M; Schantz, Stimson; Muscat, Joshua; Morgenstern, Hal; McClean, Michael; Li, Guojun; Lazarus, Philip; Kelsey, Karl; Gillison, Maura; Chen, Chu; Boffetta, Paolo; Hashibe, Mia; Olshan, Andrew F
2018-05-14
There have been few published studies on differences between Blacks and Whites in the estimated effects of alcohol and tobacco use on the incidence of head and neck cancer (HNC) in the United States. Previous studies have been limited by small numbers of Blacks. Using pooled data from 13 US case-control studies of oral, pharyngeal, and laryngeal cancers in the International Head and Neck Cancer Epidemiology Consortium, this study comprised a large number of Black HNC cases (n = 975). Logistic regression was used to estimate adjusted odds ratios (OR) and 95% confidence intervals (CI) for several tobacco and alcohol consumption characteristics. Blacks were found to have consistently stronger associations than Whites for the majority of tobacco consumption variables. For example, compared to never smokers, Blacks who smoked cigarettes for > 30 years had an OR 4.53 (95% CI 3.22-6.39), which was larger than that observed in Whites (OR 3.01, 95% CI 2.73-3.33; p interaction < 0.0001). The ORs for alcohol use were also larger among Blacks compared to Whites. Exclusion of oropharyngeal cases attenuated the racial differences in tobacco use associations but not alcohol use associations. These findings suggest modest racial differences exist in the association of HNC risk with tobacco and alcohol consumption.
Birth characteristics and childhood carcinomas.
Johnson, K J; Carozza, S E; Chow, E J; Fox, E E; Horel, S; McLaughlin, C C; Mueller, B A; Puumala, S E; Reynolds, P; Von Behren, J; Spector, L G
2011-10-25
Carcinomas in children are rare and have not been well studied. We conducted a population-based case-control study and examined associations between birth characteristics and childhood carcinomas diagnosed from 28 days to 14 years during 1980-2004 using pooled data from five states (NY, WA, MN, TX, and CA) that linked their birth and cancer registries. The pooled data set contained 57,966 controls and 475 carcinoma cases, including 159 thyroid and 126 malignant melanoma cases. We used unconditional logistic regression to calculate odds ratios (ORs) and 95% confidence intervals (CIs). White compared with 'other' race was positively associated with melanoma (OR=3.22, 95% CI 1.33-8.33). Older maternal age increased the risk for melanoma (OR(per 5-year age increase)=1.20, 95% CI 1.00-1.44), whereas paternal age increased the risk for any carcinoma (OR=1.10(per 5-year age increase), 95% CI 1.01-1.20) and thyroid carcinoma (OR(per 5-year age increase)=1.16, 95% CI 1.01-1.33). Gestational age < 37 vs 37-42 weeks increased the risk for thyroid carcinoma (OR=1.87, 95% CI 1.07-3.27). Plurality, birth weight, and birth order were not significantly associated with childhood carcinomas. This exploratory study indicates that some birth characteristics including older parental age and low gestational age may be related to childhood carcinoma aetiology.
Rodgers, Stephanie; Ajdacic-Gross, Vladeta; Kawohl, Wolfram; Müller, Mario; Rössler, Wulf; Hengartner, Michael P; Castelao, Enrique; Vandeleur, Caroline; Angst, Jules; Preisig, Martin
2015-12-01
Due to its heterogeneous phenomenology, obsessive-compulsive disorder (OCD) has been subtyped. However, these subtypes are not mutually exclusive. This study presents an alternative subtyping approach by deriving non-overlapping OCD subtypes. A pure compulsive and a mixed obsessive-compulsive subtype (including subjects manifesting obsessions with/without compulsions) were analyzed with respect to a broad pattern of psychosocial risk factors and comorbid syndromes/diagnoses in three representative Swiss community samples: the Zurich Study (n = 591), the ZInEP sample (n = 1500), and the PsyCoLaus sample (n = 3720). A selection of comorbidities was examined in a pooled database. Odds ratios were derived from logistic regressions and, in the analysis of pooled data, multilevel models. The pure compulsive subtype showed a lower age of onset and was characterized by few associations with psychosocial risk factors. The higher social popularity of the pure compulsive subjects and their families was remarkable. Comorbidities within the pure compulsive subtype were mainly restricted to phobias. In contrast, the mixed obsessive-compulsive subtype had a higher prevalence and was associated with various childhood adversities, more familial burden, and numerous comorbid disorders, including disorders characterized by high impulsivity. The current comparison study across three representative community surveys presented two basic, distinct OCD subtypes associated with differing psychosocial impairment. Such highly specific subtypes offer the opportunity to learn about pathophysiological mechanisms specifically involved in OCD.
Yang, H P; Cook, L S; Weiderpass, E; Adami, H-O; Anderson, K E; Cai, H; Cerhan, J R; Clendenen, T V; Felix, A S; Friedenreich, C M; Garcia-Closas, M; Goodman, M T; Liang, X; Lissowska, J; Lu, L; Magliocco, A M; McCann, S E; Moysich, K B; Olson, S H; Petruzella, S; Pike, M C; Polidoro, S; Ricceri, F; Risch, H A; Sacerdote, C; Setiawan, V W; Shu, X O; Spurdle, A B; Trabert, B; Webb, P M; Wentzensen, N; Xiang, Y-B; Xu, Y; Yu, H; Zeleniuch-Jacquotte, A; Brinton, L A
2015-01-01
Background: Nulliparity is an endometrial cancer risk factor, but whether or not this association is due to infertility is unclear. Although there are many underlying infertility causes, few studies have assessed risk relations by specific causes. Methods: We conducted a pooled analysis of 8153 cases and 11 713 controls from 2 cohort and 12 case-control studies. All studies provided self-reported infertility and its causes, except for one study that relied on data from national registries. Logistic regression was used to estimate adjusted odds ratios (OR) and 95% confidence intervals (CI). Results: Nulliparous women had an elevated endometrial cancer risk compared with parous women, even after adjusting for infertility (OR=1.76; 95% CI: 1.59–1.94). Women who reported infertility had an increased risk compared with those without infertility concerns, even after adjusting for nulliparity (OR=1.22; 95% CI: 1.13–1.33). Among women who reported infertility, none of the individual infertility causes were substantially related to endometrial cancer. Conclusions: Based on mainly self-reported infertility data that used study-specific definitions of infertility, nulliparity and infertility appeared to independently contribute to endometrial cancer risk. Understanding residual endometrial cancer risk related to infertility, its causes and its treatments may benefit from large studies involving detailed data on various infertility parameters. PMID:25688738
Delva, J; Spencer, M S; Lin, J K
2000-01-01
This article compares estimates of the relative odds of nitrite use obtained from weighted unconditional logistic regression with estimates obtained from conditional logistic regression after post-stratification and matching of cases with controls by neighborhood of residence. We illustrate these methods by comparing the odds associated with nitrite use among adults of four racial/ethnic groups, with and without a high school education. We used aggregated data from the 1994-B through 1996 National Household Survey on Drug Abuse (NHSDA). Difference between the methods and implications for analysis and inference are discussed.
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
ERIC Educational Resources Information Center
French, Brian F.; Maller, Susan J.
2007-01-01
Two unresolved implementation issues with logistic regression (LR) for differential item functioning (DIF) detection include ability purification and effect size use. Purification is suggested to control inaccuracies in DIF detection as a result of DIF items in the ability estimate. Additionally, effect size use may be beneficial in controlling…
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure
ERIC Educational Resources Information Center
Paek, Insu
2012-01-01
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
ERIC Educational Resources Information Center
West, Lindsey M.; Davis, Telsie A.; Thompson, Martie P.; Kaslow, Nadine J.
2011-01-01
Protective factors for fostering reasons for living were examined among low-income, suicidal, African American women. Bivariate logistic regressions revealed that higher levels of optimism, spiritual well-being, and family social support predicted reasons for living. Multivariate logistic regressions indicated that spiritual well-being showed…
Comparison of Two Approaches for Handling Missing Covariates in Logistic Regression
ERIC Educational Resources Information Center
Peng, Chao-Ying Joanne; Zhu, Jin
2008-01-01
For the past 25 years, methodological advances have been made in missing data treatment. Most published work has focused on missing data in dependent variables under various conditions. The present study seeks to fill the void by comparing two approaches for handling missing data in categorical covariates in logistic regression: the…
Comparison of IRT Likelihood Ratio Test and Logistic Regression DIF Detection Procedures
ERIC Educational Resources Information Center
Atar, Burcu; Kamata, Akihito
2011-01-01
The Type I error rates and the power of IRT likelihood ratio test and cumulative logit ordinal logistic regression procedures in detecting differential item functioning (DIF) for polytomously scored items were investigated in this Monte Carlo simulation study. For this purpose, 54 simulation conditions (combinations of 3 sample sizes, 2 sample…
Multiple Logistic Regression Analysis of Cigarette Use among High School Students
ERIC Educational Resources Information Center
Adwere-Boamah, Joseph
2011-01-01
A binary logistic regression analysis was performed to predict high school students' cigarette smoking behavior from selected predictors from 2009 CDC Youth Risk Behavior Surveillance Survey. The specific target student behavior of interest was frequent cigarette use. Five predictor variables included in the model were: a) race, b) frequency of…
ERIC Educational Resources Information Center
Anderson, Carolyn J.; Verkuilen, Jay; Peyton, Buddy L.
2010-01-01
Survey items with multiple response categories and multiple-choice test questions are ubiquitous in psychological and educational research. We illustrate the use of log-multiplicative association (LMA) models that are extensions of the well-known multinomial logistic regression model for multiple dependent outcome variables to reanalyze a set of…
Propensity Score Estimation with Data Mining Techniques: Alternatives to Logistic Regression
ERIC Educational Resources Information Center
Keller, Bryan S. B.; Kim, Jee-Seon; Steiner, Peter M.
2013-01-01
Propensity score analysis (PSA) is a methodological technique which may correct for selection bias in a quasi-experiment by modeling the selection process using observed covariates. Because logistic regression is well understood by researchers in a variety of fields and easy to implement in a number of popular software packages, it has…
Two-factor logistic regression in pediatric liver transplantation
NASA Astrophysics Data System (ADS)
Uzunova, Yordanka; Prodanova, Krasimira; Spasov, Lyubomir
2017-12-01
Using a two-factor logistic regression analysis an estimate is derived for the probability of absence of infections in the early postoperative period after pediatric liver transplantation. The influence of both the bilirubin level and the international normalized ratio of prothrombin time of blood coagulation at the 5th postoperative day is studied.
ERIC Educational Resources Information Center
Courtney, Jon R.; Prophet, Retta
2011-01-01
Placement instability is often associated with a number of negative outcomes for children. To gain state level contextual knowledge of factors associated with placement stability/instability, logistic regression was applied to selected variables from the New Mexico Adoption and Foster Care Administrative Reporting System dataset. Predictors…
Classifying machinery condition using oil samples and binary logistic regression
NASA Astrophysics Data System (ADS)
Phillips, J.; Cripps, E.; Lau, John W.; Hodkiewicz, M. R.
2015-08-01
The era of big data has resulted in an explosion of condition monitoring information. The result is an increasing motivation to automate the costly and time consuming human elements involved in the classification of machine health. When working with industry it is important to build an understanding and hence some trust in the classification scheme for those who use the analysis to initiate maintenance tasks. Typically "black box" approaches such as artificial neural networks (ANN) and support vector machines (SVM) can be difficult to provide ease of interpretability. In contrast, this paper argues that logistic regression offers easy interpretability to industry experts, providing insight to the drivers of the human classification process and to the ramifications of potential misclassification. Of course, accuracy is of foremost importance in any automated classification scheme, so we also provide a comparative study based on predictive performance of logistic regression, ANN and SVM. A real world oil analysis data set from engines on mining trucks is presented and using cross-validation we demonstrate that logistic regression out-performs the ANN and SVM approaches in terms of prediction for healthy/not healthy engines.
Length bias correction in gene ontology enrichment analysis using logistic regression.
Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H
2012-01-01
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
Hansson, Lisbeth; Khamis, Harry J
2008-12-01
Simulated data sets are used to evaluate conditional and unconditional maximum likelihood estimation in an individual case-control design with continuous covariates when there are different rates of excluded cases and different levels of other design parameters. The effectiveness of the estimation procedures is measured by method bias, variance of the estimators, root mean square error (RMSE) for logistic regression and the percentage of explained variation. Conditional estimation leads to higher RMSE than unconditional estimation in the presence of missing observations, especially for 1:1 matching. The RMSE is higher for the smaller stratum size, especially for the 1:1 matching. The percentage of explained variation appears to be insensitive to missing data, but is generally higher for the conditional estimation than for the unconditional estimation. It is particularly good for the 1:2 matching design. For minimizing RMSE, a high matching ratio is recommended; in this case, conditional and unconditional logistic regression models yield comparable levels of effectiveness. For maximizing the percentage of explained variation, the 1:2 matching design with the conditional logistic regression model is recommended.
Lee, Seokho; Shin, Hyejin; Lee, Sang Han
2016-12-01
Alzheimer's disease (AD) is usually diagnosed by clinicians through cognitive and functional performance test with a potential risk of misdiagnosis. Since the progression of AD is known to cause structural changes in the corpus callosum (CC), the CC thickness can be used as a functional covariate in AD classification problem for a diagnosis. However, misclassified class labels negatively impact the classification performance. Motivated by AD-CC association studies, we propose a logistic regression for functional data classification that is robust to misdiagnosis or label noise. Specifically, our logistic regression model is constructed by adopting individual intercepts to functional logistic regression model. This approach enables to indicate which observations are possibly mislabeled and also lead to a robust and efficient classifier. An effective algorithm using MM algorithm provides simple closed-form update formulas. We test our method using synthetic datasets to demonstrate its superiority over an existing method, and apply it to differentiating patients with AD from healthy normals based on CC from MRI. © 2016, The International Biometric Society.
Szekér, Szabolcs; Vathy-Fogarassy, Ágnes
2018-01-01
Logistic regression based propensity score matching is a widely used method in case-control studies to select the individuals of the control group. This method creates a suitable control group if all factors affecting the output variable are known. However, if relevant latent variables exist as well, which are not taken into account during the calculations, the quality of the control group is uncertain. In this paper, we present a statistics-based research in which we try to determine the relationship between the accuracy of the logistic regression model and the uncertainty of the dependent variable of the control group defined by propensity score matching. Our analyses show that there is a linear correlation between the fit of the logistic regression model and the uncertainty of the output variable. In certain cases, a latent binary explanatory variable can result in a relative error of up to 70% in the prediction of the outcome variable. The observed phenomenon calls the attention of analysts to an important point, which must be taken into account when deducting conclusions.
Logistic regression for circular data
NASA Astrophysics Data System (ADS)
Al-Daffaie, Kadhem; Khan, Shahjahan
2017-05-01
This paper considers the relationship between a binary response and a circular predictor. It develops the logistic regression model by employing the linear-circular regression approach. The maximum likelihood method is used to estimate the parameters. The Newton-Raphson numerical method is used to find the estimated values of the parameters. A data set from weather records of Toowoomba city is analysed by the proposed methods. Moreover, a simulation study is considered. The R software is used for all computations and simulations.
Naval Research Logistics Quarterly. Volume 28. Number 3,
1981-09-01
denotes component-wise maximum. f has antone (isotone) differences on C x D if for cl < c2 and d, < d2, NAVAL RESEARCH LOGISTICS QUARTERLY VOL. 28...or negative correlations and linear or nonlinear regressions. Given are the mo- ments to order two and, for special cases, (he regression function and...data sets. We designate this bnb distribution as G - B - N(a, 0, v). The distribution admits only of positive correlation and linear regressions
Bond, H S; Sullivan, S G; Cowling, B J
2016-06-01
Influenza vaccination is the most practical means available for preventing influenza virus infection and is widely used in many countries. Because vaccine components and circulating strains frequently change, it is important to continually monitor vaccine effectiveness (VE). The test-negative design is frequently used to estimate VE. In this design, patients meeting the same clinical case definition are recruited and tested for influenza; those who test positive are the cases and those who test negative form the comparison group. When determining VE in these studies, the typical approach has been to use logistic regression, adjusting for potential confounders. Because vaccine coverage and influenza incidence change throughout the season, time is included among these confounders. While most studies use unconditional logistic regression, adjusting for time, an alternative approach is to use conditional logistic regression, matching on time. Here, we used simulation data to examine the potential for both regression approaches to permit accurate and robust estimates of VE. In situations where vaccine coverage changed during the influenza season, the conditional model and unconditional models adjusting for categorical week and using a spline function for week provided more accurate estimates. We illustrated the two approaches on data from a test-negative study of influenza VE against hospitalization in children in Hong Kong which resulted in the conditional logistic regression model providing the best fit to the data.
A general framework for the regression analysis of pooled biomarker assessments.
Liu, Yan; McMahan, Christopher; Gallagher, Colin
2017-07-10
As a cost-efficient data collection mechanism, the process of assaying pooled biospecimens is becoming increasingly common in epidemiological research; for example, pooling has been proposed for the purpose of evaluating the diagnostic efficacy of biological markers (biomarkers). To this end, several authors have proposed techniques that allow for the analysis of continuous pooled biomarker assessments. Regretfully, most of these techniques proceed under restrictive assumptions, are unable to account for the effects of measurement error, and fail to control for confounding variables. These limitations are understandably attributable to the complex structure that is inherent to measurements taken on pooled specimens. Consequently, in order to provide practitioners with the tools necessary to accurately and efficiently analyze pooled biomarker assessments, herein, a general Monte Carlo maximum likelihood-based procedure is presented. The proposed approach allows for the regression analysis of pooled data under practically all parametric models and can be used to directly account for the effects of measurement error. Through simulation, it is shown that the proposed approach can accurately and efficiently estimate all unknown parameters and is more computational efficient than existing techniques. This new methodology is further illustrated using monocyte chemotactic protein-1 data collected by the Collaborative Perinatal Project in an effort to assess the relationship between this chemokine and the risk of miscarriage. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins. PMID:27418910
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.
Crane, Paul K; Gibbons, Laura E; Jolley, Lance; van Belle, Gerald
2006-11-01
We present an ordinal logistic regression model for identification of items with differential item functioning (DIF) and apply this model to a Mini-Mental State Examination (MMSE) dataset. We employ item response theory ability estimation in our models. Three nested ordinal logistic regression models are applied to each item. Model testing begins with examination of the statistical significance of the interaction term between ability and the group indicator, consistent with nonuniform DIF. Then we turn our attention to the coefficient of the ability term in models with and without the group term. If including the group term has a marked effect on that coefficient, we declare that it has uniform DIF. We examined DIF related to language of test administration in addition to self-reported race, Hispanic ethnicity, age, years of education, and sex. We used PARSCALE for IRT analyses and STATA for ordinal logistic regression approaches. We used an iterative technique for adjusting IRT ability estimates on the basis of DIF findings. Five items were found to have DIF related to language. These same items also had DIF related to other covariates. The ordinal logistic regression approach to DIF detection, when combined with IRT ability estimates, provides a reasonable alternative for DIF detection. There appear to be several items with significant DIF related to language of test administration in the MMSE. More attention needs to be paid to the specific criteria used to determine whether an item has DIF, not just the technique used to identify DIF.
Conditional Poisson models: a flexible alternative to conditional logistic case cross-over analysis.
Armstrong, Ben G; Gasparrini, Antonio; Tobias, Aurelio
2014-11-24
The time stratified case cross-over approach is a popular alternative to conventional time series regression for analysing associations between time series of environmental exposures (air pollution, weather) and counts of health outcomes. These are almost always analyzed using conditional logistic regression on data expanded to case-control (case crossover) format, but this has some limitations. In particular adjusting for overdispersion and auto-correlation in the counts is not possible. It has been established that a Poisson model for counts with stratum indicators gives identical estimates to those from conditional logistic regression and does not have these limitations, but it is little used, probably because of the overheads in estimating many stratum parameters. The conditional Poisson model avoids estimating stratum parameters by conditioning on the total event count in each stratum, thus simplifying the computing and increasing the number of strata for which fitting is feasible compared with the standard unconditional Poisson model. Unlike the conditional logistic model, the conditional Poisson model does not require expanding the data, and can adjust for overdispersion and auto-correlation. It is available in Stata, R, and other packages. By applying to some real data and using simulations, we demonstrate that conditional Poisson models were simpler to code and shorter to run than are conditional logistic analyses and can be fitted to larger data sets than possible with standard Poisson models. Allowing for overdispersion or autocorrelation was possible with the conditional Poisson model but when not required this model gave identical estimates to those from conditional logistic regression. Conditional Poisson regression models provide an alternative to case crossover analysis of stratified time series data with some advantages. The conditional Poisson model can also be used in other contexts in which primary control for confounding is by fine stratification.
Use of generalized ordered logistic regression for the analysis of multidrug resistance data.
Agga, Getahun E; Scott, H Morgan
2015-10-01
Statistical analysis of antimicrobial resistance data largely focuses on individual antimicrobial's binary outcome (susceptible or resistant). However, bacteria are becoming increasingly multidrug resistant (MDR). Statistical analysis of MDR data is mostly descriptive often with tabular or graphical presentations. Here we report the applicability of generalized ordinal logistic regression model for the analysis of MDR data. A total of 1,152 Escherichia coli, isolated from the feces of weaned pigs experimentally supplemented with chlortetracycline (CTC) and copper, were tested for susceptibilities against 15 antimicrobials and were binary classified into resistant or susceptible. The 15 antimicrobial agents tested were grouped into eight different antimicrobial classes. We defined MDR as the number of antimicrobial classes to which E. coli isolates were resistant ranging from 0 to 8. Proportionality of the odds assumption of the ordinal logistic regression model was violated only for the effect of treatment period (pre-treatment, during-treatment and post-treatment); but not for the effect of CTC or copper supplementation. Subsequently, a partially constrained generalized ordinal logistic model was built that allows for the effect of treatment period to vary while constraining the effects of treatment (CTC and copper supplementation) to be constant across the levels of MDR classes. Copper (Proportional Odds Ratio [Prop OR]=1.03; 95% CI=0.73-1.47) and CTC (Prop OR=1.1; 95% CI=0.78-1.56) supplementation were not significantly associated with the level of MDR adjusted for the effect of treatment period. MDR generally declined over the trial period. In conclusion, generalized ordered logistic regression can be used for the analysis of ordinal data such as MDR data when the proportionality assumptions for ordered logistic regression are violated. Published by Elsevier B.V.
Fei, Y; Hu, J; Li, W-Q; Wang, W; Zong, G-Q
2017-03-01
Essentials Predicting the occurrence of portosplenomesenteric vein thrombosis (PSMVT) is difficult. We studied 72 patients with acute pancreatitis. Artificial neural networks modeling was more accurate than logistic regression in predicting PSMVT. Additional predictive factors may be incorporated into artificial neural networks. Objective To construct and validate artificial neural networks (ANNs) for predicting the occurrence of portosplenomesenteric venous thrombosis (PSMVT) and compare the predictive ability of the ANNs with that of logistic regression. Methods The ANNs and logistic regression modeling were constructed using simple clinical and laboratory data of 72 acute pancreatitis (AP) patients. The ANNs and logistic modeling were first trained on 48 randomly chosen patients and validated on the remaining 24 patients. The accuracy and the performance characteristics were compared between these two approaches by SPSS17.0 software. Results The training set and validation set did not differ on any of the 11 variables. After training, the back propagation network training error converged to 1 × 10 -20 , and it retained excellent pattern recognition ability. When the ANNs model was applied to the validation set, it revealed a sensitivity of 80%, specificity of 85.7%, a positive predictive value of 77.6% and negative predictive value of 90.7%. The accuracy was 83.3%. Differences could be found between ANNs modeling and logistic regression modeling in these parameters (10.0% [95% CI, -14.3 to 34.3%], 14.3% [95% CI, -8.6 to 37.2%], 15.7% [95% CI, -9.9 to 41.3%], 11.8% [95% CI, -8.2 to 31.8%], 22.6% [95% CI, -1.9 to 47.1%], respectively). When ANNs modeling was used to identify PSMVT, the area under receiver operating characteristic curve was 0.849 (95% CI, 0.807-0.901), which demonstrated better overall properties than logistic regression modeling (AUC = 0.716) (95% CI, 0.679-0.761). Conclusions ANNs modeling was a more accurate tool than logistic regression in predicting the occurrence of PSMVT following AP. More clinical factors or biomarkers may be incorporated into ANNs modeling to improve its predictive ability. © 2016 International Society on Thrombosis and Haemostasis.
McLaren, Christine E.; Chen, Wen-Pin; Nie, Ke; Su, Min-Ying
2009-01-01
Rationale and Objectives Dynamic contrast enhanced MRI (DCE-MRI) is a clinical imaging modality for detection and diagnosis of breast lesions. Analytical methods were compared for diagnostic feature selection and performance of lesion classification to differentiate between malignant and benign lesions in patients. Materials and Methods The study included 43 malignant and 28 benign histologically-proven lesions. Eight morphological parameters, ten gray level co-occurrence matrices (GLCM) texture features, and fourteen Laws’ texture features were obtained using automated lesion segmentation and quantitative feature extraction. Artificial neural network (ANN) and logistic regression analysis were compared for selection of the best predictors of malignant lesions among the normalized features. Results Using ANN, the final four selected features were compactness, energy, homogeneity, and Law_LS, with area under the receiver operating characteristic curve (AUC) = 0.82, and accuracy = 0.76. The diagnostic performance of these 4-features computed on the basis of logistic regression yielded AUC = 0.80 (95% CI, 0.688 to 0.905), similar to that of ANN. The analysis also shows that the odds of a malignant lesion decreased by 48% (95% CI, 25% to 92%) for every increase of 1 SD in the Law_LS feature, adjusted for differences in compactness, energy, and homogeneity. Using logistic regression with z-score transformation, a model comprised of compactness, NRL entropy, and gray level sum average was selected, and it had the highest overall accuracy of 0.75 among all models, with AUC = 0.77 (95% CI, 0.660 to 0.880). When logistic modeling of transformations using the Box-Cox method was performed, the most parsimonious model with predictors, compactness and Law_LS, had an AUC of 0.79 (95% CI, 0.672 to 0.898). Conclusion The diagnostic performance of models selected by ANN and logistic regression was similar. The analytic methods were found to be roughly equivalent in terms of predictive ability when a small number of variables were chosen. The robust ANN methodology utilizes a sophisticated non-linear model, while logistic regression analysis provides insightful information to enhance interpretation of the model features. PMID:19409817
Ai, Zi-Sheng; Gao, You-Shui; Sun, Yuan; Liu, Yue; Zhang, Chang-Qing; Jiang, Cheng-Hua
2013-03-01
Risk factors for femoral neck fracture-induced avascular necrosis of the femoral head have not been elucidated clearly in middle-aged and elderly patients. Moreover, the high incidence of screw removal in China and its effect on the fate of the involved femoral head require statistical methods to reflect their intrinsic relationship. Ninety-nine patients older than 45 years with femoral neck fracture were treated by internal fixation between May 1999 and April 2004. Descriptive analysis, interaction analysis between associated factors, single factor logistic regression, multivariate logistic regression, and detailed interaction analysis were employed to explore potential relationships among associated factors. Avascular necrosis of the femoral head was found in 15 cases (15.2 %). Age × the status of implants (removal vs. maintenance) and gender × the timing of reduction were interactive according to two-factor interactive analysis. Age, the displacement of fractures, the quality of reduction, and the status of implants were found to be significant factors in single factor logistic regression analysis. Age, age × the status of implants, and the quality of reduction were found to be significant factors in multivariate logistic regression analysis. In fine interaction analysis after multivariate logistic regression analysis, implant removal was the most important risk factor for avascular necrosis in 56-to-85-year-old patients, with a risk ratio of 26.00 (95 % CI = 3.076-219.747). The middle-aged and elderly have less incidence of avascular necrosis of the femoral head following femoral neck fractures treated by cannulated screws. The removal of cannulated screws can induce a significantly high incidence of avascular necrosis of the femoral head in elderly patients, while a high-quality reduction is helpful to reduce avascular necrosis.
Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing
2016-01-01
Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.
Rank-Optimized Logistic Matrix Regression toward Improved Matrix Data Classification.
Zhang, Jianguang; Jiang, Jianmin
2018-02-01
While existing logistic regression suffers from overfitting and often fails in considering structural information, we propose a novel matrix-based logistic regression to overcome the weakness. In the proposed method, 2D matrices are directly used to learn two groups of parameter vectors along each dimension without vectorization, which allows the proposed method to fully exploit the underlying structural information embedded inside the 2D matrices. Further, we add a joint [Formula: see text]-norm on two parameter matrices, which are organized by aligning each group of parameter vectors in columns. This added co-regularization term has two roles-enhancing the effect of regularization and optimizing the rank during the learning process. With our proposed fast iterative solution, we carried out extensive experiments. The results show that in comparison to both the traditional tensor-based methods and the vector-based regression methods, our proposed solution achieves better performance for matrix data classifications.
Fan, Z Joyce; Harris-Adamson, Carisa; Gerr, Fred; Eisen, Ellen A; Hegmann, Kurt T; Bao, Stephen; Silverstein, Barbara; Evanoff, Bradley; Dale, Ann Marie; Thiese, Matthew S; Garg, Arun; Kapellusch, Jay; Burt, Susan; Merlino, Linda; Rempel, David
2015-05-01
Few large epidemiologic studies have used rigorous case criteria, individual-level exposure measurements, and appropriate control for confounders to examine associations between workplace psychosocial and biomechanical factors and carpal tunnel syndrome (CTS). Pooling data from five independent research studies, we assessed associations between prevalent CTS and personal, work psychosocial, and biomechanical factors while adjusting for confounders using multivariable logistic regression. Prevalent CTS was associated with personal factors of older age, obesity, female sex, medical conditions, previous distal upper extremity disorders, workplace measures of peak forceful hand activity, a composite measure of force and repetition (ACGIH Threshold Limit Value for Hand Activity Level), and hand vibration. In this cross-sectional analysis of production and service workers, CTS prevalence was associated with workplace and biomechanical factors. The findings were similar to those from a prospective analysis of the same cohort with differences that may be due to recall bias and other factors. © 2015 Wiley Periodicals, Inc.
Doamekpor, Lauren A; Dinwiddie, Gniesha Y
2015-03-01
We tested whether the immigrant health advantage applies to non-Hispanic Black immigrants and examined whether nativity-based differences in allostatic load exist among non-Hispanic Blacks. We used pooled data from the 2001-2010 National Health and Nutrition Examination Survey to compare allostatic load scores for US-born (n = 2745) and foreign-born (n = 152) Black adults. We used multivariate logistic regression techniques to assess the association between nativity and high allostatic load scores, controlling for gender, age, health behaviors, and socioeconomic status. For foreign-born Blacks, length of stay and age were powerful predictors of allostatic load scores. For older US-born Blacks and those who were widowed, divorced, or separated, the risk of high allostatic load was greater. Foreign-born Blacks have a health advantage in allostatic load. Further research is needed that underscores a deeper understanding of the mechanisms driving this health differential to create programs that target these populations differently.
Farkas, K; Plutzer, J; Moltchanova, E; Török, A; Varró, M J; Domokos, K; Frost, F; Hunter, P R
2015-10-01
In this study the putative protective seroprevalence (PPS) of IgG antibodies to the 27-kDa and 15/17-kDa Cryptosporidium antigens in sera of healthy participants who were and were not exposed to Cryptosporidium oocysts via surface water-derived drinking water was compared. The participants completed a questionnaire regarding risk factors that have been shown to be associated with infection. The PPS was significantly greater (49-61%) in settlements where the drinking water originated from surface water, than in the control city where riverbank filtration was used (21% and 23%). Logistic regression analysis on the risk factors showed an association between bathing/swimming in outdoor pools and antibody responses to the 15/17-kDa antigen complex. Hence the elevated responses were most likely due to the use of contaminated water. Results indicate that waterborne Cryptosporidium infections occur more frequently than reported but may derive from multiple sources.
Rehkopf, David H; Eisen, Ellen A; Modrek, Sepideh; Mokyr Horner, Elizabeth; Goldstein, Benjamin; Costello, Sadie; Cantley, Linda F; Slade, Martin D; Cullen, Mark R
2015-08-01
We examined how state characteristics in early life are associated with individual chronic disease later in life. We assessed early-life state of residence using the first 3 digits of social security numbers from blue- and white-collar workers from a US manufacturing company. Longitudinal data were available from 1997 to 2012, with 305 936 person-years of observation. Disease was assessed using medical claims. We modeled associations using pooled logistic regression with inverse probability of censoring weights. We found small but statistically significant associations between early-state-of-residence characteristics and later life hypertension, diabetes, and ischemic heart disease. The most consistent associations were with income inequality, percentage non-White, and education. These associations were similar after statistically controlling for individual socioeconomic and demographic characteristics and current state characteristics. Characteristics of the state in which an individual lives early in life are associated with prevalence of chronic disease later in life, with a strength of association equivalent to genetic associations found for these same health outcomes.
Interdependence in Health and Functioning Among Older Spousal Caregivers and Care Recipients.
Hoffman, Geoffrey J; Burgard, Sarah; Mendez-Luck, Carolyn A; Gaugler, Joseph E
2018-06-01
Older spousal caregiving relationships involve support that may be affected by the health of either the caregiver or care recipient. We conducted a longitudinal analysis using pooled data from 4,632 community-dwelling spousal care recipients and caregivers aged ⩾50 from the 2002 to 2014 waves of the Health and Retirement Study. We specified logistic and negative binomial regression models using lagged predictor variables to assess the role of partner health status on spousal caregiver and care recipient health care utilization and physical functioning outcomes. Care recipients' odds of hospitalization, odds ratio (OR): 0.83, p<.001, decreased when caregivers had more ADL difficulties. When spouses were in poorer versus better health, care recipients' bed days decreased (4.69 vs. 2.54) while caregivers' bed days increased (0.20 vs. 0.96). Providers should consider the dual needs of caregivers caring for care recipients and their own health care needs, in adopting a family-centered approach to management of older adult long-term care needs.
Same-sex cohabitors and health: the role of race-ethnicity, gender, and socioeconomic status.
Liu, Hui; Reczek, Corinne; Brown, Dustin
2013-03-01
A legacy of research finds that marriage is associated with good health. Yet same-sex cohabitors cannot marry in most states in the United States and therefore may not receive the health benefits associated with marriage. We use pooled data from the 1997 to 2009 National Health Interview Surveys to compare the self-rated health of same-sex cohabiting men (n = 1,659) and same-sex cohabiting women (n = 1,634) with that of their different-sex married, different-sex cohabiting, and unpartnered divorced, widowed, and never-married counterparts. Results from logistic regression models show that same-sex cohabitors report poorer health than their different-sex married counterparts at the same levels of socioeconomic status. Additionally, same-sex cohabitors report better health than their different-sex cohabiting and single counterparts, but these differences are fully explained by socioeconomic status. Without their socioeconomic advantages, same-sex cohabitors would report similar health to nonmarried groups. Analyses further reveal important racial-ethnic and gender variations.
Cook, Won Kim; Tseng, Winston; Ko Chin, Kathy; John, Iyanrick; Chung, Corina
2014-11-01
Working in small businesses has been identified as a key factor for low coverage rates in immigrant communities. In this study, we identify specific cultural and socioeconomic predictors of Asian Americans who work in small businesses to identify subgroups at a greater disadvantage than others in obtaining health insurance. Logistic regression models were fitted using a sample of 3,819 Asian American small business owners and employers extracted from pooled 2005–2012 California Health Interview Survey data. We found that individuals with low income levels, Korean Americans, U.S.-born South Asian and Southeast Asian (other than Vietnamese) Americans, immigrants without citizenship (particularly those lacking a green card), and individuals with limited English proficiency had higher odds of lacking coverage. The odds of being uninsured did not differ between small business owners and employees. Based upon these key findings, we propose several strategies to expand coverage for Asian Americans working in small businesses and their most vulnerable subgroups.
Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression
ERIC Educational Resources Information Center
Elosua, Paula; Wells, Craig
2013-01-01
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
ERIC Educational Resources Information Center
Rudner, Lawrence
2016-01-01
In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…
ERIC Educational Resources Information Center
Fan, Xitao; Wang, Lin
The Monte Carlo study compared the performance of predictive discriminant analysis (PDA) and that of logistic regression (LR) for the two-group classification problem. Prior probabilities were used for classification, but the cost of misclassification was assumed to be equal. The study used a fully crossed three-factor experimental design (with…
ERIC Educational Resources Information Center
Nguyen, Phuong L.
2006-01-01
This study examines the effects of parental SES, school quality, and community factors on children's enrollment and achievement in rural areas in Viet Nam, using logistic regression and ordered logistic regression. Multivariate analysis reveals significant differences in educational enrollment and outcomes by level of household expenditures and…
School Exits in the Milwaukee Parental Choice Program: Evidence of a Marketplace?
ERIC Educational Resources Information Center
Ford, Michael
2011-01-01
This article examines whether the large number of school exits from the Milwaukee school voucher program is evidence of a marketplace. Two logistic regression and multinomial logistic regression models tested the relation between the inability to draw large numbers of voucher students and the ability for a private school to remain viable. Data on…
Li, Ji; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
Model building strategy for logistic regression: purposeful selection.
Zhang, Zhongheng
2016-03-01
Logistic regression is one of the most commonly used models to account for confounders in medical literature. The article introduces how to perform purposeful selection model building strategy with R. I stress on the use of likelihood ratio test to see whether deleting a variable will have significant impact on model fit. A deleted variable should also be checked for whether it is an important adjustment of remaining covariates. Interaction should be checked to disentangle complex relationship between covariates and their synergistic effect on response variable. Model should be checked for the goodness-of-fit (GOF). In other words, how the fitted model reflects the real data. Hosmer-Lemeshow GOF test is the most widely used for logistic regression model.
NASA Astrophysics Data System (ADS)
Ceppi, C.; Mancini, F.; Ritrovato, G.
2009-04-01
This study aim at the landslide susceptibility mapping within an area of the Daunia (Apulian Apennines, Italy) by a multivariate statistical method and data manipulation in a Geographical Information System (GIS) environment. Among the variety of existing statistical data analysis techniques, the logistic regression was chosen to produce a susceptibility map all over an area where small settlements are historically threatened by landslide phenomena. By logistic regression a best fitting between the presence or absence of landslide (dependent variable) and the set of independent variables is performed on the basis of a maximum likelihood criterion, bringing to the estimation of regression coefficients. The reliability of such analysis is therefore due to the ability to quantify the proneness to landslide occurrences by the probability level produced by the analysis. The inventory of dependent and independent variables were managed in a GIS, where geometric properties and attributes have been translated into raster cells in order to proceed with the logistic regression by means of SPSS (Statistical Package for the Social Sciences) package. A landslide inventory was used to produce the bivariate dependent variable whereas the independent set of variable concerned with slope, aspect, elevation, curvature, drained area, lithology and land use after their reductions to dummy variables. The effect of independent parameters on landslide occurrence was assessed by the corresponding coefficient in the logistic regression function, highlighting a major role played by the land use variable in determining occurrence and distribution of phenomena. Once the outcomes of the logistic regression are determined, data are re-introduced in the GIS to produce a map reporting the proneness to landslide as predicted level of probability. As validation of results and regression model a cell-by-cell comparison between the susceptibility map and the initial inventory of landslide events was performed and an agreement at 75% level achieved.
Determination of riverbank erosion probability using Locally Weighted Logistic Regression
NASA Astrophysics Data System (ADS)
Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos
2015-04-01
Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Pannus, Pieter; Fajardo, Emmanuel; Metcalf, Carol; Coulborn, Rebecca M; Durán, Laura T; Bygrave, Helen; Ellman, Tom; Garone, Daniela; Murowa, Michael; Mwenda, Reuben; Reid, Tony; Preiser, Wolfgang
2013-10-01
Rollout of routine HIV-1 viral load monitoring is hampered by high costs and logistical difficulties associated with sample collection and transport. New strategies are needed to overcome these constraints. Dried blood spots from finger pricks have been shown to be more practical than the use of plasma specimens, and pooling strategies using plasma specimens have been demonstrated to be an efficient method to reduce costs. This study found that combination of finger-prick dried blood spots and a pooling strategy is a feasible and efficient option to reduce costs, while maintaining accuracy in the context of a district hospital in Malawi.
NASA Astrophysics Data System (ADS)
Yilmaz, Işık
2009-06-01
The purpose of this study is to compare the landslide susceptibility mapping methods of frequency ratio (FR), logistic regression and artificial neural networks (ANN) applied in the Kat County (Tokat—Turkey). Digital elevation model (DEM) was first constructed using GIS software. Landslide-related factors such as geology, faults, drainage system, topographical elevation, slope angle, slope aspect, topographic wetness index (TWI) and stream power index (SPI) were used in the landslide susceptibility analyses. Landslide susceptibility maps were produced from the frequency ratio, logistic regression and neural networks models, and they were then compared by means of their validations. The higher accuracies of the susceptibility maps for all three models were obtained from the comparison of the landslide susceptibility maps with the known landslide locations. However, respective area under curve (AUC) values of 0.826, 0.842 and 0.852 for frequency ratio, logistic regression and artificial neural networks showed that the map obtained from ANN model is more accurate than the other models, accuracies of all models can be evaluated relatively similar. The results obtained in this study also showed that the frequency ratio model can be used as a simple tool in assessment of landslide susceptibility when a sufficient number of data were obtained. Input process, calculations and output process are very simple and can be readily understood in the frequency ratio model, however logistic regression and neural networks require the conversion of data to ASCII or other formats. Moreover, it is also very hard to process the large amount of data in the statistical package.
Applying machine-learning techniques to Twitter data for automatic hazard-event classification.
NASA Astrophysics Data System (ADS)
Filgueira, R.; Bee, E. J.; Diaz-Doce, D.; Poole, J., Sr.; Singh, A.
2017-12-01
The constant flow of information offered by tweets provides valuable information about all sorts of events at a high temporal and spatial resolution. Over the past year we have been analyzing in real-time geological hazards/phenomenon, such as earthquakes, volcanic eruptions, landslides, floods or the aurora, as part of the GeoSocial project, by geo-locating tweets filtered by keywords in a web-map. However, not all the filtered tweets are related with hazard/phenomenon events. This work explores two classification techniques for automatic hazard-event categorization based on tweets about the "Aurora". First, tweets were filtered using aurora-related keywords, removing stop words and selecting the ones written in English. For classifying the remaining between "aurora-event" or "no-aurora-event" categories, we compared two state-of-art techniques: Support Vector Machine (SVM) and Deep Convolutional Neural Networks (CNN) algorithms. Both approaches belong to the family of supervised learning algorithms, which make predictions based on labelled training dataset. Therefore, we created a training dataset by tagging 1200 tweets between both categories. The general form of SVM is used to separate two classes by a function (kernel). We compared the performance of four different kernels (Linear Regression, Logistic Regression, Multinomial Naïve Bayesian and Stochastic Gradient Descent) provided by Scikit-Learn library using our training dataset to build the SVM classifier. The results shown that the Logistic Regression (LR) gets the best accuracy (87%). So, we selected the SVM-LR classifier to categorise a large collection of tweets using the "dispel4py" framework.Later, we developed a CNN classifier, where the first layer embeds words into low-dimensional vectors. The next layer performs convolutions over the embedded word vectors. Results from the convolutional layer are max-pooled into a long feature vector, which is classified using a softmax layer. The CNN's accuracy is lower (83%) than the SVM-LR, since the algorithm needs a bigger training dataset to increase its accuracy. We used TensorFlow framework for applying CNN classifier to the same collection of tweets.In future we will modify both classifiers to work with other geo-hazards, use larger training datasets and apply them in real-time.
ERIC Educational Resources Information Center
Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard
2010-01-01
The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…
Carolyn B. Meyer; Sherri L. Miller; C. John Ralph
2004-01-01
The scale at which habitat variables are measured affects the accuracy of resource selection functions in predicting animal use of sites. We used logistic regression models for a wide-ranging species, the marbled murrelet, (Brachyramphus marmoratus) in a large region in California to address how much changing the spatial or temporal scale of...
ERIC Educational Resources Information Center
Monahan, Patrick O.; McHorney, Colleen A.; Stump, Timothy E.; Perkins, Anthony J.
2007-01-01
Previous methodological and applied studies that used binary logistic regression (LR) for detection of differential item functioning (DIF) in dichotomously scored items either did not report an effect size or did not employ several useful measures of DIF magnitude derived from the LR model. Equations are provided for these effect size indices.…
ERIC Educational Resources Information Center
Magis, David; Raiche, Gilles; Beland, Sebastien; Gerard, Paul
2011-01-01
We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence…
Risk Factors of Falls in Community-Dwelling Older Adults: Logistic Regression Tree Analysis
ERIC Educational Resources Information Center
Yamashita, Takashi; Noe, Douglas A.; Bailer, A. John
2012-01-01
Purpose of the Study: A novel logistic regression tree-based method was applied to identify fall risk factors and possible interaction effects of those risk factors. Design and Methods: A nationally representative sample of American older adults aged 65 years and older (N = 9,592) in the Health and Retirement Study 2004 and 2006 modules was used.…
ERIC Educational Resources Information Center
Gordovil-Merino, Amalia; Guardia-Olmos, Joan; Pero-Cebollero, Maribel
2012-01-01
In this paper, we used simulations to compare the performance of classical and Bayesian estimations in logistic regression models using small samples. In the performed simulations, conditions were varied, including the type of relationship between independent and dependent variable values (i.e., unrelated and related values), the type of variable…
Ohlmacher, G.C.; Davis, J.C.
2003-01-01
Landslides in the hilly terrain along the Kansas and Missouri rivers in northeastern Kansas have caused millions of dollars in property damage during the last decade. To address this problem, a statistical method called multiple logistic regression has been used to create a landslide-hazard map for Atchison, Kansas, and surrounding areas. Data included digitized geology, slopes, and landslides, manipulated using ArcView GIS. Logistic regression relates predictor variables to the occurrence or nonoccurrence of landslides within geographic cells and uses the relationship to produce a map showing the probability of future landslides, given local slopes and geologic units. Results indicated that slope is the most important variable for estimating landslide hazard in the study area. Geologic units consisting mostly of shale, siltstone, and sandstone were most susceptible to landslides. Soil type and aspect ratio were considered but excluded from the final analysis because these variables did not significantly add to the predictive power of the logistic regression. Soil types were highly correlated with the geologic units, and no significant relationships existed between landslides and slope aspect. ?? 2003 Elsevier Science B.V. All rights reserved.
A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test
NASA Technical Reports Server (NTRS)
Messer, Bradley
2007-01-01
Propulsion ground test facilities face the daily challenge of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Over the last decade NASA s propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and exceeded the capabilities of numerous test facility and test article components. A logistic regression mathematical modeling technique has been developed to predict the probability of successfully completing a rocket propulsion test. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),.., X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure of accomplishing a full duration test. The use of logistic regression modeling is not new; however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from this type of model provide project managers with insight and confidence into the effectiveness of rocket propulsion ground testing.
Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei
2017-06-01
To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (P<0.05). In addition, a comparison of the area under receiver operating characteristic curves of the two models showed a statistically significant difference (P<0.05). The RBF ANNs model is more likely to predict the occurrence of PVT induced by AP than logistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.
Effect of obstructive sleep apnea hypopnea syndrome on lipid profile: a meta-regression analysis.
Nadeem, Rashid; Singh, Mukesh; Nida, Mahwish; Waheed, Irfan; Khan, Adnan; Ahmed, Saeed; Naseem, Jawed; Champeau, Daniel
2014-05-15
Obstructive sleep apnea (OSA) is associated with obesity, metabolic syndrome, and dyslipidemia, which may be related to decrease androgen levels found in OSA patients. Dyslipidemia may contribute to atherosclerosis leading to increasing risk of heart disease. Systematic review was conducted using PubMed and Cochrane library by utilizing different combinations of key words; sleep apnea, obstructive sleep apnea, serum lipids, dyslipidemia, cholesterol, total cholesterol, low density lipoprotein (LDL), high density lipoprotein (HDL), and triglyceride (TG). Inclusion criteria were: English articles, and studies with adult population in 2 groups of patients (patients with OSA and without OSA). A total 96 studies were reviewed for inclusion, with 25 studies pooled for analysis. Sixty-four studies were pooled for analysis; since some studies have more than one dataset, there were 107 datasets with 18,116 patients pooled for meta-analysis. All studies measured serum lipids. Total cholesterol pooled standardized difference in means was 0.267 (p = 0.001). LDL cholesterol pooled standardized difference in means was 0.296 (p = 0.001). HDL cholesterol pooled standardized difference in means was -0.433 (p = 0.001). Triglyceride pooled standardized difference in means was 0.603 (p = 0.001). Meta-regression for age, BMI, and AHI showed that age has significant effect for TC, LDL, and HDL. BMI had significant effect for LDL and HDL, while AHI had significant effect for LDL and TG. Patients with OSA appear to have increased dyslipidemia (high total cholesterol, LDL, TG, and low HDL).
Laverty, Anthony A; Palladino, Raffaele; Lee, John Tayu; Millett, Christopher
2015-05-20
There is little published data on the potential health benefits of active travel in low and middle-income countries. This is despite increasing levels of adiposity being linked to increases in physical inactivity and non-communicable diseases. This study will examine: (1) socio-demographic correlates of using active travel (walking or cycling for transport) among older adults in six populous middle-income countries (2) whether use of active travel is associated with adiposity, systolic blood pressure and self-reported diabetes in these countries. Data are from the WHO Study on Global Ageing and Adult Health (SAGE) of China, India, Mexico, Ghana, Russia and South Africa with a total sample size of 40,477. Correlates of active travel (≥150 min/week) were examined using logistic regression. Logistic and linear regression analyses were used to examine health related outcomes according to three groups of active travel use per week. 46.4% of the sample undertook ≥150 min of active travel per week (range South Africa: 21.9% Ghana: 57.8%). In pooled analyses those in wealthier households were less likely to meet this level of active travel (Adjusted Risk Ratio (ARR) 0.77, 95% Confidence Intervals 0.67; 0.88 wealthiest fifth vs. poorest). Older people and women were also less likely to use active travel for ≥150 min per week (ARR 0.71, 0.62; 0.80 those aged 70+ years vs. 18-29 years old, ARR 0.82, 0.74; 0.91 women vs. men). In pooled fully adjusted analyses, high use of active travel was associated with lower risk of overweight (ARR 0.71, 0.59; 0.86), high waist-to-hip ratio (ARR 0.71, 0.61; 0.84) and lower BMI (-0.54 kg/m(2), -0.98;- 0.11). Moderate (31-209 min/week) and high use (≥210 min/week) of active travel was associated with lower waist circumference (-1.52 cm (-2.40; -0.65) and -2.16 cm (3.07; -1.26)), and lower systolic blood pressure (-1.63 mm/Hg (-3.19; -0.06) and -2.33 mm/Hg (-3.98; -0.69)). In middle-income countries use of active travel for ≥150 min per week is more common in lower socio-economic groups and appears to confer similar health benefits to those identified in high-income settings. Efforts to increase active travel levels should be integral to strategies to maintain healthy weight and reduce disease burden in these settings.
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan; Cui, Lijuan; Cheng, Samuel; Ohno-Machado, Lucila
2013-01-01
We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection etc.) as the traditional frequentist Logistic Regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications. PMID:23562651
Zhang, Zhongheng; Ni, Hongying; Xu, Xiao
2014-08-01
Propensity score (PS) analysis has been increasingly used in critical care medicine; however, its validation has not been systematically investigated. The present study aimed to compare effect sizes in PS-based observational studies vs. randomized controlled trials (RCTs) (or meta-analysis of RCTs). Critical care observational studies using PS were systematically searched in PubMed from inception to April 2013. Identified PS-based studies were matched to one or more RCTs in terms of population, intervention, comparison, and outcome. The effect sizes of experimental treatments were compared for PS-based studies vs. RCTs (or meta-analysis of RCTs) with sign test. Furthermore, ratio of odds ratio (ROR) was calculated from the interaction term of treatment × study type in a logistic regression model. A ROR < 1 indicates greater benefit for experimental treatment in RCTs compared with PS-based studies. RORs of each comparison were pooled by using meta-analytic approach with random-effects model. A total of 20 PS-based studies were identified and matched to RCTs. Twelve of the 20 comparisons showed greater beneficial effect for experimental treatment in RCTs than that in PS-based studies (sign test P = 0.503). The difference was statistically significant in four comparisons. ROR can be calculated from 13 comparisons, of which four showed significantly greater beneficial effect for experimental treatment in RCTs. The pooled ROR was 0.71 (95% CI: 0.63, 0.79; P = 0.002), suggesting that RCTs (or meta-analysis of RCTs) were more likely to report beneficial effect for the experimental treatment than PS-based studies. The result remained unchanged in sensitivity analysis and meta-regression. In critical care literature, PS-based observational study is likely to report less beneficial effect of experimental treatment compared with RCTs (or meta-analysis of RCTs). Copyright © 2014 Elsevier Inc. All rights reserved.
The impact of moderate wine consumption on the risk of developing prostate cancer.
Vartolomei, Mihai Dorin; Kimura, Shoji; Ferro, Matteo; Foerster, Beat; Abufaraj, Mohammad; Briganti, Alberto; Karakiewicz, Pierre I; Shariat, Shahrokh F
2018-01-01
To investigate the impact of moderate wine consumption on the risk of prostate cancer (PCa). We focused on the differential effect of moderate consumption of red versus white wine. This study was a meta-analysis that includes data from case-control and cohort studies. A systematic search of Web of Science, Medline/PubMed, and Cochrane library was performed on December 1, 2017. Studies were deemed eligible if they assessed the risk of PCa due to red, white, or any wine using multivariable logistic regression analysis. We performed a formal meta-analysis for the risk of PCa according to moderate wine and wine type consumption (white or red). Heterogeneity between studies was assessed using Cochrane's Q test and I 2 statistics. Publication bias was assessed using Egger's regression test. A total of 930 abstracts and titles were initially identified. After removal of duplicates, reviews, and conference abstracts, 83 full-text original articles were screened. Seventeen studies (611,169 subjects) were included for final evaluation and fulfilled the inclusion criteria. In the case of moderate wine consumption: the pooled risk ratio (RR) for the risk of PCa was 0.98 (95% CI 0.92-1.05, p =0.57) in the multivariable analysis. Moderate white wine consumption increased the risk of PCa with a pooled RR of 1.26 (95% CI 1.10-1.43, p =0.001) in the multi-variable analysis. Meanwhile, moderate red wine consumption had a protective role reducing the risk by 12% (RR 0.88, 95% CI 0.78-0.999, p =0.047) in the multivariable analysis that comprised 222,447 subjects. In this meta-analysis, moderate wine consumption did not impact the risk of PCa. Interestingly, regarding the type of wine, moderate consumption of white wine increased the risk of PCa, whereas moderate consumption of red wine had a protective effect. Further analyses are needed to assess the differential molecular effect of white and red wine conferring their impact on PCa risk.
Dietary consumption patterns and laryngeal cancer risk.
Vlastarakos, Petros V; Vassileiou, Andrianna; Delicha, Evie; Kikidis, Dimitrios; Protopapas, Dimosthenis; Nikolopoulos, Thomas P
2016-06-01
We conducted a case-control study to investigate the effect of diet on laryngeal carcinogenesis. Our study population was made up of 140 participants-70 patients with laryngeal cancer (LC) and 70 controls with a non-neoplastic condition that was unrelated to diet, smoking, or alcohol. A food-frequency questionnaire determined the mean consumption of 113 different items during the 3 years prior to symptom onset. Total energy intake and cooking mode were also noted. The relative risk, odds ratio (OR), and 95% confidence interval (CI) were estimated by multiple logistic regression analysis. We found that the total energy intake was significantly higher in the LC group (p < 0.001), and that the difference remained statistically significant after logistic regression analysis (p < 0.001; OR: 118.70). Notably, meat consumption was higher in the LC group (p < 0.001), and the difference remained significant after logistic regression analysis (p = 0.029; OR: 1.16). LC patients also consumed significantly more fried food (p = 0.036); this difference also remained significant in the logistic regression model (p = 0.026; OR: 5.45). The LC group also consumed significantly more seafood (p = 0.012); the difference persisted after logistic regression analysis (p = 0.009; OR: 2.48), with the consumption of shrimp proving detrimental (p = 0.049; OR: 2.18). Finally, the intake of zinc was significantly higher in the LC group before and after logistic regression analysis (p = 0.034 and p = 0.011; OR: 30.15, respectively). Cereal consumption (including pastas) was also higher among the LC patients (p = 0.043), with logistic regression analysis showing that their negative effect was possibly associated with the sauces and dressings that traditionally accompany pasta dishes (p = 0.006; OR: 4.78). Conversely, a higher consumption of dairy products was found in controls (p < 0.05); logistic regression analysis showed that calcium appeared to be protective at the micronutrient level (p < 0.001; OR: 0.27). We found no difference in the overall consumption of fruits and vegetables between the LC patients and controls; however, the LC patients did have a greater consumption of cooked tomatoes and cooked root vegetables (p = 0.039 for both), and the controls had more consumption of leeks (p = 0.042) and, among controls younger than 65 years, cooked beans (p = 0.037). Lemon (p = 0.037), squeezed fruit juice (p = 0.032), and watermelon (p = 0.018) were also more frequently consumed by the controls. Other differences at the micronutrient level included greater consumption by the LC patients of retinol (p = 0.044), polyunsaturated fats (p = 0.041), and linoleic acid (p = 0.008); LC patients younger than 65 years also had greater intake of riboflavin (p = 0.045). We conclude that the differences in dietary consumption patterns between LC patients and controls indicate a possible role for lifestyle modifications involving nutritional factors as a means of decreasing the risk of laryngeal cancer.
ERIC Educational Resources Information Center
Guler, Nese; Penfield, Randall D.
2009-01-01
In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF…
ERIC Educational Resources Information Center
Le, Huy; Marcus, Justin
2012-01-01
This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…
Predicting Student Success on the Texas Chemistry STAAR Test: A Logistic Regression Analysis
ERIC Educational Resources Information Center
Johnson, William L.; Johnson, Annabel M.; Johnson, Jared
2012-01-01
Background: The context is the new Texas STAAR end-of-course testing program. Purpose: The authors developed a logistic regression model to predict who would pass-or-fail the new Texas chemistry STAAR end-of-course exam. Setting: Robert E. Lee High School (5A) with an enrollment of 2700 students, Tyler, Texas. Date of the study was the 2011-2012…
Susan L. King
2003-01-01
The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.
2008-01-01
Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of southern California. This study demonstrates that logistic regression is a valuable tool for developing models that predict the probability of debris flows occurring in recently burned landscapes.
Hein, R; Abbas, S; Seibold, P; Salazar, R; Flesch-Janys, D; Chang-Claude, J
2012-01-01
Menopausal hormone therapy (MHT) is associated with an increased breast cancer risk in postmenopausal women, with combined estrogen-progestagen therapy posing a greater risk than estrogen monotherapy. However, few studies focused on potential effect modification of MHT-associated breast cancer risk by genetic polymorphisms in the progesterone metabolism. We assessed effect modification of MHT use by five coding single nucleotide polymorphisms (SNPs) in the progesterone metabolizing enzymes AKR1C3 (rs7741), AKR1C4 (rs3829125, rs17134592), and SRD5A1 (rs248793, rs3736316) using a two-center population-based case-control study from Germany with 2,502 postmenopausal breast cancer patients and 4,833 matched controls. An empirical-Bayes procedure that tests for interaction using a weighted combination of the prospective and the retrospective case-control estimators as well as standard prospective logistic regression were applied to assess multiplicative statistical interaction between polymorphisms and duration of MHT use with regard to breast cancer risk assuming a log-additive mode of inheritance. No genetic marginal effects were observed. Breast cancer risk associated with duration of combined therapy was significantly modified by SRD5A1_rs3736316, showing a reduced risk elevation in carriers of the minor allele (p (interaction,empirical-Bayes) = 0.006 using the empirical-Bayes method, p (interaction,logistic regression) = 0.013 using logistic regression). The risk associated with duration of use of monotherapy was increased by AKR1C3_rs7741 in minor allele carriers (p (interaction,empirical-Bayes) = 0.083, p (interaction,logistic regression) = 0.029) and decreased in minor allele carriers of two SNPs in AKR1C4 (rs3829125: p (interaction,empirical-Bayes) = 0.07, p (interaction,logistic regression) = 0.021; rs17134592: p (interaction,empirical-Bayes) = 0.101, p (interaction,logistic regression) = 0.038). After Bonferroni correction for multiple testing only SRD5A1_rs3736316 assessed using the empirical-Bayes method remained significant. Postmenopausal breast cancer risk associated with combined therapy may be modified by genetic variation in SRD5A1. Further well-powered studies are, however, required to replicate our finding.
Avionics Reliability, Its Techniques and Related Disciplines.
1979-10-01
USAF F-16s. C.J.P.Haynes, UK You said that if one of the 5 nations consumes more than its fair share of the combined spares pool then the item manager ... MANAGEMENT OF THE AVIONIC SYSTEM OF A MILITARY STRIKE AIRCRAFT by A.P.White and J.D.Pavier 29 SESSION IV - SOFTWARE RELIABILITY’ INTRODUCTION TO...ASPECT by D.J.Harris 37 SESSION V - AVIONICS LOGISTICS SUPPORT ASPECTS INTEGRATED LOGISTICS SUPPORT ADDS ANOTHER DIMENSION TO MATRIX MANAGEMENT by
Applications of statistics to medical science, III. Correlation and regression.
Watanabe, Hiroshi
2012-01-01
In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.
Maternal employment and childhood overweight in low- and middle-income countries.
Oddo, Vanessa M; Mueller, Noel T; Pollack, Keshia M; Surkan, Pamela J; Bleich, Sara N; Jones-Smith, Jessica C
2017-10-01
To investigate the association between maternal employment and childhood overweight in low- and middle-income countries (LMIC). Design/Setting We utilized cross-sectional data from forty-five Demographic and Health Surveys from 2010 to 2016 (n 268 763). Mothers were categorized as formally employed, informally employed or non-employed. We used country-specific logistic regression models to investigate the association between maternal employment and childhood overweight (BMI Z-score>2) and assessed heterogeneity in the association by maternal education with the inclusion of an interaction term. We used meta-analysis to pool the associations across countries. Sensitivity analyses included modelling BMI Z-score and normal weight (weight-for-age Z-score≥-2 to <2) as outcomes. Participants included children 0-5 years old and their mothers (aged 18-49 years). In most countries, neither formal nor informal employment was associated with childhood overweight. However, children of employed mothers, compared with children of non-employed mothers, had higher BMI Z-score and higher odds of normal weight. In countries where the association varied by education, children of formally employed women with high education, compared with children of non-employed women with high education, had higher odds of overweight (pooled OR=1·2; 95 % CI 1·0, 1·4). We find no clear association between employment and child overweight. However, maternal employment is associated with a modestly higher BMI Z-score and normal weight, suggesting that employment is currently associated with beneficial effects on children's weight status in most LMIC.
Gallicchio, Lisa; Helzlsouer, Kathy J; Chow, Wong-Ho; Freedman, D Michal; Hankinson, Susan E; Hartge, Patricia; Hartmuller, Virginia; Harvey, Chinonye; Hayes, Richard B; Horst, Ronald L; Koenig, Karen L; Kolonel, Laurence N; Laden, Francine; McCullough, Marjorie L; Parisi, Dominick; Purdue, Mark P; Shu, Xiao-Ou; Snyder, Kirk; Stolzenberg-Solomon, Rachael Z; Tworoger, Shelley S; Varanasi, Arti; Virtamo, Jarmo; Wilkens, Lynne R; Xiang, Yong-Bing; Yu, Kai; Zeleniuch-Jacquotte, Anne; Zheng, Wei; Abnet, Christian C; Albanes, Demetrius; Bertrand, Kimberly; Weinstein, Stephanie J
2010-07-01
The Cohort Consortium Vitamin D Pooling Project of Rarer Cancers (VDPP), a consortium of 10 prospective cohort studies from the United States, Finland, and China, was formed to examine the associations between circulating 25-hydroxyvitamin D (25(OH)D) concentrations and the risk of rarer cancers. Cases (total n = 5,491) included incident primary endometrial (n = 830), kidney (n = 775), ovarian (n = 516), pancreatic (n = 952), and upper gastrointestinal tract (n = 1,065) cancers and non-Hodgkin lymphoma (n = 1,353) diagnosed in the participating cohorts. At least 1 control was matched to each case on age, date of blood collection (1974-2006), sex, and race/ethnicity (n = 6,714). Covariate data were obtained from each cohort in a standardized manner. The majority of the serum or plasma samples were assayed in a central laboratory using a direct, competitive chemiluminescence immunoassay on the DiaSorin LIAISON platform (DiaSorin, Inc., Stillwater, Minnesota). Masked quality control samples included serum standards from the US National Institute of Standards and Technology. Conditional logistic regression analyses were conducted using clinically defined cutpoints, with 50-<75 nmol/L as the reference category. Meta-analyses were also conducted using inverse-variance weights in random-effects models. This consortium approach permits estimation of the association between 25(OH)D and several rarer cancers with high accuracy and precision across a wide range of 25(OH)D concentrations.
Zheng, Wei; Danforth, Kim N; Tworoger, Shelley S; Goodman, Marc T; Arslan, Alan A; Patel, Alpa V; McCullough, Marjorie L; Weinstein, Stephanie J; Kolonel, Laurence N; Purdue, Mark P; Shu, Xiao-Ou; Snyder, Kirk; Steplowski, Emily; Visvanathan, Kala; Yu, Kai; Zeleniuch-Jacquotte, Anne; Gao, Yu-Tang; Hankinson, Susan E; Harvey, Chinonye; Hayes, Richard B; Henderson, Brian E; Horst, Ronald L; Helzlsouer, Kathy J
2010-07-01
A role for vitamin D in ovarian cancer etiology is supported by ecologic studies of sunlight exposure, experimental mechanism studies, and some studies of dietary vitamin D intake and genetic polymorphisms in the vitamin D receptor. However, few studies have examined the association of circulating 25-hydroxyvitamin D (25(OH)D), an integrated measure of vitamin D status, with ovarian cancer risk. A nested case-control study was conducted among 7 prospective studies to evaluate the circulating 25(OH)D concentration in relation to epithelial ovarian cancer risk. Logistic regression models were used to estimate odds ratios and 95% confidence intervals among 516 cases and 770 matched controls. Compared with 25(OH)D concentrations of 50-<75 nmol/L, no statistically significant associations were observed for <37.5 (odds ratio (OR) = 1.21, 95% confidence interval (CI): 0.87, 1.70), 37.5-<50 (OR = 1.03, 95% CI: 0.75, 1.41), or > or =75 (OR = 1.11, 95% CI: 0.79, 1.55) nmol/L. Analyses stratified by tumor subtype, age, body mass index, and other variables were generally null but suggested an inverse association between 25(OH)D and ovarian cancer risk among women with a body mass index of > or =25 kg/m(2) (P(interaction) < 0.01). In conclusion, this large pooled analysis did not support an overall association between circulating 25(OH)D and ovarian cancer risk, except possibly among overweight women.
Vrieling, Alina; Lubin, Jay H.; Kraft, Peter; Mendelsohn, Julie B.; Hartge, Patricia; Canzian, Federico; Steplowski, Emily; Arslan, Alan A.; Gross, Myron; Helzlsouer, Kathy; Jacobs, Eric J.; LaCroix, Andrea; Petersen, Gloria; Zheng, Wei; Albanes, Demetrius; Amundadottir, Laufey; Bingham, Sheila A.; Boffetta, Paolo; Boutron-Ruault, Marie-Christine; Chanock, Stephen J.; Clipp, Sandra; Hoover, Robert N.; Jacobs, Kevin; Johnson, Karen C.; Kooperberg, Charles; Luo, Juhua; Messina, Catherine; Palli, Domenico; Patel, Alpa V.; Riboli, Elio; Shu, Xiao-Ou; Rodriguez Suarez, Laudina; Thomas, Gilles; Tjønneland, Anne; Tobias, Geoffrey S.; Tong, Elissa; Trichopoulos, Dimitrios; Virtamo, Jarmo; Ye, Weimin; Yu, Kai; Zeleniuch-Jacquette, Anne; Bueno-de-Mesquita, H. Bas; Stolzenberg-Solomon, Rachael Z.
2009-01-01
Smoking is an established risk factor for pancreatic cancer; however, detailed examination of the association of smoking intensity, smoking duration, and cumulative smoking dose with pancreatic cancer is limited. The authors analyzed pooled data from the international Pancreatic Cancer Cohort Consortium nested case-control study (1,481 cases, 1,539 controls). Odds ratios and 95% confidence intervals were calculated by using unconditional logistic regression. Smoking intensity effects were examined with an excess odds ratio model that was linear in pack-years and exponential in cigarettes smoked per day and its square. When compared with never smokers, current smokers had a significantly elevated risk (odds ratio (OR) = 1.77, 95% confidence interval (CI): 1.38, 2.26). Risk increased significantly with greater intensity (≥30 cigarettes/day: OR = 1.75, 95% CI: 1.27, 2.42), duration (≥50 years: OR = 2.13, 95% CI: 1.25, 3.62), and cumulative smoking dose (≥40 pack-years: OR = 1.78, 95% CI: 1.35, 2.34). Risk more than 15 years after smoking cessation was similar to that for never smokers. Estimates of excess odds ratio per pack-year declined with increasing intensity, suggesting greater risk for total exposure delivered at lower intensity for longer duration than for higher intensity for shorter duration. This finding and the decline in risk after smoking cessation suggest that smoking has a late-stage effect on pancreatic carcinogenesis. PMID:19561064
Suh, Sunghwan; Song, Sun Ok; Kim, Jae Hyeon; Cho, Hyungjin
2017-01-01
The present observational study aimed to evaluate the clinical effectiveness of vildagliptin with metformin in Korean patients with type 2 diabetes mellitus (T2DM). Data were pooled from the vildagliptin postmarketing survey (PMS), the vildagliptin/metformin fixed drug combination (DC) PMS, and a retrospective observational study of vildagliptin/metformin (fixed DC or free DC). The effectiveness endpoint was the proportion of patients who achieved a glycemic target (HbA1c) of ≤7.0% at 24 weeks. In total, 4303 patients were included in the analysis; of these, 2087 patients were eligible. The mean patient age was 56.99 ± 11.25 years. Overall, 58.94% patients achieved an HbA1c target of ≤7.0% at 24 weeks. The glycemic target achievement rate was significantly greater in patients with baseline HbA1c < 7.5% versus ≥7.5% (84.64% versus 43.97%), receiving care at the hospital versus clinic (67.95% versus 52.33%), and receiving vildagliptin/metformin fixed DC versus free DC (70.69% versus 55.42%). Multivariate logistic regression analysis indicated that disease duration (P < 0.0001), baseline HbA1c (P < 0.0001), and DC type (P = 0.0103) had significant effects on drug effectiveness. Vildagliptin plus metformin appeared as an effective treatment option for patients with T2DM in clinical practice settings in Korea. PMID:29057274
Amo-Adjei, Joshua; Anamaale Tuoyire, Derek
2016-12-01
We analysed the extent of planned, mistimed and unwanted pregnancies and how they predict optimal use of prenatal (timing and number of antenatal) care services in 30 African countries. We pooled data from Demographic and Health Surveys conducted in 30 African countries between 2006 and 2015. We described the extent of mistimed and unwanted pregnancies and further used mixed effects logistic and Poisson regression estimation techniques to examine the impacts of planned, mistimed and unwanted pregnancies on the use of prenatal health services. In total, 73.65% of pregnancies in all countries were planned. Mistimed pregnancy ranged from 7.43% in Burkina Faso to 41.33% in Namibia. Unwanted pregnancies were most common in Swaziland (39.54%) and least common in Niger (0.74%). Timely (first trimester) initiation of ANC was 37% overall in all countries; the multicountry average number of ANC visits was optimal [4.1; 95% CI: 4.1-4.2] but with notable disparities between countries. Overall, mistimed and unwanted pregnancies were strongly associated with late ANC attendance and fewer visits women made in the pooled analysis. Unintended pregnancies are critical risks to achieving improved maternal health in respect of early and optimal ANC coverage for women in Africa. Programmes targeted at advancing coverage of ANC in Africa need to deploy contextually appropriate mechanisms to prevent unintended pregnancies. © 2016 John Wiley & Sons Ltd.
The Link between Mastery and Depression among Black Adolescents; Ethnic and Gender Differences
Assari, Shervin; Caldwell, Cleopatra Howard
2017-01-01
Purpose: Although the link between depression and lower levels of mastery is well established, limited information exists on ethnic and gender differences in the association between the two. The current study investigated ethnic, gender, and ethnic by gender differences in the link between major depressive disorder (MDD) and low mastery in the United States. Methods: We used data from the National Survey of American Life-Adolescent supplement (NSAL-A), 2003–2004. In total, 1170 Black adolescents entered the study. This number was composed of 810 African-American and 360 Caribbean Black youth (age 13 to 17). Demographic factors, socioeconomic status (family income), mastery (sense of control over life), and MDD (Composite International Diagnostic Interview, CIDI) were measured. Logistic regressions were used to test the association between mastery and MDD in the pooled sample, as well as based on ethnicity and gender. Results: In the pooled sample, a higher sense of mastery was associated with a lower risk of MDD. This association, however, was significant for African Americans but not Caribbean Blacks. Similarly, among African American males and females, higher mastery was associated with lower risk of MDD. Such association could not be found for Caribbean Black males or females. Conclusion: Findings indicate ethnic rather than gender differences in the association between depression and mastery among Black youth. Further research is needed to understand how cultural values and life experiences may alter the link between depression and mastery among ethnically diverse Black youth. PMID:28498355
Gallicchio, Lisa; Helzlsouer, Kathy J.; Chow, Wong-Ho; Freedman, D. Michal; Hankinson, Susan E.; Hartge, Patricia; Hartmuller, Virginia; Harvey, Chinonye; Hayes, Richard B.; Horst, Ronald L.; Koenig, Karen L.; Kolonel, Laurence N.; Laden, Francine; McCullough, Marjorie L.; Parisi, Dominick; Purdue, Mark P.; Shu, Xiao-Ou; Snyder, Kirk; Stolzenberg-Solomon, Rachael Z.; Tworoger, Shelley S.; Varanasi, Arti; Virtamo, Jarmo; Wilkens, Lynne R.; Xiang, Yong-Bing; Yu, Kai; Zeleniuch-Jacquotte, Anne; Zheng, Wei; Abnet, Christian C.; Albanes, Demetrius; Bertrand, Kimberly; Weinstein, Stephanie J.
2010-01-01
The Cohort Consortium Vitamin D Pooling Project of Rarer Cancers (VDPP), a consortium of 10 prospective cohort studies from the United States, Finland, and China, was formed to examine the associations between circulating 25-hydroxyvitamin D (25(OH)D) concentrations and the risk of rarer cancers. Cases (total n = 5,491) included incident primary endometrial (n = 830), kidney (n = 775), ovarian (n = 516), pancreatic (n = 952), and upper gastrointestinal tract (n = 1,065) cancers and non-Hodgkin lymphoma (n = 1,353) diagnosed in the participating cohorts. At least 1 control was matched to each case on age, date of blood collection (1974–2006), sex, and race/ethnicity (n = 6,714). Covariate data were obtained from each cohort in a standardized manner. The majority of the serum or plasma samples were assayed in a central laboratory using a direct, competitive chemiluminescence immunoassay on the DiaSorin LIAISON platform (DiaSorin, Inc., Stillwater, Minnesota). Masked quality control samples included serum standards from the US National Institute of Standards and Technology. Conditional logistic regression analyses were conducted using clinically defined cutpoints, with 50–<75 nmol/L as the reference category. Meta-analyses were also conducted using inverse-variance weights in random-effects models. This consortium approach permits estimation of the association between 25(OH)D and several rarer cancers with high accuracy and precision across a wide range of 25(OH)D concentrations. PMID:20562188
Su, Wei-Ju; Chan, Ta-Chien; Chuang, Pei-Hung; Liu, Yu-Lun; Lee, Ping-Ing; Liu, Ming-Tsan; Chuang, Jen-Hsiang
2015-01-01
We aimed to estimate the pooled vaccine effectiveness (VE) in children over five winters through data linkage of two existing surveillance systems. Five test-negative case-control studies were conducted from November to February during the 2004/2005 to 2008/2009 seasons. Sentinel physicians from the Viral Surveillance Network enrolled children aged 6-59 months with influenza-like illness to collect throat swabs. Through linking with a nationwide vaccination registry, we measured the VE with a logistic regression model adjusting for age, gender, and week of symptom onset. Both fixed-effects and random-effects models were used in the meta-analysis. Four thousand four hundred and ninety-four subjects were included. The proportion of influenza test-positive subjects across the five seasons was 11.5% (132/1151), 7.2% (41/572), 23.9% (189/791), 6.6% (75/1135), and 11.2% (95/845), respectively. The pooled VE was 62% (95% confidence interval (CI) 48-83%) in both meta-analysis models. By age category, VE was 51% (95% CI 23-68%) for those aged 6-23 months and 75% (95% CI 60-84%) for those aged 24-59 months. Influenza vaccination provided measurable protection against laboratory-confirmed influenza among children aged 6-59 months despite variations in the vaccine match during the 2004/2005 to 2008/2009 influenza seasons in Taiwan. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Schell, Greggory J; Lavieri, Mariel S; Stein, Joshua D; Musch, David C
2013-12-21
Open-angle glaucoma (OAG) is a prevalent, degenerate ocular disease which can lead to blindness without proper clinical management. The tests used to assess disease progression are susceptible to process and measurement noise. The aim of this study was to develop a methodology which accounts for the inherent noise in the data and improve significant disease progression identification. Longitudinal observations from the Collaborative Initial Glaucoma Treatment Study (CIGTS) were used to parameterize and validate a Kalman filter model and logistic regression function. The Kalman filter estimates the true value of biomarkers associated with OAG and forecasts future values of these variables. We develop two logistic regression models via generalized estimating equations (GEE) for calculating the probability of experiencing significant OAG progression: one model based on the raw measurements from CIGTS and another model based on the Kalman filter estimates of the CIGTS data. Receiver operating characteristic (ROC) curves and associated area under the ROC curve (AUC) estimates are calculated using cross-fold validation. The logistic regression model developed using Kalman filter estimates as data input achieves higher sensitivity and specificity than the model developed using raw measurements. The mean AUC for the Kalman filter-based model is 0.961 while the mean AUC for the raw measurements model is 0.889. Hence, using the probability function generated via Kalman filter estimates and GEE for logistic regression, we are able to more accurately classify patients and instances as experiencing significant OAG progression. A Kalman filter approach for estimating the true value of OAG biomarkers resulted in data input which improved the accuracy of a logistic regression classification model compared to a model using raw measurements as input. This methodology accounts for process and measurement noise to enable improved discrimination between progression and nonprogression in chronic diseases.
Computing group cardinality constraint solutions for logistic regression problems.
Zhang, Yong; Kwon, Dongjin; Pohl, Kilian M
2017-01-01
We derive an algorithm to directly solve logistic regression based on cardinality constraint, group sparsity and use it to classify intra-subject MRI sequences (e.g. cine MRIs) of healthy from diseased subjects. Group cardinality constraint models are often applied to medical images in order to avoid overfitting of the classifier to the training data. Solutions within these models are generally determined by relaxing the cardinality constraint to a weighted feature selection scheme. However, these solutions relate to the original sparse problem only under specific assumptions, which generally do not hold for medical image applications. In addition, inferring clinical meaning from features weighted by a classifier is an ongoing topic of discussion. Avoiding weighing features, we propose to directly solve the group cardinality constraint logistic regression problem by generalizing the Penalty Decomposition method. To do so, we assume that an intra-subject series of images represents repeated samples of the same disease patterns. We model this assumption by combining series of measurements created by a feature across time into a single group. Our algorithm then derives a solution within that model by decoupling the minimization of the logistic regression function from enforcing the group sparsity constraint. The minimum to the smooth and convex logistic regression problem is determined via gradient descent while we derive a closed form solution for finding a sparse approximation of that minimum. We apply our method to cine MRI of 38 healthy controls and 44 adult patients that received reconstructive surgery of Tetralogy of Fallot (TOF) during infancy. Our method correctly identifies regions impacted by TOF and generally obtains statistically significant higher classification accuracy than alternative solutions to this model, i.e., ones relaxing group cardinality constraints. Copyright © 2016 Elsevier B.V. All rights reserved.
Ren, Yilong; Wang, Yunpeng; Wu, Xinkai; Yu, Guizhen; Ding, Chuan
2016-10-01
Red light running (RLR) has become a major safety concern at signalized intersection. To prevent RLR related crashes, it is critical to identify the factors that significantly impact the drivers' behaviors of RLR, and to predict potential RLR in real time. In this research, 9-month's RLR events extracted from high-resolution traffic data collected by loop detectors from three signalized intersections were applied to identify the factors that significantly affect RLR behaviors. The data analysis indicated that occupancy time, time gap, used yellow time, time left to yellow start, whether the preceding vehicle runs through the intersection during yellow, and whether there is a vehicle passing through the intersection on the adjacent lane were significantly factors for RLR behaviors. Furthermore, due to the rare events nature of RLR, a modified rare events logistic regression model was developed for RLR prediction. The rare events logistic regression method has been applied in many fields for rare events studies and shows impressive performance, but so far none of previous research has applied this method to study RLR. The results showed that the rare events logistic regression model performed significantly better than the standard logistic regression model. More importantly, the proposed RLR prediction method is purely based on loop detector data collected from a single advance loop detector located 400 feet away from stop-bar. This brings great potential for future field applications of the proposed method since loops have been widely implemented in many intersections and can collect data in real time. This research is expected to contribute to the improvement of intersection safety significantly. Copyright © 2016 Elsevier Ltd. All rights reserved.
Engoren, Milo; Habib, Robert H; Dooner, John J; Schwann, Thomas A
2013-08-01
As many as 14 % of patients undergoing coronary artery bypass surgery are readmitted within 30 days. Readmission is usually the result of morbidity and may lead to death. The purpose of this study is to develop and compare statistical and genetic programming models to predict readmission. Patients were divided into separate Construction and Validation populations. Using 88 variables, logistic regression, genetic programs, and artificial neural nets were used to develop predictive models. Models were first constructed and tested on the Construction populations, then validated on the Validation population. Areas under the receiver operator characteristic curves (AU ROC) were used to compare the models. Two hundred and two patients (7.6 %) in the 2,644 patient Construction group and 216 (8.0 %) of the 2,711 patient Validation group were re-admitted within 30 days of CABG surgery. Logistic regression predicted readmission with AU ROC = .675 ± .021 in the Construction group. Genetic programs significantly improved the accuracy, AU ROC = .767 ± .001, p < .001). Artificial neural nets were less accurate with AU ROC = 0.597 ± .001 in the Construction group. Predictive accuracy of all three techniques fell in the Validation group. However, the accuracy of genetic programming (AU ROC = .654 ± .001) was still trivially but statistically non-significantly better than that of the logistic regression (AU ROC = .644 ± .020, p = .61). Genetic programming and logistic regression provide alternative methods to predict readmission that are similarly accurate.
Eken, Cenker; Bilge, Ugur; Kartal, Mutlu; Eray, Oktay
2009-06-03
Logistic regression is the most common statistical model for processing multivariate data in the medical literature. Artificial intelligence models like an artificial neural network (ANN) and genetic algorithm (GA) may also be useful to interpret medical data. The purpose of this study was to perform artificial intelligence models on a medical data sheet and compare to logistic regression. ANN, GA, and logistic regression analysis were carried out on a data sheet of a previously published article regarding patients presenting to an emergency department with flank pain suspicious for renal colic. The study population was composed of 227 patients: 176 patients had a diagnosis of urinary stone, while 51 ultimately had no calculus. The GA found two decision rules in predicting urinary stones. Rule 1 consisted of being male, pain not spreading to back, and no fever. In rule 2, pelvicaliceal dilatation on bedside ultrasonography replaced no fever. ANN, GA rule 1, GA rule 2, and logistic regression had a sensitivity of 94.9, 67.6, 56.8, and 95.5%, a specificity of 78.4, 76.47, 86.3, and 47.1%, a positive likelihood ratio of 4.4, 2.9, 4.1, and 1.8, and a negative likelihood ratio of 0.06, 0.42, 0.5, and 0.09, respectively. The area under the curve was found to be 0.867, 0.720, 0.715, and 0.713 for all applications, respectively. Data mining techniques such as ANN and GA can be used for predicting renal colic in emergency settings and to constitute clinical decision rules. They may be an alternative to conventional multivariate analysis applications used in biostatistics.
NASA Astrophysics Data System (ADS)
Duman, T. Y.; Can, T.; Gokceoglu, C.; Nefeslioglu, H. A.; Sonmez, H.
2006-11-01
As a result of industrialization, throughout the world, cities have been growing rapidly for the last century. One typical example of these growing cities is Istanbul, the population of which is over 10 million. Due to rapid urbanization, new areas suitable for settlement and engineering structures are necessary. The Cekmece area located west of the Istanbul metropolitan area is studied, because the landslide activity is extensive in this area. The purpose of this study is to develop a model that can be used to characterize landslide susceptibility in map form using logistic regression analysis of an extensive landslide database. A database of landslide activity was constructed using both aerial-photography and field studies. About 19.2% of the selected study area is covered by deep-seated landslides. The landslides that occur in the area are primarily located in sandstones with interbedded permeable and impermeable layers such as claystone, siltstone and mudstone. About 31.95% of the total landslide area is located at this unit. To apply logistic regression analyses, a data matrix including 37 variables was constructed. The variables used in the forwards stepwise analyses are different measures of slope, aspect, elevation, stream power index (SPI), plan curvature, profile curvature, geology, geomorphology and relative permeability of lithological units. A total of 25 variables were identified as exerting strong influence on landslide occurrence, and included by the logistic regression equation. Wald statistics values indicate that lithology, SPI and slope are more important than the other parameters in the equation. Beta coefficients of the 25 variables included the logistic regression equation provide a model for landslide susceptibility in the Cekmece area. This model is used to generate a landslide susceptibility map that correctly classified 83.8% of the landslide-prone areas.
Poulton, B.C.; Allert, A.L.
2012-01-01
A habitat-based aquatic macroinvertebrate study was initiated in the Lower Missouri River to evaluate relative quality and biological condition of dike pool habitats. Water-quality and sediment-quality parameters and macroinvertebrate assemblage structure were measured from depositional substrates at 18 sites. Sediment porewater was analysed for ammonia, sulphide, pH and oxidation-reduction potential. Whole sediments were analysed for particle-size distribution, organic carbon and contaminants. Field water-quality parameters were measured at subsurface and at the sediment-water interface. Pool area adjacent and downstream from each dike was estimated from aerial photography. Macroinvertebrate biotic condition scores were determined by integrating the following indicator response metrics: % of Ephemeroptera (mayflies), % of Oligochaeta worms, Shannon Diversity Index and total taxa richness. Regression models were developed for predicting macroinvertebrate scores based on individual water-quality and sediment-quality variables and a water/sediment-quality score that integrated all variables. Macroinvertebrate scores generated significant determination coefficients with dike pool area (R2=0.56), oxidation–reduction potential (R2=0.81) and water/sediment-quality score (R2=0.71). Dissolved oxygen saturation, oxidation-reduction potential and total ammonia in sediment porewater were most important in explaining variation in macroinvertebrate scores. The best two-variable regression models included dike pool size + the water/sediment-quality score (R2=0.84) and dike pool size + oxidation-reduction potential (R2=0.93). Results indicate that dike pool size and chemistry of sediments and overlying water can be used to evaluate dike pool quality and identify environmental conditions necessary for optimizing diversity and productivity of important aquatic macroinvertebrates. A combination of these variables could be utilized for measuring the success of habitat enhancement activities currently being implemented in this system.
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2016-06-30
Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.
NASA Astrophysics Data System (ADS)
Kneringer, Philipp; Dietz, Sebastian; Mayr, Georg J.; Zeileis, Achim
2017-04-01
Low-visibility conditions have a large impact on aviation safety and economic efficiency of airports and airlines. To support decision makers, we develop a statistical probabilistic nowcasting tool for the occurrence of capacity-reducing operations related to low visibility. The probabilities of four different low visibility classes are predicted with an ordered logistic regression model based on time series of meteorological point measurements. Potential predictor variables for the statistical models are visibility, humidity, temperature and wind measurements at several measurement sites. A stepwise variable selection method indicates that visibility and humidity measurements are the most important model inputs. The forecasts are tested with a 30 minute forecast interval up to two hours, which is a sufficient time span for tactical planning at Vienna Airport. The ordered logistic regression models outperform persistence and are competitive with human forecasters.
Wang, Shuang; Jiang, Xiaoqian; Wu, Yuan; Cui, Lijuan; Cheng, Samuel; Ohno-Machado, Lucila
2013-06-01
We developed an EXpectation Propagation LOgistic REgRession (EXPLORER) model for distributed privacy-preserving online learning. The proposed framework provides a high level guarantee for protecting sensitive information, since the information exchanged between the server and the client is the encrypted posterior distribution of coefficients. Through experimental results, EXPLORER shows the same performance (e.g., discrimination, calibration, feature selection, etc.) as the traditional frequentist logistic regression model, but provides more flexibility in model updating. That is, EXPLORER can be updated one point at a time rather than having to retrain the entire data set when new observations are recorded. The proposed EXPLORER supports asynchronized communication, which relieves the participants from coordinating with one another, and prevents service breakdown from the absence of participants or interrupted communications. Copyright © 2013 Elsevier Inc. All rights reserved.
A computational approach to compare regression modelling strategies in prediction research.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
2016-08-25
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
Cakir, Ebru; Kucuk, Ulku; Pala, Emel Ebru; Sezer, Ozlem; Ekin, Rahmi Gokhan; Cakmak, Ozgur
2017-05-01
Conventional cytomorphologic assessment is the first step to establish an accurate diagnosis in urinary cytology. In cytologic preparations, the separation of low-grade urothelial carcinoma (LGUC) from reactive urothelial proliferation (RUP) can be exceedingly difficult. The bladder washing cytologies of 32 LGUC and 29 RUP were reviewed. The cytologic slides were examined for the presence or absence of the 28 cytologic features. The cytologic criteria showing statistical significance in LGUC were increased numbers of monotonous single (non-umbrella) cells, three-dimensional cellular papillary clusters without fibrovascular cores, irregular bordered clusters, atypical single cells, irregular nuclear overlap, cytoplasmic homogeneity, increased N/C ratio, pleomorphism, nuclear border irregularity, nuclear eccentricity, elongated nuclei, and hyperchromasia (p ˂ 0.05), and the cytologic criteria showing statistical significance in RUP were inflammatory background, mixture of small and large urothelial cells, loose monolayer aggregates, and vacuolated cytoplasm (p ˂ 0.05). When these variables were subjected to a stepwise logistic regression analysis, four features were selected to distinguish LGUC from RUP: increased numbers of monotonous single (non-umbrella) cells, increased nuclear cytoplasmic ratio, hyperchromasia, and presence of small and large urothelial cells (p = 0.0001). By this logistic model of the 32 cases with proven LGUC, the stepwise logistic regression analysis correctly predicted 31 (96.9%) patients with this diagnosis, and of the 29 patients with RUP, the logistic model correctly predicted 26 (89.7%) patients as having this disease. There are several cytologic features to separate LGUC from RUP. Stepwise logistic regression analysis is a valuable tool for determining the most useful cytologic criteria to distinguish these entities. © 2017 APMIS. Published by John Wiley & Sons Ltd.
Obesity and the Risk of Papillary Thyroid Cancer: A Pooled Analysis of Three Case–Control Studies
Xu, Li; Port, Matthias; Landi, Stefano; Gemignani, Federica; Cipollini, Monica; Elisei, Rossella; Goudeva, Lilia; Müller, Jörg Andreas; Nerlich, Kai; Pellegrini, Giovanni; Reiners, Christoph; Romei, Cristina; Schwab, Robert; Abend, Michael
2014-01-01
Background: There is a correlation between temporal trends of obesity prevalence and papillary thyroid cancer (PTC) incidence in the United States. Obesity is a well-recognized risk factor for many cancers, but there are few studies on the association between obesity and PTC risk. We investigated the association between anthropometric measurements and PTC risk using pooled individual data from three case–control populations. Methods: Height and weight information were obtained from three independent case–control studies, including 1917 patients with PTC (1360 women and 557 men) and 2127 cancer-free controls from the United States, Italy, and Germany. Body mass index (BMI), body fat percentage, and body surface area (BSA) were calculated. An unconditional logistic regression model was used to calculate odds ratios (ORs) and confidence intervals (CIs) with respect to risk of PTC, adjusted by age, sex, race/ethnicity, and study site. Results: In the pooled population, for both men and women, an increased risk of PTC was found to be associated with greater weight, BMI, body fat percentage, and BSA, whereas a reduced risk of PTC was associated with greater height, in the pooled population for both men and women. Compared with normal-weight subjects (BMI 18.5–24.9 kg/m2), the ORs for overweight (BMI 25–29.9 kg/m2) and obese (BMI≥30 kg/m2) subjects were 1.72 [CI 1.48–2.00] and 4.17 [CI 3.41–5.10] respectively. Compared with the lowest quartile of body fat percentage, the ORs for the highest quartile were 3.83 [CI 2.85–5.15] in women and 4.05 [CI 2.67–6.15] in men. Conclusion: Anthropometric factors, especially BMI and body fat percentage, were significantly associated with increased risk of PTC. Future studies of anthropometric factors and PTC that incorporate intermediate factors, including adiposity and hormone biomarkers, are essential to help clarify potential mechanisms of the relationship. PMID:24555500
Brinton, Louise A; Cook, Michael B; McCormack, Valerie; Johnson, Kenneth C; Olsson, Håkan; Casagrande, John T; Cooke, Rosie; Falk, Roni T; Gapstur, Susan M; Gaudet, Mia M; Gaziano, J Michael; Gkiokas, Georgios; Guénel, Pascal; Henderson, Brian E; Hollenbeck, Albert; Hsing, Ann W; Kolonel, Laurence N; Isaacs, Claudine; Lubin, Jay H; Michels, Karin B; Negri, Eva; Parisi, Dominick; Petridou, Eleni Th; Pike, Malcolm C; Riboli, Elio; Sesso, Howard D; Snyder, Kirk; Swerdlow, Anthony J; Trichopoulos, Dimitrios; Ursin, Giske; van den Brandt, Piet A; Van Den Eeden, Stephen K; Weiderpass, Elisabete; Willett, Walter C; Ewertz, Marianne; Thomas, David B
2014-03-01
The etiology of male breast cancer is poorly understood, partly because of its relative rarity. Although genetic factors are involved, less is known regarding the role of anthropometric and hormonally related risk factors. In the Male Breast Cancer Pooling Project, a consortium of 11 case-control and 10 cohort investigations involving 2405 case patients (n = 1190 from case-control and n = 1215 from cohort studies) and 52013 control subjects, individual participant data were harmonized and pooled. Unconditional logistic regression generated study design-specific (case-control/cohort) odds ratios (ORs) and 95% confidence intervals (CIs), with exposure estimates combined using fixed effects meta-analysis. All statistical tests were two-sided. Risk was statistically significantly associated with weight (highest/lowest tertile: OR = 1.36; 95% CI = 1.18 to 1.57), height (OR = 1.18; 95% CI = 1.01 to 1.38), and body mass index (BMI; OR = 1.30; 95% CI = 1.12 to 1.51), with evidence that recent rather than distant BMI was the strongest predictor. Klinefelter syndrome (OR = 24.7; 95% CI = 8.94 to 68.4) and gynecomastia (OR = 9.78; 95% CI = 7.52 to 12.7) were also statistically significantly associated with risk, relations that were independent of BMI. Diabetes also emerged as an independent risk factor (OR = 1.19; 95% CI = 1.04 to 1.37). There were also suggestive relations with cryptorchidism (OR = 2.18; 95% CI = 0.96 to 4.94) and orchitis (OR = 1.43; 95% CI = 1.02 to 1.99). Although age at onset of puberty and histories of infertility were unrelated to risk, never having had children was statistically significantly related (OR = 1.29; 95% CI = 1.01 to 1.66). Among individuals diagnosed at older ages, a history of fractures was statistically significantly related (OR = 1.41; 95% CI = 1.07 to 1.86). Consistent findings across case-control and cohort investigations, complemented by pooled analyses, indicated important roles for anthropometric and hormonal risk factors in the etiology of male breast cancer. Further investigation should focus on potential roles of endogenous hormones.
Science of Test Research Consortium: Year Two Final Report
2012-10-02
July 2012. Analysis of an Intervention for Small Unmanned Aerial System ( SUAS ) Accidents, submitted to Quality Engineering, LQEN-2012-0056. Stone... Systems Engineering. Wolf, S. E., R. R. Hill, and J. J. Pignatiello. June 2012. Using Neural Networks and Logistic Regression to Model Small Unmanned ...Human Retina. 6. Wolf, S. E. March 2012. Modeling Small Unmanned Aerial System Mishaps using Logistic Regression and Artificial Neural Networks. 7
ERIC Educational Resources Information Center
Hidalgo, Mª Dolores; Gómez-Benito, Juana; Zumbo, Bruno D.
2014-01-01
The authors analyze the effectiveness of the R[superscript 2] and delta log odds ratio effect size measures when using logistic regression analysis to detect differential item functioning (DIF) in dichotomous items. A simulation study was carried out, and the Type I error rate and power estimates under conditions in which only statistical testing…
Brian S. Cade; Barry R. Noon; Rick D. Scherer; John J. Keane
2017-01-01
Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical...
Mohammed, Mohammed A; Manktelow, Bradley N; Hofer, Timothy P
2016-04-01
There is interest in deriving case-mix adjusted standardised mortality ratios so that comparisons between healthcare providers, such as hospitals, can be undertaken in the controversial belief that variability in standardised mortality ratios reflects quality of care. Typically standardised mortality ratios are derived using a fixed effects logistic regression model, without a hospital term in the model. This fails to account for the hierarchical structure of the data - patients nested within hospitals - and so a hierarchical logistic regression model is more appropriate. However, four methods have been advocated for deriving standardised mortality ratios from a hierarchical logistic regression model, but their agreement is not known and neither do we know which is to be preferred. We found significant differences between the four types of standardised mortality ratios because they reflect a range of underlying conceptual issues. The most subtle issue is the distinction between asking how an average patient fares in different hospitals versus how patients at a given hospital fare at an average hospital. Since the answers to these questions are not the same and since the choice between these two approaches is not obvious, the extent to which profiling hospitals on mortality can be undertaken safely and reliably, without resolving these methodological issues, remains questionable. © The Author(s) 2012.
Chan, Siew Foong; Deeks, Jonathan J; Macaskill, Petra; Irwig, Les
2008-01-01
To compare three predictive models based on logistic regression to estimate adjusted likelihood ratios allowing for interdependency between diagnostic variables (tests). This study was a review of the theoretical basis, assumptions, and limitations of published models; and a statistical extension of methods and application to a case study of the diagnosis of obstructive airways disease based on history and clinical examination. Albert's method includes an offset term to estimate an adjusted likelihood ratio for combinations of tests. Spiegelhalter and Knill-Jones method uses the unadjusted likelihood ratio for each test as a predictor and computes shrinkage factors to allow for interdependence. Knottnerus' method differs from the other methods because it requires sequencing of tests, which limits its application to situations where there are few tests and substantial data. Although parameter estimates differed between the models, predicted "posttest" probabilities were generally similar. Construction of predictive models using logistic regression is preferred to the independence Bayes' approach when it is important to adjust for dependency of tests errors. Methods to estimate adjusted likelihood ratios from predictive models should be considered in preference to a standard logistic regression model to facilitate ease of interpretation and application. Albert's method provides the most straightforward approach.
Cameron, Isobel M; Scott, Neil W; Adler, Mats; Reid, Ian C
2014-12-01
It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF. Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ(2) procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners. Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive. Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.
NASA Astrophysics Data System (ADS)
Cao, Faxian; Yang, Zhijing; Ren, Jinchang; Ling, Wing-Kuen; Zhao, Huimin; Marshall, Stephen
2017-12-01
Although the sparse multinomial logistic regression (SMLR) has provided a useful tool for sparse classification, it suffers from inefficacy in dealing with high dimensional features and manually set initial regressor values. This has significantly constrained its applications for hyperspectral image (HSI) classification. In order to tackle these two drawbacks, an extreme sparse multinomial logistic regression (ESMLR) is proposed for effective classification of HSI. First, the HSI dataset is projected to a new feature space with randomly generated weight and bias. Second, an optimization model is established by the Lagrange multiplier method and the dual principle to automatically determine a good initial regressor for SMLR via minimizing the training error and the regressor value. Furthermore, the extended multi-attribute profiles (EMAPs) are utilized for extracting both the spectral and spatial features. A combinational linear multiple features learning (MFL) method is proposed to further enhance the features extracted by ESMLR and EMAPs. Finally, the logistic regression via the variable splitting and the augmented Lagrangian (LORSAL) is adopted in the proposed framework for reducing the computational time. Experiments are conducted on two well-known HSI datasets, namely the Indian Pines dataset and the Pavia University dataset, which have shown the fast and robust performance of the proposed ESMLR framework.
Latin hypercube approach to estimate uncertainty in ground water vulnerability
Gurdak, J.J.; McCray, J.E.; Thyne, G.; Qi, S.L.
2007-01-01
A methodology is proposed to quantify prediction uncertainty associated with ground water vulnerability models that were developed through an approach that coupled multivariate logistic regression with a geographic information system (GIS). This method uses Latin hypercube sampling (LHS) to illustrate the propagation of input error and estimate uncertainty associated with the logistic regression predictions of ground water vulnerability. Central to the proposed method is the assumption that prediction uncertainty in ground water vulnerability models is a function of input error propagation from uncertainty in the estimated logistic regression model coefficients (model error) and the values of explanatory variables represented in the GIS (data error). Input probability distributions that represent both model and data error sources of uncertainty were simultaneously sampled using a Latin hypercube approach with logistic regression calculations of probability of elevated nonpoint source contaminants in ground water. The resulting probability distribution represents the prediction intervals and associated uncertainty of the ground water vulnerability predictions. The method is illustrated through a ground water vulnerability assessment of the High Plains regional aquifer. Results of the LHS simulations reveal significant prediction uncertainties that vary spatially across the regional aquifer. Additionally, the proposed method enables a spatial deconstruction of the prediction uncertainty that can lead to improved prediction of ground water vulnerability. ?? 2007 National Ground Water Association.
Kupek, Emil
2006-03-15
Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models. A large data set with a known structure among two related outcomes and three independent variables was generated to investigate the use of Yule's transformation of odds ratio (OR) into Q-metric by (OR-1)/(OR+1) to approximate Pearson's correlation coefficients between binary variables whose covariance structure can be further analysed by SEM. Percent of correctly classified events and non-events was compared with the classification obtained by logistic regression. The performance of SEM based on Q-metric was also checked on a small (N = 100) random sample of the data generated and on a real data set. SEM successfully recovered the generated model structure. SEM of real data suggested a significant influence of a latent confounding variable which would have not been detectable by standard logistic regression. SEM classification performance was broadly similar to that of the logistic regression. The analysis of binary data can be greatly enhanced by Yule's transformation of odds ratios into estimated correlation matrix that can be further analysed by SEM. The interpretation of results is aided by expressing them as odds ratios which are the most frequently used measure of effect in medical statistics.
Nakajima, Kenichi; Nakata, Tomoaki; Matsuo, Shinro; Jacobson, Arnold F
2016-10-01
(123)I meta-iodobenzylguanidine (MIBG) imaging has been extensively used for prognostication in patients with chronic heart failure (CHF). The purpose of this study was to create mortality risk charts for short-term (2 years) and long-term (5 years) prediction of cardiac mortality. Using a pooled database of 1322 CHF patients, multivariate analysis, including (123)I-MIBG late heart-to-mediastinum ratio (HMR), left ventricular ejection fraction (LVEF), and clinical factors, was performed to determine optimal variables for the prediction of 2- and 5-year mortality risk using subsets of the patients (n = 1280 and 933, respectively). Multivariate logistic regression analysis was performed to create risk charts. Cardiac mortality was 10 and 22% for the sub-population of 2- and 5-year analyses. A four-parameter multivariate logistic regression model including age, New York Heart Association (NYHA) functional class, LVEF, and HMR was used. Annualized mortality rate was <1% in patients with NYHA Class I-II and HMR ≥ 2.0, irrespective of age and LVEF. In patients with NYHA Class III-IV, mortality rate was 4-6 times higher for HMR < 1.40 compared with HMR ≥ 2.0 in all LVEF classes. Among the subset of patients with b-type natriuretic peptide (BNP) results (n = 491 and 359 for 2- and 5-year models, respectively), the 5-year model showed incremental value of HMR in addition to BNP. Both 2- and 5-year risk prediction models with (123)I-MIBG HMR can be used to identify low-risk as well as high-risk patients, which can be effective for further risk stratification of CHF patients even when BNP is available. © The Author 2015. Published by Oxford University Press on behalf of the European Society of Cardiology.
Als-Nielsen, Bodil; Chen, Wendong; Gluud, Christian; Kjaergard, Lise L
2003-08-20
Previous studies indicate that industry-sponsored trials tend to draw proindustry conclusions. To explore whether the association between funding and conclusions in randomized drug trials reflects treatment effects or adverse events. Observational study of 370 randomized drug trials included in meta-analyses from Cochrane reviews selected from the Cochrane Library, May 2001. From a random sample of 167 Cochrane reviews, 25 contained eligible meta-analyses (assessed a binary outcome; pooled at least 5 full-paper trials of which at least 1 reported adequate and 1 reported inadequate allocation concealment). The primary binary outcome from each meta-analysis was considered the primary outcome for all trials included in each meta-analysis. The association between funding and conclusions was analyzed by logistic regression with adjustment for treatment effect, adverse events, and additional confounding factors (methodological quality, control intervention, sample size, publication year, and place of publication). Conclusions in trials, classified into whether the experimental drug was recommended as the treatment of choice or not. The experimental drug was recommended as treatment of choice in 16% of trials funded by nonprofit organizations, 30% of trials not reporting funding, 35% of trials funded by both nonprofit and for-profit organizations, and 51% of trials funded by for-profit organizations (P<.001; chi2 test). Logistic regression analyses indicated that funding, treatment effect, and double blinding were the only significant predictors of conclusions. Adjusted analyses showed that trials funded by for-profit organizations were significantly more likely to recommend the experimental drug as treatment of choice (odds ratio, 5.3; 95% confidence interval, 2.0-14.4) compared with trials funded by nonprofit organizations. This association did not appear to reflect treatment effect or adverse events. Conclusions in trials funded by for-profit organizations may be more positive due to biased interpretation of trial results. Readers should carefully evaluate whether conclusions in randomized trials are supported by data.
Vanacker, Peter; Heldner, Mirjam R; Amiguet, Michael; Faouzi, Mohamed; Cras, Patrick; Ntaios, George; Arnold, Marcel; Mattle, Heinrich P; Gralla, Jan; Fischer, Urs; Michel, Patrik
2016-06-01
Endovascular treatment for acute ischemic stroke with a large vessel occlusion was recently shown to be effective. We aimed to develop a score capable of predicting large vessel occlusion eligible for endovascular treatment in the early hospital management. Retrospective, cohort study. Two tertiary, Swiss stroke centers. Consecutive acute ischemic stroke patients (1,645 patients; Acute STroke Registry and Analysis of Lausanne registry), who had CT angiography within 6 and 12 hours of symptom onset, were categorized according to the occlusion site. Demographic and clinical information was used in logistic regression analysis to derive predictors of large vessel occlusion (defined as intracranial carotid, basilar, and M1 segment of middle cerebral artery occlusions). Based on logistic regression coefficients, an integer score was created and validated internally and externally (848 patients; Bernese Stroke Registry). None. Large vessel occlusions were present in 316 patients (21%) in the derivation and 566 (28%) in the external validation cohort. Five predictors added significantly to the score: National Institute of Health Stroke Scale at admission, hemineglect, female sex, atrial fibrillation, and no history of stroke and prestroke handicap (modified Rankin Scale score, < 2). Diagnostic accuracy in internal and external validation cohorts was excellent (area under the receiver operating characteristic curve, 0.84 both). The score performed slightly better than National Institute of Health Stroke Scale alone regarding prediction error (Wilcoxon signed rank test, p < 0.001) and regarding discriminatory power in derivation and pooled cohorts (area under the receiver operating characteristic curve, 0.81 vs 0.80; DeLong test, p = 0.02). Our score accurately predicts the presence of emergent large vessel occlusions, which are eligible for endovascular treatment. However, incorporation of additional demographic and historical information available on hospital arrival provides minimal incremental predictive value compared with the National Institute of Health Stroke Scale alone.
Harris, Holly R; Babic, Ana; Webb, Penelope M; Nagle, Christina M; Jordan, Susan J; Risch, Harvey A; Rossing, Mary Anne; Doherty, Jennifer A; Goodman, Marc T; Modugno, Francesmary; Ness, Roberta B; Moysich, Kirsten B; Kjær, Susanne K; Høgdall, Estrid; Jensen, Allan; Schildkraut, Joellen M; Berchuck, Andrew; Cramer, Daniel W; Bandera, Elisa V; Wentzensen, Nicolas; Kotsopoulos, Joanne; Narod, Steven A; Phelan, Catherine M; McLaughlin, John R; Anton-Culver, Hoda; Ziogas, Argyrios; Pearce, Celeste L; Wu, Anna H; Terry, Kathryn L
2018-02-01
Background: Polycystic ovary syndrome (PCOS), and one of its distinguishing characteristics, oligomenorrhea, have both been associated with ovarian cancer risk in some but not all studies. However, these associations have been rarely examined by ovarian cancer histotypes, which may explain the lack of clear associations reported in previous studies. Methods: We analyzed data from 14 case-control studies including 16,594 women with invasive ovarian cancer ( n = 13,719) or borderline ovarian disease ( n = 2,875) and 17,718 controls. Adjusted study-specific ORs were calculated using logistic regression and combined using random-effects meta-analysis. Pooled histotype-specific ORs were calculated using polytomous logistic regression. Results: Women reporting menstrual cycle length >35 days had decreased risk of invasive ovarian cancer compared with women reporting cycle length ≤35 days [OR = 0.70; 95% confidence interval (CI) = 0.58-0.84]. Decreased risk of invasive ovarian cancer was also observed among women who reported irregular menstrual cycles compared with women with regular cycles (OR = 0.83; 95% CI = 0.76-0.89). No significant association was observed between self-reported PCOS and invasive ovarian cancer risk (OR = 0.87; 95% CI = 0.65-1.15). There was a decreased risk of all individual invasive histotypes for women with menstrual cycle length >35 days, but no association with serous borderline tumors ( P heterogeneity = 0.006). Similarly, we observed decreased risks of most invasive histotypes among women with irregular cycles, but an increased risk of borderline serous and mucinous tumors ( P heterogeneity < 0.0001). Conclusions: Our results suggest that menstrual cycle characteristics influence ovarian cancer risk differentially based on histotype. Impact: These results highlight the importance of examining ovarian cancer risk factors associations by histologic subtype. Cancer Epidemiol Biomarkers Prev; 27(2); 174-82. ©2017 AACR . ©2017 American Association for Cancer Research.
Correlates of Circulating 25-Hydroxyvitamin D
McCullough, Marjorie L.; Weinstein, Stephanie J.; Freedman, D. Michal; Helzlsouer, Kathy; Flanders, W. Dana; Koenig, Karen; Kolonel, Laurence; Laden, Francine; Le Marchand, Loic; Purdue, Mark; Snyder, Kirk; Stevens, Victoria L.; Stolzenberg-Solomon, Rachael; Virtamo, Jarmo; Yang, Gong; Yu, Kai; Zheng, Wei; Albanes, Demetrius; Ashby, Jason; Bertrand, Kimberly; Cai, Hui; Chen, Yu; Gallicchio, Lisa; Giovannucci, Edward; Jacobs, Eric J.; Hankinson, Susan E.; Hartge, Patricia; Hartmuller, Virginia; Harvey, Chinonye; Hayes, Richard B.; Horst, Ronald L.; Shu, Xiao-Ou
2010-01-01
Low vitamin D status is common globally and is associated with multiple disease outcomes. Understanding the correlates of vitamin D status will help guide clinical practice, research, and interpretation of studies. Correlates of circulating 25-hydroxyvitamin D (25(OH)D) concentrations measured in a single laboratory were examined in 4,723 cancer-free men and women from 10 cohorts participating in the Cohort Consortium Vitamin D Pooling Project of Rarer Cancers, which covers a worldwide geographic area. Demographic and lifestyle characteristics were examined in relation to 25(OH)D using stepwise linear regression and polytomous logistic regression. The prevalence of 25(OH)D concentrations less than 25 nmol/L ranged from 3% to 36% across cohorts, and the prevalence of 25(OH)D concentrations less than 50 nmol/L ranged from 29% to 82%. Seasonal differences in circulating 25(OH)D were most marked among whites from northern latitudes. Statistically significant positive correlates of 25(OH)D included male sex, summer blood draw, vigorous physical activity, vitamin D intake, fish intake, multivitamin use, and calcium supplement use. Significant inverse correlates were body mass index, winter and spring blood draw, history of diabetes, sedentary behavior, smoking, and black race/ethnicity. Correlates varied somewhat within season, race/ethnicity, and sex. These findings help identify persons at risk for low vitamin D status for both clinical and research purposes. PMID:20562191
Proximity to Sports Facilities and Sports Participation for Adolescents in Germany
Reimers, Anne K.; Wagner, Matthias; Alvanides, Seraphim; Steinmayr, Andreas; Reiner, Miriam; Schmidt, Steffen; Woll, Alexander
2014-01-01
Objectives To assess the relationship between proximity to specific sports facilities and participation in the corresponding sports activities for adolescents in Germany. Methods A sample of 1,768 adolescents aged 11–17 years old and living in 161 German communities was examined. Distances to the nearest sports facilities were calculated as an indicator of proximity to sports facilities using Geographic Information Systems (GIS). Participation in specific leisure-time sports activities in sports clubs was assessed using a self-report questionnaire and individual-level socio-demographic variables were derived from a parent questionnaire. Community-level socio-demographics as covariates were selected from the INKAR database, in particular from indicators and maps on land development. Logistic regression analyses were conducted to examine associations between proximity to the nearest sports facilities and participation in the corresponding sports activities. Results The logisitic regression analyses showed that girls residing longer distances from the nearest gym were less likely to engage in indoor sports activities; a significant interaction between distances to gyms and level of urbanization was identified. Decomposition of the interaction term showed that for adolescent girls living in rural areas participation in indoor sports activities was positively associated with gym proximity. Proximity to tennis courts and indoor pools was not associated with participation in tennis or water sports, respectively. Conclusions Improved proximity to gyms is likely to be more important for female adolescents living in rural areas. PMID:24675689
Trends and Disparities in Disordered Eating among Heterosexual and Sexual Minority Adolescents
Watson, Ryan J.; Adjei, Jones; Saewyc, Elizabeth; Homma, Yuko; Goodenow, Carol
2018-01-01
Objective Disordered eating has decreased for all youth over time, but studies have not focused specifically on lesbian, gay, and bisexual(LGB) youth. Research has found that LGB youth report disordered eating behaviors more often compared to their heterosexual counterparts, but no studies have documented trends over time for LGB youth and considered whether these disparities are narrowing or widening across sexual orientation groups. Method We use pooled data from the 1999–2013 Massachusetts Youth Risk Behavior Surveys (N = 26,002) to investigate trends in purging, fasting, and using diet pills to lose or control weight for heterosexual and sexual minority youth. We used cross tabs, logistic regression, and interactions in regression models, stratified by sex. Results The prevalence of disordered eating has decreased on all three measures across nearly all groups of heterosexual and sexual minority youth. However, we found disparities in reported disordered eating behaviors for LGB youth persisted across all survey years, with LGB students reporting significantly higher prevalence of disordered eating than heterosexuals. The disparities in fasting to control weight widened between the first and last survey waves between lesbian and heterosexual females. Discussion The significant reductions over time in prevalence of disordered eating among some youth are encouraging, but the disparities remain. Indeed, the increasing prevalence of fasting, diet pill use, and purging to control weight among lesbians may warrant targeted prevention and intervention programs. PMID:27425253
Suzuki, Taku; Iwamoto, Takuji; Shizu, Kanae; Suzuki, Katsuji; Yamada, Harumoto; Sato, Kazuki
2017-05-01
This retrospective study was designed to investigate prognostic factors for postoperative outcomes for cubital tunnel syndrome (CubTS) using multiple logistic regression analysis with a large number of patients. Eighty-three patients with CubTS who underwent surgeries were enrolled. The following potential prognostic factors for disease severity were selected according to previous reports: sex, age, type of surgery, disease duration, body mass index, cervical lesion, presence of diabetes mellitus, Workers' Compensation status, preoperative severity, and preoperative electrodiagnostic testing. Postoperative severity of disease was assessed 2 years after surgery by Messina's criteria which is an outcome measure specifically for CubTS. Bivariate analysis was performed to select candidate prognostic factors for multiple linear regression analyses. Multiple logistic regression analysis was conducted to identify the association between postoperative severity and selected prognostic factors. Both bivariate and multiple linear regression analysis revealed only preoperative severity as an independent risk factor for poor prognosis, while other factors did not show any significant association. Although conflicting results exist regarding prognosis of CubTS, this study supports evidence from previous studies and concludes early surgical intervention portends the most favorable prognosis. Copyright © 2017 The Japanese Orthopaedic Association. Published by Elsevier B.V. All rights reserved.
Inverse sampling regression for pooled data.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Eskridge, Kent; Crossa, José
2017-06-01
Because pools are tested instead of individuals in group testing, this technique is helpful for estimating prevalence in a population or for classifying a large number of individuals into two groups at a low cost. For this reason, group testing is a well-known means of saving costs and producing precise estimates. In this paper, we developed a mixed-effect group testing regression that is useful when the data-collecting process is performed using inverse sampling. This model allows including covariate information at the individual level to incorporate heterogeneity among individuals and identify which covariates are associated with positive individuals. We present an approach to fit this model using maximum likelihood and we performed a simulation study to evaluate the quality of the estimates. Based on the simulation study, we found that the proposed regression method for inverse sampling with group testing produces parameter estimates with low bias when the pre-specified number of positive pools (r) to stop the sampling process is at least 10 and the number of clusters in the sample is also at least 10. We performed an application with real data and we provide an NLMIXED code that researchers can use to implement this method.
Umegaki, Hiroyuki; Iimuro, Satoshi; Shinozaki, Tomohiro; Araki, Atsushi; Sakurai, Takashi; Iijima, Katsuya; Ohashi, Yasuo; Ito, Hideki
2012-04-01
Considerable attention has been paid to the association between type 2 diabetes mellitus (T2DM) and cognitive dysfunction in the elderly. T2DM is often comorbid with several other metabolic disturbances, including hypertension and dyslipidemia. These comorbid diseases might be associated with cognitive impairment. Many clinical indices should be included as variables for the association with cognitive decline. In the current study, we tried to identify the associated factors with cognitive decline during a 6-year period in elderly T2DM considering the changes in the clinical indices during the follow-up period. The subjects in the present study were 63 Japanese Elderly Interventional Trial participants who were administered the Mini-Mental State Examination at baseline, at the third year, and at the end of the 6-year follow-up period. We applied the pooled logistic analysis method to consider the changes in clinical indices during the observation period and tried to identify the factors associated with cognitive decline during the 6 years in elderly type 2 diabetics using repeated measured data for glycated hemoglobin A1c, blood pressure and serum lipids. In the current study, low high-density lipoprotein-cholesterol and higher diastolic blood pressure were significantly associated with cognitive decline by pooled logistic analysis in the 6-year observation of older diabetic subjects. Higher glycated hemoglobin A1c had a tendency toward association with cognitive decline. The results suggest that comprehensive management of diabetes, including dyslipidemia and hypertension, might contribute to the prevention of declines in cognitive function in older diabetic patients. © 2012 Japan Geriatrics Society.
ERIC Educational Resources Information Center
Kasapoglu, Koray
2014-01-01
This study aims to investigate which factors are associated with Turkey's 15-year-olds' scoring above the OECD average (493) on the PISA'09 reading assessment. Collected from a total of 4,996 15-year-old students from Turkey, data were analyzed by logistic regression analysis in order to model the data of students who were split into two: (1)…
Upgrade Summer Severe Weather Tool
NASA Technical Reports Server (NTRS)
Watson, Leela
2011-01-01
The goal of this task was to upgrade to the existing severe weather database by adding observations from the 2010 warm season, update the verification dataset with results from the 2010 warm season, use statistical logistic regression analysis on the database and develop a new forecast tool. The AMU analyzed 7 stability parameters that showed the possibility of providing guidance in forecasting severe weather, calculated verification statistics for the Total Threat Score (TTS), and calculated warm season verification statistics for the 2010 season. The AMU also performed statistical logistic regression analysis on the 22-year severe weather database. The results indicated that the logistic regression equation did not show an increase in skill over the previously developed TTS. The equation showed less accuracy than TTS at predicting severe weather, little ability to distinguish between severe and non-severe weather days, and worse standard categorical accuracy measures and skill scores over TTS.
Estimating the Probability of Rare Events Occurring Using a Local Model Averaging.
Chen, Jin-Hua; Chen, Chun-Shu; Huang, Meng-Fan; Lin, Hung-Chih
2016-10-01
In statistical applications, logistic regression is a popular method for analyzing binary data accompanied by explanatory variables. But when one of the two outcomes is rare, the estimation of model parameters has been shown to be severely biased and hence estimating the probability of rare events occurring based on a logistic regression model would be inaccurate. In this article, we focus on estimating the probability of rare events occurring based on logistic regression models. Instead of selecting a best model, we propose a local model averaging procedure based on a data perturbation technique applied to different information criteria to obtain different probability estimates of rare events occurring. Then an approximately unbiased estimator of Kullback-Leibler loss is used to choose the best one among them. We design complete simulations to show the effectiveness of our approach. For illustration, a necrotizing enterocolitis (NEC) data set is analyzed. © 2016 Society for Risk Analysis.
Evaluating the perennial stream using logistic regression in central Taiwan
NASA Astrophysics Data System (ADS)
Ruljigaljig, T.; Cheng, Y. S.; Lin, H. I.; Lee, C. H.; Yu, T. T.
2014-12-01
This study produces a perennial stream head potential map, based on a logistic regression method with a Geographic Information System (GIS). Perennial stream initiation locations, indicates the location of the groundwater and surface contact, were identified in the study area from field survey. The perennial stream potential map in central Taiwan was constructed using the relationship between perennial stream and their causative factors, such as Catchment area, slope gradient, aspect, elevation, groundwater recharge and precipitation. Here, the field surveys of 272 streams were determined in the study area. The areas under the curve for logistic regression methods were calculated as 0.87. The results illustrate the importance of catchment area and groundwater recharge as key factors within the model. The results obtained from the model within the GIS were then used to produce a map of perennial stream and estimate the location of perennial stream head.
Menditto, Anthony A; Linhorst, Donald M; Coleman, James C; Beck, Niels C
2006-04-01
Development of policies and procedures to contend with the risks presented by elopement, aggression, and suicidal behaviors are long-standing challenges for mental health administrators. Guidance in making such judgments can be obtained through the use of a multivariate statistical technique known as logistic regression. This procedure can be used to develop a predictive equation that is mathematically formulated to use the best combination of predictors, rather than considering just one factor at a time. This paper presents an overview of logistic regression and its utility in mental health administrative decision making. A case example of its application is presented using data on elopements from Missouri's long-term state psychiatric hospitals. Ultimately, the use of statistical prediction analyses tempered with differential qualitative weighting of classification errors can augment decision-making processes in a manner that provides guidance and flexibility while wrestling with the complex problem of risk assessment and decision making.
Lei, Yang; Nollen, Nikki; Ahluwahlia, Jasjit S; Yu, Qing; Mayo, Matthew S
2015-04-09
Other forms of tobacco use are increasing in prevalence, yet most tobacco control efforts are aimed at cigarettes. In light of this, it is important to identify individuals who are using both cigarettes and alternative tobacco products (ATPs). Most previous studies have used regression models. We conducted a traditional logistic regression model and a classification and regression tree (CART) model to illustrate and discuss the added advantages of using CART in the setting of identifying high-risk subgroups of ATP users among cigarettes smokers. The data were collected from an online cross-sectional survey administered by Survey Sampling International between July 5, 2012 and August 15, 2012. Eligible participants self-identified as current smokers, African American, White, or Latino (of any race), were English-speaking, and were at least 25 years old. The study sample included 2,376 participants and was divided into independent training and validation samples for a hold out validation. Logistic regression and CART models were used to examine the important predictors of cigarettes + ATP users. The logistic regression model identified nine important factors: gender, age, race, nicotine dependence, buying cigarettes or borrowing, whether the price of cigarettes influences the brand purchased, whether the participants set limits on cigarettes per day, alcohol use scores, and discrimination frequencies. The C-index of the logistic regression model was 0.74, indicating good discriminatory capability. The model performed well in the validation cohort also with good discrimination (c-index = 0.73) and excellent calibration (R-square = 0.96 in the calibration regression). The parsimonious CART model identified gender, age, alcohol use score, race, and discrimination frequencies to be the most important factors. It also revealed interesting partial interactions. The c-index is 0.70 for the training sample and 0.69 for the validation sample. The misclassification rate was 0.342 for the training sample and 0.346 for the validation sample. The CART model was easier to interpret and discovered target populations that possess clinical significance. This study suggests that the non-parametric CART model is parsimonious, potentially easier to interpret, and provides additional information in identifying the subgroups at high risk of ATP use among cigarette smokers.
Akkus, Zeki; Camdeviren, Handan; Celik, Fatma; Gur, Ali; Nas, Kemal
2005-09-01
To determine the risk factors of osteoporosis using a multiple binary logistic regression method and to assess the risk variables for osteoporosis, which is a major and growing health problem in many countries. We presented a case-control study, consisting of 126 postmenopausal healthy women as control group and 225 postmenopausal osteoporotic women as the case group. The study was carried out in the Department of Physical Medicine and Rehabilitation, Dicle University, Diyarbakir, Turkey between 1999-2002. The data from the 351 participants were collected using a standard questionnaire that contains 43 variables. A multiple logistic regression model was then used to evaluate the data and to find the best regression model. We classified 80.1% (281/351) of the participants using the regression model. Furthermore, the specificity value of the model was 67% (84/126) of the control group while the sensitivity value was 88% (197/225) of the case group. We found the distribution of residual values standardized for final model to be exponential using the Kolmogorow-Smirnow test (p=0.193). The receiver operating characteristic curve was found successful to predict patients with risk for osteoporosis. This study suggests that low levels of dietary calcium intake, physical activity, education, and longer duration of menopause are independent predictors of the risk of low bone density in our population. Adequate dietary calcium intake in combination with maintaining a daily physical activity, increasing educational level, decreasing birth rate, and duration of breast-feeding may contribute to healthy bones and play a role in practical prevention of osteoporosis in Southeast Anatolia. In addition, the findings of the present study indicate that the use of multivariate statistical method as a multiple logistic regression in osteoporosis, which maybe influenced by many variables, is better than univariate statistical evaluation.
Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H
2017-02-01
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.
Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K
2017-10-01
Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Schaeben, Helmut; Semmler, Georg
2016-09-01
The objective of prospectivity modeling is prediction of the conditional probability of the presence T = 1 or absence T = 0 of a target T given favorable or prohibitive predictors B, or construction of a two classes 0,1 classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geologists' favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to "validate" this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.
Separation in Logistic Regression: Causes, Consequences, and Control.
Mansournia, Mohammad Ali; Geroldinger, Angelika; Greenland, Sander; Heinze, Georg
2018-04-01
Separation is encountered in regression models with a discrete outcome (such as logistic regression) where the covariates perfectly predict the outcome. It is most frequent under the same conditions that lead to small-sample and sparse-data bias, such as presence of a rare outcome, rare exposures, highly correlated covariates, or covariates with strong effects. In theory, separation will produce infinite estimates for some coefficients. In practice, however, separation may be unnoticed or mishandled because of software limits in recognizing and handling the problem and in notifying the user. We discuss causes of separation in logistic regression and describe how common software packages deal with it. We then describe methods that remove separation, focusing on the same penalized-likelihood techniques used to address more general sparse-data problems. These methods improve accuracy, avoid software problems, and allow interpretation as Bayesian analyses with weakly informative priors. We discuss likelihood penalties, including some that can be implemented easily with any software package, and their relative advantages and disadvantages. We provide an illustration of ideas and methods using data from a case-control study of contraceptive practices and urinary tract infection.
NASA Astrophysics Data System (ADS)
Nong, Yu; Du, Qingyun; Wang, Kun; Miao, Lei; Zhang, Weiwei
2008-10-01
Urban growth modeling, one of the most important aspects of land use and land cover change study, has attracted substantial attention because it helps to comprehend the mechanisms of land use change thus helps relevant policies made. This study applied multinomial logistic regression to model urban growth in the Jiayu county of Hubei province, China to discover the relationship between urban growth and the driving forces of which biophysical and social-economic factors are selected as independent variables. This type of regression is similar to binary logistic regression, but it is more general because the dependent variable is not restricted to two categories, as those previous studies did. The multinomial one can simulate the process of multiple land use competition between urban land, bare land, cultivated land and orchard land. Taking the land use type of Urban as reference category, parameters could be estimated with odds ratio. A probability map is generated from the model to predict where urban growth will occur as a result of the computation.
Birmann, Brenda M; Andreotti, Gabriella; De Roos, Anneclaire J; Camp, Nicola J; Chiu, Brian C H; Spinelli, John J; Becker, Nikolaus; Benhaim-Luzon, Véronique; Bhatti, Parveen; Boffetta, Paolo; Brennan, Paul; Brown, Elizabeth E; Cocco, Pierluigi; Costas, Laura; Cozen, Wendy; de Sanjosé, Silvia; Foretová, Lenka; Giles, Graham G; Maynadié, Marc; Moysich, Kirsten; Nieters, Alexandra; Staines, Anthony; Tricot, Guido; Weisenburger, Dennis; Zhang, Yawei; Baris, Dalsu; Purdue, Mark P
2017-06-01
Background: Multiple myeloma risk increases with higher adult body mass index (BMI). Emerging evidence also supports an association of young adult BMI with multiple myeloma. We undertook a pooled analysis of eight case-control studies to further evaluate anthropometric multiple myeloma risk factors, including young adult BMI. Methods: We conducted multivariable logistic regression analysis of usual adult anthropometric measures of 2,318 multiple myeloma cases and 9,609 controls, and of young adult BMI (age 25 or 30 years) for 1,164 cases and 3,629 controls. Results: In the pooled sample, multiple myeloma risk was positively associated with usual adult BMI; risk increased 9% per 5-kg/m 2 increase in BMI [OR, 1.09; 95% confidence interval (CI), 1.04-1.14; P = 0.007]. We observed significant heterogeneity by study design ( P = 0.04), noting the BMI-multiple myeloma association only for population-based studies ( P trend = 0.0003). Young adult BMI was also positively associated with multiple myeloma (per 5-kg/m 2 ; OR, 1.2; 95% CI, 1.1-1.3; P = 0.0002). Furthermore, we observed strong evidence of interaction between younger and usual adult BMI ( P interaction <0.0001); we noted statistically significant associations with multiple myeloma for persons overweight (25-<30 kg/m 2 ) or obese (30+ kg/m 2 ) in both younger and usual adulthood (vs. individuals consistently <25 kg/m 2 ), but not for those overweight or obese at only one time period. Conclusions: BMI-associated increases in multiple myeloma risk were highest for individuals who were overweight or obese throughout adulthood. Impact: These findings provide the strongest evidence to date that earlier and later adult BMI may increase multiple myeloma risk and suggest that healthy BMI maintenance throughout life may confer an added benefit of multiple myeloma prevention. Cancer Epidemiol Biomarkers Prev; 26(6); 876-85. ©2017 AACR . ©2017 American Association for Cancer Research.
Uterine insemination with a standard AI dose in a sow pool system.
Peltoniemi, O A T; Alm, K; Andersson, M
2009-06-01
The effect of uterine AI with a standard dose of spermatozoa on fertility of the sow was studied in a field trial. The trial involved a sow pool system with 440 sows using AI as the primary method of breeding. Sows were twice a day checked for oestrus symptoms by back pressure test in front of a boar on days 3-6 after weaning. When in standing heat, sows were randomly allocated into either a uterine insemination group (UTER, n = 157) or standard AI group (CONT, n = 169) and bred accordingly using 3 billion spermatozoa in 80 ml of extender. In both treatment groups, insemination was repeated once if the sow was still receptive 24 h later. Using pregnancy (farrowed or not) and live-born litter size as the outcome variables, a logistic and linear regression approach, respectively, was taken to study the effect of the following factors: treatment (UTER vs CONT), AI operator, breed, satellite herd preceding weaning, parity, weaning-to-oestrus interval and length of lactation. Overall, live-born litter size was 11.3 +/- 2.9, repeat breeding rate 4.2% and farrowing rate 91.2%. In the UTER group, 93.6% of inseminated sows farrowed, whereas farrowing rate for the CONT group was 88.8% (p = 0.13). Intrauterine insemination with a standard AI dose did not result in a significant improvement in the live-born litter size (11.5 +/- 2.8 for the UTER and 11.1 +/- 3.0 for the CONT sows, respectively, p = 0.13). However, the preceding satellite herd had a highly significant effect on the live-born litter size (12.4 +/- 2.6; 11.1 +/- 2.9; 10.8 +/- 2.9 and 10.9 +/- 2.9 for the four satellite herds, p < 0.01). We conclude that uterine insemination did not have a significant effect on live-born litter size and farrowing rate and we also conclude that satellite herd appears to have a major effect on fertility in a sow pool system.
Spatiotemporal variability of stream habitat and movement of three species of fish
Roberts, J.H.; Angermeier, P.L.
2007-01-01
Relationships between environmental variability and movement are poorly understood, due to both their complexity and the limited ecological scope of most movement studies. We studied movements of fantail (Etheostoma flabellare), riverweed (E. podostemone), and Roanoke darters (Percina roanoka) through two stream systems during two summers. We then related movement to variability in measured habitat attributes using logistic regression and exploratory data plots. We indexed habitat conditions at both microhabitat (i.e., patches of uniform depth, velocity, and substrate) and mesohabitat (i.e., riffle and pool channel units) spatial scales, and determined how local habitat conditions were affected by landscape spatial (i.e., longitudinal position, land use) and temporal contexts. Most spatial variability in habitat conditions and fish movement was unexplained by a site's location on the landscape. Exceptions were microhabitat diversity, which was greater in the less-disturbed watershed, and riffle isolation and predator density in pools, which were greater at more-downstream sites. Habitat conditions and movement also exhibited only minor temporal variability, but the relative influences of habitat attributes on movement were quite variable over time. During the first year, movements of fantail and riverweed darters were triggered predominantly by loss of shallow microhabitats; whereas, during the second year, microhabitat diversity was more strongly related (though in opposite directions) to movement of these two species. Roanoke darters did not move in response to microhabitat-scale variables, presumably because of the species' preference for deeper microhabitats that changed little over time. Conversely, movement of all species appeared to be constrained by riffle isolation and predator density in pools, two mesohabitat-scale attributes. Relationships between environmental variability and movement depended on both the spatiotemporal scale of consideration and the ecology of the species. Future studies that integrate across scales, taxa, and life-histories are likely to provide greater insight into movement ecology than will traditional, single-season, single-species approaches. ?? 2006 Springer-Verlag.
Aaron, Grant J; Huang, Jin; Varadhan, Ravi; Temple, Victor; Rayco-Solon, Pura; Macdonald, Barbara
2017-01-01
Background: A lack of information on the etiology of anemia has hampered the design and monitoring of anemia-control efforts. Objective: We aimed to evaluate predictors of anemia in preschool children (PSC) (age range: 6–59 mo) by country and infection-burden category. Design: Cross-sectional data from 16 surveys (n = 29,293) from the Biomarkers Reflecting Inflammation and Nutritional Determinants of Anemia (BRINDA) project were analyzed separately and pooled by category of infection burden. We assessed relations between anemia (hemoglobin concentration <110 g/L) and severe anemia (hemoglobin concentration <70 g/L) and individual-level (age, anthropometric measures, micronutrient deficiencies, malaria, and inflammation) and household-level predictors; we also examined the proportion of anemia with concomitant iron deficiency (defined as an inflammation-adjusted ferritin concentration <12 μg/L). Countries were grouped into 4 categories on the basis of risk and burden of infectious disease, and a pooled multivariable logistic regression analysis was conducted for each group. Results: Iron deficiency, malaria, breastfeeding, stunting, underweight, inflammation, low socioeconomic status, and poor sanitation were each associated with anemia in >50% of surveys. Associations between breastfeeding and anemia were attenuated by controlling for child age, which was negatively associated with anemia. The most consistent predictors of severe anemia were malaria, poor sanitation, and underweight. In multivariable pooled models, child age, iron deficiency, and stunting independently predicted anemia and severe anemia. Inflammation was generally associated with anemia in the high- and very high–infection groups but not in the low- and medium-infection groups. In PSC with anemia, 50%, 30%, 55%, and 58% of children had concomitant iron deficiency in low-, medium-, high-, and very high–infection categories, respectively. Conclusions: Although causal inference is limited by cross-sectional survey data, results suggest anemia-control programs should address both iron deficiency and infections. The relative importance of factors that are associated with anemia varies by setting, and thus, country-specific data are needed to guide programs. PMID:28615260
van Hulsteijn, L T; Niemeijer, N D; Dekkers, O M; Corssmit, E P M
2014-04-01
(131)I-MIBG therapy can be used for palliative treatment of malignant paraganglioma and phaeochromocytoma. The main objective of this study was to perform a systematic review and meta-analysis assessing the effect of (131)I-MIBG therapy on tumour volume in patients with malignant paraganglioma/phaeochromocytoma. A literature search was performed in December 2012 to identify potentially relevant studies. Main outcomes were the pooled proportions of complete response, partial response and stable disease after radionuclide therapy. A meta-analysis was performed with an exact likelihood approach using a logistic regression with a random effect at the study level. Pooled proportions with 95% confidence intervals (CI) were reported. Seventeen studies concerning a total of 243 patients with malignant paraganglioma/phaeochromocytoma were treated with (131)I-MIBG therapy. The mean follow-up ranged from 24 to 62 months. A meta-analysis of the effect of (131)I-MIBG therapy on tumour volume showed pooled proportions of complete response, partial response and stable disease of, respectively, 0·03 (95% CI: 0·06-0·15), 0·27 (95% CI: 0·19-0·37) and 0·52 (95% CI: 0·41-0·62) and for hormonal response 0·11 (95% CI: 0·05-0·22), 0·40 (95% CI: 0·28-0·53) and 0·21 (95% CI: 0·10-0·40), respectively. Separate analyses resulted in better results in hormonal response for patients with paraganglioma than for patients with phaeochromocytoma. Data on the effects of (131)I-MIBG therapy on malignant paraganglioma/phaeochromocytoma suggest that stable disease concerning tumour volume and a partial hormonal response can be achieved in over 50% and 40% of patients, respectively, treated with (131)I-MIBG therapy. It cannot be ruled out that stable disease reflects not only the effect of MIBG therapy, but also (partly) the natural course of the disease. © 2013 John Wiley & Sons Ltd.
Feng, Rui Mei; Hu, Shang Ying; Zhao, Fang Hui; Zhang, Rong; Zhang, Xun; Wallach, Asya Izraelit; Qiao, You Lin
2017-09-01
We performed a pooled analysis to examine cigarette smoking and household passive smoke exposure in relation to the risk of human papillomavirus (HPV) infection and cervical intraepithelial neoplasia grade 2+ (CIN2+). Data were pooled from 12 cross-sectional studies for cervical cancer screenings from 10 provinces of China in 1999-2007. A total of 16,422 women were analyzed, along with 2,392 high-risk-HPV (hr-HPV) positive women and 381 CIN2+ cases. Pooled odds ratios (ORs) and 95% confidence intervals (CIs) were estimated using logistic regression models controlling for sexual and non-sexual confounding factors. There was an excess risk between active smoking and hr-HPV infection and CIN2+. Adjusted OR for ever smokers vs. never smokers was 1.45 (95% CI=1.10-1.91), for hr-HPV infection and 1.89 (95% CI=1.03-3.44), for CIN2+. Passive smoking had a slightly increased risk on the hr-HPV infection with adjusted OR 1.11 (1.00-1.24), but no statistical association was observed between passive smoke exposure and CIN2+. Compared with the neither active nor passive smokers, both active and passive smokers had a 1.57-fold (95% CI=1.14-2.15) increased risk of HPV infection and a 1.99-fold (95% CI=1.02-3.88) risk of CIN2+. Our large multi-center cross-sectional study found active smoking could increase the risk of overall hr-HPV infection and CIN2+ adjusted by passive smoking and other factors. Passive smoking mildly increased the risk of HPV infection but not the CIN2+. An interaction existed between passive tobacco exposure and active smoking for hr-HPV infection and the CIN2+. Copyright © 2017. Asian Society of Gynecologic Oncology, Korean Society of Gynecologic Oncology
Pitesky, Maurice; Charlton, Bruce; Bland, Mark; Rolfe, Dan
2013-03-01
Between July 2007 and December 2011, 2660 environmental drag swab samples were collected in total from California layer flocks on behalf of the California Egg Quality Assurance Program (CEQAP), the egg safety rule (21 CFR Parts 16 and 118) of the Food and Drug Administration (FDA), or both. The samples were processed by the California Animal Health and Food Safety Lab, and positive or negative results for Salmonella enterica serovar Enteritidis (SE) were recorded. This study retrospectively compares the differences between the FDA and CEQAP programs with respect to their SE environmental sampling surveillance results. To accomplish this comparison, two different CEQAP (new and old) data sets representing different SE environmental surveillance approaches in the life of the flock were compared against each other and against the FDA's SE environmental testing plan. Significant differences were noted between the CEQAP and FDA programs with respect to the prevalence of SE in the farm environment. Analyses of the prevalence of SE at different stages in the flock's life cycle (chick papers, preproduction, midproduction, postmolt, and premarket) found the highest prevalence of SE in premarket (11.9%), followed by postmolt (3.5%) and midproduction (3.4%), and there was a tie between chick papers and preproduction (2.1%). To assess the main effects of the presence of SE in the farm environment, backwards binary logistic regression was used. Of six independent variables examined (age of flock, year, season, owner, CEQAP membership, and analysis of pooled samples vs. individual swabs), only age of flock, owner, and year were determined to be significant factors in the final model. Although CEQAP membership and pooling vs. individuals swabs were not included in the final model, Pearson chi-square tests did show significantly higher odds of SE for non-CEQAP member farms and higher odds of SE in pooled samples vs. individual swabs.
Morrow, Gary R; Schwartzberg, Lee; Barbour, Sally Y; Ballinari, Gianluca; Thorn, Michael D; Cox, David
2015-01-01
Background No clinical standard currently exists for the optimal management of nausea induced by emetogenic chemotherapy, particularly delayed nausea. Objective To compare the efficacy and safety of palonosetron with older 5-HT3 receptor antagonists (RAs) in preventing chemotherapy-induced nausea. Methods Data were pooled from 4 similarly designed multicenter, randomized, double-blind, clinical trials that compared single intravenous doses of palonosetron 0.25 mg or 0.75 mg with ondansetron 32 mg, dolasetron 100 mg, or granisetron 40 μg/kg, administered 30 minutes before moderately emetogenic chemotherapy (MEC) or highly emetogenic chemotherapy (HEC). Pooled data within each chemotherapy category (MEC: n = 1,132; HEC: n = 1,781) were analyzed by a logistic regression model. Nausea endpoints were complete control rates (ie, no more than mild nausea, no vomiting, and no rescue medication), nausea-free rates, nausea severity, and requirement for rescue antiemetic/antinausea medication over 5 days following chemotherapy. Pooled safety data were summarized descriptively. Results Numerically more palonosetron-treated patients were nausea-free on each day, and fewer had moderate-severe nausea. Similarly, usage of rescue medication was less frequent among palonosetron-treated patients. Complete control rates for palonosetron and older 5-HT3 RAs in the acute phase were 66% vs 63%, 52% vs 42% in the delayed phase (24-120 hours), and 46% vs 37% in the overall phase. The incidence of adverse events was similar for palonosetron and older 5-HT3 RAs. Limitations This post hoc analysis summarized data for palonosetron and several other 5-HT3 RAs but was not powered for statistical comparisons between individual agents. Because nausea is inherently subjective, the reliability of assessments of some aspects (eg, severity) may be influenced by interindividual variability. Conclusion Palonosetron may be more effective than older 5-HT3 RAs in preventing nausea, with comparable tolerability. PMID:25830233
Dietary sugar/starches intake and Barrett's esophagus: a pooled analysis.
Li, Nan; Petrick, Jessica Leigh; Steck, Susan Elizabeth; Bradshaw, Patrick Terrence; McClain, Kathleen Michele; Niehoff, Nicole Michelle; Engel, Lawrence Stuart; Shaheen, Nicholas James; Corley, Douglas Allen; Vaughan, Thomas Leonard; Gammon, Marilie Denise
2017-11-01
Barrett's esophagus (BE) is the key precursor lesion of esophageal adenocarcinoma, a lethal cancer that has increased rapidly in westernized countries over the past four decades. Dietary sugar intake has also been increasing over time, and may be associated with these tumors by promoting hyperinsulinemia. The study goal was to examine multiple measures of sugar/starches intake in association with BE. This pooled analysis included 472 BE cases and 492 controls from two similarly conducted case-control studies in the United States. Dietary intake data, collected by study-specific food frequency questionnaires, were harmonized across studies by linking with the University of Minnesota Nutrient Database, and pooled based on study-specific quartiles. Logistic regression was used to calculate odds ratios (ORs) and 95% confidence intervals (CIs), adjusting for age, sex, race, total energy intake, study indicator, body mass index, frequency of gastro-esophageal reflux, and fruit/vegetable intake. In both studies, intake of sucrose (cases vs. controls, g/day: 36.07 vs. 33.51; 36.80 vs. 35.06, respectively) and added sugar (46.15 vs. 41.01; 44.18 vs. 40.68, respectively) were higher in cases than controls. BE risk was increased 79% and 71%, respectively, for associations comparing the fourth to the first quartile of intake of sucrose (OR Q4vs.Q1 = 1.79, 95% CI = 1.07-3.02, P trend = 0.01) and added sugar (OR Q4vs.Q1 = 1.71, 95% CI = 1.05-2.80, P trend = 0.15). Intake of sweetened desserts/beverages was associated with 71% increase in BE risk (OR Q4vs.Q1 = 1.71, 95% CI = 1.07-2.73, P trend = 0.04). Limiting dietary intake of foods and beverages that are high in added sugar, especially refined table sugar, may reduce the risk of developing BE.
Performance of fish passage structures at upstream barriers to migration
Bunt, C.M.; Castro-Santos, T.; Haro, A.
2012-01-01
Attraction and passage efficiency were reviewed and compared from 19 monitoring studies that produced data for evaluations of pool-and-weir, Denil, vertical-slot and nature-like fishways. Data from 26 species of anadromous and potamodromous fishes from six countries were separated by year and taxonomic family into a matrix with 101 records. Attraction performance was highly variable for the following fishway structures: pool-and-weir (attraction range = 29–100%, mean = 77%, median = 81%), vertical-slot (attraction range = 0–100%, mean = 63%, median = 80%), Denil (attraction range = 21–100%, mean = 61%, median = 57%) and nature-like (attraction range = 0–100%, mean = 48%, median = 50%). Mean passage efficiency was inversely related to mean attraction efficiency by fishway structure type, with the highest passage for nature-like fishways (range = 0–100%, mean = 70%, median = 86%), followed by Denil (range = 0–97%, mean = 51%, median = 38%), vertical-slot (range = 0–100%, mean = 45%, median = 43%) and pool-and-weir (range = 0–100%, mean = 40%, median = 34%). Principal components analysis and logistic regression modelling indicated that variation in fish attraction was driven by biological characteristics of the fish that were studied, whereas variation in fish passage was related to fishway type, slope and elevation change. This meta-analysis revealed that the species of fish monitored and structural design of the fishways have strong implications for both attraction and passage performance, and in most cases, existing data are not sufficient to support design recommendations. Many more fishway evaluations are needed over a range of species, fishway types and configurations to characterize, to optimize and to design new fishways. Furthermore, these studies must be performed in a consistent manner to identify the relative contributions of fish attraction and passage to overall fishway performance at each site.
Multiple Chronic Conditions and Disparities in 30-Day Hospital Readmissions Among Nonelderly Adults.
Basu, Jayasree; Hanchate, Amresh; Koroukian, Siran
2018-05-15
This study examines the patterns of 30-day hospital readmissions by race/ethnicity and multiple chronic conditions (MCC) burden among nonelderly adult patients. We used hospital discharge data of patients in the 18- to 64-year age group in 5 US states, California, Florida, Missouri, New York, and Tennessee, for 2009 from the Healthcare Cost and Utilization Project State Inpatient Database (HCUP-SID) of the Agency for Healthcare Research and Quality, linked to contextual and provider data from the Health Resources and Services Administration. A multilevel logistic regression model was used for data pooled over 5 states, adjusting for patient, hospital, and community characteristics. Controlling for other covariates, the study found that a higher MCC burden was associated with a higher all-cause 30-day readmission risk. We found considerable heterogeneity in levels of readmission risk among racial/ethnic subgroups stratified by chronic conditions. Among patients with a lowest MCC burden, African Americans had the highest risk of readmission, but with a higher MCC burden, the risk of readmission increased most for Hispanics.
Mustanski, Brian; Birkett, Michelle; Greene, George J.; Rosario, Margaret; Bostwick, Wendy; Everett, Bethany G.
2014-01-01
Objectives. We examined the prevalence and associations between behavioral and identity dimensions of sexual orientation among adolescents in the United States, with consideration of differences associated with race/ethnicity, sex, and age. Methods. We used pooled data from 2005 and 2007 Youth Risk Behavior Surveys to estimate prevalence of sexual orientation variables within demographic sub-groups. We used multilevel logistic regression models to test differences in the association between sexual orientation identity and sexual behavior across groups. Results. There was substantial incongruence between behavioral and identity dimensions of sexual orientation, which varied across sex and race/ethnicity. Whereas girls were more likely to identify as bisexual, boys showed a stronger association between same-sex behavior and a bisexual identity. The pattern of association of age with sexual orientation differed between boys and girls. Conclusions. Our results highlight demographic differences between 2 sexual orientation dimensions, and their congruence, among 13- to 18-year-old adolescents. Future research is needed to better understand the implications of such differences, particularly in the realm of health and health disparities. PMID:24328662
Van Pelt, Amelia E.; Quiñones, Beatriz; Lofgren, Hannah L.; Bartz, Faith E.; Newman, Kira L.; Leon, Juan S.
2018-01-01
Foodborne illness burdens individuals around the world and may be caused by consuming fresh produce contaminated with bacterial, parasite, and viral pathogens. Pathogen contamination on produce may originate at the farm and packing facility. This research aimed to determine the prevalence of human pathogens (bacteria, parasites, and viruses) on fresh produce (fruits, herbs, and vegetables) on farms and in packing facilities worldwide through a systematic review of 38 peer-reviewed articles. The median and range of the prevalence was calculated, and Kruskal–Wallis tests and logistic regression were performed to compare prevalence among pooled samples of produce groups, pathogen types, and sampling locations. Results indicated a low median percentage of fresh produce contaminated with pathogens (0%). Both viruses (p-value = 0.017) and parasites (p-value = 0.033), on fresh produce, exhibited higher prevalence than bacteria. No significant differences between fresh produce types or between farm and packing facility were observed. These results may help to better quantify produce contamination in the production environment and inform strategies to prevent future foodborne illness. PMID:29527522
Tucker, Jalie A; Roth, David L; Vignolo, Mary J; Westfall, Andrew O
2009-04-01
Data were pooled from 3 studies of recently resolved community-dwelling problem drinkers to determine whether a behavioral economic index of the value of rewards available over different time horizons distinguished among moderation (n = 30), abstinent (n = 95), and unresolved (n = 77) outcomes. Moderation over 1- to 2-year prospective follow-up intervals was hypothesized to involve longer term behavior regulation processes than abstinence or relapse and to be predicted by more balanced preresolution monetary allocations between short-term and longer term objectives (i.e., drinking and saving for the future). Standardized odds ratios (ORs) based on changes in standard deviation units from a multinomial logistic regression indicated that increases on this "Alcohol-Savings Discretionary Expenditure" index predicted higher rates of abstinence (OR = 1.93, p = .004) and relapse (OR = 2.89, p < .0001) compared with moderation outcomes. The index had incremental utility in predicting moderation in complex models that included other established predictors. The study adds to evidence supporting a behavioral economic analysis of drinking resolutions and shows that a systematic analysis of preresolution spending patterns aids in predicting moderation.
Disanto, Giulio; Hall, Carolina; Lucas, Robyn; Ponsonby, Anne-Louise; Berlanga-Taylor, Antonio J; Giovannoni, Gavin; Ramagopalan, Sreeram V
2013-09-01
Gene-environment interactions may shed light on the mechanisms underlying multiple sclerosis (MS). We pooled data from two case-control studies on incident demyelination and used different methods to assess interaction between HLA-DRB1*15 (DRB1-15) and history of infectious mononucleosis (IM). Individuals exposed to both factors were at substantially increased risk of disease (OR=7.32, 95% CI=4.92-10.90). In logistic regression models, DRB1-15 and IM status were independent predictors of disease while their interaction term was not (DRB1-15*IM: OR=1.35, 95% CI=0.79-2.23). However, interaction on an additive scale was evident (Synergy index=2.09, 95% CI=1.59-2.59; excess risk due to interaction=3.30, 95%CI=0.47-6.12; attributable proportion due to interaction=45%, 95% CI=22-68%). This suggests, if the additive model is appropriate, the DRB1-15 and IM may be involved in the same causal process leading to MS and highlights the benefit of reporting gene-environment interactions on both a multiplicative and additive scale.
Maty, Siobhan C; Leung, Holden; Lau, Christine; Kim, Gemma
2011-06-01
Little is known about the determinants of self-reported general health status among different Asian ethnic subgroups. Using a community-based participatory research approach, we designed, administered, and analyzed a cross-sectional survey of 705 Asians (292 Chinese, 226 Korean, 187 Vietnamese) in the Portland, Oregon region to describe associations between general health status and several sociodemographic and health-related factors in pooled and ethnic-group-stratified samples. Ethnic variation existed in all covariate distributions, except employment, public-service use, language use, health status, visiting healthcare providers, sleep habits, and use of prayer, meditation, yoga or acupuncture. Acculturation measures were strong predictors of poor/fair health in logistic regression models regardless of ethnicity. Ethnic variation in outcome status existed for all remaining covariates. Most health-related research overlooks the heterogeneity within the Asian population. These findings highlight substantial variability in the associations between self-reported general health status and sociodemographic and health-related measures between Asian ethnic groups.
Mustanski, Brian; Birkett, Michelle; Greene, George J; Rosario, Margaret; Bostwick, Wendy; Everett, Bethany G
2014-02-01
We examined the prevalence and associations between behavioral and identity dimensions of sexual orientation among adolescents in the United States, with consideration of differences associated with race/ethnicity, sex, and age. We used pooled data from 2005 and 2007 Youth Risk Behavior Surveys to estimate prevalence of sexual orientation variables within demographic sub-groups. We used multilevel logistic regression models to test differences in the association between sexual orientation identity and sexual behavior across groups. There was substantial incongruence between behavioral and identity dimensions of sexual orientation, which varied across sex and race/ethnicity. Whereas girls were more likely to identify as bisexual, boys showed a stronger association between same-sex behavior and a bisexual identity. The pattern of association of age with sexual orientation differed between boys and girls. Our results highlight demographic differences between 2 sexual orientation dimensions, and their congruence, among 13- to 18-year-old adolescents. Future research is needed to better understand the implications of such differences, particularly in the realm of health and health disparities.
Physical, Mental, and Financial Impacts From Drought in Two California Counties, 2015.
Barreau, Tracy; Conway, David; Haught, Karen; Jackson, Rebecca; Kreutzer, Richard; Lockman, Andrew; Minnick, Sharon; Roisman, Rachel; Rozell, David; Smorodinsky, Svetlana; Tafoya, Dana; Wilken, Jason A
2017-05-01
To evaluate health impacts of drought during the most severe drought in California's recorded history with a rapid assessment method. We conducted Community Assessments for Public Health Emergency Response during October through November 2015 in Tulare County and Mariposa County to evaluate household water access, acute stressors, exacerbations of chronic diseases and behavioral health issues, and financial impacts. We evaluated pairwise associations by logistic regression with pooled data. By assessment area, households reported not having running water (3%-12%); impacts on finances (25%-39%), property (39%-54%), health (10%-20%), and peace of mind (33%-61%); worsening of a chronic disease (16%-46%); acute stress (8%-26%); and considering moving (14%-34%). Impacts on finances or property were each associated with impacts on health and peace of mind, and acute stress. Drought-impacted households might perceive physical and mental health effects and might experience financial or property impacts related to the drought. Public Health Implications. Local jurisdictions should consider implementing drought assistance programs, including behavioral health, and consider rapid assessments to inform public health action.
Bharmal, Nazleen; Kaplan, Robert M; Shapiro, Martin F; Mangione, Carol M; Kagawa-Singer, Marjorie; Wong, Mitchell D; McCarthy, William J
2015-06-01
South Asians are disproportionately impacted by cardiovascular disease (CVD). Our objective was to examine the association between duration of residence in the US and CVD risk factors among South Asian adult immigrants. Multivariate logistic regression analyses using pooled data from the 2005, 2007, 2009 California Health Interview Surveys. Duration of residence in the US < 15 years was significantly associated with overweight/obese BMI (OR 0.59; 95% CI 0.35, 0.98 for 5 to < 10 years), daily consumption of 5+ servings of fruits/vegetables (OR 0.37; 95% CI 0.15, 0.94 for 10 to < 15 years), and sedentary lifestyle (OR 2.11; 95% CI 1.17, 3.81 for 10 to < 15 years) compared with duration of residence ≥ 15 years after adjusting for illness burden, healthcare access, and socio-demographic characteristics. Duration of residence was not significantly associated with other CVD risk factors. Duration of residence is an important correlate of overweight/obesity and other risk factors among South Asian immigrants.
Risk of work injury among adolescent students from single and partnered parent families.
Wong, Imelda S; Breslin, F Curtis
2017-03-01
Parental involvement in keeping their children safe at work has been examined in a handful of studies, with mixed results. Evidence has suggested that non-work injury risk is higher among children from single-parent families, but little is known about their risk for work-related injuries. Five survey cycles of the Canadian Community Health Survey were pooled to create a nationally representative sample of employed 15-19-year old students (N = 16,620). Multivariable logistic regression estimated the association between family status and work injury. Risk of work-related repetitive strains (OR:1.24, 95%CI: 0.69-2.22) did not differ by family type. However, children of single parents were less likely to sustain a work injury receiving immediate medical care (OR:0.43, 95%CI: 0.19-0.96). Despite advantages and disadvantages related to family types, there is no evidence that work-related injury risk among adolescents from single parent families is greater than that of partnered-parent families. Am. J. Ind. Med. 60:285-294, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Logistic Mixed Models to Investigate Implicit and Explicit Belief Tracking.
Lages, Martin; Scheel, Anne
2016-01-01
We investigated the proposition of a two-systems Theory of Mind in adults' belief tracking. A sample of N = 45 participants predicted the choice of one of two opponent players after observing several rounds in an animated card game. Three matches of this card game were played and initial gaze direction on target and subsequent choice predictions were recorded for each belief task and participant. We conducted logistic regressions with mixed effects on the binary data and developed Bayesian logistic mixed models to infer implicit and explicit mentalizing in true belief and false belief tasks. Although logistic regressions with mixed effects predicted the data well a Bayesian logistic mixed model with latent task- and subject-specific parameters gave a better account of the data. As expected explicit choice predictions suggested a clear understanding of true and false beliefs (TB/FB). Surprisingly, however, model parameters for initial gaze direction also indicated belief tracking. We discuss why task-specific parameters for initial gaze directions are different from choice predictions yet reflect second-order perspective taking.
Model selection for logistic regression models
NASA Astrophysics Data System (ADS)
Duller, Christine
2012-09-01
Model selection for logistic regression models decides which of some given potential regressors have an effect and hence should be included in the final model. The second interesting question is whether a certain factor is heterogeneous among some subsets, i.e. whether the model should include a random intercept or not. In this paper these questions will be answered with classical as well as with Bayesian methods. The application show some results of recent research projects in medicine and business administration.
Radiomorphometric analysis of frontal sinus for sex determination.
Verma, Saumya; Mahima, V G; Patil, Karthikeya
2014-09-01
Sex determination of unknown individuals carries crucial significance in forensic research, in cases where fragments of skull persist with no likelihood of identification based on dental arch. In these instances sex determination becomes important to rule out certain number of possibilities instantly and helps in establishing a biological profile of human remains. The aim of the study is to evaluate a mathematical method based on logistic regression analysis capable of ascertaining the sex of individuals in the South Indian population. The study was conducted in the department of Oral Medicine and Radiology. The right and left areas, maximum height, width of frontal sinus were determined in 100 Caldwell views of 50 women and 50 men aged 20 years and above, with the help of Vernier callipers and a square grid with 1 square measuring 1mm(2) in area. Student's t-test, logistic regression analysis. The mean values of variables were greater in men, based on Student's t-test at 5% level of significance. The mathematical model based on logistic regression analysis gave percentage agreement of total area to correctly predict the female gender as 55.2%, of right area as 60.9% and of left area as 55.2%. The areas of the frontal sinus and the logistic regression proved to be unreliable in sex determination. (Logit = 0.924 - 0.00217 × right area).
Genetic prediction of type 2 diabetes using deep neural network.
Kim, J; Kim, J; Kwak, M J; Bajaj, M
2018-04-01
Type 2 diabetes (T2DM) has strong heritability but genetic models to explain heritability have been challenging. We tested deep neural network (DNN) to predict T2DM using the nested case-control study of Nurses' Health Study (3326 females, 45.6% T2DM) and Health Professionals Follow-up Study (2502 males, 46.5% T2DM). We selected 96, 214, 399, and 678 single-nucleotide polymorphism (SNPs) through Fisher's exact test and L1-penalized logistic regression. We split each dataset randomly in 4:1 to train prediction models and test their performance. DNN and logistic regressions showed better area under the curve (AUC) of ROC curves than the clinical model when 399 or more SNPs included. DNN was superior than logistic regressions in AUC with 399 or more SNPs in male and 678 SNPs in female. Addition of clinical factors consistently increased AUC of DNN but failed to improve logistic regressions with 214 or more SNPs. In conclusion, we show that DNN can be a versatile tool to predict T2DM incorporating large numbers of SNPs and clinical information. Limitations include a relatively small number of the subjects mostly of European ethnicity. Further studies are warranted to confirm and improve performance of genetic prediction models using DNN in different ethnic groups. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Unconditional or Conditional Logistic Regression Model for Age-Matched Case-Control Data?
Kuo, Chia-Ling; Duan, Yinghui; Grady, James
2018-01-01
Matching on demographic variables is commonly used in case-control studies to adjust for confounding at the design stage. There is a presumption that matched data need to be analyzed by matched methods. Conditional logistic regression has become a standard for matched case-control data to tackle the sparse data problem. The sparse data problem, however, may not be a concern for loose-matching data when the matching between cases and controls is not unique, and one case can be matched to other controls without substantially changing the association. Data matched on a few demographic variables are clearly loose-matching data, and we hypothesize that unconditional logistic regression is a proper method to perform. To address the hypothesis, we compare unconditional and conditional logistic regression models by precision in estimates and hypothesis testing using simulated matched case-control data. Our results support our hypothesis; however, the unconditional model is not as robust as the conditional model to the matching distortion that the matching process not only makes cases and controls similar for matching variables but also for the exposure status. When the study design involves other complex features or the computational burden is high, matching in loose-matching data can be ignored for negligible loss in testing and estimation if the distributions of matching variables are not extremely different between cases and controls.
Unconditional or Conditional Logistic Regression Model for Age-Matched Case–Control Data?
Kuo, Chia-Ling; Duan, Yinghui; Grady, James
2018-01-01
Matching on demographic variables is commonly used in case–control studies to adjust for confounding at the design stage. There is a presumption that matched data need to be analyzed by matched methods. Conditional logistic regression has become a standard for matched case–control data to tackle the sparse data problem. The sparse data problem, however, may not be a concern for loose-matching data when the matching between cases and controls is not unique, and one case can be matched to other controls without substantially changing the association. Data matched on a few demographic variables are clearly loose-matching data, and we hypothesize that unconditional logistic regression is a proper method to perform. To address the hypothesis, we compare unconditional and conditional logistic regression models by precision in estimates and hypothesis testing using simulated matched case–control data. Our results support our hypothesis; however, the unconditional model is not as robust as the conditional model to the matching distortion that the matching process not only makes cases and controls similar for matching variables but also for the exposure status. When the study design involves other complex features or the computational burden is high, matching in loose-matching data can be ignored for negligible loss in testing and estimation if the distributions of matching variables are not extremely different between cases and controls. PMID:29552553
Austin, Peter C
2010-04-22
Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.
Zlotnik, Alexander; Alfaro, Miguel Cuchí; Pérez, María Carmen Pérez; Gallardo-Antolín, Ascensión; Martínez, Juan Manuel Montero
2016-05-01
The usage of decision support tools in emergency departments, based on predictive models, capable of estimating the probability of admission for patients in the emergency department may give nursing staff the possibility of allocating resources in advance. We present a methodology for developing and building one such system for a large specialized care hospital using a logistic regression and an artificial neural network model using nine routinely collected variables available right at the end of the triage process.A database of 255.668 triaged nonobstetric emergency department presentations from the Ramon y Cajal University Hospital of Madrid, from January 2011 to December 2012, was used to develop and test the models, with 66% of the data used for derivation and 34% for validation, with an ordered nonrandom partition. On the validation dataset areas under the receiver operating characteristic curve were 0.8568 (95% confidence interval, 0.8508-0.8583) for the logistic regression model and 0.8575 (95% confidence interval, 0.8540-0. 8610) for the artificial neural network model. χ Values for Hosmer-Lemeshow fixed "deciles of risk" were 65.32 for the logistic regression model and 17.28 for the artificial neural network model. A nomogram was generated upon the logistic regression model and an automated software decision support system with a Web interface was built based on the artificial neural network model.
Product unit neural network models for predicting the growth limits of Listeria monocytogenes.
Valero, A; Hervás, C; García-Gimeno, R M; Zurera, G
2007-08-01
A new approach to predict the growth/no growth interface of Listeria monocytogenes as a function of storage temperature, pH, citric acid (CA) and ascorbic acid (AA) is presented. A linear logistic regression procedure was performed and a non-linear model was obtained by adding new variables by means of a Neural Network model based on Product Units (PUNN). The classification efficiency of the training data set and the generalization data of the new Logistic Regression PUNN model (LRPU) were compared with Linear Logistic Regression (LLR) and Polynomial Logistic Regression (PLR) models. 92% of the total cases from the LRPU model were correctly classified, an improvement on the percentage obtained using the PLR model (90%) and significantly higher than the results obtained with the LLR model, 80%. On the other hand predictions of LRPU were closer to data observed which permits to design proper formulations in minimally processed foods. This novel methodology can be applied to predictive microbiology for describing growth/no growth interface of food-borne microorganisms such as L. monocytogenes. The optimal balance is trying to find models with an acceptable interpretation capacity and with good ability to fit the data on the boundaries of variable range. The results obtained conclude that these kinds of models might well be very a valuable tool for mathematical modeling.
Lacagnina, Valerio; Leto-Barone, Maria S; La Piana, Simona; Seidita, Aurelio; Pingitore, Giuseppe; Di Lorenzo, Gabriele
2014-01-01
This article uses the logistic regression model for diagnostic decision making in patients with chronic nasal symptoms. We studied the ability of the logistic regression model, obtained by the evaluation of a database, to detect patients with positive allergy skin-prick test (SPT) and patients with negative SPT. The model developed was validated using the data set obtained from another medical institution. The analysis was performed using a database obtained from a questionnaire administered to the patients with nasal symptoms containing personal data, clinical data, and results of allergy testing (SPT). All variables found to be significantly different between patients with positive and negative SPT (p < 0.05) were selected for the logistic regression models and were analyzed with backward stepwise logistic regression, evaluated with area under the curve of the receiver operating characteristic curve. A second set of patients from another institution was used to prove the model. The accuracy of the model in identifying, over the second set, both patients whose SPT will be positive and negative was high. The model detected 96% of patients with nasal symptoms and positive SPT and classified 94% of those with negative SPT. This study is preliminary to the creation of a software that could help the primary care doctors in a diagnostic decision making process (need of allergy testing) in patients complaining of chronic nasal symptoms.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M
In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.
2011-01-01
Introduction Necrotizing fasciitis (NF) is a life threatening infectious disease with a high mortality rate. We carried out a microbiological characterization of the causative pathogens. We investigated the correlation of mortality in NF with bloodstream infection and with the presence of co-morbidities. Methods In this retrospective study, we analyzed 323 patients who presented with necrotizing fasciitis at two different institutions. Bloodstream infection (BSI) was defined as a positive blood culture result. The patients were categorized as survivors and non-survivors. Eleven clinically important variables which were statistically significant by univariate analysis were selected for multivariate regression analysis and a stepwise logistic regression model was developed to determine the association between BSI and mortality. Results Univariate logistic regression analysis showed that patients with hypotension, heart disease, liver disease, presence of Vibrio spp. in wound cultures, presence of fungus in wound cultures, and presence of Streptococcus group A, Aeromonas spp. or Vibrio spp. in blood cultures, had a significantly higher risk of in-hospital mortality. Our multivariate logistic regression analysis showed a higher risk of mortality in patients with pre-existing conditions like hypotension, heart disease, and liver disease. Multivariate logistic regression analysis also showed that presence of Vibrio spp in wound cultures, and presence of Streptococcus Group A in blood cultures were associated with a high risk of mortality while debridement > = 3 was associated with improved survival. Conclusions Mortality in patients with necrotizing fasciitis was significantly associated with the presence of Vibrio in wound cultures and Streptococcus group A in blood cultures. PMID:21693053
Prediction of siRNA potency using sparse logistic regression.
Hu, Wei; Hu, John
2014-06-01
RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun
2006-01-01
In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.
A general framework for the use of logistic regression models in meta-analysis.
Simmonds, Mark C; Higgins, Julian Pt
2016-12-01
Where individual participant data are available for every randomised trial in a meta-analysis of dichotomous event outcomes, "one-stage" random-effects logistic regression models have been proposed as a way to analyse these data. Such models can also be used even when individual participant data are not available and we have only summary contingency table data. One benefit of this one-stage regression model over conventional meta-analysis methods is that it maximises the correct binomial likelihood for the data and so does not require the common assumption that effect estimates are normally distributed. A second benefit of using this model is that it may be applied, with only minor modification, in a range of meta-analytic scenarios, including meta-regression, network meta-analyses and meta-analyses of diagnostic test accuracy. This single model can potentially replace the variety of often complex methods used in these areas. This paper considers, with a range of meta-analysis examples, how random-effects logistic regression models may be used in a number of different types of meta-analyses. This one-stage approach is compared with widely used meta-analysis methods including Bayesian network meta-analysis and the bivariate and hierarchical summary receiver operating characteristic (ROC) models for meta-analyses of diagnostic test accuracy. © The Author(s) 2014.
2011-01-01
Background The relationship between asthma and traffic-related pollutants has received considerable attention. The use of individual-level exposure measures, such as residence location or proximity to emission sources, may avoid ecological biases. Method This study focused on the pediatric Medicaid population in Detroit, MI, a high-risk population for asthma-related events. A population-based matched case-control analysis was used to investigate associations between acute asthma outcomes and proximity of residence to major roads, including freeways. Asthma cases were identified as all children who made at least one asthma claim, including inpatient and emergency department visits, during the three-year study period, 2004-06. Individually matched controls were randomly selected from the rest of the Medicaid population on the basis of non-respiratory related illness. We used conditional logistic regression with distance as both categorical and continuous variables, and examined non-linear relationships with distance using polynomial splines. The conditional logistic regression models were then extended by considering multiple asthma states (based on the frequency of acute asthma outcomes) using polychotomous conditional logistic regression. Results Asthma events were associated with proximity to primary roads with an odds ratio of 0.97 (95% CI: 0.94, 0.99) for a 1 km increase in distance using conditional logistic regression, implying that asthma events are less likely as the distance between the residence and a primary road increases. Similar relationships and effect sizes were found using polychotomous conditional logistic regression. Another plausible exposure metric, a reduced form response surface model that represents atmospheric dispersion of pollutants from roads, was not associated under that exposure model. Conclusions There is moderately strong evidence of elevated risk of asthma close to major roads based on the results obtained in this population-based matched case-control study. PMID:21513554
Neural network modeling for surgical decisions on traumatic brain injury patients.
Li, Y C; Liu, L; Chiu, W T; Jian, W S
2000-01-01
Computerized medical decision support systems have been a major research topic in recent years. Intelligent computer programs were implemented to aid physicians and other medical professionals in making difficult medical decisions. This report compares three different mathematical models for building a traumatic brain injury (TBI) medical decision support system (MDSS). These models were developed based on a large TBI patient database. This MDSS accepts a set of patient data such as the types of skull fracture, Glasgow Coma Scale (GCS), episode of convulsion and return the chance that a neurosurgeon would recommend an open-skull surgery for this patient. The three mathematical models described in this report including a logistic regression model, a multi-layer perceptron (MLP) neural network and a radial-basis-function (RBF) neural network. From the 12,640 patients selected from the database. A randomly drawn 9480 cases were used as the training group to develop/train our models. The other 3160 cases were in the validation group which we used to evaluate the performance of these models. We used sensitivity, specificity, areas under receiver-operating characteristics (ROC) curve and calibration curves as the indicator of how accurate these models are in predicting a neurosurgeon's decision on open-skull surgery. The results showed that, assuming equal importance of sensitivity and specificity, the logistic regression model had a (sensitivity, specificity) of (73%, 68%), compared to (80%, 80%) from the RBF model and (88%, 80%) from the MLP model. The resultant areas under ROC curve for logistic regression, RBF and MLP neural networks are 0.761, 0.880 and 0.897, respectively (P < 0.05). Among these models, the logistic regression has noticeably poorer calibration. This study demonstrated the feasibility of applying neural networks as the mechanism for TBI decision support systems based on clinical databases. The results also suggest that neural networks may be a better solution for complex, non-linear medical decision support systems than conventional statistical techniques such as logistic regression.
Viswanathan, M; Pearl, D L; Taboada, E N; Parmley, E J; Mutschall, S K; Jardine, C M
2017-05-01
Using data collected from a cross-sectional study of 25 farms (eight beef, eight swine and nine dairy) in 2010, we assessed clustering of molecular subtypes of C. jejuni based on a Campylobacter-specific 40 gene comparative genomic fingerprinting assay (CGF40) subtypes, using unweighted pair-group method with arithmetic mean (UPGMA) analysis, and multiple correspondence analysis. Exact logistic regression was used to determine which genes differentiate wildlife and livestock subtypes in our study population. A total of 33 bovine livestock (17 beef and 16 dairy), 26 wildlife (20 raccoon (Procyon lotor), five skunk (Mephitis mephitis) and one mouse (Peromyscus spp.) C. jejuni isolates were subtyped using CGF40. Dendrogram analysis, based on UPGMA, showed distinct branches separating bovine livestock and mammalian wildlife isolates. Furthermore, two-dimensional multiple correspondence analysis was highly concordant with dendrogram analysis showing clear differentiation between livestock and wildlife CGF40 subtypes. Based on multilevel logistic regression models with a random intercept for farm of origin, we found that isolates in general, and raccoons more specifically, were significantly more likely to be part of the wildlife branch. Exact logistic regression conducted gene by gene revealed 15 genes that were predictive of whether an isolate was of wildlife or bovine livestock isolate origin. Both multiple correspondence analysis and exact logistic regression revealed that in most cases, the presence of a particular gene (13 of 15) was associated with an isolate being of livestock rather than wildlife origin. In conclusion, the evidence gained from dendrogram analysis, multiple correspondence analysis and exact logistic regression indicates that mammalian wildlife carry CGF40 subtypes of C. jejuni distinct from those carried by bovine livestock. Future studies focused on source attribution of C. jejuni in human infections will help determine whether wildlife transmit Campylobacter jejuni directly to humans. © 2016 Blackwell Verlag GmbH.
2012-09-01
3,435 10,461 9.1 3.1 63 Unmarried with Children+ Unmarried without Children 439,495 0.01 10,350 43,870 10.1 2.2 64 Married with Children+ Married ...logistic regression model was used to predict the probability of eligibility for the survey (known eligibility vs . unknown eligibility). A second logistic...regression model was used to predict the probability of response among eligible sample members (complete response vs . non-response). CHAID (Chi
Habitat features and predictive habitat modeling for the Colorado chipmunk in southern New Mexico
Rivieccio, M.; Thompson, B.C.; Gould, W.R.; Boykin, K.G.
2003-01-01
Two subspecies of Colorado chipmunk (state threatened and federal species of concern) occur in southern New Mexico: Tamias quadrivittatus australis in the Organ Mountains and T. q. oscuraensis in the Oscura Mountains. We developed a GIS model of potentially suitable habitat based on vegetation and elevation features, evaluated site classifications of the GIS model, and determined vegetation and terrain features associated with chipmunk occurrence. We compared GIS model classifications with actual vegetation and elevation features measured at 37 sites. At 60 sites we measured 18 habitat variables regarding slope, aspect, tree species, shrub species, and ground cover. We used logistic regression to analyze habitat variables associated with chipmunk presence/absence. All (100%) 37 sample sites (28 predicted suitable, 9 predicted unsuitable) were classified correctly by the GIS model regarding elevation and vegetation. For 28 sites predicted suitable by the GIS model, 18 sites (64%) appeared visually suitable based on habitat variables selected from logistic regression analyses, of which 10 sites (36%) were specifically predicted as suitable habitat via logistic regression. We detected chipmunks at 70% of sites deemed suitable via the logistic regression models. Shrub cover, tree density, plant proximity, presence of logs, and presence of rock outcrop were retained in the logistic model for the Oscura Mountains; litter, shrub cover, and grass cover were retained in the logistic model for the Organ Mountains. Evaluation of predictive models illustrates the need for multi-stage analyses to best judge performance. Microhabitat analyses indicate prospective needs for different management strategies between the subspecies. Sensitivities of each population of the Colorado chipmunk to natural and prescribed fire suggest that partial burnings of areas inhabited by Colorado chipmunks in southern New Mexico may be beneficial. These partial burnings may later help avoid a fire that could substantially reduce habitat of chipmunks over a mountain range.
New England salt marsh pools: A quantitative analysis of geomorphic and geographic features
Adamowicz, S.C.; Roman, C.T.
2005-01-01
New England salt marsh pools provide important wildlife habitat and are the object of on-going salt marsh restoration projects; however, they have not been quantified in terms of their basic geomorphic and geographic traits. An examination of 32 ditched and unditched salt marshes from the Connecticut shore of Long Island Sound to southern Maine, USA, revealed that pools from ditched and unditched marshes had similar average sizes of about 200 m2, averaged 29 cm in depth, and were located about 11 m from the nearest tidal flow. Unditched marshes had 3 times the density (13 pools/ha), 2.5 times the pool coverage (83 m pool/km transect), and 4 times the total pool surface area per hectare (913 m2 pool/ha salt marsh) of ditched sites. Linear regression analysis demonstrated that an increasing density of ditches (m ditch/ha salt marsh) was negatively correlated with pool density and total pool surface area per hectare. Creek density was positively correlated with these variables. Thus, it was not the mere presence of drainage channels that were associated with low numbers of pools, but their type (ditch versus creek) and abundance. Tidal range was not correlated with pool density or total pool surface area, while marsh latitude had only a weak relationship to total pool surface area per hectare. Pools should be incorporated into salt marsh restoration planning, and the parameters quantified here may be used as initial design targets.
García Lavandeira, José A; Ruano-Ravina, Alberto; Kelsey, Karl T; Torres-Durán, María; Parente-Lamelas, Isaura; Leiro-Fernández, Virginia; Zapata, Maruxa; Abal-Arca, José; Vidal-García, Iria; Montero-Martínez, Carmen; Amenedo, Margarita; Castro-Añón, Olalla; Golpe-Gómez, Antonio; Guzmán-Taveras, Rosirys; Martínez, Cristina; Provencio, Mariano; Mejuto-Martí, María J; García-García, Silvia; Fernández-Villar, Alberto; Piñeiro, María; Barros-Dios, Juan M
2018-06-01
Lung cancer is the deadliest cancer in developed countries but the etiology of lung cancer risk in never smokers (LCRINS) is largely unknown. We aim to assess the effects of alcohol consumption, in its different forms, on LCRINS. We pooled six multi-center case-control studies developed in the northwest of Spain. Cases and controls groups were composed of never smokers. We selected incident cases with anatomopathologically confirmed lung cancer diagnoses. All participants were personally interviewed. We performed two groups of statistical models, applying unconditional logistic regression with generalized additive models. One considered the effect of alcohol type consumption and the other considered the quantity of each alcoholic beverage consumed. A total of 438 cases and 863 controls were included. Median age was 71 and 66, years, respectively. Adenocarcinoma was the predominant histological type, comprising 66% of all cases. We found that any type of wine consumption posed an OR of 2.20 OR 95%CI 1.12-4.35), and spirits consumption had an OR of 1.90 (95%CI 1.13-3.23). Beer consumption had an OR of 1.33 (95%CI 0.82-2.14). These results were similar when women were analyzed separately, but for men there was no apparent risk for any alcoholic beverage. The dose-response analysis for each alcoholic beverage revealed no clear pattern. Wine and spirits consumption might increase the risk of LCRINSs, particularly in females. These results have to be taken with caution given the limitations of the present study.
Novel genes identified in a high-density genome wide association study for nicotine dependence.
Bierut, Laura Jean; Madden, Pamela A F; Breslau, Naomi; Johnson, Eric O; Hatsukami, Dorothy; Pomerleau, Ovide F; Swan, Gary E; Rutter, Joni; Bertelsen, Sarah; Fox, Louis; Fugman, Douglas; Goate, Alison M; Hinrichs, Anthony L; Konvicka, Karel; Martin, Nicholas G; Montgomery, Grant W; Saccone, Nancy L; Saccone, Scott F; Wang, Jen C; Chase, Gary A; Rice, John P; Ballinger, Dennis G
2007-01-01
Tobacco use is a leading contributor to disability and death worldwide, and genetic factors contribute in part to the development of nicotine dependence. To identify novel genes for which natural variation contributes to the development of nicotine dependence, we performed a comprehensive genome wide association study using nicotine dependent smokers as cases and non-dependent smokers as controls. To allow the efficient, rapid, and cost effective screen of the genome, the study was carried out using a two-stage design. In the first stage, genotyping of over 2.4 million single nucleotide polymorphisms (SNPs) was completed in case and control pools. In the second stage, we selected SNPs for individual genotyping based on the most significant allele frequency differences between cases and controls from the pooled results. Individual genotyping was performed in 1050 cases and 879 controls using 31 960 selected SNPs. The primary analysis, a logistic regression model with covariates of age, gender, genotype and gender by genotype interaction, identified 35 SNPs with P-values less than 10(-4) (minimum P-value 1.53 x 10(-6)). Although none of the individual findings is statistically significant after correcting for multiple tests, additional statistical analyses support the existence of true findings in this group. Our study nominates several novel genes, such as Neurexin 1 (NRXN1), in the development of nicotine dependence while also identifying a known candidate gene, the beta3 nicotinic cholinergic receptor. This work anticipates the future directions of large-scale genome wide association studies with state-of-the-art methodological approaches and sharing of data with the scientific community.
Assari, Shervin; Caldwell, Cleopatra Howard
2017-01-01
Adolescence is a developmental period marked by increased stress, especially among Black youth. In addition to stress related to their developmental transition, social factors such as a perceived unsafe neighborhood impose additional risks. We examined gender and ethnic differences in the association between perceived neighborhood safety and major depressive disorder (MDD) among a national sample of Black youth. We used data from the National Survey of American Life - Adolescents (NSAL-A), 2003–2004. In total, 1170 Black adolescents entered the study. This number was composed of 810 African American and 360 Caribbean Black youth (age 13 to 17). Demographic factors, perceived neighborhood safety, and MDD (Composite International Diagnostic Interview, CIDI) were measured. Logistic regressions were used to test the association between neighborhood safety and MDD in the pooled sample, as well as based on ethnicity by gender groups. In the pooled sample of Black youth, those who perceived their neighborhoods to be unsafe were at higher risk of MDD (Odds Ratio [OR] = 1.25; 95% Confidence Interval [CI] = 1.02-1.51). The perception that one’s neighborhood is unsafe was associated with a higher risk of MDD among African American males (OR=1.41; 95% CI = 1.03–1.93) but not African American females or Caribbean Black males and females. In conclusion, perceived neighborhood safety is not a universal psychological determinant of MDD across ethnic by gender groups of Black youth; however, policies and programs that enhance the sense of neighborhood safety may prevent MDD in male African American youth. PMID:28241490
Birth order and Risk of Childhood Cancer: A Pooled Analysis from Five U.S. States
Von Behren, Julie; Spector, Logan G.; Mueller, Beth A.; Carozza, Susan E.; Chow, Eric J.; Fox, Erin E.; Horel, Scott; Johnson, Kimberly J.; McLaughlin, Colleen; Puumala, Susan E.; Ross, Julie A.; Reynolds, Peggy
2010-01-01
The causes of childhood cancers are largely unknown. Birth order has been used as a proxy for prenatal and postnatal exposures, such as frequency of infections and in utero hormone exposures. We investigated the association between birth order and childhood cancers in a pooled case-control dataset. The subjects were drawn from population-based registries of cancers and births in California, Minnesota, New York, Texas, and Washington. We included 17,672 cases less than 15 years of age who were diagnosed from1980-2004 and 57,966 randomly selected controls born 1970-2004, excluding children with Down syndrome. We calculated odds ratios and 95% confidence intervals using logistic regression, adjusted for sex, birth year, maternal race, maternal age, multiple birth, gestational age, and birth weight. Overall, we found an inverse relationship between childhood cancer risk and birth order. For children in the fourth or higher birth order category compared to first-born children, the adjusted OR was 0.87 (95% CI: 0.81, 0.93) for all cancers combined. When we examined risks by cancer type, a decreasing risk with increasing birth order was seen in the central nervous system (CNS) tumors, neuroblastoma, bilateral retinoblastoma, Wilms tumor, and rhabdomyosarcoma. We observed increased risks with increasing birth order for acute myeloid leukemia but a slight decrease in risk for acute lymphoid leukemia. These risk estimates were based on a very large sample size which allowed us to examine rare cancer types with greater statistical power than in most previous studies, however the biologic mechanisms remain to be elucidated. PMID:20715170
Molecular proxies for climate maladaptation in a long-lived tree (Pinus pinaster Aiton, Pinaceae).
Jaramillo-Correa, Juan-Pablo; Rodríguez-Quilón, Isabel; Grivet, Delphine; Lepoittevin, Camille; Sebastiani, Federico; Heuertz, Myriam; Garnier-Géré, Pauline H; Alía, Ricardo; Plomion, Christophe; Vendramin, Giovanni G; González-Martínez, Santiago C
2015-03-01
Understanding adaptive genetic responses to climate change is a main challenge for preserving biological diversity. Successful predictive models for climate-driven range shifts of species depend on the integration of information on adaptation, including that derived from genomic studies. Long-lived forest trees can experience substantial environmental change across generations, which results in a much more prominent adaptation lag than in annual species. Here, we show that candidate-gene SNPs (single nucleotide polymorphisms) can be used as predictors of maladaptation to climate in maritime pine (Pinus pinaster Aiton), an outcrossing long-lived keystone tree. A set of 18 SNPs potentially associated with climate, 5 of them involving amino acid-changing variants, were retained after performing logistic regression, latent factor mixed models, and Bayesian analyses of SNP-climate correlations. These relationships identified temperature as an important adaptive driver in maritime pine and highlighted that selective forces are operating differentially in geographically discrete gene pools. The frequency of the locally advantageous alleles at these selected loci was strongly correlated with survival in a common garden under extreme (hot and dry) climate conditions, which suggests that candidate-gene SNPs can be used to forecast the likely destiny of natural forest ecosystems under climate change scenarios. Differential levels of forest decline are anticipated for distinct maritime pine gene pools. Geographically defined molecular proxies for climate adaptation will thus critically enhance the predictive power of range-shift models and help establish mitigation measures for long-lived keystone forest trees in the face of impending climate change. Copyright © 2015 by the Genetics Society of America.
Ose, Jennifer; Poole, Elizabeth M.; Schock, Helena; Lehtinen, Matti; Arslan, Alan A.; Zeleniuch-Jacquotte, Anne; Visvanathan, Kala; Helzlsouer, Kathy; Buring, Julie E.; Lee, I-Min; Tjønneland, Anne; Dossus, Laure; Trichopoulou, Antonia; Masala, Giovanna; Onland-Moret, N. Charlotte; Weiderpass, Elisabete; Duell, Eric J.; Idahl, Annika; Travis, Ruth C.; Rinaldi, Sabina; Merritt, Melissa A.; Trabert, Britton; Wentzensen, Nicolas; Tworoger, Shelley S.; Kaaks, Rudolf; Fortner, Renée T.
2017-01-01
Invasive epithelial ovarian cancer (EOC) is the most lethal gynecologic malignancy. The etiology of EOC remains elusive; however, experimental and epidemiologic data suggest a role for hormone-related exposures in ovarian carcinogenesis and risk factor differences by histologic phenotypes and developmental pathways. Research on pre-diagnosis androgen concentrations and EOC risk has yielded inconclusive results, and analyses incorporating EOC subtypes are sparse. We conducted a pooled analysis of 7 nested case-control studies in the Ovarian Cancer Cohort Consortium to investigate the association between pre-diagnosis circulating androgens (testosterone, free testosterone, androstenedione, dehydroepiandrosterone sulfate (DHEAS)), sex hormone binding globulin (SHBG), and EOC risk by tumor characteristics (i.e. histology, grade, and stage). The final study population included 1,331 EOC cases and 3,017 matched controls. Multivariable conditional logistic regression was used to assess risk associations in pooled individual data. Testosterone was positively associated with EOC risk (all subtypes combined, Odds Ratio (OR)log2=1.12 [95% Confidence Interval (CI) 1.02–1.24]); other endogenous androgens and SHBG were not associated with overall risk. Higher concentrations of testosterone and androstenedione associated with an increased risk in endometrioid and mucinous tumors (e.g., testosterone, endometrioid tumors, ORlog2=1.40 [1.03–1.91]), but not serous or clear cell. An inverse association was observed between androstenedione and high grade serous tumors (ORlog2=0.76 [0.60–0.96]). Our analyses provide further evidence for a role of hormone-related pathways in EOC risk, with differences in associations between androgens and histologic subtypes of EOC. PMID:28381542
Basinas, Ioannis; Schlünssen, Vivi; Heederik, Dick; Sigsgaard, Torben; Smit, Lidwien A M; Samadi, Sadegh; Omland, Oyvind; Hjort, Charlotte; Madsen, Anne Mette; Skov, Simon; Wouters, Inge M
2012-02-01
To test the hypotheses that current endotoxin exposure is inversely associated with allergic sensitisation and positively associated with non-allergic respiratory diseases in four occupationally exposed populations using a standardised analytical approach. Data were pooled from four epidemiological studies including 3883 Dutch and Danish employees in veterinary medicine, agriculture and power plants using biofuel. Endotoxin exposure was estimated by quantitative job-exposure matrices specific for the study populations. Dose-response relationships between exposure, IgE-mediated sensitisation to common allergens and self-reported health symptoms were assessed using logistic regression and generalised additive modelling. Adjustments were made for study, age, sex, atopic predisposition, smoking habit and farm childhood. Heterogeneity was assessed by analysis stratified by study. Current endotoxin exposure was dose-dependently associated with a reduced prevalence of allergic sensitisation (ORs of 0.92, 0.81 and 0.66 for low mediate, high mediate and high exposure) and hay fever (ORs of 1.16, 0.81 and 0.58). Endotoxin exposure was a risk factor for organic dust toxic syndrome, and levels above 100 EU/m(3) significantly increased the risk of chronic bronchitis (p<0.0001). Stratification by farm childhood showed no effect modification except for allergic sensitisation. Only among workers without a farm childhood, endotoxin exposure was inversely associated with allergic sensitisation. Heterogeneity was primarily present for biofuel workers. Occupational endotoxin exposure has a protective effect on allergic sensitisation and hay fever but increases the risk for organic dust toxic syndrome and chronic bronchitis. Endotoxin's protective effects are most clearly observed among agricultural workers.
Bailey, Helen D; Fritschi, Lin; Infante-Rivard, Claire; Glass, Deborah C; Miligi, Lucia; Dockerty, John D; Lightfoot, Tracy; Clavel, Jacqueline; Roman, Eve; Spector, Logan G; Kaatsch, Peter; Metayer, Catherine; Magnani, Corrado; Milne, Elizabeth; Polychronopoulou, Sophia; Simpson, Jill; Rudant, Jérémie; Sidi, Vasiliki; Rondelli, Roberto; Orsi, Laurent; Kang, Alice; Petridou, Eleni; Schüz, Joachim
2014-01-01
Maternal occupational pesticide exposure during pregnancy and/or paternal occupational pesticide exposure around conception have been suggested to increase risk of leukemia in the offspring. With a view to providing insight in this area we pooled individual level data from 13 case-control studies participating in the Childhood Leukemia International Consortium (CLIC). Occupational data were harmonized to a compatible format. Pooled individual analyses were undertaken using unconditional logistic regression. Using exposure data from mothers of 8,236 cases, and 14,850 controls, and from fathers of 8,169 cases and 14,201 controls the odds ratio (OR) for maternal exposure during pregnancy and the risk of acute lymphoblastic leukemia (ALL) was 1.01 (95% confidence interval (CI) 0.78, 1.30) and for paternal exposure around conception 1.20 (95% 1.06, 1.38). For acute myeloid leukemia (AML), the OR for maternal exposure during pregnancy was 1.94 (CI 1.19, 3.18) and for paternal exposure around conception 0.91 (CI 0.66, 1.24.) based on data from 1,329 case and 12,141 control mothers, and 1,231 case and 11,383 control fathers. Our finding of a significantly increased risk of AML in the offspring with maternal exposure to pesticides during pregnancy is consistent with previous reports. We also found a slight increase in risk of ALL with paternal exposure around conception which appeared to be more evident in children diagnosed at the age of five years or more and those with T cell ALL which raises interesting questions on possible mechanisms. PMID:24700406
Cannioto, Rikki; LaMonte, Michael J.; Risch, Harvey A.; Hong, Chi-Chen; Sucheston-Campbell, Lara E.; Eng, Kevin H.; Szender, J. Brian; Chang-Claude, Jenny; Schmalfeldt, Barbara; Klapdor, Ruediger; Gower, Emily; Minlikeeva, Albina N.; Zirpoli, Gary; Bandera, Elisa V.; Berchuck, Andrew; Cramer, Daniel; Doherty, Jennifer A.; Edwards, Robert P.; Fridley, Brooke L.; Goode, Ellen L.; Goodman, Marc T.; Hogdall, Estrid; Hosono, Satoyo; Jensen, Allan; Jordan, Susan; Kjaer, Susanne K.; Matsuo, Keitaro; Ness, Roberta B.; Olsen, Catherine M.; Olson, Sara H.; Pearce, Celeste Leigh; Pike, Malcolm C.; Rossing, Mary Anne; Szamreta, Elizabeth A.; Thompson, Pamela J.; Tseng, Chiu-Chen; Vierkant, Robert A.; Webb, Penelope M.; Wentzensen, Nicolas; Wicklund, Kristine G.; Winham, Stacey J.; Wu, Anna H.; Modugno, Francesmary; Schildkraut, Joellen M.; Terry, Kathryn L.; Kelemen, Linda E.; Moysich, Kirsten B.
2016-01-01
Background Despite a large body of literature evaluating the association between recreational physical activity and epithelial ovarian cancer (EOC) risk, the extant evidence is inconclusive and little is known about the independent association between recreational physical inactivity and EOC risk. We conducted a pooled analysis of nine studies from the Ovarian Cancer Association Consortium (OCAC) to investigate the association between chronic recreational physical inactivity and EOC risk. Methods In accordance with the 2008 Physical Activity Guidelines for Americans, women reporting no regular, weekly recreational physical activity were classified as inactive. Multivariable logistic regression was utilized to estimate the odds ratios (OR) and 95% confidence intervals (CI) for the association between inactivity and EOC risk overall and by subgroups based upon histotype, menopausal status, race and body mass index (BMI). Results The current analysis included data from 8,309 EOC patients and 12,612 controls. We observed a significant positive association between inactivity and EOC risk (OR=1.34, 95% CI: 1.14-1.57) and similar associations were observed for each histotype. Conclusions In this large pooled analysis examining the association between recreational physical inactivity and EOC risk, we observed consistent evidence of an association between chronic inactivity and all EOC histotypes. Impact These data add to the growing body of evidence suggesting that inactivity is an independent risk factor for cancer. If the apparent association between inactivity and EOC risk is substantiated, additional work via targeted interventions should be pursued to characterize the dose of activity required to mitigate the risk of this highly fatal disease. PMID:27197285
Mortality at older ages and moves in residential and sheltered housing: evidence from the UK.
Robards, James; Evandrou, Maria; Falkingham, Jane; Vlachantoni, Athina
2014-06-01
The study examines the relationship between transitions to residential and sheltered housing and mortality. Past research has focused on housing moves over extended time periods and subsequent mortality. In this paper, annual housing transitions allow the identification of the patterning of housing moves, the duration of stay in each sector and the assessment of the relationship of preceding moves to a heightened risk of dying. The study uses longitudinal data constructed from pooled observations from the British Household Panel Survey (waves 1993-2008). Records were pooled for all cases where the survey member is 65 years or over and living in private housing at baseline and observed at three consecutive time points, including baseline (N=23 727). Binary logistic regression (death as outcome three waves after baseline) explored the relative strength of different housing transitions, controlling for sociodemographic predictors. (1) Transition to residential housing within the previous 12 months was associated with the highest mortality risk. (2) Results support existing findings showing an interaction between marital status and mortality, whereby unmarried persons were more likely to die. (3) Higher male mortality was observed across all housing transitions. An older person's move to residential housing is associated with a higher risk of mortality within 12 months of the move. Survivors living in residential housing for more than a year, show a similar probability of dying to those living in sheltered housing. Results highlight that it is the type of accommodation that affects an older person's mortality risk, and the length of time they spend there.
Molecular Proxies for Climate Maladaptation in a Long-Lived Tree (Pinus pinaster Aiton, Pinaceae)
Jaramillo-Correa, Juan-Pablo; Rodríguez-Quilón, Isabel; Grivet, Delphine; Lepoittevin, Camille; Sebastiani, Federico; Heuertz, Myriam; Garnier-Géré, Pauline H.; Alía, Ricardo; Plomion, Christophe; Vendramin, Giovanni G.; González-Martínez, Santiago C.
2015-01-01
Understanding adaptive genetic responses to climate change is a main challenge for preserving biological diversity. Successful predictive models for climate-driven range shifts of species depend on the integration of information on adaptation, including that derived from genomic studies. Long-lived forest trees can experience substantial environmental change across generations, which results in a much more prominent adaptation lag than in annual species. Here, we show that candidate-gene SNPs (single nucleotide polymorphisms) can be used as predictors of maladaptation to climate in maritime pine (Pinus pinaster Aiton), an outcrossing long-lived keystone tree. A set of 18 SNPs potentially associated with climate, 5 of them involving amino acid-changing variants, were retained after performing logistic regression, latent factor mixed models, and Bayesian analyses of SNP–climate correlations. These relationships identified temperature as an important adaptive driver in maritime pine and highlighted that selective forces are operating differentially in geographically discrete gene pools. The frequency of the locally advantageous alleles at these selected loci was strongly correlated with survival in a common garden under extreme (hot and dry) climate conditions, which suggests that candidate-gene SNPs can be used to forecast the likely destiny of natural forest ecosystems under climate change scenarios. Differential levels of forest decline are anticipated for distinct maritime pine gene pools. Geographically defined molecular proxies for climate adaptation will thus critically enhance the predictive power of range-shift models and help establish mitigation measures for long-lived keystone forest trees in the face of impending climate change. PMID:25549630
Hatt, Laurel E; Waters, Hugh R
2006-01-01
Diarrhea and respiratory infections account for more than two-fifths of all deaths among children under five. Parental education and economic status are well-known risk factors for child morbidity, but little is known about whether education and economic status operate synergistically or independently to influence children's health. Confirming the presence and direction of such interactions is important to better target education and development policies. Our objective is to test for interactions between parental education and economic status in predicting the risk of diarrhea and respiratory illness among children under five, before and after adjusting for key proximate risk factors. We pool 12 Demographic and Health Surveys (DHS) and nine Living Standards Measurement Surveys (LSMS) from Latin America, creating two large databases. Quintiles of economic status are constructed from principal components asset indices. We use logistic regression to analyze episodes of diarrhea and respiratory illness, and interactions between economic quintile and maternal and paternal education are evaluated via likelihood ratio tests. We find that mother's education and quintile interact synergistically in the DHS data, while results are inconclusive in the LSMS data. The effect of increasing maternal education appears to be more protective for children in wealthy families than for children in poor families. Conversely, improvements in economic status reduce health risks more for children whose mothers are better educated. Father's education is protective and operates independently of economic status. Our findings imply that poverty alleviation efforts occurring in concert with programs to educate women and girls will be more effective for improving children's health than either approach alone.
Development and evaluation of a reservoir model for the Chain of Lakes in Illinois
Domanski, Marian M.
2017-01-27
Forecasts of flows entering and leaving the Chain of Lakes reservoir on the Fox River in northeastern Illinois are critical information to water-resource managers who determine the optimal operation of the dam at McHenry, Illinois, to help minimize damages to property and loss of life because of flooding on the Fox River. In 2014, the U.S. Geological Survey; the Illinois Department of Natural Resources, Office of Water Resources; and National Weather Service, North Central River Forecast Center began a cooperative study to develop a system to enable engineers and planners to simulate and communicate flows and to prepare proactively for precipitation events in near real time in the upper Fox River watershed. The purpose of this report is to document the development and evaluation of the Chain of Lakes reservoir model developed in this study.The reservoir model for the Chain of Lakes was developed using the Hydrologic Engineering Center–Reservoir System Simulation program. Because of the complex relation between the dam headwater and reservoir pool elevations, the reservoir model uses a linear regression model that relates dam headwater elevation to reservoir pool elevation. The linear regression model was developed using 17 U.S. Geological Survey streamflow measurements, along with the gage height in the reservoir pool and the gage height at the dam headwater. The Nash-Sutcliffe model efficiency coefficients for all three linear regression model variables ranged from 0.90 to 0.98.The reservoir model performance was evaluated by graphically comparing simulated and observed reservoir pool elevation time series during nine periods of high pool elevation. In addition, the peak elevations during these time periods were graphically compared to the closest-in-time observed pool elevation peak. The mean difference in the simulated and observed peak elevations was -0.03 feet, with a standard deviation of 0.19 feet. The Nash-Sutcliffe coefficient for peak prediction was calculated as 0.94. Evaluation of the model based on accuracy of peak prediction and the ability to simulate an elevation time series showed the performance of the model was satisfactory.
The logistic model for predicting the non-gonoactive Aedes aegypti females.
Reyes-Villanueva, Filiberto; Rodríguez-Pérez, Mario A
2004-01-01
To estimate, using logistic regression, the likelihood of occurrence of a non-gonoactive Aedes aegypti female, previously fed human blood, with relation to body size and collection method. This study was conducted in Monterrey, Mexico, between 1994 and 1996. Ten samplings of 60 mosquitoes of Ae. aegypti females were carried out in three dengue endemic areas: six of biting females, two of emerging mosquitoes, and two of indoor resting females. Gravid females, as well as those with blood in the gut were removed. Mosquitoes were taken to the laboratory and engorged on human blood. After 48 hours, ovaries were dissected to register whether they were gonoactive or non-gonoactive. Wing-length in mm was an indicator for body size. The logistic regression model was used to assess the likelihood of non-gonoactivity, as a binary variable, in relation to wing-length and collection method. Of the 600 females, 164 (27%) remained non-gonoactive, with a wing-length range of 1.9-3.2 mm, almost equal to that of all females (1.8-3.3 mm). The logistic regression model showed a significant likelihood of a female remaining non-gonoactive (Y=1). The collection method did not influence the binary response, but there was an inverse relationship between non-gonoactivity and wing-length. Dengue vector populations from Monterrey, Mexico display a wide-range body size. Logistic regression was a useful tool to estimate the likelihood for an engorged female to remain non-gonoactive. The necessity for a second blood meal is present in any female, but small mosquitoes are more likely to bite again within a 2-day interval, in order to attain egg maturation. The English version of this paper is available too at: http://www.insp.mx/salud/index.html.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwon, Deukwoo; Little, Mark P.; Miller, Donald L.
Purpose: To determine more accurate regression formulas for estimating peak skin dose (PSD) from reference air kerma (RAK) or kerma-area product (KAP). Methods: After grouping of the data from 21 procedures into 13 clinically similar groups, assessments were made of optimal clustering using the Bayesian information criterion to obtain the optimal linear regressions of (log-transformed) PSD vs RAK, PSD vs KAP, and PSD vs RAK and KAP. Results: Three clusters of clinical groups were optimal in regression of PSD vs RAK, seven clusters of clinical groups were optimal in regression of PSD vs KAP, and six clusters of clinical groupsmore » were optimal in regression of PSD vs RAK and KAP. Prediction of PSD using both RAK and KAP is significantly better than prediction of PSD with either RAK or KAP alone. The regression of PSD vs RAK provided better predictions of PSD than the regression of PSD vs KAP. The partial-pooling (clustered) method yields smaller mean squared errors compared with the complete-pooling method.Conclusion: PSD distributions for interventional radiology procedures are log-normal. Estimates of PSD derived from RAK and KAP jointly are most accurate, followed closely by estimates derived from RAK alone. Estimates of PSD derived from KAP alone are the least accurate. Using a stochastic search approach, it is possible to cluster together certain dissimilar types of procedures to minimize the total error sum of squares.« less
Thorisdottir, Ingibjorg E; Asgeirsdottir, Bryndis B; Sigurvinsdottir, Rannveig; Allegrante, John P; Sigfusdottir, Inga D
2017-10-01
Both research and popular media reports suggest that adolescent mental health has been deteriorating across societies with advanced economies. This study sought to describe the trends in self-reported symptoms of depressed mood and anxiety among Icelandic adolescents. Data for this study come from repeated, cross-sectional, population-based school surveys of 43 482 Icelandic adolescents in 9th and 10th grade, with six waves of pooled data from 2006 to 2016. We used analysis of variance, linear regression and binomial logistic regression to examine trends in symptom scores of anxiety and depressed mood over time. Gender differences in trends of high symptoms were also tested for interactions. Linear regression analysis showed a significant linear increase over the course of the study period in mean symptoms of anxiety and depressed mood for girls only; however, symptoms of anxiety among boys decreased. The proportion of adolescents reporting high depressive symptoms increased by 1.6% for boys and 6.8% for girls; the proportion of those reporting high anxiety symptoms increased by 1.3% for boys and 8.6% for girls. Over the study period, the odds for reporting high depressive symptoms and high anxiety symptoms were significantly higher for both genders. Girls were more likely to report high symptoms of anxiety and depressed mood than boys. Self-reported symptoms of anxiety and depressed mood have increased over time among Icelandic adolescents. Our findings suggest that future research needs to look beyond mean changes and examine the trends among those adolescents who report high symptoms of emotional distress. © The Author 2017. Published by Oxford University Press on behalf of the European Public Health Association. All rights reserved.
Yoon, Hee Mang; Suh, Chong Hyun; Cho, Young Ah; Kim, Jeong Rye; Lee, Jin Seong; Jung, Ah Young; Kim, Jung Heon; Lee, Jeong-Yong; Kim, So Yeon
2018-06-01
To evaluate the diagnostic performance of reduced-dose CT for suspected appendicitis. A systematic search of the MEDLINE and EMBASE databases was carried out through to 10 January 2017. Studies evaluating the diagnostic performance of reduced-dose CT for suspected appendicitis in paediatric and adult patients were selected. Pooled summary estimates of sensitivity and specificity were calculated using hierarchical logistic regression modelling. Meta-regression was performed. Fourteen original articles with a total of 3,262 patients were included. For all studies using reduced-dose CT, the summary sensitivity was 96 % (95 % CI 93-98) with a summary specificity of 94 % (95 % CI 92-95). For the 11 studies providing a head-to-head comparison between reduced-dose CT and standard-dose CT, reduced-dose CT demonstrated a comparable summary sensitivity of 96 % (95 % CI 91-98) and specificity of 94 % (95 % CI 93-96) without any significant differences (p=.41). In meta-regression, there were no significant factors affecting the heterogeneity. The median effective radiation dose of the reduced-dose CT was 1.8 mSv (1.46-4.16 mSv), which was a 78 % reduction in effective radiation dose compared to the standard-dose CT. Reduced-dose CT shows excellent diagnostic performance for suspected appendicitis. • Reduced-dose CT shows excellent diagnostic performance for evaluating suspected appendicitis. • Reduced-dose CT has a comparable diagnostic performance to standard-dose CT. • Median effective radiation dose of reduced-dose CT was 1.8 mSv (1.46-4.16). • Reduced-dose CT achieved a 78 % dose reduction compared to standard-dose CT.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
Cho, Yoo Jin; Thrasher, James F; Swayampakala, Kamala; Yong, Hua-Hie; McKeever, Robert; Hammond, David; Anshari, Dien; Cummings, K Michael; Borland, Ron
2016-01-01
Some researchers have raised concerns that pictorial health warning labels (HWLs) on cigarette packages may lead to message rejection and reduced effectiveness of HWL messages. This study aimed to determine how state reactance (i.e., negative affect due to perceived manipulation) in response to both pictorial and text-only HWLs is associated with other types of HWL responses and with subsequent cessation attempts. Survey data were collected every 4 months between September 2013 and 2014 from online panels of adult smokers in Australia, Canada, Mexico, and the US were analyzed. Participants with at least one wave of follow-up were included in the analysis (n = 4,072 smokers; 7,459 observations). Surveys assessed psychological and behavioral responses to HWLs (i.e., attention to HWLs, cognitive elaboration of risks due to HWLs, avoiding HWLs, and forgoing cigarettes because of HWLs) and cessation attempts. Participants then viewed specific HWLs from their countries and were queried about affective state reactance. Logistic and linear Generalized Estimating Equation (GEE) models regressed each of the psychological and behavioral HWL responses on reactance, while controlling for socio-demographic and smoking-related variables. Logistic GEE models also regressed having attempted to quit by the subsequent survey on reactance, each of the psychological and behavioral HWL responses (analyzed separately), adjustment variables. Data from all countries were initially pooled, with interactions between country and reactance assessed; when interactions were statistically significant, country-stratified models were estimated. Interactions between country and reactance were found in all models that regressed psychological and behavioral HWL responses on study variables. In the US, stronger reactance was associated with more frequent reading of HWLs and thinking about health risks. Smokers from all four countries with stronger reactance reported greater likelihood of avoiding warnings and forgoing cigarettes due to warnings, although the association appeared stronger in the US. Both stronger HWLs responses and reactance were positively associated with subsequent cessation attempts, with no significant interaction between country and reactance. Reactance towards HWLs does not appear to interfere with quitting, which is consistent with its being an indicator of concern, not a systematic effort to avoid HWL message engagement.
Generalized and synthetic regression estimators for randomized branch sampling
David L. R. Affleck; Timothy G. Gregoire
2015-01-01
In felled-tree studies, ratio and regression estimators are commonly used to convert more readily measured branch characteristics to dry crown mass estimates. In some cases, data from multiple trees are pooled to form these estimates. This research evaluates the utility of both tactics in the estimation of crown biomass following randomized branch sampling (...
Ardoino, Ilaria; Lanzoni, Monica; Marano, Giuseppe; Boracchi, Patrizia; Sagrini, Elisabetta; Gianstefani, Alice; Piscaglia, Fabio; Biganzoli, Elia M
2017-04-01
The interpretation of regression models results can often benefit from the generation of nomograms, 'user friendly' graphical devices especially useful for assisting the decision-making processes. However, in the case of multinomial regression models, whenever categorical responses with more than two classes are involved, nomograms cannot be drawn in the conventional way. Such a difficulty in managing and interpreting the outcome could often result in a limitation of the use of multinomial regression in decision-making support. In the present paper, we illustrate the derivation of a non-conventional nomogram for multinomial regression models, intended to overcome this issue. Although it may appear less straightforward at first sight, the proposed methodology allows an easy interpretation of the results of multinomial regression models and makes them more accessible for clinicians and general practitioners too. Development of prediction model based on multinomial logistic regression and of the pertinent graphical tool is illustrated by means of an example involving the prediction of the extent of liver fibrosis in hepatitis C patients by routinely available markers.
Regularization Paths for Conditional Logistic Regression: The clogitL1 Package.
Reid, Stephen; Tibshirani, Rob
2014-07-01
We apply the cyclic coordinate descent algorithm of Friedman, Hastie, and Tibshirani (2010) to the fitting of a conditional logistic regression model with lasso [Formula: see text] and elastic net penalties. The sequential strong rules of Tibshirani, Bien, Hastie, Friedman, Taylor, Simon, and Tibshirani (2012) are also used in the algorithm and it is shown that these offer a considerable speed up over the standard coordinate descent algorithm with warm starts. Once implemented, the algorithm is used in simulation studies to compare the variable selection and prediction performance of the conditional logistic regression model against that of its unconditional (standard) counterpart. We find that the conditional model performs admirably on datasets drawn from a suitable conditional distribution, outperforming its unconditional counterpart at variable selection. The conditional model is also fit to a small real world dataset, demonstrating how we obtain regularization paths for the parameters of the model and how we apply cross validation for this method where natural unconditional prediction rules are hard to come by.
Computational tools for exact conditional logistic regression.
Corcoran, C; Mehta, C; Patel, N; Senchaudhuri, P
Logistic regression analyses are often challenged by the inability of unconditional likelihood-based approximations to yield consistent, valid estimates and p-values for model parameters. This can be due to sparseness or separability in the data. Conditional logistic regression, though useful in such situations, can also be computationally unfeasible when the sample size or number of explanatory covariates is large. We review recent developments that allow efficient approximate conditional inference, including Monte Carlo sampling and saddlepoint approximations. We demonstrate through real examples that these methods enable the analysis of significantly larger and more complex data sets. We find in this investigation that for these moderately large data sets Monte Carlo seems a better alternative, as it provides unbiased estimates of the exact results and can be executed in less CPU time than can the single saddlepoint approximation. Moreover, the double saddlepoint approximation, while computationally the easiest to obtain, offers little practical advantage. It produces unreliable results and cannot be computed when a maximum likelihood solution does not exist. Copyright 2001 John Wiley & Sons, Ltd.
Regularization Paths for Conditional Logistic Regression: The clogitL1 Package
Reid, Stephen; Tibshirani, Rob
2014-01-01
We apply the cyclic coordinate descent algorithm of Friedman, Hastie, and Tibshirani (2010) to the fitting of a conditional logistic regression model with lasso (ℓ1) and elastic net penalties. The sequential strong rules of Tibshirani, Bien, Hastie, Friedman, Taylor, Simon, and Tibshirani (2012) are also used in the algorithm and it is shown that these offer a considerable speed up over the standard coordinate descent algorithm with warm starts. Once implemented, the algorithm is used in simulation studies to compare the variable selection and prediction performance of the conditional logistic regression model against that of its unconditional (standard) counterpart. We find that the conditional model performs admirably on datasets drawn from a suitable conditional distribution, outperforming its unconditional counterpart at variable selection. The conditional model is also fit to a small real world dataset, demonstrating how we obtain regularization paths for the parameters of the model and how we apply cross validation for this method where natural unconditional prediction rules are hard to come by. PMID:26257587
Ordinal logistic regression analysis on the nutritional status of children in KarangKitri village
NASA Astrophysics Data System (ADS)
Ohyver, Margaretha; Yongharto, Kimmy Octavian
2015-09-01
Ordinal logistic regression is a statistical technique that can be used to describe the relationship between ordinal response variable with one or more independent variables. This method has been used in various fields including in the health field. In this research, ordinal logistic regression is used to describe the relationship between nutritional status of children with age, gender, height, and family status. Nutritional status of children in this research is divided into over nutrition, well nutrition, less nutrition, and malnutrition. The purpose for this research is to describe the characteristics of children in the KarangKitri Village and to determine the factors that influence the nutritional status of children in the KarangKitri village. There are three things that obtained from this research. First, there are still children who are not categorized as well nutritional status. Second, there are children who come from sufficient economic level which include in not normal status. Third, the factors that affect the nutritional level of children are age, family status, and height.
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D.; Hood, Darryl B.; Skelton, Tyler
2014-01-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire. PMID:23395953
An ultra low power feature extraction and classification system for wearable seizure detection.
Page, Adam; Pramod Tim Oates, Siddharth; Mohsenin, Tinoosh
2015-01-01
In this paper we explore the use of a variety of machine learning algorithms for designing a reliable and low-power, multi-channel EEG feature extractor and classifier for predicting seizures from electroencephalographic data (scalp EEG). Different machine learning classifiers including k-nearest neighbor, support vector machines, naïve Bayes, logistic regression, and neural networks are explored with the goal of maximizing detection accuracy while minimizing power, area, and latency. The input to each machine learning classifier is a 198 feature vector containing 9 features for each of the 22 EEG channels obtained over 1-second windows. All classifiers were able to obtain F1 scores over 80% and onset sensitivity of 100% when tested on 10 patients. Among five different classifiers that were explored, logistic regression (LR) proved to have minimum hardware complexity while providing average F-1 score of 91%. Both ASIC and FPGA implementations of logistic regression are presented and show the smallest area, power consumption, and the lowest latency when compared to the previous work.
The arcsine is asinine: the analysis of proportions in ecology.
Warton, David I; Hui, Francis K C
2011-01-01
The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Examples are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine- and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D; Hood, Darryl B; Skelton, Tyler
2013-02-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire.
Avalos, Marta; Adroher, Nuria Duran; Lagarde, Emmanuel; Thiessard, Frantz; Grandvalet, Yves; Contrand, Benjamin; Orriols, Ludivine
2012-09-01
Large data sets with many variables provide particular challenges when constructing analytic models. Lasso-related methods provide a useful tool, although one that remains unfamiliar to most epidemiologists. We illustrate the application of lasso methods in an analysis of the impact of prescribed drugs on the risk of a road traffic crash, using a large French nationwide database (PLoS Med 2010;7:e1000366). In the original case-control study, the authors analyzed each exposure separately. We use the lasso method, which can simultaneously perform estimation and variable selection in a single model. We compare point estimates and confidence intervals using (1) a separate logistic regression model for each drug with a Bonferroni correction and (2) lasso shrinkage logistic regression analysis. Shrinkage regression had little effect on (bias corrected) point estimates, but led to less conservative results, noticeably for drugs with moderate levels of exposure. Carbamates, carboxamide derivative and fatty acid derivative antiepileptics, drugs used in opioid dependence, and mineral supplements of potassium showed stronger associations. Lasso is a relevant method in the analysis of databases with large number of exposures and can be recommended as an alternative to conventional strategies.
NASA Astrophysics Data System (ADS)
Shafizadeh-Moghadam, Hossein; Helbich, Marco
2015-03-01
The rapid growth of megacities requires special attention among urban planners worldwide, and particularly in Mumbai, India, where growth is very pronounced. To cope with the planning challenges this will bring, developing a retrospective understanding of urban land-use dynamics and the underlying driving-forces behind urban growth is a key prerequisite. This research uses regression-based land-use change models - and in particular non-spatial logistic regression models (LR) and auto-logistic regression models (ALR) - for the Mumbai region over the period 1973-2010, in order to determine the drivers behind spatiotemporal urban expansion. Both global models are complemented by a local, spatial model, the so-called geographically weighted logistic regression (GWLR) model, one that explicitly permits variations in driving-forces across space. The study comes to two main conclusions. First, both global models suggest similar driving-forces behind urban growth over time, revealing that LRs and ALRs result in estimated coefficients with comparable magnitudes. Second, all the local coefficients show distinctive temporal and spatial variations. It is therefore concluded that GWLR aids our understanding of urban growth processes, and so can assist context-related planning and policymaking activities when seeking to secure a sustainable urban future.
Can Predictive Modeling Identify Head and Neck Oncology Patients at Risk for Readmission?
Manning, Amy M; Casper, Keith A; Peter, Kay St; Wilson, Keith M; Mark, Jonathan R; Collar, Ryan M
2018-05-01
Objective Unplanned readmission within 30 days is a contributor to health care costs in the United States. The use of predictive modeling during hospitalization to identify patients at risk for readmission offers a novel approach to quality improvement and cost reduction. Study Design Two-phase study including retrospective analysis of prospectively collected data followed by prospective longitudinal study. Setting Tertiary academic medical center. Subjects and Methods Prospectively collected data for patients undergoing surgical treatment for head and neck cancer from January 2013 to January 2015 were used to build predictive models for readmission within 30 days of discharge using logistic regression, classification and regression tree (CART) analysis, and random forests. One model (logistic regression) was then placed prospectively into the discharge workflow from March 2016 to May 2016 to determine the model's ability to predict which patients would be readmitted within 30 days. Results In total, 174 admissions had descriptive data. Thirty-two were excluded due to incomplete data. Logistic regression, CART, and random forest predictive models were constructed using the remaining 142 admissions. When applied to 106 consecutive prospective head and neck oncology patients at the time of discharge, the logistic regression model predicted readmissions with a specificity of 94%, a sensitivity of 47%, a negative predictive value of 90%, and a positive predictive value of 62% (odds ratio, 14.9; 95% confidence interval, 4.02-55.45). Conclusion Prospectively collected head and neck cancer databases can be used to develop predictive models that can accurately predict which patients will be readmitted. This offers valuable support for quality improvement initiatives and readmission-related cost reduction in head and neck cancer care.
Roland, Lauren T.; Kallogjeri, Dorina; Sinks, Belinda C.; Rauch, Steven D.; Shepard, Neil T.; White, Judith A.; Goebel, Joel A.
2015-01-01
Objective Test performance of a focused dizziness questionnaire’s ability to discriminate between peripheral and non-peripheral causes of vertigo. Study Design Prospective multi-center Setting Four academic centers with experienced balance specialists Patients New dizzy patients Interventions A 32-question survey was given to participants. Balance specialists were blinded and a diagnosis was established for all participating patients within 6 months. Main outcomes Multinomial logistic regression was used to evaluate questionnaire performance in predicting final diagnosis and differentiating between peripheral and non-peripheral vertigo. Univariate and multivariable stepwise logistic regression were used to identify questions as significant predictors of the ultimate diagnosis. C-index was used to evaluate performance and discriminative power of the multivariable models. Results 437 patients participated in the study. Eight participants without confirmed diagnoses were excluded and 429 were included in the analysis. Multinomial regression revealed that the model had good overall predictive accuracy of 78.5% for the final diagnosis and 75.5% for differentiating between peripheral and non-peripheral vertigo. Univariate logistic regression identified significant predictors of three main categories of vertigo: peripheral, central and other. Predictors were entered into forward stepwise multivariable logistic regression. The discriminative power of the final models for peripheral, central and other causes were considered good as measured by c-indices of 0.75, 0.7 and 0.78, respectively. Conclusions This multicenter study demonstrates a focused dizziness questionnaire can accurately predict diagnosis for patients with chronic/relapsing dizziness referred to outpatient clinics. Additionally, this survey has significant capability to differentiate peripheral from non-peripheral causes of vertigo and may, in the future, serve as a screening tool for specialty referral. Clinical utility of this questionnaire to guide specialty referral is discussed. PMID:26485598
Roland, Lauren T; Kallogjeri, Dorina; Sinks, Belinda C; Rauch, Steven D; Shepard, Neil T; White, Judith A; Goebel, Joel A
2015-12-01
Test performance of a focused dizziness questionnaire's ability to discriminate between peripheral and nonperipheral causes of vertigo. Prospective multicenter. Four academic centers with experienced balance specialists. New dizzy patients. A 32-question survey was given to participants. Balance specialists were blinded and a diagnosis was established for all participating patients within 6 months. Multinomial logistic regression was used to evaluate questionnaire performance in predicting final diagnosis and differentiating between peripheral and nonperipheral vertigo. Univariate and multivariable stepwise logistic regression were used to identify questions as significant predictors of the ultimate diagnosis. C-index was used to evaluate performance and discriminative power of the multivariable models. In total, 437 patients participated in the study. Eight participants without confirmed diagnoses were excluded and 429 were included in the analysis. Multinomial regression revealed that the model had good overall predictive accuracy of 78.5% for the final diagnosis and 75.5% for differentiating between peripheral and nonperipheral vertigo. Univariate logistic regression identified significant predictors of three main categories of vertigo: peripheral, central, and other. Predictors were entered into forward stepwise multivariable logistic regression. The discriminative power of the final models for peripheral, central, and other causes was considered good as measured by c-indices of 0.75, 0.7, and 0.78, respectively. This multicenter study demonstrates a focused dizziness questionnaire can accurately predict diagnosis for patients with chronic/relapsing dizziness referred to outpatient clinics. Additionally, this survey has significant capability to differentiate peripheral from nonperipheral causes of vertigo and may, in the future, serve as a screening tool for specialty referral. Clinical utility of this questionnaire to guide specialty referral is discussed.
Prediction of cold and heat patterns using anthropometric measures based on machine learning.
Lee, Bum Ju; Lee, Jae Chul; Nam, Jiho; Kim, Jong Yeol
2018-01-01
To examine the association of body shape with cold and heat patterns, to determine which anthropometric measure is the best indicator for discriminating between the two patterns, and to investigate whether using a combination of measures can improve the predictive power to diagnose these patterns. Based on a total of 4,859 subjects (3,000 women and 1,859 men), statistical analyses using binary logistic regression were performed to assess the significance of the difference and the predictive power of each anthropometric measure, and binary logistic regression and Naive Bayes with the variable selection technique were used to assess the improvement in the predictive power of the patterns using the combined measures. In women, the strongest indicators for determining the cold and heat patterns among anthropometric measures were body mass index (BMI) and rib circumference; in men, the best indicator was BMI. In experiments using a combination of measures, the values of the area under the receiver operating characteristic curve in women were 0.776 by Naive Bayes and 0.772 by logistic regression, and the values in men were 0.788 by Naive Bayes and 0.779 by logistic regression. Individuals with a higher BMI have a tendency toward a heat pattern in both women and men. The use of a combination of anthropometric measures can slightly improve the diagnostic accuracy. Our findings can provide fundamental information for the diagnosis of cold and heat patterns based on body shape for personalized medicine.
Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq
2007-10-01
A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.
Gong, Xu; Cui, Jianli; Jiang, Ziping; Lu, Laijin; Li, Xiucun
2018-03-01
Few clinical retrospective studies have reported the risk factors of pedicled flap necrosis in hand soft tissue reconstruction. The aim of this study was to identify non-technical risk factors associated with pedicled flap perioperative necrosis in hand soft tissue reconstruction via a multivariate logistic regression analysis. For patients with hand soft tissue reconstruction, we carefully reviewed hospital records and identified 163 patients who met the inclusion criteria. The characteristics of these patients, flap transfer procedures and postoperative complications were recorded. Eleven predictors were identified. The correlations between pedicled flap necrosis and risk factors were analysed using a logistic regression model. Of 163 skin flaps, 125 flaps survived completely without any complications. The pedicled flap necrosis rate in hands was 11.04%, which included partial flap necrosis (7.36%) and total flap necrosis (3.68%). Soft tissue defects in fingers were noted in 68.10% of all cases. The logistic regression analysis indicated that the soft tissue defect site (P = 0.046, odds ratio (OR) = 0.079, confidence interval (CI) (0.006, 0.959)), flap size (P = 0.020, OR = 1.024, CI (1.004, 1.045)) and postoperative wound infection (P < 0.001, OR = 17.407, CI (3.821, 79.303)) were statistically significant risk factors for pedicled flap necrosis of the hand. Soft tissue defect site, flap size and postoperative wound infection were risk factors associated with pedicled flap necrosis in hand soft tissue defect reconstruction. © 2017 Royal Australasian College of Surgeons.
Determination and Dependencies of Melt Pool Dimensions in Laser Micro Welding
NASA Astrophysics Data System (ADS)
Patschger, Andreas; Bliedtner, Jens
Melt pool dimensions such as width and length influence the properties of the resulting weld joint and should be considered when designing the laser welding process. The melt pool width and as a consequence the weld seam width determine the strength of the joint. The melt pool length is directly linked to the solidification time which affects the resulting metallurgical micro structure. The melt pool dimensions can be estimated by given analytical solutions based on the capillary diameter. In order to test the given estimations, melt pool dimensions of bead-on-plate welds in stainless steel foils were measured by means of high speed imaging and microscopy. The welds were obtained by applying different focal diameters between 25 μm and 204 μm to foil thicknesses of 50 μm and 100 μm. As a result, simplified correlations based on the focal diameter are derived which is less complex to determine in practice. Regression analyses ensure a statistical comparability.
A regularization corrected score method for nonlinear regression models with covariate error.
Zucker, David M; Gorfine, Malka; Li, Yi; Tadesse, Mahlet G; Spiegelman, Donna
2013-03-01
Many regression analyses involve explanatory variables that are measured with error, and failing to account for this error is well known to lead to biased point and interval estimates of the regression coefficients. We present here a new general method for adjusting for covariate error. Our method consists of an approximate version of the Stefanski-Nakamura corrected score approach, using the method of regularization to obtain an approximate solution of the relevant integral equation. We develop the theory in the setting of classical likelihood models; this setting covers, for example, linear regression, nonlinear regression, logistic regression, and Poisson regression. The method is extremely general in terms of the types of measurement error models covered, and is a functional method in the sense of not involving assumptions on the distribution of the true covariate. We discuss the theoretical properties of the method and present simulation results in the logistic regression setting (univariate and multivariate). For illustration, we apply the method to data from the Harvard Nurses' Health Study concerning the relationship between physical activity and breast cancer mortality in the period following a diagnosis of breast cancer. Copyright © 2013, The International Biometric Society.
Logistic Mixed Models to Investigate Implicit and Explicit Belief Tracking
Lages, Martin; Scheel, Anne
2016-01-01
We investigated the proposition of a two-systems Theory of Mind in adults’ belief tracking. A sample of N = 45 participants predicted the choice of one of two opponent players after observing several rounds in an animated card game. Three matches of this card game were played and initial gaze direction on target and subsequent choice predictions were recorded for each belief task and participant. We conducted logistic regressions with mixed effects on the binary data and developed Bayesian logistic mixed models to infer implicit and explicit mentalizing in true belief and false belief tasks. Although logistic regressions with mixed effects predicted the data well a Bayesian logistic mixed model with latent task- and subject-specific parameters gave a better account of the data. As expected explicit choice predictions suggested a clear understanding of true and false beliefs (TB/FB). Surprisingly, however, model parameters for initial gaze direction also indicated belief tracking. We discuss why task-specific parameters for initial gaze directions are different from choice predictions yet reflect second-order perspective taking. PMID:27853440
Horne, Hisani N; Oh, Hannah; Sherman, Mark E; Palakal, Maya; Hewitt, Stephen M; Schmidt, Marjanka K; Milne, Roger L; Hardisson, David; Benitez, Javier; Blomqvist, Carl; Bolla, Manjeet K; Brenner, Hermann; Chang-Claude, Jenny; Cora, Renata; Couch, Fergus J; Cuk, Katarina; Devilee, Peter; Easton, Douglas F; Eccles, Diana M; Eilber, Ursula; Hartikainen, Jaana M; Heikkilä, Päivi; Holleczek, Bernd; Hooning, Maartje J; Jones, Michael; Keeman, Renske; Mannermaa, Arto; Martens, John W M; Muranen, Taru A; Nevanlinna, Heli; Olson, Janet E; Orr, Nick; Perez, Jose I A; Pharoah, Paul D P; Ruddy, Kathryn J; Saum, Kai-Uwe; Schoemaker, Minouk J; Seynaeve, Caroline; Sironen, Reijo; Smit, Vincent T H B M; Swerdlow, Anthony J; Tengström, Maria; Thomas, Abigail S; Timmermans, A Mieke; Tollenaar, Rob A E M; Troester, Melissa A; van Asperen, Christi J; van Deurzen, Carolien H M; Van Leeuwen, Flora F; Van't Veer, Laura J; García-Closas, Montserrat; Figueroa, Jonine D
2018-04-26
E-cadherin (CDH1) is a putative tumor suppressor gene implicated in breast carcinogenesis. Yet, whether risk factors or survival differ by E-cadherin tumor expression is unclear. We evaluated E-cadherin tumor immunohistochemistry expression using tissue microarrays of 5,933 female invasive breast cancers from 12 studies from the Breast Cancer Consortium. H-scores were calculated and case-case odds ratios (OR) and 95% confidence intervals (CIs) were estimated using logistic regression. Survival analyses were performed using Cox regression models. All analyses were stratified by estrogen receptor (ER) status and histologic subtype. E-cadherin low cases (N = 1191, 20%) were more frequently of lobular histology, low grade, >2 cm, and HER2-negative. Loss of E-cadherin expression (score < 100) was associated with menopausal hormone use among ER-positive tumors (ever compared to never users, OR = 1.24, 95% CI = 0.97-1.59), which was stronger when we evaluated complete loss of E-cadherin (i.e. H-score = 0), OR = 1.57, 95% CI = 1.06-2.33. Breast cancer specific mortality was unrelated to E-cadherin expression in multivariable models. E-cadherin low expression is associated with lobular histology, tumor characteristics and menopausal hormone use, with no evidence of an association with breast cancer specific survival. These data support loss of E-cadherin expression as an important marker of tumor subtypes.
2013-01-01
Introduction In countries such as Bangladesh many women may only seek skilled care at birth when complications become evident. This often results in higher neonatal mortality for women who give birth in institutions than for those that give birth at home. However, we hypothesise that this apparent excess mortality is concentrated among less advantaged women. The aim of this paper is to examine the association between place of birth and neonatal mortality in Bangladesh, and how this varies by socio-economic status. Methodology The study is based on pooled data from four Bangladesh Demographic and Household Surveys, and uses descriptive analysis and binomial multivariate logistic regression. It uses regression models stratified for place of delivery to examine the impact of socio-economic status and place of residence on neonatal mortality. Results Poor women from rural areas and those with no education who gave birth in institutions had much worse outcomes than those who gave birth at home. There is no difference for more wealthy women. There is a much stronger socio-economic gradient in neonatal mortality for women who gave birth in institutions than those who delivered at home. Conclusion In Bangladesh babies from lower socio-economic groups and particularly those in rural areas have very poor outcomes if born in a facility. This suggests poorer, rural and less educated women are failing to obtain the timely access to quality maternal health care services needed to improve newborn outcomes. PMID:23496964
Ambient air pollution, traffic noise and adult asthma prevalence: a BioSHaRE approach.
Cai, Yutong; Zijlema, Wilma L; Doiron, Dany; Blangiardo, Marta; Burton, Paul R; Fortier, Isabel; Gaye, Amadou; Gulliver, John; de Hoogh, Kees; Hveem, Kristian; Mbatchou, Stéphane; Morley, David W; Stolk, Ronald P; Elliott, Paul; Hansell, Anna L; Hodgson, Susan
2017-01-01
We investigated the effects of both ambient air pollution and traffic noise on adult asthma prevalence, using harmonised data from three European cohort studies established in 2006-2013 (HUNT3, Lifelines and UK Biobank).Residential exposures to ambient air pollution (particulate matter with aerodynamic diameter ≤10 µm (PM 10 ) and nitrogen dioxide (NO 2 )) were estimated by a pan-European Land Use Regression model for 2007. Traffic noise for 2009 was modelled at home addresses by adapting a standardised noise assessment framework (CNOSSOS-EU). A cross-sectional analysis of 646 731 participants aged ≥20 years was undertaken using DataSHIELD to pool data for individual-level analysis via a "compute to the data" approach. Multivariate logistic regression models were fitted to assess the effects of each exposure on lifetime and current asthma prevalence.PM 10 or NO 2 higher by 10 µg·m -3 was associated with 12.8% (95% CI 9.5-16.3%) and 1.9% (95% CI 1.1-2.8%) higher lifetime asthma prevalence, respectively, independent of confounders. Effects were larger in those aged ≥50 years, ever-smokers and less educated. Noise exposure was not significantly associated with asthma prevalence.This study suggests that long-term ambient PM 10 exposure is associated with asthma prevalence in western European adults. Traffic noise is not associated with asthma prevalence, but its potential to impact on asthma exacerbations needs further investigation. Copyright ©ERS 2017.
Distiller, Larry A; Joffe, Barry I; Melville, Vanessa; Welman, Tania; Distiller, Greg B
2006-01-01
The factors responsible for premature coronary atherosclerosis in patients with type 1 diabetes are ill defined. We therefore assessed carotid intima-media complex thickness (IMT) in relatively long-surviving patients with type 1 diabetes as a marker of atherosclerosis and correlated this with traditional risk factors. Cross-sectional study of 148 patients with relatively long-surviving (>18 years) type 1 diabetes (76 men and 72 women) attending the Centre for Diabetes and Endocrinology, Johannesburg. The mean common carotid artery IMT and presence or absence of plaque was evaluated by high-resolution B-mode ultrasound. Their median age was 48 years and duration of diabetes 26 years (range 18-59 years). Traditional risk factors (age, duration of diabetes, glycemic control, hypertension, smoking and lipoprotein concentrations) were recorded. Three response variables were defined and modeled. Standard multiple regression was used for a continuous IMT variable, logistic regression for the presence/absence of plaque and ordinal logistic regression to model three categories of "risk." The median common carotid IMT was 0.62 mm (range 0.44-1.23 mm) with plaque detected in 28 cases. The multiple regression model found significant associations between IMT and current age (P=.001), duration of diabetes (P=.033), BMI (P=.008) and diagnosed hypertension (P=.046) with HDL showing a protective effect (P=.022). Current age (P=.001) and diagnosed hypertension (P=.004), smoking (P=.008) and retinopathy (P=.033) were significant in the logistic regression model. Current age was also significant in the ordinal logistic regression model (P<.001), as was total cholesterol/HDL ratio (P<.001) and mean HbA(1c) concentration (P=.073). The major factors influencing common carotid IMT in patients with relatively long-surviving type 1 diabetes are age, duration of diabetes, existing hypertension and HDL (protective) with a relatively minor role ascribed to relatively long-standing glycemic control.
Oddo, Vanessa M; Bleich, Sara N; Pollack, Keshia M; Surkan, Pamela J; Mueller, Noel T; Jones-Smith, Jessica C
2017-10-18
Maternal employment has increased in low-and middle-income countries (LMIC) and is a hypothesized risk factor for maternal overweight due to increased income and behavioral changes related to time allocation. However, few studies have investigated this relationship in LMIC. Using cross-sectional samples from Demographic and Health Surveys, we investigated the association between maternal employment and overweight (body mass index [BMI] ≥ 25 kg/m 2 ) among women in 38 LMIC (N = 162,768). We categorized mothers as formally employed, informally employed, or non-employed based on 4 indicators: employment status in the last 12 months; aggregate occupation category (skilled, unskilled); type of earnings (cash only, cash and in-kind, in-kind only, unpaid); and seasonality of employment (all year, seasonal/occasional employment). Formally employed women were largely employed year-round in skilled occupations and earned a wage (e.g. professional), whereas informally employed women were often irregularly employed in unskilled occupations and in some cases, were paid in-kind (e.g. domestic work). For within-country analyses, we used adjusted logistic regression models and included an interaction term to assess heterogeneity in the association by maternal education level. We then used meta-analysis and meta-regression to explore differences in the associations pooled across countries. Compared to non-employed mothers, formally employed mothers had higher odds of overweight (pooled odds ratio [POR] = 1.3; 95% Confidence Interval [CI] 1.2, 1.4) whereas informally employed mothers, compared to non-employed mothers, had lower odds of overweight (POR = 0.72; 95% CI: 0.64, 0.81). In 14 LMIC, the association varied by education. In these countries, the magnitude of the formal employment-overweight association was larger for women with low education (POR = 1.5; 95% CI: 1.1, 1.9) compared to those with high education (POR = 1.2; 95% CI: 1.0, 1.3). Formally employed mothers in LMIC have higher odds of overweight and the association varies by educational attainment in 14 countries. This knowledge highlights the importance of workplace initiatives to reduce the risk of overweight among working women in LMIC.
The impact of moderate wine consumption on the risk of developing prostate cancer
Ferro, Matteo; Foerster, Beat; Abufaraj, Mohammad; Briganti, Alberto; Karakiewicz, Pierre I; Shariat, Shahrokh F
2018-01-01
Objective To investigate the impact of moderate wine consumption on the risk of prostate cancer (PCa). We focused on the differential effect of moderate consumption of red versus white wine. Design This study was a meta-analysis that includes data from case–control and cohort studies. Materials and methods A systematic search of Web of Science, Medline/PubMed, and Cochrane library was performed on December 1, 2017. Studies were deemed eligible if they assessed the risk of PCa due to red, white, or any wine using multivariable logistic regression analysis. We performed a formal meta-analysis for the risk of PCa according to moderate wine and wine type consumption (white or red). Heterogeneity between studies was assessed using Cochrane’s Q test and I2 statistics. Publication bias was assessed using Egger’s regression test. Results A total of 930 abstracts and titles were initially identified. After removal of duplicates, reviews, and conference abstracts, 83 full-text original articles were screened. Seventeen studies (611,169 subjects) were included for final evaluation and fulfilled the inclusion criteria. In the case of moderate wine consumption: the pooled risk ratio (RR) for the risk of PCa was 0.98 (95% CI 0.92–1.05, p=0.57) in the multivariable analysis. Moderate white wine consumption increased the risk of PCa with a pooled RR of 1.26 (95% CI 1.10–1.43, p=0.001) in the multi-variable analysis. Meanwhile, moderate red wine consumption had a protective role reducing the risk by 12% (RR 0.88, 95% CI 0.78–0.999, p=0.047) in the multivariable analysis that comprised 222,447 subjects. Conclusions In this meta-analysis, moderate wine consumption did not impact the risk of PCa. Interestingly, regarding the type of wine, moderate consumption of white wine increased the risk of PCa, whereas moderate consumption of red wine had a protective effect. Further analyses are needed to assess the differential molecular effect of white and red wine conferring their impact on PCa risk. PMID:29713200
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
Lin, Dongxin; Ou, Qianting; Lin, Jialing; Peng, Yang; Yao, Zhenjiang
2017-04-01
Health care workers may potentially spread Staphylococcus aureus and methicillin-resistant S aureus (MRSA) to patients by contaminated high-touch items. We aimed to determine the pooled rates of S aureus and MRSA contamination and influencing factors. A literature search of the PubMed, ScienceDirect, Embase, Ovid, and Scopus databases was performed. Pooled contamination rates were determined using random effect models. Subgroup and meta-regression analyses were conducted to identify factors potentially influencing the rates of S aureus and MRSA contamination. Sensitivity and publication bias analyses were performed. Thirty-eight studies were included in the meta-analysis. The pooled contamination rates were 15.0% (95% confidence interval [CI], 9.8%-21.1%) for S aureus and 5.0% (95% CI, 2.7%-7.7%) for MRSA. The subgroup analyses indicated that the pooled rate of S aureus contamination was significantly higher for studies conducted in South America, in developing countries, and during 2010-2015. The pooled rate of MRSA contamination was significantly higher for studies conducted in Africa. The meta-regression analysis suggested that the pooled rate of S aureus contamination was lower for studies conducted in developed countries (odds ratio, 0.664; 95% CI, 0.509-0.867; P = .004). No bias was found in the publication of the rates of S aureus and MRSA contamination. S aureus and MRSA contamination statuses of high-touch items are worrisome and should be paid greater attention. Developing country status was a risk factor for S aureus contamination. Copyright © 2017 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.
Variability of collagen crosslinks: impact of sample collection period
NASA Technical Reports Server (NTRS)
Smith, S. M.; Dillon, E. L.; DeKerlegand, D. E.; Davis-Street, J. E.
2004-01-01
Because of the variability of collagen crosslinks, their use as markers for bone resorption is often criticized. We hypothesized that the variability could be reduced by collecting urine for 24 hours (or longer) instead of using single voids, and by not normalizing to creatinine. Urine samples were collected from 22 healthy subjects during two or more 24-hour periods. Each 24-hour pool and each 2nd void of the day were analyzed for N-telopeptide (NTX), pyridinium (PYD), and deoxypyridinoline (DPD) crosslinks. Data were analyzed by using linear regression. For NTX, R2 for the two, 2nd-void samples (n = 38) was 0.55, whereas R2 for the two 24-hour pools was 0.51 or 0.52, expressed per day or per creatinine. For PYD and DPD, R2 for the 2nd-void samples was 0.26 and 0.18, R2 for the 24-hour pools expressed per day was 0.58 and 0.74, and R2 for the 24-hour pools expressed per creatinine was 0.65 and 0.76, respectively. Regression of the 2nd void and the corresponding 24-hour pool, expressed per day, yielded R2 = 0.19, 0.19, and 0.08, for NTX, PYD, and DPD, respectively (n = 76 each). For the 2nd-void sample and its corresponding 24-hour pool, expressed per creatinine, R2 = 0.24, 0.33, and 0.08, respectively. In a separate study, the coefficient of variation for NTX was reduced (P < 0.05) when data from more than one 24-hour collection were combined. Thus, the variability inherent in crosslink determinations can be reduced by collecting urine for longer periods. In research studies, the high variability of single-void collections, compounded by creatinine normalization, may alter or obscure findings.
Revealing the association between cerebrovascular accidents and ambient temperature: a meta-analysis
NASA Astrophysics Data System (ADS)
Zorrilla-Vaca, Andrés; Healy, Ryan Jacob; Silva-Medina, Melissa M.
2017-05-01
The association between cerebrovascular accidents (CVA) and weather has been described across several studies showing multiple conflicting results. In this paper, we aim to conduct a meta-analysis to further clarify this association, as well as to find the potential sources of heterogeneity. PubMed, EMBASE, and Google Scholar were searched from inception through 2015, for articles analyzing the correlation between the incidence of CVA and temperature. A pooled effect size (ES) was estimated using random effects model and expressed as absolute values. Subgroup analyses by type of CVA were also performed. Heterogeneity and influence of covariates—including geographic latitude of the study site, male percentage, average temperature, and time interval—were assessed by meta-regression analysis. Twenty-six articles underwent full data extraction and scoring. A total of 19,736 subjects with CVA from 12 different countries were included and grouped as ischemic strokes (IS; n = 14,199), intracerebral hemorrhages (ICH; n = 3798), and subarachnoid hemorrhages (SAH; n = 1739). Lower ambient temperature was significantly associated with increase in incidence of overall CVA when using unadjusted (pooled ES = 0.23, P < 0.001) and adjusted data (pooled ES = 0.03, P = 0.003). Subgroup analyses showed that lower temperature has higher impact on the incidence of ICH (pooled ES = 0.34, P < 0.001), than that of IS (pooled ES = 0.22, P < 0.001) and SAH (pooled ES = 0.11, P = 0.012). In meta-regression analysis, the geographic latitude of the study site was the most influencing factor on this association ( Z-score = 8.68). Synthesis of the existing data provides evidence supporting that a lower ambient temperature increases the incidence of CVA. Further population-based studies conducted at negative latitudes are needed to clarify the influence of this factor.
Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.
Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A
2016-01-01
Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.
Effects of commercial harvest on shovelnose sturgeon populations in the Upper Mississippi River
Koch, Jeff D.; Quist, Michael C.; Pierce, Clay L.; Hansen, Kirk A.; Steuck, Michael J.
2009-01-01
Shovelnose sturgeon Scaphirhynchus platorynchus have become an increasingly important commercial species in the upper Mississippi River (UMR) because of the collapse of foreign sturgeon (family Acipenseridae) populations and bans on imported caviar. In response to concerns about the sustainability of the commercial shovelnose sturgeon fishery in the UMR, we undertook this study to describe the demographics of the shovelnose sturgeon population and evaluate the influence of commercial harvest on shovelnose sturgeon populations in the UMR. A total of 1,682 shovelnose sturgeon were collected from eight study pools in 2006 and 2007 (Pools 4, 7, 9, 11, 13, 14, 16, and 18). Shovelnose sturgeon from upstream pools generally had greater lengths, weights, and ages than those from downstream pools. Additionally, mortality estimates were lower in upstream pools (Pools 4, 7, 9, and 11) than in downstream pools (Pools 13, 14, 16, and 18). Linear regression suggested that the slower growth of shovelnose sturgeon is a consequence of commercial harvest in the UMR. Modeling of potential management scenarios suggested that a 685-mm minimum length limit is necessary to prevent growth and recruitment overfishing of shovelnose sturgeon in the UMR.
Multinomial logistic regression in workers' health
NASA Astrophysics Data System (ADS)
Grilo, Luís M.; Grilo, Helena L.; Gonçalves, Sónia P.; Junça, Ana
2017-11-01
In European countries, namely in Portugal, it is common to hear some people mentioning that they are exposed to excessive and continuous psychosocial stressors at work. This is increasing in diverse activity sectors, such as, the Services sector. A representative sample was collected from a Portuguese Services' organization, by applying a survey (internationally validated), which variables were measured in five ordered categories in Likert-type scale. A multinomial logistic regression model is used to estimate the probability of each category of the dependent variable general health perception where, among other independent variables, burnout appear as statistically significant.
Du, Qing-Yun; Wang, En-Yin; Huang, Yan; Guo, Xiao-Yi; Xiong, Yu-Jing; Yu, Yi-Ping; Yao, Gui-Dong; Shi, Sen-Lin; Sun, Ying-Pu
2016-04-01
To evaluate the independent effects of the degree of blastocoele expansion and re-expansion and the inner cell mass (ICM) and trophectoderm (TE) grades on predicting live birth after fresh and vitrified/warmed single blastocyst transfer. Retrospective study. Reproductive medical center. Women undergoing 844 fresh and 370 vitrified/warmed single blastocyst transfer cycles. None. Live-birth rate correlated with blastocyst morphology parameters by logistic regression analysis and Spearman correlations analysis. The degree of blastocoele expansion and re-expansion was the only blastocyst morphology parameter that exhibited a significant ability to predict live birth in both fresh and vitrified/warmed single blastocyst transfer cycles respectively by multivariate logistic regression and Spearman correlations analysis. Although the ICM grade was significantly related to live birth in fresh cycles according to the univariate model, its effect was not maintained in the multivariate logistic analysis. In vitrified/warmed cycles, neither ICM nor TE grade was correlated with live birth by logistic regression analysis. This study is the first to confirm that the degree of blastocoele expansion and re-expansion is a better predictor of live birth after both fresh and vitrified/warmed single blastocyst transfer cycles than ICM or TE grade. Copyright © 2016. Published by Elsevier Inc.
Kassa, Getachew Mullu; Muche, Achenef Asmamaw; Berhe, Abadi Kidanemariam; Fekadu, Gedefaw Abeje
2017-01-01
Anemia during pregnancy is one of the most common indirect obstetric cause of maternal mortality in developing countries. It is responsible for poor maternal and fetal outcomes. A limited number of studies were conducted on anemia during pregnancy in Ethiopia, and they present inconsistent findings. Therefore, this review was undertaken to summarize the findings conducted in several parts of the country and present the national level of anemia among pregnant women in Ethiopia. Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline was followed for this systematic review and meta-analysis. The databases used were; PUBMED, Cochrane Library, Google Scholar, CINAHL, and African Journals Online. Search terms used were; anemia, pregnancy related anemia and Ethiopia. Joanna Briggs Institute Meta-Analysis of Statistics Assessment and Review Instrument (JBI-MAStARI) was used for critical appraisal of studies. The meta-analysis was conducted using STATA 14 software. The pooled Meta logistic regression was computed to present the pooled prevalence and relative risks (RRs) of the determinate factors with 95% confidence interval (CI). Twenty studies were included in the meta-analysis with a total of 10, 281 pregnant women. The pooled prevalence of anemia among pregnant women in Ethiopia was 31.66% (95% CI (26.20, 37.11)). Based on the pooled prevalence of the subgroup analysis result, the lowest prevalence of anemia among pregnant women was observed in Amhara region, 15.89% (95% CI (8.82, 22.96)) and the highest prevalence was in Somali region, 56.80% (95% CI (52.76, 60.84)). Primigravid (RR: 0.61 (95% CI: 0.53, 0.71)) and urban women (RR: 0.73 (95% CI: 0.60, 0.88)) were less likely to develop anemia. On the other hand, mothers with short pregnancy interval (RR: 2.14 (95% CI: 1.67, 2.74)) and malaria infection during pregnancy (RR: 1.94 (95% CI: 1.33, 2.82)) had higher risk to develop anemia. Almost one-third of pregnant women in Ethiopia were anemic. Statistically significant association was observed between anemia during pregnancy and residence, gravidity, pregnancy interval, and malaria infection during pregnancy. Regions with higher anemia prevalence among pregnant women should be given due emphasis. The concerned body should intervene on the identified factors to reduce the high prevalence of anemia among pregnant women.
Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees.
Chung, Yi-Shih
2013-12-01
Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies. Copyright © 2012 Elsevier Ltd. All rights reserved.
Keogh, Ruth H; Mangtani, Punam; Rodrigues, Laura; Nguipdop Djomo, Patrick
2016-01-05
Traditional analyses of standard case-control studies using logistic regression do not allow estimation of time-varying associations between exposures and the outcome. We present two approaches which allow this. The motivation is a study of vaccine efficacy as a function of time since vaccination. Our first approach is to estimate time-varying exposure-outcome associations by fitting a series of logistic regressions within successive time periods, reusing controls across periods. Our second approach treats the case-control sample as a case-cohort study, with the controls forming the subcohort. In the case-cohort analysis, controls contribute information at all times they are at risk. Extensions allow left truncation, frequency matching and, using the case-cohort analysis, time-varying exposures. Simulations are used to investigate the methods. The simulation results show that both methods give correct estimates of time-varying effects of exposures using standard case-control data. Using the logistic approach there are efficiency gains by reusing controls over time and care should be taken over the definition of controls within time periods. However, using the case-cohort analysis there is no ambiguity over the definition of controls. The performance of the two analyses is very similar when controls are used most efficiently under the logistic approach. Using our methods, case-control studies can be used to estimate time-varying exposure-outcome associations where they may not previously have been considered. The case-cohort analysis has several advantages, including that it allows estimation of time-varying associations as a continuous function of time, while the logistic regression approach is restricted to assuming a step function form for the time-varying association.
Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula
2011-01-01
Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.
Regression analysis for solving diagnosis problem of children's health
NASA Astrophysics Data System (ADS)
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
[Liver transplantation - current aspects of allocation, indication and donor pool expansion].
Settmacher, U; Scheuerlein, H; Dittmar, Y; Rauchfuss, F
2013-12-01
Liver transplantation is nowadays an established treatment option for end-stage liver disease and the associated complications. In this article, we summarise the actual aspects of allocation, indication for transplantation as well as approaches for donor pool expansion in the field of liver transplantation in Germany. Beside the maintenance of long-term survival and quality of life, the actual donor organ shortage is the most important issue worldwide. While trying to control this shortage, there is a lot of discussion about the transplantation for malignant liver disease. In our opinion, the focus in this topic should be the utilisation and expansion of the donor pool. There are many logistic and medical aspects which could be optimised. Furthermore, there are open questions in public and political discussions (up to the revision of the transplantation law) which should be improved for the purpose of the waiting list patients. Georg Thieme Verlag KG Stuttgart · New York.
NASA Technical Reports Server (NTRS)
Chesters, Dennis; Keyser, Dennis A.; Larko, David E.; Uccellini, Louis W.
1987-01-01
An Atmospheric Variability Experiment (AVE) was conducted over the central U.S. in the spring of 1982, collecting radiosonde date to verify mesoscale soundings from the VISSR Atmospheric Sounder (VAS) on the GOES satellite. Previously published VAS/AVE comparisons for the 6 March 1982 case found that the satellite retrievals scarcely detected a low level temperature inversion or a mid-tropospheric cold pool over a special mesoscale radiosonde verification network in north central Texas. The previously published regression and physical retrieval algorithms did not fully utilize VAS' sensitivity to important subsynoptic thermal features. Therefore, the 6 March 1982 case was reprocessed adding two enhancements to the VAS regression retrieval algorithm: (1) the regression matrix was determined using AVE profile data obtained in the region at asynoptic times, and (2) more optimistic signal-to-noise statistical conditioning factors were applied to the VAS temperature sounding channels. The new VAS soundings resolve more of the low level temperature inversion and mid-level cold pool. Most of the improvements stems from the utilization of asynoptic radiosonde observations at NWS sites. This case suggests that VAS regression soundings may require a ground-based asynoptic profiler network to bridge the gap between the synoptic radiosonde network and the high resolution geosynchronous satellite observations during the day.
[Calculating Pearson residual in logistic regressions: a comparison between SPSS and SAS].
Xu, Hao; Zhang, Tao; Li, Xiao-song; Liu, Yuan-yuan
2015-01-01
To compare the results of Pearson residual calculations in logistic regression models using SPSS and SAS. We reviewed Pearson residual calculation methods, and used two sets of data to test logistic models constructed by SPSS and STATA. One model contained a small number of covariates compared to the number of observed. The other contained a similar number of covariates as the number of observed. The two software packages produced similar Pearson residual estimates when the models contained a similar number of covariates as the number of observed, but the results differed when the number of observed was much greater than the number of covariates. The two software packages produce different results of Pearson residuals, especially when the models contain a small number of covariates. Further studies are warranted.
Greeven, Anja; van Balkom, Anton J L M; Spinhoven, Philip
2014-05-01
We aimed to investigate whether personality characteristics predict time to remission and psychiatric status. The follow-up was at most 6 years and was performed within the scope of a randomized controlled trial that investigated the efficacy of cognitive behavioral therapy, paroxetine, and placebo in hypochondriasis. The Life Chart Interview was administered to investigate for each year if remission had occurred. Personality was assessed at pretest by the Abbreviated Dutch Temperament and Character Inventory. Cox's regression models for recurrent events were compared with logistic regression models. Sixteen (36.4%) of 44 patients achieved remission during the follow-up period. Cox's regression yielded approximately the same results as the logistic regression. Being less harm avoidant and more cooperative were associated with a shorter time to remission and a remitted state after the follow-up period. Personality variables seem to be relevant for describing patients with a more chronic course of hypochondriacal complaints.
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.
2006-01-01
As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dean, Jamie A., E-mail: jamie.dean@icr.ac.uk; Wong, Kee H.; Gay, Hiram
Purpose: Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue–sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. Methods and Materials: FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogrammore » data. The reduced dose data were input into functional logistic regression models (functional partial least squares–logistic regression [FPLS-LR] and functional principal component–logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate–response associations, assessed using bootstrapping. Results: The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/−0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/−0.96, 0.79/−0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. Conclusions: FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling.« less
Dean, Jamie A; Wong, Kee H; Gay, Hiram; Welsh, Liam C; Jones, Ann-Britt; Schick, Ulrike; Oh, Jung Hun; Apte, Aditya; Newbold, Kate L; Bhide, Shreerang A; Harrington, Kevin J; Deasy, Joseph O; Nutting, Christopher M; Gulliford, Sarah L
2016-11-15
Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue-sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogram data. The reduced dose data were input into functional logistic regression models (functional partial least squares-logistic regression [FPLS-LR] and functional principal component-logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate-response associations, assessed using bootstrapping. The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/-0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/-0.96, 0.79/-0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.
Ngo, Long H; Inouye, Sharon K; Jones, Richard N; Travison, Thomas G; Libermann, Towia A; Dillon, Simon T; Kuchel, George A; Vasunilashorn, Sarinnapha M; Alsop, David C; Marcantonio, Edward R
2017-06-06
The nested case-control study (NCC) design within a prospective cohort study is used when outcome data are available for all subjects, but the exposure of interest has not been collected, and is difficult or prohibitively expensive to obtain for all subjects. A NCC analysis with good matching procedures yields estimates that are as efficient and unbiased as estimates from the full cohort study. We present methodological considerations in a matched NCC design and analysis, which include the choice of match algorithms, analysis methods to evaluate the association of exposures of interest with outcomes, and consideration of overmatching. Matched, NCC design within a longitudinal observational prospective cohort study in the setting of two academic hospitals. Study participants are patients aged over 70 years who underwent scheduled major non-cardiac surgery. The primary outcome was postoperative delirium from in-hospital interviews and medical record review. The main exposure was IL-6 concentration (pg/ml) from blood sampled at three time points before delirium occurred. We used nonparametric signed ranked test to test for the median of the paired differences. We used conditional logistic regression to model the risk of IL-6 on delirium incidence. Simulation was used to generate a sample of cohort data on which unconditional multivariable logistic regression was used, and the results were compared to those of the conditional logistic regression. Partial R-square was used to assess the level of overmatching. We found that the optimal match algorithm yielded more matched pairs than the greedy algorithm. The choice of analytic strategy-whether to consider measured cytokine levels as the predictor or outcome-- yielded inferences that have different clinical interpretations but similar levels of statistical significance. Estimation results from NCC design using conditional logistic regression, and from simulated cohort design using unconditional logistic regression, were similar. We found minimal evidence for overmatching. Using a matched NCC approach introduces methodological challenges into the study design and data analysis. Nonetheless, with careful selection of the match algorithm, match factors, and analysis methods, this design is cost effective and, for our study, yields estimates that are similar to those from a prospective cohort study design.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Shujie; Kawamoto, Taisuke; Morita, Osamu
Chemical exposure often results in liver hypertrophy in animal tests, characterized by increased liver weight, hepatocellular hypertrophy, and/or cell proliferation. While most of these changes are considered adaptive responses, there is concern that they may be associated with carcinogenesis. In this study, we have employed a toxicogenomic approach using a logistic ridge regression model to identify genes responsible for liver hypertrophy and hypertrophic hepatocarcinogenesis and to develop a predictive model for assessing hypertrophy-inducing compounds. Logistic regression models have previously been used in the quantification of epidemiological risk factors. DNA microarray data from the Toxicogenomics Project-Genomics Assisted Toxicity Evaluation System weremore » used to identify hypertrophy-related genes that are expressed differently in hypertrophy induced by carcinogens and non-carcinogens. Data were collected for 134 chemicals (72 non-hypertrophy-inducing chemicals, 27 hypertrophy-inducing non-carcinogenic chemicals, and 15 hypertrophy-inducing carcinogenic compounds). After applying logistic ridge regression analysis, 35 genes for liver hypertrophy (e.g., Acot1 and Abcc3) and 13 genes for hypertrophic hepatocarcinogenesis (e.g., Asns and Gpx2) were selected. The predictive models built using these genes were 94.8% and 82.7% accurate, respectively. Pathway analysis of the genes indicates that, aside from a xenobiotic metabolism-related pathway as an adaptive response for liver hypertrophy, amino acid biosynthesis and oxidative responses appear to be involved in hypertrophic hepatocarcinogenesis. Early detection and toxicogenomic characterization of liver hypertrophy using our models may be useful for predicting carcinogenesis. In addition, the identified genes provide novel insight into discrimination between adverse hypertrophy associated with carcinogenesis and adaptive hypertrophy in risk assessment. - Highlights: • Hypertrophy (H) and hypertrophic carcinogenesis (C) were studied by toxicogenomics. • Important genes for H and C were selected by logistic ridge regression analysis. • Amino acid biosynthesis and oxidative responses may be involved in C. • Predictive models for H and C provided 94.8% and 82.7% accuracy, respectively. • The identified genes could be useful for assessment of liver hypertrophy.« less
Zhang, Xingyu; Kim, Joyce; Patzer, Rachel E; Pitts, Stephen R; Patzer, Aaron; Schrager, Justin D
2017-10-26
To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements. Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient's reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model. Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.731- 0.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN. The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient's reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.
Cevenini, Gabriele; Barbini, Emanuela; Scolletta, Sabino; Biagioli, Bonizella; Giomarelli, Pierpaolo; Barbini, Paolo
2007-11-22
Popular predictive models for estimating morbidity probability after heart surgery are compared critically in a unitary framework. The study is divided into two parts. In the first part modelling techniques and intrinsic strengths and weaknesses of different approaches were discussed from a theoretical point of view. In this second part the performances of the same models are evaluated in an illustrative example. Eight models were developed: Bayes linear and quadratic models, k-nearest neighbour model, logistic regression model, Higgins and direct scoring systems and two feed-forward artificial neural networks with one and two layers. Cardiovascular, respiratory, neurological, renal, infectious and hemorrhagic complications were defined as morbidity. Training and testing sets each of 545 cases were used. The optimal set of predictors was chosen among a collection of 78 preoperative, intraoperative and postoperative variables by a stepwise procedure. Discrimination and calibration were evaluated by the area under the receiver operating characteristic curve and Hosmer-Lemeshow goodness-of-fit test, respectively. Scoring systems and the logistic regression model required the largest set of predictors, while Bayesian and k-nearest neighbour models were much more parsimonious. In testing data, all models showed acceptable discrimination capacities, however the Bayes quadratic model, using only three predictors, provided the best performance. All models showed satisfactory generalization ability: again the Bayes quadratic model exhibited the best generalization, while artificial neural networks and scoring systems gave the worst results. Finally, poor calibration was obtained when using scoring systems, k-nearest neighbour model and artificial neural networks, while Bayes (after recalibration) and logistic regression models gave adequate results. Although all the predictive models showed acceptable discrimination performance in the example considered, the Bayes and logistic regression models seemed better than the others, because they also had good generalization and calibration. The Bayes quadratic model seemed to be a convincing alternative to the much more usual Bayes linear and logistic regression models. It showed its capacity to identify a minimum core of predictors generally recognized as essential to pragmatically evaluate the risk of developing morbidity after heart surgery.
NASA Astrophysics Data System (ADS)
Ozdemir, Adnan
2011-07-01
SummaryThe purpose of this study is to produce a groundwater spring potential map of the Sultan Mountains in central Turkey, based on a logistic regression method within a Geographic Information System (GIS) environment. Using field surveys, the locations of the springs (440 springs) were determined in the study area. In this study, 17 spring-related factors were used in the analysis: geology, relative permeability, land use/land cover, precipitation, elevation, slope, aspect, total curvature, plan curvature, profile curvature, wetness index, stream power index, sediment transport capacity index, distance to drainage, distance to fault, drainage density, and fault density map. The coefficients of the predictor variables were estimated using binary logistic regression analysis and were used to calculate the groundwater spring potential for the entire study area. The accuracy of the final spring potential map was evaluated based on the observed springs. The accuracy of the model was evaluated by calculating the relative operating characteristics. The area value of the relative operating characteristic curve model was found to be 0.82. These results indicate that the model is a good estimator of the spring potential in the study area. The spring potential map shows that the areas of very low, low, moderate and high groundwater spring potential classes are 105.586 km 2 (28.99%), 74.271 km 2 (19.906%), 101.203 km 2 (27.14%), and 90.05 km 2 (24.671%), respectively. The interpretations of the potential map showed that stream power index, relative permeability of lithologies, geology, elevation, aspect, wetness index, plan curvature, and drainage density play major roles in spring occurrence and distribution in the Sultan Mountains. The logistic regression approach has not yet been used to delineate groundwater potential zones. In this study, the logistic regression method was used to locate potential zones for groundwater springs in the Sultan Mountains. The evolved model was found to be in strong agreement with the available groundwater spring test data. Hence, this method can be used routinely in groundwater exploration under favourable conditions.
Independent Prognostic Factors for Acute Organophosphorus Pesticide Poisoning.
Tang, Weidong; Ruan, Feng; Chen, Qi; Chen, Suping; Shao, Xuebo; Gao, Jianbo; Zhang, Mao
2016-07-01
Acute organophosphorus pesticide poisoning (AOPP) is becoming a significant problem and a potential cause of human mortality because of the abuse of organophosphate compounds. This study aims to determine the independent prognostic factors of AOPP by using multivariate logistic regression analysis. The clinical data for 71 subjects with AOPP admitted to our hospital were retrospectively analyzed. This information included the Acute Physiology and Chronic Health Evaluation II (APACHE II) scores, 6-h post-admission blood lactate levels, post-admission 6-h lactate clearance rates, admission blood cholinesterase levels, 6-h post-admission blood cholinesterase levels, cholinesterase activity, blood pH, and other factors. Univariate analysis and multivariate logistic regression analyses were conducted to identify all prognostic factors and independent prognostic factors, respectively. A receiver operating characteristic curve was plotted to analyze the testing power of independent prognostic factors. Twelve of 71 subjects died. Admission blood lactate levels, 6-h post-admission blood lactate levels, post-admission 6-h lactate clearance rates, blood pH, and APACHE II scores were identified as prognostic factors for AOPP according to the univariate analysis, whereas only 6-h post-admission blood lactate levels, post-admission 6-h lactate clearance rates, and blood pH were independent prognostic factors identified by multivariate logistic regression analysis. The receiver operating characteristic analysis suggested that post-admission 6-h lactate clearance rates were of moderate diagnostic value. High 6-h post-admission blood lactate levels, low blood pH, and low post-admission 6-h lactate clearance rates were independent prognostic factors identified by multivariate logistic regression analysis. Copyright © 2016 by Daedalus Enterprises.
Sargolzaie, Narjes; Miri-Moghaddam, Ebrahim
2014-01-01
The most common differential diagnosis of β-thalassemia (β-thal) trait is iron deficiency anemia. Several red blood cell equations were introduced during different studies for differential diagnosis between β-thal trait and iron deficiency anemia. Due to genetic variations in different regions, these equations cannot be useful in all population. The aim of this study was to determine a native equation with high accuracy for differential diagnosis of β-thal trait and iron deficiency anemia for the Sistan and Baluchestan population by logistic regression analysis. We selected 77 iron deficiency anemia and 100 β-thal trait cases. We used binary logistic regression analysis and determined best equations for probability prediction of β-thal trait against iron deficiency anemia in our population. We compared diagnostic values and receiver operative characteristic (ROC) curve related to this equation and another 10 published equations in discriminating β-thal trait and iron deficiency anemia. The binary logistic regression analysis determined the best equation for best probability prediction of β-thal trait against iron deficiency anemia with area under curve (AUC) 0.998. Based on ROC curves and AUC, Green & King, England & Frazer, and then Sirdah indices, respectively, had the most accuracy after our equation. We suggest that to get the best equation and cut-off in each region, one needs to evaluate specific information of each region, specifically in areas where populations are homogeneous, to provide a specific formula for differentiating between β-thal trait and iron deficiency anemia.
Selenium in irrigated agricultural areas of the western United States
Nolan, B.T.; Clark, M.L.
1997-01-01
A logistic regression model was developed to predict the likelihood that Se exceeds the USEPA chronic criterion for aquatic life (5 ??g/L) in irrigated agricultural areas of the western USA. Preliminary analysis of explanatory variables used in the model indicated that surface-water Se concentration increased with increasing dissolved solids (DS) concentration and with the presence of Upper Cretaceous, mainly marine sediment. The presence or absence of Cretaceous sediment was the major variable affecting Se concentration in surface-water samples from the National Irrigation Water Quality Program. Median Se concentration was 14 ??g/L in samples from areas underlain by Cretaceous sediments and < 1 ??g/L in samples from areas underlain by non-Cretaceous sediments. Wilcoxon rank sum tests indicated that elevated Se concentrations in samples from areas with Cretaceous sediments, irrigated areas, and from closed lakes and ponds were statistically significant. Spearman correlations indicated that Se was positively correlated with a binary geology variable (0.64) and DS (0.45). Logistic regression models indicated that the concentration of Se in surface water was almost certain to exceed the Environmental Protection Agency aquatic-life chronic criterion of 5 ??g/L when DS was greater than 3000 mg/L in areas with Cretaceous sediments. The 'best' logistic regression model correctly predicted Se exceedances and nonexceedances 84.4% of the time, and model sensitivity was 80.7%. A regional map of Cretaceous sediment showed the location of potential problem areas. The map and logistic regression model are tools that can be used to determine the potential for Se contamination of irrigated agricultural areas in the western USA.
Fang, Xingang; Bagui, Sikha; Bagui, Subhash
2017-08-01
The readily available high throughput screening (HTS) data from the PubChem database provides an opportunity for mining of small molecules in a variety of biological systems using machine learning techniques. From the thousands of available molecular descriptors developed to encode useful chemical information representing the characteristics of molecules, descriptor selection is an essential step in building an optimal quantitative structural-activity relationship (QSAR) model. For the development of a systematic descriptor selection strategy, we need the understanding of the relationship between: (i) the descriptor selection; (ii) the choice of the machine learning model; and (iii) the characteristics of the target bio-molecule. In this work, we employed the Signature descriptor to generate a dataset on the Human kallikrein 5 (hK 5) inhibition confirmatory assay data and compared multiple classification models including logistic regression, support vector machine, random forest and k-nearest neighbor. Under optimal conditions, the logistic regression model provided extremely high overall accuracy (98%) and precision (90%), with good sensitivity (65%) in the cross validation test. In testing the primary HTS screening data with more than 200K molecular structures, the logistic regression model exhibited the capability of eliminating more than 99.9% of the inactive structures. As part of our exploration of the descriptor-model-target relationship, the excellent predictive performance of the combination of the Signature descriptor and the logistic regression model on the assay data of the Human kallikrein 5 (hK 5) target suggested a feasible descriptor/model selection strategy on similar targets. Copyright © 2017 Elsevier Ltd. All rights reserved.
Non-ignorable missingness in logistic regression.
Wang, Joanna J J; Bartlett, Mark; Ryan, Louise
2017-08-30
Nonresponses and missing data are common in observational studies. Ignoring or inadequately handling missing data may lead to biased parameter estimation, incorrect standard errors and, as a consequence, incorrect statistical inference and conclusions. We present a strategy for modelling non-ignorable missingness where the probability of nonresponse depends on the outcome. Using a simple case of logistic regression, we quantify the bias in regression estimates and show the observed likelihood is non-identifiable under non-ignorable missing data mechanism. We then adopt a selection model factorisation of the joint distribution as the basis for a sensitivity analysis to study changes in estimated parameters and the robustness of study conclusions against different assumptions. A Bayesian framework for model estimation is used as it provides a flexible approach for incorporating different missing data assumptions and conducting sensitivity analysis. Using simulated data, we explore the performance of the Bayesian selection model in correcting for bias in a logistic regression. We then implement our strategy using survey data from the 45 and Up Study to investigate factors associated with worsening health from the baseline to follow-up survey. Our findings have practical implications for the use of the 45 and Up Study data to answer important research questions relating to health and quality-of-life. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Prediction model for the return to work of workers with injuries in Hong Kong.
Xu, Yanwen; Chan, Chetwyn C H; Lo, Karen Hui Yu-Ling; Tang, Dan
2008-01-01
This study attempts to formulate a prediction model of return to work for a group of workers who have been suffering from chronic pain and physical injury while also being out of work in Hong Kong. The study used Case-based Reasoning (CBR) method, and compared the result with the statistical method of logistic regression model. The database of the algorithm of CBR was composed of 67 cases who were also used in the logistic regression model. The testing cases were 32 participants who had a similar background and characteristics to those in the database. The methods of setting constraints and Euclidean distance metric were used in CBR to search the closest cases to the trial case based on the matrix. The usefulness of the algorithm was tested on 32 new participants, and the accuracy of predicting return to work outcomes was 62.5%, which was no better than the 71.2% accuracy derived from the logistic regression model. The results of the study would enable us to have a better understanding of the CBR applied in the field of occupational rehabilitation by comparing with the conventional regression analysis. The findings would also shed light on the development of relevant interventions for the return-to-work process of these workers.
Ensemble of trees approaches to risk adjustment for evaluating a hospital's performance.
Liu, Yang; Traskin, Mikhail; Lorch, Scott A; George, Edward I; Small, Dylan
2015-03-01
A commonly used method for evaluating a hospital's performance on an outcome is to compare the hospital's observed outcome rate to the hospital's expected outcome rate given its patient (case) mix and service. The process of calculating the hospital's expected outcome rate given its patient mix and service is called risk adjustment (Iezzoni 1997). Risk adjustment is critical for accurately evaluating and comparing hospitals' performances since we would not want to unfairly penalize a hospital just because it treats sicker patients. The key to risk adjustment is accurately estimating the probability of an Outcome given patient characteristics. For cases with binary outcomes, the method that is commonly used in risk adjustment is logistic regression. In this paper, we consider ensemble of trees methods as alternatives for risk adjustment, including random forests and Bayesian additive regression trees (BART). Both random forests and BART are modern machine learning methods that have been shown recently to have excellent performance for prediction of outcomes in many settings. We apply these methods to carry out risk adjustment for the performance of neonatal intensive care units (NICU). We show that these ensemble of trees methods outperform logistic regression in predicting mortality among babies treated in NICU, and provide a superior method of risk adjustment compared to logistic regression.
ERIC Educational Resources Information Center
Boyd, Aimee M.; Dodd, Barbara; Fitzpatrick, Steven
2013-01-01
This study compared several exposure control procedures for CAT systems based on the three-parameter logistic testlet response theory model (Wang, Bradlow, & Wainer, 2002) and Masters' (1982) partial credit model when applied to a pool consisting entirely of testlets. The exposure control procedures studied were the modified within 0.10 logits…
NASA Astrophysics Data System (ADS)
Jokar Arsanjani, Jamal; Helbich, Marco; Kainz, Wolfgang; Darvishi Boloorani, Ali
2013-04-01
This research analyses the suburban expansion in the metropolitan area of Tehran, Iran. A hybrid model consisting of logistic regression model, Markov chain (MC), and cellular automata (CA) was designed to improve the performance of the standard logistic regression model. Environmental and socio-economic variables dealing with urban sprawl were operationalised to create a probability surface of spatiotemporal states of built-up land use for the years 2006, 2016, and 2026. For validation, the model was evaluated by means of relative operating characteristic values for different sets of variables. The approach was calibrated for 2006 by cross comparing of actual and simulated land use maps. The achieved outcomes represent a match of 89% between simulated and actual maps of 2006, which was satisfactory to approve the calibration process. Thereafter, the calibrated hybrid approach was implemented for forthcoming years. Finally, future land use maps for 2016 and 2026 were predicted by means of this hybrid approach. The simulated maps illustrate a new wave of suburban development in the vicinity of Tehran at the western border of the metropolis during the next decades.
A statistical method for predicting seizure onset zones from human single-neuron recordings
NASA Astrophysics Data System (ADS)
Valdez, André B.; Hickman, Erin N.; Treiman, David M.; Smith, Kris A.; Steinmetz, Peter N.
2013-02-01
Objective. Clinicians often use depth-electrode recordings to localize human epileptogenic foci. To advance the diagnostic value of these recordings, we applied logistic regression models to single-neuron recordings from depth-electrode microwires to predict seizure onset zones (SOZs). Approach. We collected data from 17 epilepsy patients at the Barrow Neurological Institute and developed logistic regression models to calculate the odds of observing SOZs in the hippocampus, amygdala and ventromedial prefrontal cortex, based on statistics such as the burst interspike interval (ISI). Main results. Analysis of these models showed that, for a single-unit increase in burst ISI ratio, the left hippocampus was approximately 12 times more likely to contain a SOZ; and the right amygdala, 14.5 times more likely. Our models were most accurate for the hippocampus bilaterally (at 85% average sensitivity), and performance was comparable with current diagnostics such as electroencephalography. Significance. Logistic regression models can be combined with single-neuron recording to predict likely SOZs in epilepsy patients being evaluated for resective surgery, providing an automated source of clinically useful information.
Gazolla, Fernanda Mussi; Neves Bordallo, Maria Alice; Madeira, Isabel Rey; de Miranda Carvalho, Cecilia Noronha; Vieira Monteiro, Alexandra Maria; Pinheiro Rodrigues, Nádia Cristina; Borges, Marcos Antonio; Collett-Solberg, Paulo Ferrez; Muniz, Bruna Moreira; de Oliveira, Cecilia Lacroix; Pinheiro, Suellen Martins; de Queiroz Ribeiro, Rebeca Mathias
2015-05-01
Early exposure to cardiovascular risk factors creates a chronic inflammatory state that could damage the endothelium followed by thickening of the carotid intima-media. To investigate the association of cardiovascular risk factors and thickening of the carotid intima. Media in prepubertal children. In this cross-sectional study, carotid intima-media thickness (cIMT) and cardiovascular risk factors were assessed in 129 prepubertal children aged from 5 to 10 year. Association was assessed by simple and multivariate logistic regression analyses. In simple logistic regression analyses, body mass index (BMI) z-score, waist circumference, and systolic blood pressure (SBP) were positively associated with increased left, right, and average cIMT, whereas diastolic blood pressure was positively associated only with increased left and average cIMT (p<0.05). In multivariate logistic regression analyses increased left cIMT was positively associated to BMI z-score and SBP, and increased average cIMT was only positively associated to SBP (p<0.05). BMI z-score and SBP were the strongest risk factors for increased cIMT.
New machine-learning algorithms for prediction of Parkinson's disease
NASA Astrophysics Data System (ADS)
Mandal, Indrajit; Sairam, N.
2014-03-01
This article presents an enhanced prediction accuracy of diagnosis of Parkinson's disease (PD) to prevent the delay and misdiagnosis of patients using the proposed robust inference system. New machine-learning methods are proposed and performance comparisons are based on specificity, sensitivity, accuracy and other measurable parameters. The robust methods of treating Parkinson's disease (PD) includes sparse multinomial logistic regression, rotation forest ensemble with support vector machines and principal components analysis, artificial neural networks, boosting methods. A new ensemble method comprising of the Bayesian network optimised by Tabu search algorithm as classifier and Haar wavelets as projection filter is used for relevant feature selection and ranking. The highest accuracy obtained by linear logistic regression and sparse multinomial logistic regression is 100% and sensitivity, specificity of 0.983 and 0.996, respectively. All the experiments are conducted over 95% and 99% confidence levels and establish the results with corrected t-tests. This work shows a high degree of advancement in software reliability and quality of the computer-aided diagnosis system and experimentally shows best results with supportive statistical inference.
Landslide Hazard Mapping in Rwanda Using Logistic Regression
NASA Astrophysics Data System (ADS)
Piller, A.; Anderson, E.; Ballard, H.
2015-12-01
Landslides in the United States cause more than $1 billion in damages and 50 deaths per year (USGS 2014). Globally, figures are much more grave, yet monitoring, mapping and forecasting of these hazards are less than adequate. Seventy-five percent of the population of Rwanda earns a living from farming, mostly subsistence. Loss of farmland, housing, or life, to landslides is a very real hazard. Landslides in Rwanda have an impact at the economic, social, and environmental level. In a developing nation that faces challenges in tracking, cataloging, and predicting the numerous landslides that occur each year, satellite imagery and spatial analysis allow for remote study. We have focused on the development of a landslide inventory and a statistical methodology for assessing landslide hazards. Using logistic regression on approximately 30 test variables (i.e. slope, soil type, land cover, etc.) and a sample of over 200 landslides, we determine which variables are statistically most relevant to landslide occurrence in Rwanda. A preliminary predictive hazard map for Rwanda has been produced, using the variables selected from the logistic regression analysis.
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bramer, L. M.; Rounds, J.; Burleyson, C. D.
Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions is examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and datasets were examined. A penalized logistic regression model fit at the operation-zone levelmore » was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at different time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. The methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.« less
Evaluating penalized logistic regression models to predict Heat-Related Electric grid stress days
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bramer, Lisa M.; Rounds, J.; Burleyson, C. D.
Understanding the conditions associated with stress on the electricity grid is important in the development of contingency plans for maintaining reliability during periods when the grid is stressed. In this paper, heat-related grid stress and the relationship with weather conditions were examined using data from the eastern United States. Penalized logistic regression models were developed and applied to predict stress on the electric grid using weather data. The inclusion of other weather variables, such as precipitation, in addition to temperature improved model performance. Several candidate models and combinations of predictive variables were examined. A penalized logistic regression model which wasmore » fit at the operation-zone level was found to provide predictive value and interpretability. Additionally, the importance of different weather variables observed at various time scales were examined. Maximum temperature and precipitation were identified as important across all zones while the importance of other weather variables was zone specific. In conclusion, the methods presented in this work are extensible to other regions and can be used to aid in planning and development of the electrical grid.« less
GIS-based rare events logistic regression for mineral prospectivity mapping
NASA Astrophysics Data System (ADS)
Xiong, Yihui; Zuo, Renguang
2018-02-01
Mineralization is a special type of singularity event, and can be considered as a rare event, because within a specific study area the number of prospective locations (1s) are considerably fewer than the number of non-prospective locations (0s). In this study, GIS-based rare events logistic regression (RELR) was used to map the mineral prospectivity in the southwestern Fujian Province, China. An odds ratio was used to measure the relative importance of the evidence variables with respect to mineralization. The results suggest that formations, granites, and skarn alterations, followed by faults and aeromagnetic anomaly are the most important indicators for the formation of Fe-related mineralization in the study area. The prediction rate and the area under the curve (AUC) values show that areas with higher probability have a strong spatial relationship with the known mineral deposits. Comparing the results with original logistic regression (OLR) demonstrates that the GIS-based RELR performs better than OLR. The prospectivity map obtained in this study benefits the search for skarn Fe-related mineralization in the study area.
Sun, Shi-Guang; Li, Zi-Feng; Xie, Yan-Ming; Liu, Jian; Lu, Yan; Song, Yi-Fei; Han, Ying-Hua; Liu, Li-Da; Peng, Ting-Ting
2013-09-01
To rationalize the clinical use and safety are some of the key issues in the surveillance of traditional Chinese medicine injections (TCMIs). In this 2011 study, 240 medical records of patients who had been discharged following treatment with TCMIs between 1 and 12 month previously were randomly selected from hospital records. Consistency between clinical use and the description of TCMIs was evaluated. Research on drug use and adverse drug reactions/events using logistic regression analysis was carried out. There was poor consistency between clinical use and best practice advised in manuals on TCMIs. Over-dosage and overly concentrated administration of TCMIs occurred, with the outcome of modifying properties of the blood. Logistic regression analysis showed that, drug concentration was a valid predictor for both adverse drug reactions/events and benefits associated with TCMIs. Surveillance of rational clinical use and safety of TCMIs finds that clinical use should be consistent with technical drug manual specifications, and drug use should draw on multi-layered logistic regression analysis research to help avoid adverse drug reactions/events.
NASA Astrophysics Data System (ADS)
Oh, Hyun-Joo; Lee, Saro; Chotikasathien, Wisut; Kim, Chang Hwan; Kwon, Ju Hyoung
2009-04-01
For predictive landslide susceptibility mapping, this study applied and verified probability model, the frequency ratio and statistical model, logistic regression at Pechabun, Thailand, using a geographic information system (GIS) and remote sensing. Landslide locations were identified in the study area from interpretation of aerial photographs and field surveys, and maps of the topography, geology and land cover were constructed to spatial database. The factors that influence landslide occurrence, such as slope gradient, slope aspect and curvature of topography and distance from drainage were calculated from the topographic database. Lithology and distance from fault were extracted and calculated from the geology database. Land cover was classified from Landsat TM satellite image. The frequency ratio and logistic regression coefficient were overlaid for landslide susceptibility mapping as each factor’s ratings. Then the landslide susceptibility map was verified and compared using the existing landslide location. As the verification results, the frequency ratio model showed 76.39% and logistic regression model showed 70.42% in prediction accuracy. The method can be used to reduce hazards associated with landslides and to plan land cover.
Wang, Shuang; Zhang, Yuchen; Dai, Wenrui; Lauter, Kristin; Kim, Miran; Tang, Yuzhe; Xiong, Hongkai; Jiang, Xiaoqian
2016-01-01
Motivation: Genome-wide association studies (GWAS) have been widely used in discovering the association between genotypes and phenotypes. Human genome data contain valuable but highly sensitive information. Unprotected disclosure of such information might put individual’s privacy at risk. It is important to protect human genome data. Exact logistic regression is a bias-reduction method based on a penalized likelihood to discover rare variants that are associated with disease susceptibility. We propose the HEALER framework to facilitate secure rare variants analysis with a small sample size. Results: We target at the algorithm design aiming at reducing the computational and storage costs to learn a homomorphic exact logistic regression model (i.e. evaluate P-values of coefficients), where the circuit depth is proportional to the logarithmic scale of data size. We evaluate the algorithm performance using rare Kawasaki Disease datasets. Availability and implementation: Download HEALER at http://research.ucsd-dbmi.org/HEALER/ Contact: shw070@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26446135
Testing Gene-Gene Interactions in the Case-Parents Design
Yu, Zhaoxia
2011-01-01
The case-parents design has been widely used to detect genetic associations as it can prevent spurious association that could occur in population-based designs. When examining the effect of an individual genetic locus on a disease, logistic regressions developed by conditioning on parental genotypes provide complete protection from spurious association caused by population stratification. However, when testing gene-gene interactions, it is unknown whether conditional logistic regressions are still robust. Here we evaluate the robustness and efficiency of several gene-gene interaction tests that are derived from conditional logistic regressions. We found that in the presence of SNP genotype correlation due to population stratification or linkage disequilibrium, tests with incorrectly specified main-genetic-effect models can lead to inflated type I error rates. We also found that a test with fully flexible main genetic effects always maintains correct test size and its robustness can be achieved with negligible sacrifice of its power. When testing gene-gene interactions is the focus, the test allowing fully flexible main effects is recommended to be used. PMID:21778736
Li, Saijiao; He, Aiyan; Yang, Jing; Yin, TaiLang; Xu, Wangming
2011-01-01
To investigate factors that can affect compliance with treatment of polycystic ovary syndrome (PCOS) in infertile patients and to provide a basis for clinical treatment, specialist consultation and health education. Patient compliance was assessed via a questionnaire based on the Morisky-Green test and the treatment principles of PCOS. Then interviews were conducted with 99 infertile patients diagnosed with PCOS at Renmin Hospital of Wuhan University in China, from March to September 2009. Finally, these data were analyzed using logistic regression analysis. Logistic regression analysis revealed that a total of 23 (25.6%) of the participants showed good compliance. Factors that significantly (p < 0.05) affected compliance with treatment were the patient's body mass index, convenience of medical treatment and concerns about adverse drug reactions. Patients who are obese, experience inconvenient medical treatment or are concerned about adverse drug reactions are more likely to exhibit noncompliance. Treatment education and intervention aimed at these patients should be strengthened in the clinic to improve treatment compliance. Further research is needed to better elucidate the compliance behavior of patients with PCOS.