Sample records for regression tree classification

  1. The process and utility of classification and regression tree methodology in nursing research

    PubMed Central

    Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda

    2014-01-01

    Aim This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Background Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Design Discussion paper. Data sources English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984–2013. Discussion Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Implications for Nursing Research Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Conclusion Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. PMID:24237048

  2. The process and utility of classification and regression tree methodology in nursing research.

    PubMed

    Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda

    2014-06-01

    This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.

  3. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  4. Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types

    PubMed Central

    Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.

    2014-01-01

    Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592

  5. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis.

    PubMed

    Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H

    2016-01-01

    Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.

  6. DIF Trees: Using Classification Trees to Detect Differential Item Functioning

    ERIC Educational Resources Information Center

    Vaughn, Brandon K.; Wang, Qiu

    2010-01-01

    A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

  7. Newer classification and regression tree techniques: Bagging and Random Forests for ecological prediction

    Treesearch

    Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw

    2006-01-01

    We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.

  8. Identification of Sexually Abused Female Adolescents at Risk for Suicidal Ideations: A Classification and Regression Tree Analysis

    ERIC Educational Resources Information Center

    Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois

    2013-01-01

    This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…

  9. Using the PDD Behavior Inventory as a Level 2 Screener: A Classification and Regression Trees Analysis

    ERIC Educational Resources Information Center

    Cohen, Ira L.; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N. S.; Romanczyk, Raymond G.; Karmel, Bernard Z.; Gardner, Judith M.

    2016-01-01

    In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80%,…

  10. Indicators of Terrorism Vulnerability in Africa

    DTIC Science & Technology

    2015-03-26

    the terror threat and vulnerabilities across Africa. Key words: Terrorism, Africa, Negative Binomial Regression, Classification Tree iv I would like...31 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Log -likelihood...70 viii Page 5.3 Classification Tree Description

  11. Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.

    PubMed

    Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi

    2017-06-01

    Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.

  12. Modeling time-to-event (survival) data using classification tree analysis.

    PubMed

    Linden, Ariel; Yarnold, Paul R

    2017-12-01

    Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.

  13. Assessing College Student Interest in Math and/or Computer Science in a Cross-National Sample Using Classification and Regression Trees

    ERIC Educational Resources Information Center

    Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas

    2012-01-01

    The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The results…

  14. Differences in Risk Factors for Rotator Cuff Tears between Elderly Patients and Young Patients.

    PubMed

    Watanabe, Akihisa; Ono, Qana; Nishigami, Tomohiko; Hirooka, Takahiko; Machida, Hirohisa

    2018-02-01

    It has been unclear whether the risk factors for rotator cuff tears are the same at all ages or differ between young and older populations. In this study, we examined the risk factors for rotator cuff tears using classification and regression tree analysis as methods of nonlinear regression analysis. There were 65 patients in the rotator cuff tears group and 45 patients in the intact rotator cuff group. Classification and regression tree analysis was performed to predict rotator cuff tears. The target factor was rotator cuff tears; explanatory variables were age, sex, trauma, and critical shoulder angle≥35°. In the results of classification and regression tree analysis, the tree was divided at age 64. For patients aged≥64, the tree was divided at trauma. For patients aged<64, the tree was divided at critical shoulder angle≥35°. The odds ratio for critical shoulder angle≥35° was significant for all ages (5.89), and for patients aged<64 (10.3) while trauma was only a significant factor for patients aged≥64 (5.13). Age, trauma, and critical shoulder angle≥35° were related to rotator cuff tears in this study. However, these risk factors showed different trends according to age group, not a linear relationship.

  15. Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees.

    PubMed

    Chung, Yi-Shih

    2013-12-01

    Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies. Copyright © 2012 Elsevier Ltd. All rights reserved.

  16. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression.

    PubMed

    Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson

    2010-08-01

    Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  17. Identification of extremely premature infants at high risk of rehospitalization.

    PubMed

    Ambalavanan, Namasivayam; Carlo, Waldemar A; McDonald, Scott A; Yao, Qing; Das, Abhik; Higgins, Rosemary D

    2011-11-01

    Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002-2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%-42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge.

  18. Identification of Extremely Premature Infants at High Risk of Rehospitalization

    PubMed Central

    Carlo, Waldemar A.; McDonald, Scott A.; Yao, Qing; Das, Abhik; Higgins, Rosemary D.

    2011-01-01

    OBJECTIVE: Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. METHODS: Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002–2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. RESULTS: A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%–42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. CONCLUSIONS: The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge. PMID:22007016

  19. Individualized Prediction of Heat Stress in Firefighters: A Data-Driven Approach Using Classification and Regression Trees.

    PubMed

    Mani, Ashutosh; Rao, Marepalli; James, Kelley; Bhattacharya, Amit

    2015-01-01

    The purpose of this study was to explore data-driven models, based on decision trees, to develop practical and easy to use predictive models for early identification of firefighters who are likely to cross the threshold of hyperthermia during live-fire training. Predictive models were created for three consecutive live-fire training scenarios. The final predicted outcome was a categorical variable: will a firefighter cross the upper threshold of hyperthermia - Yes/No. Two tiers of models were built, one with and one without taking into account the outcome (whether a firefighter crossed hyperthermia or not) from the previous training scenario. First tier of models included age, baseline heart rate and core body temperature, body mass index, and duration of training scenario as predictors. The second tier of models included the outcome of the previous scenario in the prediction space, in addition to all the predictors from the first tier of models. Classification and regression trees were used independently for prediction. The response variable for the regression tree was the quantitative variable: core body temperature at the end of each scenario. The predicted quantitative variable from regression trees was compared to the upper threshold of hyperthermia (38°C) to predict whether a firefighter would enter hyperthermia. The performance of classification and regression tree models was satisfactory for the second (success rate = 79%) and third (success rate = 89%) training scenarios but not for the first (success rate = 43%). Data-driven models based on decision trees can be a useful tool for predicting physiological response without modeling the underlying physiological systems. Early prediction of heat stress coupled with proactive interventions, such as pre-cooling, can help reduce heat stress in firefighters.

  20. Estimating probabilities of infestation and extent of damage by the roundheaded pine beetle in ponderosa pine in the Sacramento Mountains, New Mexico

    Treesearch

    Jose Negron

    1997-01-01

    Classification trees and linear regression analysis were used to build models to predict probabilities of infestation and amount of tree mortality in terms of basal area resulting from roundheaded pine beetle, Dendroctonus adjunctus Blandford, activity in ponderosa pine, Pinus ponderosa Laws., in the Sacramento Mountains, New Mexico. Classification trees were built for...

  1. A self-trained classification technique for producing 30 m percent-water maps from Landsat data

    USGS Publications Warehouse

    Rover, Jennifer R.; Wylie, Bruce K.; Ji, Lei

    2010-01-01

    Small bodies of water can be mapped with moderate-resolution satellite data using methods where water is mapped as subpixel fractions using field measurements or high-resolution images as training datasets. A new method, developed from a regression-tree technique, uses a 30 m Landsat image for training the regression tree that, in turn, is applied to the same image to map subpixel water. The self-trained method was evaluated by comparing the percent-water map with three other maps generated from established percent-water mapping methods: (1) a regression-tree model trained with a 5 m SPOT 5 image, (2) a regression-tree model based on endmembers and (3) a linear unmixing classification technique. The results suggest that subpixel water fractions can be accurately estimated when high-resolution satellite data or intensively interpreted training datasets are not available, which increases our ability to map small water bodies or small changes in lake size at a regional scale.

  2. Probability of infestation and extent of mortality associated with the Douglas-fir beetle in the Colorado Front Range

    Treesearch

    Jose F. Negron

    1998-01-01

    Infested and uninfested areas within Douglas fir, Pseudotsuga menziesii Mirb.. Franco, stands affected by the Douglas-fir beetle, Dendroctonus pseudotsugae Hopk. were sampled in the Colorado Front Range, CO. Classification tree models were built to predict probabilities of infestation. Regression trees and linear regression analysis were used to model amount of tree...

  3. Using Classification Trees to Predict Alumni Giving for Higher Education

    ERIC Educational Resources Information Center

    Weerts, David J.; Ronca, Justin M.

    2009-01-01

    As the relative level of public support for higher education declines, colleges and universities aim to maximize alumni-giving to keep their programs competitive. Anchored in a utility maximization framework, this study employs the classification and regression tree methodology to examine characteristics of alumni donors and non-donors at a…

  4. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    PubMed Central

    Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  5. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    PubMed

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  6. Application of classification tree and logistic regression for the management and health intervention plans in a community-based study.

    PubMed

    Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq

    2007-10-01

    A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.

  7. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    PubMed

    Henrard, S; Speybroeck, N; Hermans, C

    2015-11-01

    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  8. Industrial and occupational ergonomics in the petrochemical process industry: a regression trees approach.

    PubMed

    Bevilacqua, M; Ciarapica, F E; Giacchetta, G

    2008-07-01

    This work is an attempt to apply classification tree methods to data regarding accidents in a medium-sized refinery, so as to identify the important relationships between the variables, which can be considered as decision-making rules when adopting any measures for improvement. The results obtained using the CART (Classification And Regression Trees) method proved to be the most precise and, in general, they are encouraging concerning the use of tree diagrams as preliminary explorative techniques for the assessment of the ergonomic, management and operational parameters which influence high accident risk situations. The Occupational Injury analysis carried out in this paper was planned as a dynamic process and can be repeated systematically. The CART technique, which considers a very wide set of objective and predictive variables, shows new cause-effect correlations in occupational safety which had never been previously described, highlighting possible injury risk groups and supporting decision-making in these areas. The use of classification trees must not, however, be seen as an attempt to supplant other techniques, but as a complementary method which can be integrated into traditional types of analysis.

  9. Simulation of land use change in the three gorges reservoir area based on CART-CA

    NASA Astrophysics Data System (ADS)

    Yuan, Min

    2018-05-01

    This study proposes a new method to simulate spatiotemporal complex multiple land uses by using classification and regression tree algorithm (CART) based CA model. In this model, we use classification and regression tree algorithm to calculate land class conversion probability, and combine neighborhood factor, random factor to extract cellular transformation rules. The overall Kappa coefficient is 0.8014 and the overall accuracy is 0.8821 in the land dynamic simulation results of the three gorges reservoir area from 2000 to 2010, and the simulation results are satisfactory.

  10. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

    PubMed

    Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

    2015-08-01

    Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.

  11. A retrospective analysis to identify the factors affecting infection in patients undergoing chemotherapy.

    PubMed

    Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung

    2015-12-01

    This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  13. Combining logistic regression with classification and regression tree to predict quality of care in a home health nursing data set.

    PubMed

    Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun

    2006-01-01

    In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.

  14. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    ERIC Educational Resources Information Center

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  15. Distribution of cavity trees in midwestern old-growth and second-growth forests

    Treesearch

    Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen

    2003-01-01

    We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...

  16. Distribution of cavity trees in midwesternold-growth and second-growth forests

    Treesearch

    Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R., III Thompson; David R. Larsen

    2003-01-01

    We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...

  17. The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis.

    PubMed

    Koziol, James A; Feng, Anne C; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan

    2009-01-01

    Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors.

  18. Systematization method for distinguishing plastic groups by using NIR spectroscopy.

    PubMed

    Kaihara, Mikio; Satoh, Minami; Satoh, Minoru

    2007-07-01

    A systematic classification method for polymers is not yet available in case of using near infrared spectra (NIR). That is why we have been searching for a systematic method. Because raw NIR spectra usually have few obvious peaks, NIR spectra have been pretreated by 2nd derivation for taking well modulated spectra. After the pretreatment, we applied classification and regression trees (CART) to the discrimination between the spectra and the species of polymers. As a result, we obtained a relatively simple classification tree. Judging from the obtained splitting conditions and the classified polymers, we concluded that obtained knowledge on the chemical function groups estimated by the important wavelength regions is not always applicable to this classification tree. However, we clarified the splitting rules for polymer species from the NIR spectral point of view.

  19. Assessing the Effectiveness of Statistical Classification Techniques in Predicting Future Employment of Participants in the Temporary Assistance for Needy Families Program

    ERIC Educational Resources Information Center

    Montoya, Isaac D.

    2008-01-01

    Three classification techniques (Chi-square Automatic Interaction Detection [CHAID], Classification and Regression Tree [CART], and discriminant analysis) were tested to determine their accuracy in predicting Temporary Assistance for Needy Families program recipients' future employment. Technique evaluation was based on proportion of correctly…

  20. Modeling brook trout presence and absence from landscape variables using four different analytical methods

    USGS Publications Warehouse

    Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.

    2006-01-01

    As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.

  1. Presence of indicator plant species as a predictor of wetland vegetation integrity

    USGS Publications Warehouse

    Stapanian, Martin A.; Adams, Jean V.; Gara, Brian

    2013-01-01

    We fit regression and classification tree models to vegetation data collected from Ohio (USA) wetlands to determine (1) which species best predict Ohio vegetation index of biotic integrity (OVIBI) score and (2) which species best predict high-quality wetlands (OVIBI score >75). The simplest regression tree model predicted OVIBI score based on the occurrence of three plant species: skunk-cabbage (Symplocarpus foetidus), cinnamon fern (Osmunda cinnamomea), and swamp rose (Rosa palustris). The lowest OVIBI scores were best predicted by the absence of the selected plant species rather than by the presence of other species. The simplest classification tree model predicted high-quality wetlands based on the occurrence of two plant species: skunk-cabbage and marsh-fern (Thelypteris palustris). The overall misclassification rate from this tree was 13 %. Again, low-quality wetlands were better predicted than high-quality wetlands by the absence of selected species rather than the presence of other species using the classification tree model. Our results suggest that a species’ wetland status classification and coefficient of conservatism are of little use in predicting wetland quality. A simple, statistically derived species checklist such as the one created in this study could be used by field biologists to quickly and efficiently identify wetland sites likely to be regulated as high-quality, and requiring more intensive field assessments. Alternatively, it can be used for advanced determinations of low-quality wetlands. Agencies can save considerable money by screening wetlands for the presence/absence of such “indicator” species before issuing permits.

  2. The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis

    PubMed Central

    Koziol, James A.; Feng, Anne C.; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan

    2009-01-01

    Motivation: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Results: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors. Contact: dmercola@uci.edu PMID:18628288

  3. Which sociodemographic factors are important on smoking behaviour of high school students? The contribution of classification and regression tree methodology in a broad epidemiological survey.

    PubMed

    Ozge, C; Toros, F; Bayramkaya, E; Camdeviren, H; Sasmaz, T

    2006-08-01

    The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Using in-class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed.

  4. Propensity score estimation: machine learning and classification methods as alternatives to logistic regression

    PubMed Central

    Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson

    2010-01-01

    Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332

  5. A cross-sectional study for predicting tail biting risk in pig farms using classification and regression tree analysis.

    PubMed

    Scollo, Annalisa; Gottardo, Flaviana; Contiero, Barbara; Edwards, Sandra A

    2017-10-01

    Tail biting in pigs has been an identified behavioural, welfare and economic problem for decades, and requires appropriate but sometimes difficult on-farm interventions. The aim of the paper is to introduce the Classification and Regression Tree (CRT) methodologies to develop a tool for prevention of acute tail biting lesions in pigs on-farm. A sample of 60 commercial farms rearing heavy pigs were involved; an on-farm visit and an interview with the farmer collected data on general management, herd health, disease prevention, climate control, feeding and production traits. Results suggest a value for the CRT analysis in managing the risk factors behind tail biting on a farm-specific level, showing 86.7% sensitivity for the Classification Tree and a correlation of 0.7 between observed and predicted prevalence of tail biting obtained with the Regression Tree. CRT analysis showed five main variables (stocking density, ammonia levels, number of pigs per stockman, type of floor and timeliness in feed supply) as critical predictors of acute tail biting lesions, which demonstrate different importance in different farms subgroups. The model might have reliable and practical applications for the support and implementation of tail biting prevention interventions, especially in case of subgroups of pigs with higher risk, helping farmers and veterinarians to assess the risk in their own farm and to manage their predisposing variables in order to reduce acute tail biting lesions. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. Mastectomy or breast conserving surgery? Factors affecting type of surgical treatment for breast cancer--a classification tree approach.

    PubMed

    Martin, Michael A; Meyricke, Ramona; O'Neill, Terry; Roberts, Steven

    2006-04-20

    A critical choice facing breast cancer patients is which surgical treatment--mastectomy or breast conserving surgery (BCS)--is most appropriate. Several studies have investigated factors that impact the type of surgery chosen, identifying features such as place of residence, age at diagnosis, tumor size, socio-economic and racial/ethnic elements as relevant. Such assessment of "propensity" is important in understanding issues such as a reported under-utilisation of BCS among women for whom such treatment was not contraindicated. Using Western Australian (WA) data, we further examine the factors associated with the type of surgical treatment for breast cancer using a classification tree approach. This approach deals naturally with complicated interactions between factors, and so allows flexible and interpretable models for treatment choice to be built that add to the current understanding of this complex decision process. Data was extracted from the WA Cancer Registry on women diagnosed with breast cancer in WA from 1990 to 2000. Subjects' treatment preferences were predicted from covariates using both classification trees and logistic regression. Tumor size was the primary determinant of patient choice, subjects with tumors smaller than 20 mm in diameter preferring BCS. For subjects with tumors greater than 20 mm in diameter factors such as patient age, nodal status, and tumor histology become relevant as predictors of patient choice. Classification trees perform as well as logistic regression for predicting patient choice, but are much easier to interpret for clinical use. The selected tree can inform clinicians' advice to patients.

  7. Online adaptive decision trees: pattern classification and function approximation.

    PubMed

    Basak, Jayanta

    2006-09-01

    Recently we have shown that decision trees can be trained in the online adaptive (OADT) mode (Basak, 2004), leading to better generalization score. OADTs were bottlenecked by the fact that they are able to handle only two-class classification tasks with a given structure. In this article, we provide an architecture based on OADT, ExOADT, which can handle multiclass classification tasks and is able to perform function approximation. ExOADT is structurally similar to OADT extended with a regression layer. We also show that ExOADT is capable not only of adapting the local decision hyperplanes in the nonterminal nodes but also has the potential of smoothly changing the structure of the tree depending on the data samples. We provide the learning rules based on steepest gradient descent for the new model ExOADT. Experimentally we demonstrate the effectiveness of ExOADT in the pattern classification and function approximation tasks. Finally, we briefly discuss the relationship of ExOADT with other classification models.

  8. Which sociodemographic factors are important on smoking behaviour of high school students? The contribution of classification and regression tree methodology in a broad epidemiological survey

    PubMed Central

    Özge, C; Toros, F; Bayramkaya, E; Çamdeviren, H; Şaşmaz, T

    2006-01-01

    Background The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Methods Using in‐class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. Results The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. Conclusions It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed. PMID:16891446

  9. Predicting U.S. Army Reserve Unit Manning Using Market Demographics

    DTIC Science & Technology

    2015-06-01

    develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S

  10. Classification and regression tree (CART) analyses of genomic signatures reveal sets of tetramers that discriminate temperature optima of archaea and bacteria

    PubMed Central

    Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.

    2008-01-01

    Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742

  11. Identification of sexually abused female adolescents at risk for suicidal ideations: a classification and regression tree analysis.

    PubMed

    Brabant, Marie-Eve; Hébert, Martine; Chagnon, François

    2013-01-01

    This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression, posttraumatic stress symptoms, and hopelessness discriminated profiles of suicidal and nonsuicidal survivors. The elevated prevalence of suicidal ideations among adolescent survivors of sexual abuse underscores the importance of investigating the presence of suicidal ideations in sexual abuse survivors. However, suicidal ideation is not the sole variable that needs to be investigated; depression, hopelessness and posttraumatic stress symptoms are also related to suicidal ideations in survivors and could therefore guide interventions.

  12. Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods

    Treesearch

    Gretchen G. Moisen; Elizabeth A. Freeman; Jock A. Blackard; Tracey S. Frescino; Niklaus E. Zimmermann; Thomas C. Edwards

    2006-01-01

    Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's© See5 and Cubist (for binary and continuous responses,...

  13. Random forests and stochastic gradient boosting for predicting tree canopy cover: Comparing tuning processes and model performance

    Treesearch

    E. Freeman; G. Moisen; J. Coulston; B. Wilson

    2014-01-01

    Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...

  14. Portable Language-Independent Adaptive Translation from OCR. Phase 1

    DTIC Science & Technology

    2009-04-01

    including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation

  15. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  16. Comparison of Sub-Pixel Classification Approaches for Crop-Specific Mapping

    EPA Science Inventory

    This paper examined two non-linear models, Multilayer Perceptron (MLP) regression and Regression Tree (RT), for estimating sub-pixel crop proportions using time-series MODIS-NDVI data. The sub-pixel proportions were estimated for three major crop types including corn, soybean, a...

  17. Analysis of occlusal variables, dental attrition, and age for distinguishing healthy controls from female patients with intracapsular temporomandibular disorders.

    PubMed

    Seligman, D A; Pullinger, A G

    2000-01-01

    Confusion about the relationship of occlusion to temporomandibular disorders (TMD) persists. This study attempted to identify occlusal and attrition factors plus age that would characterize asymptomatic normal female subjects. A total of 124 female patients with intracapsular TMD were compared with 47 asymptomatic female controls for associations to 9 occlusal factors, 3 attrition severity measures, and age using classification tree, multiple stepwise logistic regression, and univariate analyses. Models were tested for accuracy (sensitivity and specificity) and total contribution to the variance. The classification tree model had 4 terminal nodes that used only anterior attrition and age. "Normals" were mainly characterized by low attrition levels, whereas patients had higher attrition and tended to be younger. The tree model was only moderately useful (sensitivity 63%, specificity 94%) in predicting normals. The logistic regression model incorporated unilateral posterior crossbite and mediotrusive attrition severity in addition to the 2 factors in the tree, but was slightly less accurate than the tree (sensitivity 51%, specificity 90%). When only occlusal factors were considered in the analysis, normals were additionally characterized by a lack of anterior open bite, smaller overjet, and smaller RCP-ICP slides. The log likelihood accounted for was similar for both the tree (pseudo R(2) = 29.38%; mean deviance = 0.95) and the multiple logistic regression (Cox Snell R(2) = 30.3%, mean deviance = 0.84) models. The occlusal and attrition factors studied were only moderately useful in differentiating normals from TMD patients.

  18. Beating the Odds: Trees to Success in Different Countries

    ERIC Educational Resources Information Center

    Finch, W. Holmes; Marchant, Gregory J.

    2017-01-01

    A recursive partitioning model approach in the form of classification and regression trees (CART) was used with 2012 PISA data for five countries (Canada, Finland, Germany, Singapore-China, and the Unites States). The objective of the study was to determine demographic and educational variables that differentiated between low SES student that were…

  19. Geospatial relationships of tree species damage caused by Hurricane Katrina in south Mississippi

    Treesearch

    Mark W. Garrigues; Zhaofei Fan; David L. Evans; Scott D. Roberts; William H. Cooke III

    2012-01-01

    Hurricane Katrina generated substantial impacts on the forests and biological resources of the affected area in Mississippi. This study seeks to use classification tree analysis (CTA) to determine which variables are significant in predicting hurricane damage (shear or windthrow) in the Southeast Mississippi Institute for Forest Inventory District. Logistic regressions...

  20. A combined M5P tree and hazard-based duration model for predicting urban freeway traffic accident durations.

    PubMed

    Lin, Lei; Wang, Qian; Sadek, Adel W

    2016-06-01

    The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. Ensemble classification of individual Pinus crowns from multispectral satellite imagery and airborne LiDAR

    NASA Astrophysics Data System (ADS)

    Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph

    2018-03-01

    Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.

  2. Using Classification and Regression Trees (CART) and random forests to analyze attrition: Results from two simulations.

    PubMed

    Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J

    2015-12-01

    In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. (c) 2015 APA, all rights reserved).

  3. Using Classification and Regression Trees (CART) and Random Forests to Analyze Attrition: Results From Two Simulations

    PubMed Central

    Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J.

    2016-01-01

    In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. PMID:26389526

  4. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    PubMed

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.

  5. Computer aided diagnosis system for the Alzheimer's disease based on partial least squares and random forest SPECT image classification.

    PubMed

    Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P

    2010-03-19

    This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  6. Post-fire tree establishment patterns at the alpine treeline ecotone: Mount Rainier National Park, Washington, USA

    Treesearch

    Kirk M. Stueve; Dawna L. Cerney; Regina M. Rochefort; Laurie L. Kurth

    2009-01-01

    We performed classification analysis of 1970 satellite imagery and 2003 aerial photography to delineate establishment. Local site conditions were calculated from a LIDAR-based DEM, ancillary climate data, and 1970 tree locations in a GIS. We used logistic regression on a spatially weighted landscape matrix to rank variables.

  7. On the detection of pornographic digital images

    NASA Astrophysics Data System (ADS)

    Schettini, Raimondo; Brambilla, Carla; Cusano, Claudio; Ciocca, Gianluigi

    2003-06-01

    The paper addresses the problem of distinguishing between pornographic and non-pornographic photographs, for the design of semantic filters for the web. Both, decision forests of trees built according to CART (Classification And Regression Trees) methodology and Support Vectors Machines (SVM), have been used to perform the classification. The photographs are described by a set of low-level features, features that can be automatically computed simply on gray-level and color representation of the image. The database used in our experiments contained 1500 photographs, 750 of which labeled as pornographic on the basis of the independent judgement of several viewers.

  8. Explaining Match Outcome During The Men’s Basketball Tournament at The Olympic Games

    PubMed Central

    Leicht, Anthony S.; Gómez, Miguel A.; Woods, Carl T.

    2017-01-01

    In preparation for the Olympics, there is a limited opportunity for coaches and athletes to interact regularly with team performance indicators providing important guidance to coaches for enhanced match success at the elite level. This study examined the relationship between match outcome and team performance indicators during men’s basketball tournaments at the Olympic Games. Twelve team performance indicators were collated from all men’s teams and matches during the basketball tournament of the 2004-2016 Olympic Games (n = 156). Linear and non-linear analyses examined the relationship between match outcome and team performance indicator characteristics; namely, binary logistic regression and a conditional interference (CI) classification tree. The most parsimonious logistic regression model retained ‘assists’, ‘defensive rebounds’, ‘field-goal percentage’, ‘fouls’, ‘fouls against’, ‘steals’ and ‘turnovers’ (delta AIC <0.01; Akaike weight = 0.28) with a classification accuracy of 85.5%. Conversely, four performance indicators were retained with the CI classification tree with an average classification accuracy of 81.4%. However, it was the combination of ‘field-goal percentage’ and ‘defensive rebounds’ that provided the greatest probability of winning (93.2%). Match outcome during the men’s basketball tournaments at the Olympic Games was identified by a unique combination of performance indicators. Despite the average model accuracy being marginally higher for the logistic regression analysis, the CI classification tree offered a greater practical utility for coaches through its resolution of non-linear phenomena to guide team success. Key points A unique combination of team performance indicators explained 93.2% of winning observations in men’s basketball at the Olympics. Monitoring of these team performance indicators may provide coaches with the capability to devise multiple game plans or strategies to enhance their likelihood of winning. Incorporation of machine learning techniques with team performance indicators may provide a valuable and strategic approach to explain patterns within multivariate datasets in sport science. PMID:29238245

  9. An evaluation of supervised classifiers for indirectly detecting salt-affected areas at irrigation scheme level

    NASA Astrophysics Data System (ADS)

    Muller, Sybrand Jacobus; van Niekerk, Adriaan

    2016-07-01

    Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.

  10. A Pilot Test of Indicator Species to Assess Uniqueness of Oak-Dominated Ecoregions in Central Tennessee

    Treesearch

    W. Henry McNab; David L. Loftis; Callie J. Schweitzer; Raymond Sheffield

    2004-01-01

    We used tree indicator species occurring on 438 plots in the Plateau counties of Tennessee to test the uniqueness of four conterminous ecoregions. Multinomial logistic regression indicated that the presence of 14 tree species allowed classification of sample plots according to ecoregion with an average overall accuracy of 75 percent (range 45 to 94 percent). Additional...

  11. Classification and regression trees

    Treesearch

    G. G. Moisen

    2008-01-01

    Frequently, ecologists are interested in exploring ecological relationships, describing patterns and processes, or making spatial or temporal predictions. These purposes often can be addressed by modeling the relationship between some outcome or response and a set of features or explanatory variables.

  12. Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees

    NASA Astrophysics Data System (ADS)

    Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu

    2018-02-01

    A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.

  13. CART (Classification and Regression Trees) Program: The Implementation of the CART Program and Its Application to Estimating Attrition Rates.

    DTIC Science & Technology

    1985-12-01

    consists of the node t and all descendants of t in T. (3) Definition 3. Pruning a branch Tt from a tree T con- sists of deleting from T all...The default is 1.0 so that actually, this keyword did not need to appear in the above file. (5) DELETE . This keyword does not appear in our example, but...when it is used associated with some variable names, it indicates that we want to delete these vari- ables from the regression. If this keyword is

  14. Stratification of the severity of critically ill patients with classification trees

    PubMed Central

    2009-01-01

    Background Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69-75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients. PMID:20003229

  15. Foot and hip contributions to high frontal plane knee projection angle in athletes: a classification and regression tree approach.

    PubMed

    Bittencourt, Natalia F N; Ocarino, Juliana M; Mendonça, Luciana D M; Hewett, Timothy E; Fonseca, Sergio T

    2012-12-01

    Cross-sectional. To investigate predictors of increased frontal plane knee projection angle (FPKPA) in athletes. The underlying mechanisms that lead to increased FPKPA are likely multifactorial and depend on how the musculoskeletal system adapts to the possible interactions between its distal and proximal segments. Bivariate and linear analyses traditionally employed to analyze the occurrence of increased FPKPA are not sufficiently robust to capture complex relationships among predictors. The investigation of nonlinear interactions among biomechanical factors is necessary to further our understanding of the interdependence of lower-limb segments and resultant dynamic knee alignment. The FPKPA was assessed in 101 athletes during a single-leg squat and in 72 athletes at the moment of landing from a jump. The investigated predictors were sex, hip abductor isometric torque, passive range of motion (ROM) of hip internal rotation (IR), and shank-forefoot alignment. Classification and regression trees were used to investigate nonlinear interactions among predictors and their influence on the occurrence of increased FPKPA. During single-leg squatting, the occurrence of high FPKPA was predicted by the interaction between hip abductor isometric torque and passive hip IR ROM. At the moment of landing, the shank-forefoot alignment, abductor isometric torque, and passive hip IR ROM were predictors of high FPKPA. In addition, the classification and regression trees established cutoff points that could be used in clinical practice to identify athletes who are at potential risk for excessive FPKPA. The models captured nonlinear interactions between hip abductor isometric torque, passive hip IR ROM, and shank-forefoot alignment.

  16. A Comparison of Logistic Regression, Neural Networks, and Classification Trees Predicting Success of Actuarial Students

    ERIC Educational Resources Information Center

    Schumacher, Phyllis; Olinsky, Alan; Quinn, John; Smith, Richard

    2010-01-01

    The authors extended previous research by 2 of the authors who conducted a study designed to predict the successful completion of students enrolled in an actuarial program. They used logistic regression to determine the probability of an actuarial student graduating in the major or dropping out. They compared the results of this study with those…

  17. Predicting surface fuel models and fuel metrics using lidar and CIR imagery in a dense mixed conifer forest

    Treesearch

    Marek K. Jakubowksi; Qinghua Guo; Brandon Collins; Scott Stephens; Maggi Kelly

    2013-01-01

    We compared the ability of several classification and regression algorithms to predict forest stand structure metrics and standard surface fuel models. Our study area spans a dense, topographically complex Sierra Nevada mixed-conifer forest. We used clustering, regression trees, and support vector machine algorithms to analyze high density (average 9 pulses/m

  18. Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

    PubMed

    Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.

  19. Extensions and applications of ensemble-of-trees methods in machine learning

    NASA Astrophysics Data System (ADS)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems.

  20. Assessment of wastewater treatment facility compliance with decreasing ammonia discharge limits using a regression tree model.

    PubMed

    Suchetana, Bihu; Rajagopalan, Balaji; Silverstein, JoAnn

    2017-11-15

    A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Classification and regression tree analysis of acute-on-chronic hepatitis B liver failure: Seeing the forest for the trees.

    PubMed

    Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H

    2017-02-01

    At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.

  2. Harmonic regression of Landsat time series for modeling attributes from national forest inventory data

    NASA Astrophysics Data System (ADS)

    Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.

    2018-03-01

    Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.

  3. BAYESIAN METHODS FOR REGIONAL-SCALE EUTROPHICATION MODELS. (R830887)

    EPA Science Inventory

    We demonstrate a Bayesian classification and regression tree (CART) approach to link multiple environmental stressors to biological responses and quantify uncertainty in model predictions. Such an approach can: (1) report prediction uncertainty, (2) be consistent with the amou...

  4. Evaluating multimedia chemical persistence: Classification and regression tree analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, D.H.; McKone, T.E.; Kastenberg, W.E.

    2000-04-01

    For the thousands of chemicals continuously released into the environment, it is desirable to make prospective assessments of those likely to be persistent. Widely distributed persistent chemicals are impossible to remove from the environment and remediation by natural processes may take decades, which is problematic if adverse health or ecological effects are discovered after prolonged release into the environment. A tiered approach using a classification scheme and a multimedia model for determining persistence is presented. Using specific criteria for persistence, a classification tree is developed to classify a chemical as persistent or nonpersistent based on the chemical properties. In thismore » approach, the classification is derived from the results of a standardized unit world multimedia model. Thus, the classifications are more robust for multimedia pollutants than classifications using a single medium half-life. The method can be readily implemented and provides insight without requiring extensive and often unavailable data. This method can be used to classify chemicals when only a few properties are known and can be used to direct further data collection. Case studies are presented to demonstrate the advantages of the approach.« less

  5. Modelling spruce bark beetle infestation probability

    Treesearch

    Paulius Zolubas; Jose Negron; A. Steven Munson

    2009-01-01

    Spruce bark beetle (Ips typographus L.) risk model, based on pure Norway spruce (Picea abies Karst.) stand characteristics in experimental and control plots was developed using classification and regression tree statistical technique under endemic pest population density. The most significant variable in spruce bark beetle...

  6. CART DIAGNOSIS OF WATERSHED IMPAIRMENT IN THE MID-ATLANTIC REGION

    EPA Science Inventory

    Many factors ( stressors ) can lead to increased concentrations of nutrients and sediments, and these factors change across watersheds. Classification and Regression Tree (CART) is a statistical approach that can be used to "diagnose" which factors are important stressors on a pe...

  7. Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model

    PubMed Central

    Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz

    2016-01-01

    When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924

  8. Prediction of strontium bromide laser efficiency using cluster and decision tree analysis

    NASA Astrophysics Data System (ADS)

    Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar

    2018-01-01

    Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.

  9. Using Baidu Search Index to Predict Dengue Outbreak in China

    NASA Astrophysics Data System (ADS)

    Liu, Kangkang; Wang, Tao; Yang, Zhicong; Huang, Xiaodong; Milinovich, Gabriel J.; Lu, Yi; Jing, Qinlong; Xia, Yao; Zhao, Zhengyang; Yang, Yang; Tong, Shilu; Hu, Wenbiao; Lu, Jiahai

    2016-12-01

    This study identified the possible threshold to predict dengue fever (DF) outbreaks using Baidu Search Index (BSI). Time-series classification and regression tree models based on BSI were used to develop a predictive model for DF outbreak in Guangzhou and Zhongshan, China. In the regression tree models, the mean autochthonous DF incidence rate increased approximately 30-fold in Guangzhou when the weekly BSI for DF at the lagged moving average of 1-3 weeks was more than 382. When the weekly BSI for DF at the lagged moving average of 1-5 weeks was more than 91.8, there was approximately 9-fold increase of the mean autochthonous DF incidence rate in Zhongshan. In the classification tree models, the results showed that when the weekly BSI for DF at the lagged moving average of 1-3 weeks was more than 99.3, there was 89.28% chance of DF outbreak in Guangzhou, while, in Zhongshan, when the weekly BSI for DF at the lagged moving average of 1-5 weeks was more than 68.1, the chance of DF outbreak rose up to 100%. The study indicated that less cost internet-based surveillance systems can be the valuable complement to traditional DF surveillance in China.

  10. Using cluster analysis and a classification and regression tree model to developed cover types in the Sky Islands of southeastern Arizona

    Treesearch

    Jose M. Iniguez; Joseph L. Ganey; Peter J. Daughtery; John D. Bailey

    2005-01-01

    The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such a system we qualitatively and quantitatively compared a hierarchical (Ward’s) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups represented by...

  11. Using cluster analysis and a classification and regression tree model to developed cover types in the Sky Islands of southeastern Arizona [Abstract

    Treesearch

    Jose M. Iniguez; Joseph L. Ganey; Peter J. Daugherty; John D. Bailey

    2005-01-01

    The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such system we qualitatively and quantitatively compared a hierarchical (Ward’s) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups and plots...

  12. Classification of sodium MRI data of cartilage using machine learning.

    PubMed

    Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R

    2015-11-01

    To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.

  13. Data mining of tree-based models to analyze freeway accident frequency.

    PubMed

    Chang, Li-Yen; Chen, Wen-Chieh

    2005-01-01

    Statistical models, such as Poisson or negative binomial regression models, have been employed to analyze vehicle accident frequency for many years. However, these models have their own model assumptions and pre-defined underlying relationship between dependent and independent variables. If these assumptions are violated, the model could lead to erroneous estimation of accident likelihood. Classification and Regression Tree (CART), one of the most widely applied data mining techniques, has been commonly employed in business administration, industry, and engineering. CART does not require any pre-defined underlying relationship between target (dependent) variable and predictors (independent variables) and has been shown to be a powerful tool, particularly for dealing with prediction and classification problems. This study collected the 2001-2002 accident data of National Freeway 1 in Taiwan. A CART model and a negative binomial regression model were developed to establish the empirical relationship between traffic accidents and highway geometric variables, traffic characteristics, and environmental factors. The CART findings indicated that the average daily traffic volume and precipitation variables were the key determinants for freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies.

  14. Disentangling Environmental and Anthropogenic Impacts on the Distribution of Unintentionally Introduced Invasive Alien Insects in Mainland China

    PubMed Central

    Zhao, Cai-Yun; Xu, Jing; Liu, Xiao-Yan

    2017-01-01

    Abstract Globalization increases the opportunities for unintentionally introduced invasive alien species, especially for insects, and most of these species could damage ecosystems and cause economic loss in China. In this study, we analyzed drivers of the distribution of unintentionally introduced invasive alien insects. Based on the number of unintentionally introduced invasive alien insects and their presence/absence records in each province in mainland China, regression trees were built to elucidate the roles of environmental and anthropogenic factors on the number distribution and similarity of species composition of these insects. Classification and regression trees indicated climatic suitability (the mean temperature in January) and human economic activity (sum of total freight) are primary drivers for the number distribution pattern of unintentionally introduced invasive alien insects at provincial scale, while only environmental factors (the mean January temperature, the annual precipitation and the areas of provinces) significantly affect the similarity of them based on the multivariate regression trees. PMID:28973576

  15. Hyperspectral Analysis of Soil Nitrogen, Carbon, Carbonate, and Organic Matter Using Regression Trees

    PubMed Central

    Gmur, Stephan; Vogt, Daniel; Zabowski, Darlene; Moskal, L. Monika

    2012-01-01

    The characterization of soil attributes using hyperspectral sensors has revealed patterns in soil spectra that are known to respond to mineral composition, organic matter, soil moisture and particle size distribution. Soil samples from different soil horizons of replicated soil series from sites located within Washington and Oregon were analyzed with the FieldSpec Spectroradiometer to measure their spectral signatures across the electromagnetic range of 400 to 1,000 nm. Similarity rankings of individual soil samples reveal differences between replicate series as well as samples within the same replicate series. Using classification and regression tree statistical methods, regression trees were fitted to each spectral response using concentrations of nitrogen, carbon, carbonate and organic matter as the response variables. Statistics resulting from fitted trees were: nitrogen R2 0.91 (p < 0.01) at 403, 470, 687, and 846 nm spectral band widths, carbonate R2 0.95 (p < 0.01) at 531 and 898 nm band widths, total carbon R2 0.93 (p < 0.01) at 400, 409, 441 and 907 nm band widths, and organic matter R2 0.98 (p < 0.01) at 300, 400, 441, 832 and 907 nm band widths. Use of the 400 to 1,000 nm electromagnetic range utilizing regression trees provided a powerful, rapid and inexpensive method for assessing nitrogen, carbon, carbonate and organic matter for upper soil horizons in a nondestructive method. PMID:23112620

  16. Neuropsychological Test Selection for Cognitive Impairment Classification: A Machine Learning Approach

    PubMed Central

    Williams, Jennifer A.; Schmitter-Edgecombe, Maureen; Cook, Diane J.

    2016-01-01

    Introduction Reducing the amount of testing required to accurately detect cognitive impairment is clinically relevant. The aim of this research was to determine the fewest number of clinical measures required to accurately classify participants as healthy older adult, mild cognitive impairment (MCI) or dementia using a suite of classification techniques. Methods Two variable selection machine learning models (i.e., naive Bayes, decision tree), a logistic regression, and two participant datasets (i.e., clinical diagnosis, clinical dementia rating; CDR) were explored. Participants classified using clinical diagnosis criteria included 52 individuals with dementia, 97 with MCI, and 161 cognitively healthy older adults. Participants classified using CDR included 154 individuals CDR = 0, 93 individuals with CDR = 0.5, and 25 individuals with CDR = 1.0+. Twenty-seven demographic, psychological, and neuropsychological variables were available for variable selection. Results No significant difference was observed between naive Bayes, decision tree, and logistic regression models for classification of both clinical diagnosis and CDR datasets. Participant classification (70.0 – 99.1%), geometric mean (60.9 – 98.1%), sensitivity (44.2 – 100%), and specificity (52.7 – 100%) were generally satisfactory. Unsurprisingly, the MCI/CDR = 0.5 participant group was the most challenging to classify. Through variable selection only 2 – 9 variables were required for classification and varied between datasets in a clinically meaningful way. Conclusions The current study results reveal that machine learning techniques can accurately classifying cognitive impairment and reduce the number of measures required for diagnosis. PMID:26332171

  17. Quad-polarized synthetic aperture radar and multispectral data classification using classification and regression tree and support vector machine-based data fusion system

    NASA Astrophysics Data System (ADS)

    Bigdeli, Behnaz; Pahlavani, Parham

    2017-01-01

    Interpretation of synthetic aperture radar (SAR) data processing is difficult because the geometry and spectral range of SAR are different from optical imagery. Consequently, SAR imaging can be a complementary data to multispectral (MS) optical remote sensing techniques because it does not depend on solar illumination and weather conditions. This study presents a multisensor fusion of SAR and MS data based on the use of classification and regression tree (CART) and support vector machine (SVM) through a decision fusion system. First, different feature extraction strategies were applied on SAR and MS data to produce more spectral and textural information. To overcome the redundancy and correlation between features, an intrinsic dimension estimation method based on noise-whitened Harsanyi, Farrand, and Chang determines the proper dimension of the features. Then, principal component analysis and independent component analysis were utilized on stacked feature space of two data. Afterward, SVM and CART classified each reduced feature space. Finally, a fusion strategy was utilized to fuse the classification results. To show the effectiveness of the proposed methodology, single classification on each data was compared to the obtained results. A coregistered Radarsat-2 and WorldView-2 data set from San Francisco, USA, was available to examine the effectiveness of the proposed method. The results show that combinations of SAR data with optical sensor based on the proposed methodology improve the classification results for most of the classes. The proposed fusion method provided approximately 93.24% and 95.44% for two different areas of the data.

  18. Boosted classification trees result in minor to modest improvement in the accuracy in classifying cardiovascular outcomes compared to conventional classification trees

    PubMed Central

    Austin, Peter C; Lee, Douglas S

    2011-01-01

    Purpose: Classification trees are increasingly being used to classifying patients according to the presence or absence of a disease or health outcome. A limitation of classification trees is their limited predictive accuracy. In the data-mining and machine learning literature, boosting has been developed to improve classification. Boosting with classification trees iteratively grows classification trees in a sequence of reweighted datasets. In a given iteration, subjects that were misclassified in the previous iteration are weighted more highly than subjects that were correctly classified. Classifications from each of the classification trees in the sequence are combined through a weighted majority vote to produce a final classification. The authors' objective was to examine whether boosting improved the accuracy of classification trees for predicting outcomes in cardiovascular patients. Methods: We examined the utility of boosting classification trees for classifying 30-day mortality outcomes in patients hospitalized with either acute myocardial infarction or congestive heart failure. Results: Improvements in the misclassification rate using boosted classification trees were at best minor compared to when conventional classification trees were used. Minor to modest improvements to sensitivity were observed, with only a negligible reduction in specificity. For predicting cardiovascular mortality, boosted classification trees had high specificity, but low sensitivity. Conclusions: Gains in predictive accuracy for predicting cardiovascular outcomes were less impressive than gains in performance observed in the data mining literature. PMID:22254181

  19. Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes.

    PubMed

    Yates, Katherine L; Mellin, Camille; Caley, M Julian; Radford, Ben T; Meeuwig, Jessica J

    2016-01-01

    Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability.

  20. Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes

    PubMed Central

    Yates, Katherine L.; Mellin, Camille; Caley, M. Julian; Radford, Ben T.; Meeuwig, Jessica J.

    2016-01-01

    Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability. PMID:27333202

  1. Interactions between factors related to the decision of sex offenders to confess during police interrogation: a classification-tree approach.

    PubMed

    Beauregard, Eric; Deslauriers-Varin, Nadine; St-Yves, Michel

    2010-09-01

    Most studies of confessions have looked at the influence of individual factors, neglecting the potential interactions between these factors and their impact on the decision to confess or not during an interrogation. Classification and regression tree analyses conducted on a sample of 624 convicted sex offenders showed that certain factors related to the offenders (e.g., personality, criminal career), victims (e.g., sex, relationship to offender), and case (e.g., time of day of the crime) were related to the decision to confess or not during the police interrogation. Several interactions were also observed between these factors. Results will be discussed in light of previous findings and interrogation strategies for sex offenders.

  2. Feature Relevance Assessment of Multispectral Airborne LIDAR Data for Tree Species Classification

    NASA Astrophysics Data System (ADS)

    Amiri, N.; Heurich, M.; Krzystek, P.; Skidmore, A. K.

    2018-04-01

    The presented experiment investigates the potential of Multispectral Laser Scanning (MLS) point clouds for single tree species classification. The basic idea is to simulate a MLS sensor by combining two different Lidar sensors providing three different wavelngthes. The available data were acquired in the summer 2016 at the same date in a leaf-on condition with an average point density of 37 points/m2. For the purpose of classification, we segmented the combined 3D point clouds consisiting of three different spectral channels into 3D clusters using Normalized Cut segmentation approach. Then, we extracted four group of features from the 3D point cloud space. Once a varity of features has been extracted, we applied forward stepwise feature selection in order to reduce the number of irrelevant or redundant features. For the classification, we used multinomial logestic regression with L1 regularization. Our study is conducted using 586 ground measured single trees from 20 sample plots in the Bavarian Forest National Park, in Germany. Due to lack of reference data for some rare species, we focused on four classes of species. The results show an improvement between 4-10 pp for the tree species classification by using MLS data in comparison to a single wavelength based approach. A cross validated (15-fold) accuracy of 0.75 can be achieved when all feature sets from three different spectral channels are used. Our results cleary indicates that the use of MLS point clouds has great potential to improve detailed forest species mapping.

  3. Risk profiles for weight gain among postmenopausal women: A classification and regression tree analysis approach

    USDA-ARS?s Scientific Manuscript database

    Risk factors for obesity and weight gain are typically evaluated individually while "adjusting for" the influence of other confounding factors, and few studies, if any, have created risk profiles by clustering risk factors. We identified subgroups of postmenopausal women homogeneous in their cluster...

  4. Profiling Student Use of Calculators in the Learning of High School Mathematics

    ERIC Educational Resources Information Center

    Crowe, Cheryll E.; Ma, Xin

    2010-01-01

    Using data from the 2005 National Assessment of Educational Progress, students' use of calculators in the learning of high school mathematics was profiled based on their family background, curriculum background, and advanced mathematics coursework. A statistical method new to educational research--classification and regression trees--was applied…

  5. Pesticides in Urban Multiunit Dwellings: Hazard IdentificationUsing Classification and Regression Tree (CART) Analysis

    EPA Science Inventory

    Many units in public housing or other low-income urban dwellings may have elevated pesticide residues, given recurring infestation, but it would be logistically and economically infeasible to sample a large number of units to identify highly exposed households to design interven...

  6. [RS estimation of inventory parameters and carbon storage of moso bamboo forest based on synergistic use of object-based image analysis and decision tree].

    PubMed

    Du, Hua Qiang; Sun, Xiao Yan; Han, Ning; Mao, Fang Jie

    2017-10-01

    By synergistically using the object-based image analysis (OBIA) and the classification and regression tree (CART) methods, the distribution information, the indexes (including diameter at breast, tree height, and crown closure), and the aboveground carbon storage (AGC) of moso bamboo forest in Shanchuan Town, Anji County, Zhejiang Province were investigated. The results showed that the moso bamboo forest could be accurately delineated by integrating the multi-scale ima ge segmentation in OBIA technique and CART, which connected the image objects at various scales, with a pretty good producer's accuracy of 89.1%. The investigation of indexes estimated by regression tree model that was constructed based on the features extracted from the image objects reached normal or better accuracy, in which the crown closure model archived the best estimating accuracy of 67.9%. The estimating accuracy of diameter at breast and tree height was relatively low, which was consistent with conclusion that estimating diameter at breast and tree height using optical remote sensing could not achieve satisfactory results. Estimation of AGC reached relatively high accuracy, and accuracy of the region of high value achieved above 80%.

  7. Analysis of Chi-square Automatic Interaction Detection (CHAID) and Classification and Regression Tree (CRT) for Classification of Corn Production

    NASA Astrophysics Data System (ADS)

    Susanti, Yuliana; Zukhronah, Etik; Pratiwi, Hasih; Respatiwulan; Sri Sulistijowati, H.

    2017-11-01

    To achieve food resilience in Indonesia, food diversification by exploring potentials of local food is required. Corn is one of alternating staple food of Javanese society. For that reason, corn production needs to be improved by considering the influencing factors. CHAID and CRT are methods of data mining which can be used to classify the influencing variables. The present study seeks to dig up information on the potentials of local food availability of corn in regencies and cities in Java Island. CHAID analysis yields four classifications with accuracy of 78.8%, while CRT analysis yields seven classifications with accuracy of 79.6%.

  8. Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research.

    PubMed

    Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann

    2003-01-01

    Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

  9. Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study.

    PubMed

    Khalkhali, Hamid Reza; Lotfnezhad Afshar, Hadi; Esnaashari, Omid; Jabbari, Nasrollah

    2016-01-01

    Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.

  10. First analysis of risk factors associated with bee colony collapse disorder by classification and regression trees

    USDA-ARS?s Scientific Manuscript database

    Sudden losses of managed honey bee (Apis mellifera L.) colonies are considered an important problem worldwide but the underlying cause or causes of these losses are currently unknown. In the United States, this syndrome was termed Colony Collapse Disorder (CCD), since the defining trait was a rapid ...

  11. A Comparison of Two Scoring Methods for an Automated Speech Scoring System

    ERIC Educational Resources Information Center

    Xi, Xiaoming; Higgins, Derrick; Zechner, Klaus; Williamson, David

    2012-01-01

    This paper compares two alternative scoring methods--multiple regression and classification trees--for an automated speech scoring system used in a practice environment. The two methods were evaluated on two criteria: construct representation and empirical performance in predicting human scores. The empirical performance of the two scoring models…

  12. Student Self-Reported Learning Outcomes of Field Trips: The Pedagogical Impact

    ERIC Educational Resources Information Center

    Alon, Nirit Lavie; Tal, Tali

    2015-01-01

    In this study, we used the classification and regression trees (CART) method to draw relationships between student self-reported learning outcomes in 26 field trips to natural environments and various characteristics of the field trip that include variables associated with preparation and pedagogy. We wished to examine the extent to which the…

  13. Assessing wildfire risks at multiple spatial scales

    Treesearch

    Justin Fitch

    2008-01-01

    In continuation of the efforts to advance wildfire science and develop tools for wildland fire managers, a spatial wildfire risk assessment was carried out using Classification and Regression Tree analysis (CART) and Geographic Information Systems (GIS). The analysis was performed at two scales. The small-scale assessment covered the entire state of New Mexico, while...

  14. Differential Diagnosis of Erythmato-Squamous Diseases Using Classification and Regression Tree.

    PubMed

    Maghooli, Keivan; Langarizadeh, Mostafa; Shahmoradi, Leila; Habibi-Koolaee, Mahdi; Jebraeily, Mohamad; Bouraghi, Hamid

    2016-10-01

    Differential diagnosis of Erythmato-Squamous Diseases (ESD) is a major challenge in the field of dermatology. The ESD diseases are placed into six different classes. Data mining is the process for detection of hidden patterns. In the case of ESD, data mining help us to predict the diseases. Different algorithms were developed for this purpose. we aimed to use the Classification and Regression Tree (CART) to predict differential diagnosis of ESD. we used the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. For this purpose, the dermatology data set from machine learning repository, UCI was obtained. The Clementine 12.0 software from IBM Company was used for modelling. In order to evaluation of the model we calculate the accuracy, sensitivity and specificity of the model. The proposed model had an accuracy of 94.84% (. 24.42) in order to correct prediction of the ESD disease. Results indicated that using of this classifier could be useful. But, it would be strongly recommended that the combination of machine learning methods could be more useful in terms of prediction of ESD.

  15. Disentangling Environmental and Anthropogenic Impacts on the Distribution of Unintentionally Introduced Invasive Alien Insects in Mainland China.

    PubMed

    Zhao, Cai-Yun; Li, Jun-Sheng; Xu, Jing; Liu, Xiao-Yan

    2017-05-01

    Globalization increases the opportunities for unintentionally introduced invasive alien species, especially for insects, and most of these species could damage ecosystems and cause economic loss in China. In this study, we analyzed drivers of the distribution of unintentionally introduced invasive alien insects. Based on the number of unintentionally introduced invasive alien insects and their presence/absence records in each province in mainland China, regression trees were built to elucidate the roles of environmental and anthropogenic factors on the number distribution and similarity of species composition of these insects. Classification and regression trees indicated climatic suitability (the mean temperature in January) and human economic activity (sum of total freight) are primary drivers for the number distribution pattern of unintentionally introduced invasive alien insects at provincial scale, while only environmental factors (the mean January temperature, the annual precipitation and the areas of provinces) significantly affect the similarity of them based on the multivariate regression trees. © The Authors 2017. Published by Oxford University Press on behalf of Entomological Society of America.

  16. Decision trees in epidemiological research.

    PubMed

    Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone

    2017-01-01

    In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

  17. A soil map of a large watershed in China: applying digital soil mapping in a data sparse region

    NASA Astrophysics Data System (ADS)

    Barthold, F.; Blank, B.; Wiesmeier, M.; Breuer, L.; Frede, H.-G.

    2009-04-01

    Prediction of soil classes in data sparse regions is a major research challenge. With the advent of machine learning the possibilities to spatially predict soil classes have increased tremendously and given birth to new possibilities in soil mapping. Digital soil mapping is a research field that has been established during the last decades and has been accepted widely. We now need to develop tools to reduce the uncertainty in soil predictions. This is especially challenging in data sparse regions. One approach to do this is to implement soil taxonomic distance as a classification error criterion in classification and regression trees (CART) as suggested by Minasny et al. (Geoderma 142 (2007) 285-293). This approach assumes that the classification error should be larger between soils that are more dissimilar, i.e. differ in a larger number of soil properties, and smaller between more similar soils. Our study area is the Xilin River Basin, which is located in central Inner Mongolia in China. It is characterized by semi arid climate conditions and is representative for the natural occurring steppe ecosystem. The study area comprises 3600 km2. We applied a random, stratified sampling design after McKenzie and Ryan (Geoderma 89 (1999) 67-94) with landuse and topography as stratifying variables. We defined 10 sampling classes, from each class 14 replicates were randomly drawn and sampled. The dataset was split into 100 soil profiles for training and 40 soil profiles for validation. We then applied classification and regression trees (CART) to quantify the relationships between soil classes and environmental covariates. The classification tree explained 75.5% of the variance with land use and geology as most important predictor variables. Among the 8 soil classes that we predicted, the Kastanozems cover most of the area. They are predominantly found in steppe areas. However, even some of the soils at sand dune sites, which were thought to show only little soil formation, can be classified as Kastanozems. Besides the Kastanozems, Regosols are most common at the sand dune sites as well as at sites that are defined as bare soil which are characterized by little or no vegetation. Gleysols are mostly found at sites in the vicinity of the Xilin river, which are connected to the groundwater. They can also be found in small valleys or depressions where sub-surface waters from neighboring areas collect. The richest soils are found in mountain meadow areas. Pedogenetic conditions here are most favorable and lead to the formation of Chernozems with deep humic Ah horizons. Other soil types that occur in the study area are Arenosols, Calcisols, Cambisol and Phaeozems. In addition, soil taxonomic distance is implemented into the decision tree procedure as a measure of classification error. The results of incorporating taxonomic distance as a loss function in the decision tree will be compared with the standard application of the decision tree.

  18. Applying an Ensemble Classification Tree Approach to the Prediction of Completion of a 12-Step Facilitation Intervention with Stimulant Abusers

    PubMed Central

    Doyle, Suzanne R.; Donovan, Dennis M.

    2014-01-01

    Aims The purpose of this study was to explore the selection of predictor variables in the evaluation of drug treatment completion using an ensemble approach with classification trees. The basic methodology is reviewed and the subagging procedure of random subsampling is applied. Methods Among 234 individuals with stimulant use disorders randomized to a 12-Step facilitative intervention shown to increase stimulant use abstinence, 67.52% were classified as treatment completers. A total of 122 baseline variables were used to identify factors associated with completion. Findings The number of types of self-help activity involvement prior to treatment was the predominant predictor. Other effective predictors included better coping self-efficacy for substance use in high-risk situations, more days of prior meeting attendance, greater acceptance of the Disease model, higher confidence for not resuming use following discharge, lower ASI Drug and Alcohol composite scores, negative urine screens for cocaine or marijuana, and fewer employment problems. Conclusions The application of an ensemble subsampling regression tree method utilizes the fact that classification trees are unstable but, on average, produce an improved prediction of the completion of drug abuse treatment. The results support the notion there are early indicators of treatment completion that may allow for modification of approaches more tailored to fitting the needs of individuals and potentially provide more successful treatment engagement and improved outcomes. PMID:25134038

  19. A regional classification scheme for estimating reference water quality in streams using land-use-adjusted spatial regression-tree analysis

    USGS Publications Warehouse

    Robertson, Dale M.; Saad, D.A.; Heisey, D.M.

    2006-01-01

    Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.

  20. Using Bayesian neural networks to classify forest scenes

    NASA Astrophysics Data System (ADS)

    Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni

    1998-10-01

    We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.

  1. Tree Classification with Fused Mobile Laser Scanning and Hyperspectral Data

    PubMed Central

    Puttonen, Eetu; Jaakkola, Anttoni; Litkey, Paula; Hyyppä, Juha

    2011-01-01

    Mobile Laser Scanning data were collected simultaneously with hyperspectral data using the Finnish Geodetic Institute Sensei system. The data were tested for tree species classification. The test area was an urban garden in the City of Espoo, Finland. Point clouds representing 168 individual tree specimens of 23 tree species were determined manually. The classification of the trees was done using first only the spatial data from point clouds, then with only the spectral data obtained with a spectrometer, and finally with the combined spatial and hyperspectral data from both sensors. Two classification tests were performed: the separation of coniferous and deciduous trees, and the identification of individual tree species. All determined tree specimens were used in distinguishing coniferous and deciduous trees. A subset of 133 trees and 10 tree species was used in the tree species classification. The best classification results for the fused data were 95.8% for the separation of the coniferous and deciduous classes. The best overall tree species classification succeeded with 83.5% accuracy for the best tested fused data feature combination. The respective results for paired structural features derived from the laser point cloud were 90.5% for the separation of the coniferous and deciduous classes and 65.4% for the species classification. Classification accuracies with paired hyperspectral reflectance value data were 90.5% for the separation of coniferous and deciduous classes and 62.4% for different species. The results are among the first of their kind and they show that mobile collected fused data outperformed single-sensor data in both classification tests and by a significant margin. PMID:22163894

  2. Tree classification with fused mobile laser scanning and hyperspectral data.

    PubMed

    Puttonen, Eetu; Jaakkola, Anttoni; Litkey, Paula; Hyyppä, Juha

    2011-01-01

    Mobile Laser Scanning data were collected simultaneously with hyperspectral data using the Finnish Geodetic Institute Sensei system. The data were tested for tree species classification. The test area was an urban garden in the City of Espoo, Finland. Point clouds representing 168 individual tree specimens of 23 tree species were determined manually. The classification of the trees was done using first only the spatial data from point clouds, then with only the spectral data obtained with a spectrometer, and finally with the combined spatial and hyperspectral data from both sensors. Two classification tests were performed: the separation of coniferous and deciduous trees, and the identification of individual tree species. All determined tree specimens were used in distinguishing coniferous and deciduous trees. A subset of 133 trees and 10 tree species was used in the tree species classification. The best classification results for the fused data were 95.8% for the separation of the coniferous and deciduous classes. The best overall tree species classification succeeded with 83.5% accuracy for the best tested fused data feature combination. The respective results for paired structural features derived from the laser point cloud were 90.5% for the separation of the coniferous and deciduous classes and 65.4% for the species classification. Classification accuracies with paired hyperspectral reflectance value data were 90.5% for the separation of coniferous and deciduous classes and 62.4% for different species. The results are among the first of their kind and they show that mobile collected fused data outperformed single-sensor data in both classification tests and by a significant margin.

  3. Level 2 Screening with the PDD Behavior Inventory: Subgroup Profiles and Implications for Differential Diagnosis

    ERIC Educational Resources Information Center

    Cohen, Ira L.; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N. S.; Romanczyk, Raymond G.; Karmel, Bernard Z.; Gardner, Judith M.

    2017-01-01

    The PDD Behavior Inventory (PDDBI) has recently been shown, in a large multisite study, to discriminate well between autism spectrum disorder (ASD) and other groups when its scores were examined using a machine learning tool, Classification and Regression Trees (CART). Discrimination was good for toddlers, preschoolers, and school-age children;…

  4. New Directions in Education Research: Using Data Mining Techniques to Explore Predictors of Grade Retention

    ERIC Educational Resources Information Center

    Kelley-Winstead, Deanna

    2010-01-01

    The purpose of this study was to use classification trees and logistic regression to identify subgroups of students more likely to be retained. The National Educational Longitudinal Study of 1988 (NELS:88) was used to identify the sociodemographic, family background and school related factors associated with grade retention. The sample size for…

  5. Knowledge and Community: The Effect of a First-Year Seminar on Student Persistence

    ERIC Educational Resources Information Center

    Pittendrigh, Adele; Borkowski, John; Swinford, Steven; Plumb, Carolyn

    2016-01-01

    This study explores the effects of an academic seminar on the persistence of first-year college students, including effects on students most at risk of dropping out. A secondary interest was demonstrating the utility of using classification and regression tree analysis to identify relevant predictors of student persistence. The results of the…

  6. Evaluation of open source data mining software packages

    Treesearch

    Bonnie Ruefenacht; Greg Liknes; Andrew J. Lister; Haans Fisk; Dan Wendt

    2009-01-01

    Since 2001, the USDA Forest Service (USFS) has used classification and regression-tree technology to map USFS Forest Inventory and Analysis (FIA) biomass, forest type, forest type groups, and National Forest vegetation. This prior work used Cubist/See5 software for the analyses. The objective of this project, sponsored by the Remote Sensing Steering Committee (RSSC),...

  7. Improving ensemble decision tree performance using Adaboost and Bagging

    NASA Astrophysics Data System (ADS)

    Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie

    2015-12-01

    Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.

  8. Developing a case mix classification for child and adolescent mental health services: the influence of presenting problems, complexity factors and service providers on number of appointments.

    PubMed

    Martin, Peter; Davies, Roger; Macdougall, Amy; Ritchie, Benjamin; Vostanis, Panos; Whale, Andy; Wolpert, Miranda

    2017-09-01

    Case-mix classification is a focus of international attention in considering how best to manage and fund services, by providing a basis for fairer comparison of resource utilization. Yet there is little evidence of the best ways to establish case mix for child and adolescent mental health services (CAMHS). To develop a case mix classification for CAMHS that is clinically meaningful and predictive of number of appointments attended and to investigate the influence of presenting problems, context and complexity factors and provider variation. We analysed 4573 completed episodes of outpatient care from 11 English CAMHS. Cluster analysis, regression trees and a conceptual classification based on clinical best practice guidelines were compared regarding their ability to predict number of appointments, using mixed effects negative binomial regression. The conceptual classification is clinically meaningful and did as well as data-driven classifications in accounting for number of appointments. There was little evidence for effects of complexity or context factors, with the possible exception of school attendance problems. Substantial variation in resource provision between providers was not explained well by case mix. The conceptually-derived classification merits further testing and development in the context of collaborative decision making.

  9. Classification and regression tree (CART) analysis of endometrial carcinoma: Seeing the forest for the trees.

    PubMed

    Barlin, Joyce N; Zhou, Qin; St Clair, Caryn M; Iasonos, Alexia; Soslow, Robert A; Alektiar, Kaled M; Hensley, Martee L; Leitao, Mario M; Barakat, Richard R; Abu-Rustum, Nadeem R

    2013-09-01

    The objectives of the study are to evaluate which clinicopathologic factors influenced overall survival (OS) in endometrial carcinoma and to determine if the surgical effort to assess para-aortic (PA) lymph nodes (LNs) at initial staging surgery impacts OS. All patients diagnosed with endometrial cancer from 1/1993-12/2011 who had LNs excised were included. PALN assessment was defined by the identification of one or more PALNs on final pathology. A multivariate analysis was performed to assess the effect of PALNs on OS. A form of recursive partitioning called classification and regression tree (CART) analysis was implemented. Variables included: age, stage, tumor subtype, grade, myometrial invasion, total LNs removed, evaluation of PALNs, and adjuvant chemotherapy. The cohort included 1920 patients, with a median age of 62 years. The median number of LNs removed was 16 (range, 1-99). The removal of PALNs was not associated with OS (P=0.450). Using the CART hierarchically, stage I vs. stages II-IV and grades 1-2 vs. grade 3 emerged as predictors of OS. If the tree was allowed to grow, further branching was based on age and myometrial invasion. Total number of LNs removed and assessment of PALNs as defined in this study were not predictive of OS. This innovative CART analysis emphasized the importance of proper stage assignment and a binary grading system in impacting OS. Notably, the total number of LNs removed and specific evaluation of PALNs as defined in this study were not important predictors of OS. Copyright © 2013 Elsevier Inc. All rights reserved.

  10. Comparative study of classification algorithms for damage classification in smart composite laminates

    NASA Astrophysics Data System (ADS)

    Khan, Asif; Ryoo, Chang-Kyung; Kim, Heung Soo

    2017-04-01

    This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to model delamination at different interfaces along the thickness and longitudinal directions of the smart composite laminate. The input-output data obtained through surface bonded piezoelectric sensor and actuator is analyzed by the system identification algorithm to get the system parameters. The identified parameters for the healthy and delaminated structure are supplied as input data to the classification algorithms. The classification algorithms considered in this study are ZeroR, Classification via regression, Naïve Bayes, Multilayer Perceptron, Sequential Minimal Optimization, Multiclass-Classifier, and Decision tree (J48). The open source software of Waikato Environment for Knowledge Analysis (WEKA) is used to evaluate the classification performance of the classifiers mentioned above via 75-25 holdout and leave-one-sample-out cross-validation regarding classification accuracy, precision, recall, kappa statistic and ROC Area.

  11. Identification of immune correlates of protection in Shigella infection by application of machine learning.

    PubMed

    Arevalillo, Jorge M; Sztein, Marcelo B; Kotloff, Karen L; Levine, Myron M; Simon, Jakub K

    2017-10-01

    Immunologic correlates of protection are important in vaccine development because they give insight into mechanisms of protection, assist in the identification of promising vaccine candidates, and serve as endpoints in bridging clinical vaccine studies. Our goal is the development of a methodology to identify immunologic correlates of protection using the Shigella challenge as a model. The proposed methodology utilizes the Random Forests (RF) machine learning algorithm as well as Classification and Regression Trees (CART) to detect immune markers that predict protection, identify interactions between variables, and define optimal cutoffs. Logistic regression modeling is applied to estimate the probability of protection and the confidence interval (CI) for such a probability is computed by bootstrapping the logistic regression models. The results demonstrate that the combination of Classification and Regression Trees and Random Forests complements the standard logistic regression and uncovers subtle immune interactions. Specific levels of immunoglobulin IgG antibody in blood on the day of challenge predicted protection in 75% (95% CI 67-86). Of those subjects that did not have blood IgG at or above a defined threshold, 100% were protected if they had IgA antibody secreting cells above a defined threshold. Comparison with the results obtained by applying only logistic regression modeling with standard Akaike Information Criterion for model selection shows the usefulness of the proposed method. Given the complexity of the immune system, the use of machine learning methods may enhance traditional statistical approaches. When applied together, they offer a novel way to quantify important immune correlates of protection that may help the development of vaccines. Copyright © 2017 Elsevier Inc. All rights reserved.

  12. Can SLE classification rules be effectively applied to diagnose unclear SLE cases?

    PubMed Central

    Mesa, Annia; Fernandez, Mitch; Wu, Wensong; Narasimhan, Giri; Greidinger, Eric L.; Mills, DeEtta K.

    2016-01-01

    Summary Objective Develop a novel classification criteria to distinguish between unclear SLE and MCTD cases. Methods A total of 205 variables from 111 SLE and 55 MCTD patients were evaluated to uncover unique molecular and clinical markers for each disease. Binomial logistic regressions (BLR) were performed on currently used SLE and MCTD classification criteria sets to obtain six reduced models with power to discriminate between unclear SLE and MCTD patients which were confirmed by Receiving Operating Characteristic (ROC) curve. Decision trees were employed to delineate novel classification rules to discriminate between unclear SLE and MCTD patients. Results SLE and MCTD patients exhibited contrasting molecular markers and clinical manifestations. Furthermore, reduced models highlighted SLE patients exhibit prevalence of skin rashes and renal disease while MCTD cases show dominance of myositis and muscle weakness. Additionally decision trees analyses revealed a novel classification rule tailored to differentiate unclear SLE and MCTD patients (Lu-vs-M) with an overall accuracy of 88%. Conclusions Validation of our novel proposed classification rule (Lu-vs-M) includes novel contrasting characteristics (calcinosis, CPK elevated and anti-IgM reactivity for U1-70K, U1A and U1C) between SLE and MCTD patients and showed a 33% improvement in distinguishing these disorders when compare to currently used classification criteria sets. Pending additional validation, our novel classification rule is a promising method to distinguish between patients with unclear SLE and MCTD diagnosis. PMID:27353506

  13. Spectral analysis of white ash response to emerald ash borer infestations

    NASA Astrophysics Data System (ADS)

    Calandra, Laura

    The emerald ash borer (EAB) (Agrilus planipennis Fairmaire) is an invasive insect that has killed over 50 million ash trees in the US. The goal of this research was to establish a method to identify ash trees infested with EAB using remote sensing techniques at the leaf-level and tree crown level. First, a field-based study at the leaf-level used the range of spectral bands from the WorldView-2 sensor to determine if there was a significant difference between EAB-infested white ash (Fraxinus americana) and healthy leaves. Binary logistic regression models were developed using individual and combinations of wavelengths; the most successful model included 545 and 950 nm bands. The second half of this research employed imagery to identify healthy and EAB-infested trees, comparing pixel- and object-based methods by applying an unsupervised classification approach and a tree crown delineation algorithm, respectively. The pixel-based models attained the highest overall accuracies.

  14. Differential Diagnosis of Erythmato-Squamous Diseases Using Classification and Regression Tree

    PubMed Central

    Maghooli, Keivan; Langarizadeh, Mostafa; Shahmoradi, Leila; Habibi-koolaee, Mahdi; Jebraeily, Mohamad; Bouraghi, Hamid

    2016-01-01

    Introduction: Differential diagnosis of Erythmato-Squamous Diseases (ESD) is a major challenge in the field of dermatology. The ESD diseases are placed into six different classes. Data mining is the process for detection of hidden patterns. In the case of ESD, data mining help us to predict the diseases. Different algorithms were developed for this purpose. Objective: we aimed to use the Classification and Regression Tree (CART) to predict differential diagnosis of ESD. Methods: we used the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. For this purpose, the dermatology data set from machine learning repository, UCI was obtained. The Clementine 12.0 software from IBM Company was used for modelling. In order to evaluation of the model we calculate the accuracy, sensitivity and specificity of the model. Results: The proposed model had an accuracy of 94.84% ( Standard Deviation: 24.42) in order to correct prediction of the ESD disease. Conclusions: Results indicated that using of this classifier could be useful. But, it would be strongly recommended that the combination of machine learning methods could be more useful in terms of prediction of ESD. PMID:28077889

  15. Fault detection and diagnosis of induction motors using motor current signature analysis and a hybrid FMM-CART model.

    PubMed

    Seera, Manjeevan; Lim, Chee Peng; Ishak, Dahaman; Singh, Harapajan

    2012-01-01

    In this paper, a novel approach to detect and classify comprehensive fault conditions of induction motors using a hybrid fuzzy min-max (FMM) neural network and classification and regression tree (CART) is proposed. The hybrid model, known as FMM-CART, exploits the advantages of both FMM and CART for undertaking data classification and rule extraction problems. A series of real experiments is conducted, whereby the motor current signature analysis method is applied to form a database comprising stator current signatures under different motor conditions. The signal harmonics from the power spectral density are extracted as discriminative input features for fault detection and classification with FMM-CART. A comprehensive list of induction motor fault conditions, viz., broken rotor bars, unbalanced voltages, stator winding faults, and eccentricity problems, has been successfully classified using FMM-CART with good accuracy rates. The results are comparable, if not better, than those reported in the literature. Useful explanatory rules in the form of a decision tree are also elicited from FMM-CART to analyze and understand different fault conditions of induction motors.

  16. Seasonal trends in separability of leaf reflectance spectra for Ailanthus altissima and four other tree species

    NASA Astrophysics Data System (ADS)

    Burkholder, Aaron

    This project investigated the spectral separability of the invasive species Ailanthus altissima, commonly called tree of heaven, and four other native species. Leaves were collected from Ailanthus and four native tree species from May 13 through August 24, 2008, and spectral reflectance factor measurements were gathered for each tree using an ASD (Boulder, Colorado) FieldSpec Pro full-range spectroradiometer. The original data covered the range from 350-2500 nm, with one reflectance measurement collected per one nm wavelength. To reduce dimensionality, the measurements were resampled to the actual resolution of the spectrometer's sensors, and regions of atmospheric absorption were removed. Continuum removal was performed on the reflectance data, resulting in a second dataset. For both the reflectance and continuum removed datasets, least angle regression (LARS) and random forest classification were used to identify a single set of optimal wavelengths across all sampled dates, a set of optimal wavelengths for each date, and the dates for which Ailanthus is most separable from other species. It was found that classification accuracy varies both with dates and bands used. Contrary to expectations that early spring would provide the best separability, the lowest classification error was observed on July 22 for the reflectance data, and on May 13, July 11 and August 1 for the continuum removed data. This suggests that July and August are also potentially good months for species differentiation. Applying continuum removal in many cases reduced classification error, although not consistently. Band selection seems to be more important for reflectance data in that it results in greater improvement in classification accuracy, and LARS appears to be an effective band selection tool. The optimal spectral bands were selected from across the spectrum, often with bands from the blue (401-431 nm), NIR (1115 nm) and SWIR (1985-1995 nm), suggesting that hyperspectral sensors with broad wavelength sensitivity are important for mapping and identification of Ailanthus.

  17. Geospatial analysis of lake and landscape interactions within the Toolik Lake region, North Slope of Alaska

    NASA Astrophysics Data System (ADS)

    Pathak, Prasad A.

    The Arctic region of Alaska is experiencing severe impacts of climate change. The Arctic lakes ecosystems are bound to undergo alterations in its trophic structure and other chemical properties. However, landscape factors controlling the lake influxes were not studied till date. This research has examined the currently existing lake landscape interactions using Remote Sensing and GIS technology. The statistical modeling was carried out using Regression and CART methods. Remote sensing data was applied to derive the required landscape indices. Remote sensing in the Arctic Alaska faces many challenges including persistent cloud cover, low sun angle and limited snow free period. Tundra vegetation types are interspersed and intricate to classify unlike managed forest stands. Therefore, historical studies have remained underachieved with respect thematic accuracies. However, looking at vegetation communities at watershed level and the implementation of expert classification system achieved the accuracies up to 90%. The research has highlighted the probable role of interactions between vegetation root zones, nutrient availability within active zone, as well as importance of permafrost thawing. Multiple regression analyses and Classification Trees were developed to understand relationships between landscape factors with various chemical parameters as well as chlorophyll readings. Spatial properties of Shrubs and Riparian complexes such as complexity of individual patches at watershed level and within proximity of water channels were influential on Chlorophyll production of lakes. Till-age had significant impact on Total Nitrogen contents. Moreover, relatively young tills exhibited significantly positive correlation with concentration of various ions and conductivity of lakes. Similarly, density of patches of Heath complexes was found to be important with respect to Total Phosphorus contents in lakes. All the regression models developed in this study were significant at 95% confidence level. However, the classification trees could not achieve high predictabilities due to limited number of lakes sampled. Keywords: Landscape factors, Lake primary productivity, Arctic, Climate change, Regression, CART

  18. Mapping and detecting bark beetle-caused tree mortality in the western United States

    NASA Astrophysics Data System (ADS)

    Meddens, Arjan J. H.

    Recently, insect outbreaks across North America have dramatically increased and the forest area affected by bark beetles is similar to that affected by fire. Remote sensing offers the potential to detect insect outbreaks with high accuracy. Chapter one involved detection of insect-caused tree mortality on the tree level for a 90km2 area in northcentral Colorado. Classes of interest included green trees, multiple stages of post-insect attack tree mortality including dead trees with red needles ("red-attack") and dead trees without needles ("gray-attack"), and non-forest. The results illustrated that classification of an image with a spatial resolution similar to the area of a tree crown outperformed that from finer and coarser resolution imagery for mapping tree mortality and non-forest classes. I also demonstrated that multispectral imagery could be used to separate multiple postoutbreak attack stages (i.e., red-attack and gray-attack) from other classes in the image. In Chapter 2, I compared and improved methods for detecting bark beetle-caused tree mortality using medium-resolution satellite data. I found that overall classification accuracy was similar between single-date and multi-date classification methods. I developed regression models to predict percent red attack within a 30-m grid cell and these models explained >75% of the variance using three Landsat spectral explanatory variables. Results of the final product showed that approximately 24% of the forest within the Landsat scene was comprised of tree mortality caused by bark beetles. In Chapter 3, I developed a gridded data set with 1-km2 resolution using aerial survey data and improved estimates of tree mortality across the western US and British Columbia. In the US, I also produced an upper estimate by forcing the mortality area to match that from high-resolution imagery in Idaho, Colorado, and New Mexico. Cumulative mortality area from all bark beetles was 5.46 Mha in British Columbia in 2001-2010 and 0.47-5.37 Mha (lower and upper estimate) in the western conterminous US during 1997-2010. Improved methods for detection and mapping of insect outbreak areas will lead to improved assessments of the effects of these forest disturbances on the economy, carbon cycle (and feedback to climate change), fuel loads, hydrology and forest ecology.

  19. A new approach to enhance the performance of decision tree for classifying gene expression data.

    PubMed

    Hassan, Md; Kotagiri, Ramamohanarao

    2013-12-20

    Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.

  20. Modeling ready biodegradability of fragrance materials.

    PubMed

    Ceriani, Lidia; Papa, Ester; Kovarich, Simona; Boethling, Robert; Gramatica, Paola

    2015-06-01

    In the present study, quantitative structure activity relationships were developed for predicting ready biodegradability of approximately 200 heterogeneous fragrance materials. Two classification methods, classification and regression tree (CART) and k-nearest neighbors (kNN), were applied to perform the modeling. The models were validated with multiple external prediction sets, and the structural applicability domain was verified by the leverage approach. The best models had good sensitivity (internal ≥80%; external ≥68%), specificity (internal ≥80%; external 73%), and overall accuracy (≥75%). Results from the comparison with BIOWIN global models, based on group contribution method, show that specific models developed in the present study perform better in prediction than BIOWIN6, in particular for the correct classification of not readily biodegradable fragrance materials. © 2015 SETAC.

  1. A new tree classification system for southern hardwoods

    Treesearch

    James S. Meadows; Daniel A. Jr. Skojac

    2008-01-01

    A new tree classification system for southern hardwoods is described. The new system is based on the Putnam tree classification system, originally developed by Putnam et al., 1960, Management ond inventory of southern hardwoods, Agriculture Handbook 181, US For. Sew., Washington, DC, which consists of four tree classes: (1) preferred growing stock, (2) reserve growing...

  2. Using decision tree analysis to identify risk factors for relapse to smoking

    PubMed Central

    Piper, Megan E.; Loh, Wei-Yin; Smith, Stevens S.; Japuntich, Sandra J.; Baker, Timothy B.

    2010-01-01

    This research used classification tree analysis and logistic regression models to identify risk factors related to short- and long-term abstinence. Baseline and cessation outcome data from two smoking cessation trials, conducted from 2001 to 2002, in two Midwestern urban areas, were analyzed. There were 928 participants (53.1% women, 81.8% white) with complete data. Both analyses suggest that relapse risk is produced by interactions of risk factors and that early and late cessation outcomes reflect different vulnerability factors. The results illustrate the dynamic nature of relapse risk and suggest the importance of efficient modeling of interactions in relapse prediction. PMID:20397871

  3. Subpopulations of Older Foster Youths With Differential Risk of Diagnosis for Alcohol Abuse or Dependence*

    PubMed Central

    Keller, Thomas E.; Blakeslee, Jennifer E.; Lemon, Stephenie C.; Courtney, Mark E.

    2010-01-01

    Objective: Distinctive combinations of factors are likely to be associated with serious alcohol problems among adolescents about to emancipate from the foster care system and face the difficult transition to independent adulthood. This study identifies particular subpopulations of older foster youths that differ markedly in the probability of a lifetime diagnosis for alcohol abuse or dependence. Method: Classification and regression tree (CART) analysis was applied to a large, representative sample (N = 732) of individuals, 17 years of age or older, placed in the child welfare system for more than 1 year. CART evaluated two exploratory sets of variables for optimal splits into groups distinguished from each other on the criterion of lifetime alcohol-use disorder diagnosis. Results: Each classification tree yielded four terminal groups with different rates of lifetime alcohol-use disorder diagnosis. Notable groups in the first tree included one characterized by high levels of both delinquency and violence exposure (53% diagnosed) and another that featured lower delinquency but an independent-living placement (21% diagnosed). Notable groups in the second tree included African American adolescents (only 8% diagnosed), White adolescents not close to caregivers (40% diagnosed), and White adolescents closer to caregivers but with a history of psychological abuse (36% diagnosed). Conclusions: Analyses incorporating variables that could be comorbid with or symptomatic of alcohol problems, such as delinquency, yielded classifications potentially useful for assessment and service planning. Analyses without such variables identified other factors, such as quality of caregiving relationships and maltreatment, associated with serious alcohol problems, suggesting opportunities for prevention or intervention. PMID:20946738

  4. Ecological Factors of Being Bullied Among Adolescents: a Classification and Regression Tree Approach

    PubMed Central

    Moon, Sung Seek; Kim, Heeyoung; Seay, Kristen; Small, Eusebius; Kim, Youn Kyoung

    2015-01-01

    Being bullied is a well-recognized trauma for adolescents. Bullying can best be understood through an ecological framework since bullying or being bullied involves risk factors at multiple contextual levels. The purpose of the study was to identify the risk and protective factors that best differentiate groups along with the outcome variable of interest (being bullied) using Classification and Regression Tree (CART) analysis. The study used the Health Behavior in School-Aged Children (HBSC) data collected from a nationally representative sample of students in grades six through ten during the 2005–2006 school years. This study identified that for adolescents 12 and younger, lower parental support is a critical risk factor associated with bullying and among those 13 to 14 with lower parent support, adolescent with higher academic pressure reported experiencing more bullying. For the older group of adolescents (aged 15 and older), school related factors were identified to increase the risk level of being bullied. There was a critical age (15 years old) for implementing victimization interventions to reduce the damage from being bullied. Service providers working with adolescents aged 14 and less should focus more on family-oriented intervention and those working with adolescents aged 15 and more should offer peer- or school-related interventions. PMID:27617043

  5. City housing atmospheric pollutant impact on emergency visit for asthma: A classification and regression tree approach.

    PubMed

    Mazenq, Julie; Dubus, Jean-Christophe; Gaudart, Jean; Charpin, Denis; Viudes, Gilles; Noel, Guilhem

    2017-11-01

    Particulate matter, nitrogen dioxide (NO 2 ) and ozone are recognized as the three pollutants that most significantly affect human health. Asthma is a multifactorial disease. However, the place of residence has rarely been investigated. We compared the impact of air pollution, measured near patients' homes, on emergency department (ED) visits for asthma or trauma (controls) within the Provence-Alpes-Côte-d'Azur region. Variables were selected using classification and regression trees on asthmatic and control population, 3-99 years, visiting ED from January 1 to December 31, 2013. Then in a nested case control study, randomization was based on the day of ED visit and on defined age groups. Pollution, meteorological, pollens and viral data measured that day were linked to the patient's ZIP code. A total of 794,884 visits were reported including 6250 for asthma and 278,192 for trauma. Factors associated with an excess risk of emergency visit for asthma included short-term exposure to NO 2 , female gender, high viral load and a combination of low temperature and high humidity. Short-term exposures to high NO 2 concentrations, as assessed close to the homes of the patients, were significantly associated with asthma-related ED visits in children and adults. Copyright © 2017 Elsevier Ltd. All rights reserved.

  6. Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree

    NASA Astrophysics Data System (ADS)

    Ru, Feng; Yin, Aijing; Jin, Jiaxin; Zhang, Xiuying; Yang, Xiaohui; Zhang, Ming; Gao, Chao

    2016-08-01

    Reclamation of coastal land is one of the most common ways to obtain land resources in China. However, it has long been acknowledged that the artificial interference with coastal land has disadvantageous effects, such as heavy metal contamination. This study aimed to develop a prediction model for cadmium enrichment levels and assess the importance of affecting factors in typical reclaimed land in Eastern China (DFCL: Dafeng Coastal Land). Two hundred and twenty seven surficial soil/sediment samples were collected and analyzed to identify the enrichment levels of cadmium and the possible affecting factors in soils and sediments. The classification and regression tree (CART) model was applied in this study to predict cadmium enrichment levels. The prediction results showed that cadmium enrichment levels assessed by the CART model had an accuracy of 78.0%. The CART model could extract more information on factors affecting the environmental behavior of cadmium than correlation analysis. The integration of correlation analysis and the CART model showed that fertilizer application and organic carbon accumulation were the most important factors affecting soil/sediment cadmium enrichment levels, followed by particle size effects (Al2O3, TFe2O3 and SiO2), contents of Cl and S, surrounding construction areas and reclamation history.

  7. Habitat selection models for Pacific sand lance (Ammodytes hexapterus) in Prince William Sound, Alaska

    USGS Publications Warehouse

    Ostrand, William D.; Gotthardt, Tracey A.; Howlin, Shay; Robards, Martin D.

    2005-01-01

    We modeled habitat selection by Pacific sand lance (Ammodytes hexapterus) by examining their distribution in relation to water depth, distance to shore, bottom slope, bottom type, distance from sand bottom, and shoreline type. Through both logistic regression and classification tree models, we compared the characteristics of 29 known sand lance locations to 58 randomly selected sites. The best models indicated a strong selection of shallow water by sand lance, with weaker association between sand lance distribution and beach shorelines, sand bottoms, distance to shore, bottom slope, and distance to the nearest sand bottom. We applied an information-theoretic approach to the interpretation of the logistic regression analysis and determined importance values of 0.99, 0.54, 0.52, 0.44, 0.39, and 0.25 for depth, beach shorelines, sand bottom, distance to shore, gradual bottom slope, and distance to the nearest sand bottom, respectively. The classification tree model indicated that sand lance selected shallow-water habitats and remained near sand bottoms when located in habitats with depths between 40 and 60 m. All sand lance locations were at depths <60 m and 93% occurred at depths <40 m. Probable reasons for the modeled relationships between the distribution of sand lance and the independent variables are discussed.

  8. The decision tree approach to classification

    NASA Technical Reports Server (NTRS)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  9. Determination of the chemical parameters and manufacturer of divins from their broadband transmission spectra

    NASA Astrophysics Data System (ADS)

    Khodasevich, M. A.; Sinitsyn, G. V.; Skorbanova, E. A.; Rogovaya, M. V.; Kambur, E. I.; Aseev, V. A.

    2016-06-01

    Analysis of multiparametric data on transmission spectra of 24 divins (Moldovan cognacs) in the 190-2600 nm range allows identification of outliers and their removal from a sample under study in the following consideration. The principal component analysis and classification tree with a single-rank predictor constructed in the 2D space of principal components allow classification of divin manufacturers. It is shown that the accuracy of syringaldehyde, ethyl acetate, vanillin, and gallic acid concentrations in divins calculated with the regression to latent structures depends on the sample volume and is 3, 6, 16, and 20%, respectively, which is acceptable for the application.

  10. Weighing risk factors associated with bee colony collapse disorder by classification and regression tree analysis.

    PubMed

    VanEngelsdorp, Dennis; Speybroeck, Niko; Evans, Jay D; Nguyen, Bach Kim; Mullin, Chris; Frazier, Maryann; Frazier, Jim; Cox-Foster, Diana; Chen, Yanping; Tarpy, David R; Haubruge, Eric; Pettis, Jeffrey S; Saegerman, Claude

    2010-10-01

    Colony collapse disorder (CCD), a syndrome whose defining trait is the rapid loss of adult worker honey bees, Apis mellifera L., is thought to be responsible for a minority of the large overwintering losses experienced by U.S. beekeepers since the winter 2006-2007. Using the same data set developed to perform a monofactorial analysis (PloS ONE 4: e6481, 2009), we conducted a classification and regression tree (CART) analysis in an attempt to better understand the relative importance and interrelations among different risk variables in explaining CCD. Fifty-five exploratory variables were used to construct two CART models: one model with and one model without a cost of misclassifying a CCD-diagnosed colony as a non-CCD colony. The resulting model tree that permitted for misclassification had a sensitivity and specificity of 85 and 74%, respectively. Although factors measuring colony stress (e.g., adult bee physiological measures, such as fluctuating asymmetry or mass of head) were important discriminating values, six of the 19 variables having the greatest discriminatory value were pesticide levels in different hive matrices. Notably, coumaphos levels in brood (a miticide commonly used by beekeepers) had the highest discriminatory value and were highest in control (healthy) colonies. Our CART analysis provides evidence that CCD is probably the result of several factors acting in concert, making afflicted colonies more susceptible to disease. This analysis highlights several areas that warrant further attention, including the effect of sublethal pesticide exposure on pathogen prevalence and the role of variability in bee tolerance to pesticides on colony survivorship.

  11. Support-vector-machine tree-based domain knowledge learning toward automated sports video classification

    NASA Astrophysics Data System (ADS)

    Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin

    2010-12-01

    We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.

  12. Identification of patients with gout: elaboration of a questionnaire for epidemiological studies.

    PubMed

    Richette, P; Clerson, P; Bouée, S; Chalès, G; Doherty, M; Flipo, R M; Lambert, C; Lioté, F; Poiraud, T; Schaeverbeke, T; Bardin, T

    2015-09-01

    In France, the prevalence of gout is currently unknown. We aimed to design a questionnaire to detect gout that would be suitable for use in a telephone survey by non-physicians and assessed its performance. We designed a 62-item questionnaire covering comorbidities, clinical features and treatment of gout. In a case-control study, we enrolled patients with a history of arthritis who had undergone arthrocentesis for synovial fluid analysis and crystal detection. Cases were patients with crystal-proven gout and controls were patients who had arthritis and effusion with no monosodium urate crystals in synovial fluid. The questionnaire was administered by phone to cases and controls by non-physicians who were unaware of the patient diagnosis. Logistic regression analysis and classification and regression trees were used to select items discriminating cases and controls. We interviewed 246 patients (102 cases and 142 controls). Two logistic regression models (sensitivity 88.0% and 87.5%; specificity 93.0% and 89.8%, respectively) and one classification and regression tree model (sensitivity 81.4%, specificity 93.7%) revealed 11 informative items that allowed for classifying 90.0%, 88.8% and 88.5% of patients, respectively. We developed a questionnaire to detect gout containing 11 items that is fast and suitable for use in a telephone survey by non-physicians. The questionnaire demonstrated good properties for discriminating patients with and without gout. It will be administered in a large sample of the general population to estimate the prevalence of gout in France. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  13. Guide to the measurement of tree characteristics important to the quality classification for young hardwood trees

    Treesearch

    David L. Sonderman

    1979-01-01

    A procedure is shown for measuring external tree characteristics that are important in determining the current and future quality of young hardwood trees. This guide supplements a precious study which describes the quality classification system for young hardwood trees

  14. Mathematical models application for mapping soils spatial distribution on the example of the farm from the North of Udmurt Republic of Russia

    NASA Astrophysics Data System (ADS)

    Dokuchaev, P. M.; Meshalkina, J. L.; Yaroslavtsev, A. M.

    2018-01-01

    Comparative analysis of soils geospatial modeling using multinomial logistic regression, decision trees, random forest, regression trees and support vector machines algorithms was conducted. The visual interpretation of the digital maps obtained and their comparison with the existing map, as well as the quantitative assessment of the individual soil groups detection overall accuracy and of the models kappa showed that multiple logistic regression, support vector method, and random forest models application with spatial prediction of the conditional soil groups distribution can be reliably used for mapping of the study area. It has shown the most accurate detection for sod-podzolics soils (Phaeozems Albic) lightly eroded and moderately eroded soils. In second place, according to the mean overall accuracy of the prediction, there are sod-podzolics soils - non-eroded and warp one, as well as sod-gley soils (Umbrisols Gleyic) and alluvial soils (Fluvisols Dystric, Umbric). Heavy eroded sod-podzolics and gray forest soils (Phaeozems Albic) were detected by methods of automatic classification worst of all.

  15. Automated Decision Tree Classification of Corneal Shape

    PubMed Central

    Twa, Michael D.; Parthasarathy, Srinivasan; Roberts, Cynthia; Mahmoud, Ashraf M.; Raasch, Thomas W.; Bullimore, Mark A.

    2011-01-01

    Purpose The volume and complexity of data produced during videokeratography examinations present a challenge of interpretation. As a consequence, results are often analyzed qualitatively by subjective pattern recognition or reduced to comparisons of summary indices. We describe the application of decision tree induction, an automated machine learning classification method, to discriminate between normal and keratoconic corneal shapes in an objective and quantitative way. We then compared this method with other known classification methods. Methods The corneal surface was modeled with a seventh-order Zernike polynomial for 132 normal eyes of 92 subjects and 112 eyes of 71 subjects diagnosed with keratoconus. A decision tree classifier was induced using the C4.5 algorithm, and its classification performance was compared with the modified Rabinowitz–McDonnell index, Schwiegerling’s Z3 index (Z3), Keratoconus Prediction Index (KPI), KISA%, and Cone Location and Magnitude Index using recommended classification thresholds for each method. We also evaluated the area under the receiver operator characteristic (ROC) curve for each classification method. Results Our decision tree classifier performed equal to or better than the other classifiers tested: accuracy was 92% and the area under the ROC curve was 0.97. Our decision tree classifier reduced the information needed to distinguish between normal and keratoconus eyes using four of 36 Zernike polynomial coefficients. The four surface features selected as classification attributes by the decision tree method were inferior elevation, greater sagittal depth, oblique toricity, and trefoil. Conclusions Automated decision tree classification of corneal shape through Zernike polynomials is an accurate quantitative method of classification that is interpretable and can be generated from any instrument platform capable of raw elevation data output. This method of pattern classification is extendable to other classification problems. PMID:16357645

  16. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping.

    PubMed

    Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah

    2018-07-01

    In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.

  17. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    PubMed Central

    2011-01-01

    Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043

  18. Activity classification using realistic data from wearable sensors.

    PubMed

    Pärkkä, Juha; Ermes, Miikka; Korpipää, Panu; Mäntyjärvi, Jani; Peltola, Johannes; Korhonen, Ilkka

    2006-01-01

    Automatic classification of everyday activities can be used for promotion of health-enhancing physical activities and a healthier lifestyle. In this paper, methods used for classification of everyday activities like walking, running, and cycling are described. The aim of the study was to find out how to recognize activities, which sensors are useful and what kind of signal processing and classification is required. A large and realistic data library of sensor data was collected. Sixteen test persons took part in the data collection, resulting in approximately 31 h of annotated, 35-channel data recorded in an everyday environment. The test persons carried a set of wearable sensors while performing several activities during the 2-h measurement session. Classification results of three classifiers are shown: custom decision tree, automatically generated decision tree, and artificial neural network. The classification accuracies using leave-one-subject-out cross validation range from 58 to 97% for custom decision tree classifier, from 56 to 97% for automatically generated decision tree, and from 22 to 96% for artificial neural network. Total classification accuracy is 82 % for custom decision tree classifier, 86% for automatically generated decision tree, and 82% for artificial neural network.

  19. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units.

    PubMed

    Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien

    2015-09-01

    According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as well as more detailed geological information. Copyright © 2015 Elsevier Ltd. All rights reserved.

  20. Predicting outcome on admission and post-admission for acetaminophen-induced acute liver failure using classification and regression tree models.

    PubMed

    Speiser, Jaime Lynn; Lee, William M; Karvellas, Constantine J

    2015-01-01

    Assessing prognosis for acetaminophen-induced acute liver failure (APAP-ALF) patients often presents significant challenges. King's College (KCC) has been validated on hospital admission, but little has been published on later phases of illness. We aimed to improve determinations of prognosis both at the time of and following admission for APAP-ALF using Classification and Regression Tree (CART) models. CART models were applied to US ALFSG registry data to predict 21-day death or liver transplant early (on admission) and post-admission (days 3-7) for 803 APAP-ALF patients enrolled 01/1998-09/2013. Accuracy in prediction of outcome (AC), sensitivity (SN), specificity (SP), and area under receiver-operating curve (AUROC) were compared between 3 models: KCC (INR, creatinine, coma grade, pH), CART analysis using only KCC variables (KCC-CART) and a CART model using new variables (NEW-CART). Traditional KCC yielded 69% AC, 90% SP, 27% SN, and 0.58 AUROC on admission, with similar performance post-admission. KCC-CART at admission offered predictive 66% AC, 65% SP, 67% SN, and 0.74 AUROC. Post-admission, KCC-CART had predictive 82% AC, 86% SP, 46% SN and 0.81 AUROC. NEW-CART models using MELD (Model for end stage liver disease), lactate and mechanical ventilation on admission yielded predictive 72% AC, 71% SP, 77% SN and AUROC 0.79. For later stages, NEW-CART (MELD, lactate, coma grade) offered predictive AC 86%, SP 91%, SN 46%, AUROC 0.73. CARTs offer simple prognostic models for APAP-ALF patients, which have higher AUROC and SN than KCC, with similar AC and negligibly worse SP. Admission and post-admission predictions were developed. • Prognostication in acetaminophen-induced acute liver failure (APAP-ALF) is challenging beyond admission • Little has been published regarding the use of King's College Criteria (KCC) beyond admission and KCC has shown limited sensitivity in subsequent studies • Classification and Regression Tree (CART) methodology allows the development of predictive models using binary splits and offers an intuitive method for predicting outcome, using processes familiar to clinicians • Data from the ALFSG registry suggested that CART prognosis models for the APAP population offer improved sensitivity and model performance over traditional regression-based KCC, while maintaining similar accuracy and negligibly worse specificity • KCC-CART models offered modest improvement over traditional KCC, with NEW-CART models performing better than KCC-CART particularly at late time points.

  1. A novel dendrochronological approach reveals drivers of carbon sequestration in tree species of riparian forests across spatiotemporal scales.

    PubMed

    Rieger, Isaak; Kowarik, Ingo; Cherubini, Paolo; Cierjacks, Arne

    2017-01-01

    Aboveground carbon (C) sequestration in trees is important in global C dynamics, but reliable techniques for its modeling in highly productive and heterogeneous ecosystems are limited. We applied an extended dendrochronological approach to disentangle the functioning of drivers from the atmosphere (temperature, precipitation), the lithosphere (sedimentation rate), the hydrosphere (groundwater table, river water level fluctuation), the biosphere (tree characteristics), and the anthroposphere (dike construction). Carbon sequestration in aboveground biomass of riparian Quercus robur L. and Fraxinus excelsior L. was modeled (1) over time using boosted regression tree analysis (BRT) on cross-datable trees characterized by equal annual growth ring patterns and (2) across space using a subsequent classification and regression tree analysis (CART) on cross-datable and not cross-datable trees. While C sequestration of cross-datable Q. robur responded to precipitation and temperature, cross-datable F. excelsior also responded to a low Danube river water level. However, CART revealed that C sequestration over time is governed by tree height and parameters that vary over space (magnitude of fluctuation in the groundwater table, vertical distance to mean river water level, and longitudinal distance to upstream end of the study area). Thus, a uniform response to climatic drivers of aboveground C sequestration in Q. robur was only detectable in trees of an intermediate height class and in taller trees (>21.8m) on sites where the groundwater table fluctuated little (≤0.9m). The detection of climatic drivers and the river water level in F. excelsior depended on sites at lower altitudes above the mean river water level (≤2.7m) and along a less dynamic downstream section of the study area. Our approach indicates unexploited opportunities of understanding the interplay of different environmental drivers in aboveground C sequestration. Results may support species-specific and locally adapted forest management plans to increase carbon dioxide sequestration from the atmosphere in trees. Copyright © 2016 Elsevier B.V. All rights reserved.

  2. Learning accurate very fast decision trees from uncertain data streams

    NASA Astrophysics Data System (ADS)

    Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo

    2015-12-01

    Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.

  3. Automated system for characterization and classification of malaria-infected stages using light microscopic images of thin blood smears.

    PubMed

    Das, D K; Maiti, A K; Chakraborty, C

    2015-03-01

    In this paper, we propose a comprehensive image characterization cum classification framework for malaria-infected stage detection using microscopic images of thin blood smears. The methodology mainly includes microscopic imaging of Leishman stained blood slides, noise reduction and illumination correction, erythrocyte segmentation, feature selection followed by machine classification. Amongst three-image segmentation algorithms (namely, rule-based, Chan-Vese-based and marker-controlled watershed methods), marker-controlled watershed technique provides better boundary detection of erythrocytes specially in overlapping situations. Microscopic features at intensity, texture and morphology levels are extracted to discriminate infected and noninfected erythrocytes. In order to achieve subgroup of potential features, feature selection techniques, namely, F-statistic and information gain criteria are considered here for ranking. Finally, five different classifiers, namely, Naive Bayes, multilayer perceptron neural network, logistic regression, classification and regression tree (CART), RBF neural network have been trained and tested by 888 erythrocytes (infected and noninfected) for each features' subset. Performance evaluation of the proposed methodology shows that multilayer perceptron network provides higher accuracy for malaria-infected erythrocytes recognition and infected stage classification. Results show that top 90 features ranked by F-statistic (specificity: 98.64%, sensitivity: 100%, PPV: 99.73% and overall accuracy: 96.84%) and top 60 features ranked by information gain provides better results (specificity: 97.29%, sensitivity: 100%, PPV: 99.46% and overall accuracy: 96.73%) for malaria-infected stage classification. © 2014 The Authors Journal of Microscopy © 2014 Royal Microscopical Society.

  4. Sentinel node status prediction by four statistical models: results from a large bi-institutional series (n = 1132).

    PubMed

    Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R

    2009-12-01

    To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.

  5. Toward extending terrestrial laser scanning applications in forestry: a case study of broad- and needle-leaf tree classification

    NASA Astrophysics Data System (ADS)

    Lin, Yi; Jiang, Miao

    2017-01-01

    Tree species information is essential for forest research and management purposes, which in turn require approaches for accurate and precise classification of tree species. One such remote sensing technology, terrestrial laser scanning (TLS), has proved to be capable of characterizing detailed tree structures, such as tree stem geometry. Can TLS further differentiate between broad- and needle-leaves? If the answer is positive, TLS data can be used for classification of taxonomic tree groups by directly examining their differences in leaf morphology. An analysis was proposed to assess TLS-represented broad- and needle-leaf structures, followed by a Bayes classifier to perform the classification. Tests indicated that the proposed method can basically implement the task, with an overall accuracy of 77.78%. This study indicates a way of implementing the classification of the two major broad- and needle-leaf taxonomies measured by TLS in accordance to their literal definitions, and manifests the potential of extending TLS applications in forestry.

  6. Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

    NASA Technical Reports Server (NTRS)

    Stolzer, Alan J.; Halford, Carl

    2007-01-01

    In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.

  7. Voice based gender classification using machine learning

    NASA Astrophysics Data System (ADS)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  8. Blood oxygen level dependent magnetic resonance imaging for detecting pathological patterns in lupus nephritis patients: a preliminary study using a decision tree model.

    PubMed

    Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng

    2018-02-09

    Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P < 0.001) and inferior to that of the Logistic regression model (71.87% vs 78.71%, P < 0.001). The specificity of decision tree model was equivalent to that of the line discriminant model (63.87% vs 63.73%, P = 0.939) and higher than that of the logistic regression model (63.87% vs 38.0%, P < 0.001). The Area under the ROC curve (AUROCC) of the decision tree model was greater than that of the line discriminant model (0.765 vs 0.629, P < 0.001) and logistic regression model (0.765 vs 0.662, P < 0.001). BOLD MRI is a useful non-invasive imaging technique for the evaluation of lupus nephritis. Decision tree models constructed using functions of R2* values may facilitate the prediction of renal pathological patterns.

  9. The application of data mining techniques to oral cancer prognosis.

    PubMed

    Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan

    2015-05-01

    This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.

  10. PCA based feature reduction to improve the accuracy of decision tree c4.5 classification

    NASA Astrophysics Data System (ADS)

    Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.

    2018-03-01

    Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.

  11. Predicting future depression in adolescents using the Short Mood and Feelings Questionnaire: a two-nation study

    PubMed Central

    McKenzie, Dean P.; Toumbourou, John W.; Forbes, Andrew B.; Mackinnon, Andrew J.; McMorris, Barbara J.; Catalano, Richard F.; Patton, George C.

    2011-01-01

    Background Adolescence is a key life period for the development of depression. Predicting the development of depression in adolescence through detecting specific early symptoms may aid in the development of timely screening and intervention programs. Methods We administered the Short Mood and Feelings Questionnaire (SMFQ) to 5,769 American and Australian students aged 10 to 15 years, at two time points, separated by 12 months. We attempted to predict high levels of depression symptoms at 12 months from symptoms at baseline, using statistical approaches based upon the quality, as well as the quantity, of depression symptoms present. These approaches included classification and regression trees (CART) and logistic regression. Results A classification tree employing four SMFQ items, such as feelings of self-hatred and of being unloved, performed almost as well as all 13 SMFQ items at predicting subsequent depression symptomatology. Limitations Depression was measured using a self-report instrument, rather than a criterion standard diagnostic interview. Conclusion Further validation on other populations of adolescents is required: however the results suggest that several symptoms of depression, especially feelings of self-hatred, and being unloved, are associated with increased levels of self-reported depression at 12 months. Although screening for depression can be problematic, symptoms such as the ones above should be considered for inclusion in screening tests for adolescents. PMID:21669461

  12. Determination of colonoscopy indication from administrative claims data.

    PubMed

    Ko, Cynthia W; Dominitz, Jason A; Neradilek, Moni; Polissar, Nayak; Green, Pam; Kreuter, William; Baldwin, Laura-Mae

    2014-04-01

    Colonoscopy outcomes, such as polyp detection or complication rates, may differ by procedure indication. To develop methods to classify colonoscopy indications from administrative data, facilitating study of colonoscopy quality and outcomes. We linked 14,844 colonoscopy reports from the Clinical Outcomes Research Initiative, a national repository of endoscopic reports, to the corresponding Medicare Carrier and Outpatient File claims. Colonoscopy indication was determined from the procedure reports. We developed algorithms using classification and regression trees and linear discriminant analysis (LDA) to classify colonoscopy indication. Predictor variables included ICD-9CM and CPT/HCPCS codes present on the colonoscopy claim or in the 12 months prior, patient demographics, and site of colonoscopy service. Algorithms were developed on a training set of 7515 procedures, then validated using a test set of 7329 procedures. Sensitivity was lowest for identifying average-risk screening colonoscopies, varying between 55% and 86% for the different algorithms, but specificity for this indication was consistently over 95%. Sensitivity for diagnostic colonoscopy varied between 77% and 89%, with specificity between 55% and 87%. Algorithms with classification and regression trees with 7 variables or LDA with 10 variables had similar overall accuracy, and generally lower accuracy than the algorithm using LDA with 30 variables. Algorithms using Medicare claims data have moderate sensitivity and specificity for colonoscopy indication, and will be useful for studying colonoscopy quality in this population. Further validation may be needed before use in alternative populations.

  13. "Take the Volume Pledge" may result in disparity in access to care.

    PubMed

    Blanco, Barbara A; Kothari, Anai N; Blackwell, Robert H; Brownlee, Sarah A; Yau, Ryan M; Attisha, John P; Ezure, Yoshiki; Pappas, Sam; Kuo, Paul C; Abood, Gerard J

    2017-03-01

    "Take the Volume Pledge" proposes restricting pancreatectomies to hospitals that perform ≥20 per year. Our purpose was to identify those factors that characterize patients at risk for loss of access to pancreatic cancer care with enforcement of volume standards. Using the Healthcare Cost and Utilization Project State Inpatient Database from Florida, we identified patients who underwent pancreatectomy for pancreatic malignancy from 2007-2011. American Hospital Association and United States Census Bureau data were linked to patient-level data. High-volume hospitals were defined as performing ≥20 pancreatic resections per year. Univariable and multivariable statistics compared patient characteristics and utilization of high-volume hospitals. Classification and Regression Tree modeling was used to predict patients at risk for losing access to care. Our study included 1,663 patients. Five high-volume hospitals were identified, and they treated 1,056 (63.5%) patients. Patients residing far from high-volume hospitals, in areas with the highest population density, non-Caucasian ethnicity, and greater income had decreased odds of obtaining care at high-volume hospitals. Using these factors, we developed a Classification and Regression Tree-based predictive tool to identify these patients. Implementation of "Take the Volume Pledge" is an important step toward improving pancreatectomy outcomes; however, policymakers must consider the potential impact on limiting access and possible health disparities that may arise. Copyright © 2016 Elsevier Inc. All rights reserved.

  14. Personalized Modeling for Prediction with Decision-Path Models

    PubMed Central

    Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.

    2015-01-01

    Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570

  15. From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

    PubMed

    Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

    2010-01-30

    Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.

  16. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    PubMed Central

    2010-01-01

    Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. PMID:20113515

  17. Predicting Increased Blood Pressure Using Machine Learning

    PubMed Central

    Golino, Hudson Fernandes; Amaral, Liliany Souza de Brito; Duarte, Stenio Fernando Pimentel; Soares, Telma de Jesus; dos Reis, Luciana Araujo

    2014-01-01

    The present study investigates the prediction of increased blood pressure by body mass index (BMI), waist (WC) and hip circumference (HC), and waist hip ratio (WHR) using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women) from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42), misclassification (.19), and the higher pseudo R 2 (.43). This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25), misclassification (.16), and the higher pseudo R 2 (.46). This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power. PMID:24669313

  18. Predicting increased blood pressure using machine learning.

    PubMed

    Golino, Hudson Fernandes; Amaral, Liliany Souza de Brito; Duarte, Stenio Fernando Pimentel; Gomes, Cristiano Mauro Assis; Soares, Telma de Jesus; Dos Reis, Luciana Araujo; Santos, Joselito

    2014-01-01

    The present study investigates the prediction of increased blood pressure by body mass index (BMI), waist (WC) and hip circumference (HC), and waist hip ratio (WHR) using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women) from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42), misclassification (.19), and the higher pseudo R (2) (.43). This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25), misclassification (.16), and the higher pseudo R (2) (.46). This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power.

  19. Landslide susceptibility mapping using decision-tree based CHi-squared automatic interaction detection (CHAID) and Logistic regression (LR) integration

    NASA Astrophysics Data System (ADS)

    Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin

    2014-06-01

    This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.

  20. Comprehensive decision tree models in bioinformatics.

    PubMed

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.

  1. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449

  2. Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

    PubMed

    Sankari, E Siva; Manimegalai, D

    2017-12-21

    Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.

  3. Tree Classification Software

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1993-01-01

    This paper introduces the IND Tree Package to prospective users. IND does supervised learning using classification trees. This learning task is a basic tool used in the development of diagnosis, monitoring and expert systems. The IND Tree Package was developed as part of a NASA project to semi-automate the development of data analysis and modelling algorithms using artificial intelligence techniques. The IND Tree Package integrates features from CART and C4 with newer Bayesian and minimum encoding methods for growing classification trees and graphs. The IND Tree Package also provides an experimental control suite on top. The newer features give improved probability estimates often required in diagnostic and screening tasks. The package comes with a manual, Unix 'man' entries, and a guide to tree methods and research. The IND Tree Package is implemented in C under Unix and was beta-tested at university and commercial research laboratories in the United States.

  4. [Analysis of the characteristics of the older adults with depression using data mining decision tree analysis].

    PubMed

    Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi

    2013-02-01

    The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.

  5. A binary genetic programing model for teleconnection identification between global sea surface temperature and local maximum monthly rainfall events

    NASA Astrophysics Data System (ADS)

    Danandeh Mehr, Ali; Nourani, Vahid; Hrnjica, Bahrudin; Molajou, Amir

    2017-12-01

    The effectiveness of genetic programming (GP) for solving regression problems in hydrology has been recognized in recent studies. However, its capability to solve classification problems has not been sufficiently explored so far. This study develops and applies a novel classification-forecasting model, namely Binary GP (BGP), for teleconnection studies between sea surface temperature (SST) variations and maximum monthly rainfall (MMR) events. The BGP integrates certain types of data pre-processing and post-processing methods with conventional GP engine to enhance its ability to solve both regression and classification problems simultaneously. The model was trained and tested using SST series of Black Sea, Mediterranean Sea, and Red Sea as potential predictors as well as classified MMR events at two locations in Iran as predictand. Skill of the model was measured in regard to different rainfall thresholds and SST lags and compared to that of the hybrid decision tree-association rule (DTAR) model available in the literature. The results indicated that the proposed model can identify potential teleconnection signals of surrounding seas beneficial to long-term forecasting of the occurrence of the classified MMR events.

  6. C-fuzzy variable-branch decision tree with storage and classification error rate constraints

    NASA Astrophysics Data System (ADS)

    Yang, Shiueng-Bien

    2009-10-01

    The C-fuzzy decision tree (CFDT), which is based on the fuzzy C-means algorithm, has recently been proposed. The CFDT is grown by selecting the nodes to be split according to its classification error rate. However, the CFDT design does not consider the classification time taken to classify the input vector. Thus, the CFDT can be improved. We propose a new C-fuzzy variable-branch decision tree (CFVBDT) with storage and classification error rate constraints. The design of the CFVBDT consists of two phases-growing and pruning. The CFVBDT is grown by selecting the nodes to be split according to the classification error rate and the classification time in the decision tree. Additionally, the pruning method selects the nodes to prune based on the storage requirement and the classification time of the CFVBDT. Furthermore, the number of branches of each internal node is variable in the CFVBDT. Experimental results indicate that the proposed CFVBDT outperforms the CFDT and other methods.

  7. Classification and Sequential Pattern Analysis for Improving Managerial Efficiency and Providing Better Medical Service in Public Healthcare Centers

    PubMed Central

    Chung, Sukhoon; Rhee, Hyunsill; Suh, Yongmoo

    2010-01-01

    Objectives This study sought to find answers to the following questions: 1) Can we predict whether a patient will revisit a healthcare center? 2) Can we anticipate diseases of patients who revisit the center? Methods For the first question, we applied 5 classification algorithms (decision tree, artificial neural network, logistic regression, Bayesian networks, and Naïve Bayes) and the stacking-bagging method for building classification models. To solve the second question, we performed sequential pattern analysis. Results We determined: 1) In general, the most influential variables which impact whether a patient of a public healthcare center will revisit it or not are personal burden, insurance bill, period of prescription, age, systolic pressure, name of disease, and postal code. 2) The best plain classification model is dependent on the dataset. 3) Based on average of classification accuracy, the proposed stacking-bagging method outperformed all traditional classification models and our sequential pattern analysis revealed 16 sequential patterns. Conclusions Classification models and sequential patterns can help public healthcare centers plan and implement healthcare service programs and businesses that are more appropriate to local residents, encouraging them to revisit public health centers. PMID:21818426

  8. Crown-level tree species classification from AISA hyperspectral imagery using an innovative pixel-weighting approach

    NASA Astrophysics Data System (ADS)

    Liu, Haijian; Wu, Changshan

    2018-06-01

    Crown-level tree species classification is a challenging task due to the spectral similarity among different tree species. Shadow, underlying objects, and other materials within a crown may decrease the purity of extracted crown spectra and further reduce classification accuracy. To address this problem, an innovative pixel-weighting approach was developed for tree species classification at the crown level. The method utilized high density discrete LiDAR data for individual tree delineation and Airborne Imaging Spectrometer for Applications (AISA) hyperspectral imagery for pure crown-scale spectra extraction. Specifically, three steps were included: 1) individual tree identification using LiDAR data, 2) pixel-weighted representative crown spectra calculation using hyperspectral imagery, with which pixel-based illuminated-leaf fractions estimated using a linear spectral mixture analysis (LSMA) were employed as weighted factors, and 3) representative spectra based tree species classification was performed through applying a support vector machine (SVM) approach. Analysis of results suggests that the developed pixel-weighting approach (OA = 82.12%, Kc = 0.74) performed better than treetop-based (OA = 70.86%, Kc = 0.58) and pixel-majority methods (OA = 72.26, Kc = 0.62) in terms of classification accuracy. McNemar tests indicated the differences in accuracy between pixel-weighting and treetop-based approaches as well as that between pixel-weighting and pixel-majority approaches were statistically significant.

  9. The Pediatric Home Care/Expenditure Classification Model (P/ECM): A Home Care Case-Mix Model for Children Facing Special Health Care Challenges.

    PubMed

    Phillips, Charles D

    2015-01-01

    Case-mix classification and payment systems help assure that persons with similar needs receive similar amounts of care resources, which is a major equity concern for consumers, providers, and programs. Although health service programs for adults regularly use case-mix payment systems, programs providing health services to children and youth rarely use such models. This research utilized Medicaid home care expenditures and assessment data on 2,578 children receiving home care in one large state in the USA. Using classification and regression tree analyses, a case-mix model for long-term pediatric home care was developed. The Pediatric Home Care/Expenditure Classification Model (P/ECM) grouped children and youth in the study sample into 24 groups, explaining 41% of the variance in annual home care expenditures. The P/ECM creates the possibility of a more equitable, and potentially more effective, allocation of home care resources among children and youth facing serious health care challenges.

  10. The Pediatric Home Care/Expenditure Classification Model (P/ECM): A Home Care Case-Mix Model for Children Facing Special Health Care Challenges

    PubMed Central

    Phillips, Charles D.

    2015-01-01

    Case-mix classification and payment systems help assure that persons with similar needs receive similar amounts of care resources, which is a major equity concern for consumers, providers, and programs. Although health service programs for adults regularly use case-mix payment systems, programs providing health services to children and youth rarely use such models. This research utilized Medicaid home care expenditures and assessment data on 2,578 children receiving home care in one large state in the USA. Using classification and regression tree analyses, a case-mix model for long-term pediatric home care was developed. The Pediatric Home Care/Expenditure Classification Model (P/ECM) grouped children and youth in the study sample into 24 groups, explaining 41% of the variance in annual home care expenditures. The P/ECM creates the possibility of a more equitable, and potentially more effective, allocation of home care resources among children and youth facing serious health care challenges. PMID:26740744

  11. Patient casemix classification for medicare psychiatric prospective payment.

    PubMed

    Drozd, Edward M; Cromwell, Jerry; Gage, Barbara; Maier, Jan; Greenwald, Leslie M; Goldman, Howard H

    2006-04-01

    For a proposed Medicare prospective payment system for inpatient psychiatric facility treatment, the authors developed a casemix classification to capture differences in patients' real daily resource use. Primary data on patient characteristics and daily time spent in various activities were collected in a survey of 696 patients from 40 inpatient psychiatric facilities. Survey data were combined with Medicare claims data to estimate intensity-adjusted daily cost. Classification and Regression Trees (CART) analysis of average daily routine and ancillary costs yielded several hierarchical classification groupings. Regression analysis was used to control for facility and day-of-stay effects in order to compare hierarchical models with models based on the recently proposed payment system of the Centers for Medicare & Medicaid Services. CART analysis identified a small set of patient characteristics strongly associated with higher daily costs, including age, psychiatric diagnosis, deficits in daily living activities, and detox or ECT use. A parsimonious, 16-group, fully interactive model that used five major DSM-IV categories and stratified by age, illness severity, deficits in daily living activities, dangerousness, and use of ECT explained 40% (out of a possible 76%) of daily cost variation not attributable to idiosyncratic daily changes within patients. A noninteractive model based on diagnosis-related groups, age, and medical comorbidity had explanatory power of only 32%. A regression model with 16 casemix groups restricted to using "appropriate" payment variables (i.e., those with clinical face validity and low administrative burden that are easily validated and provide proper care incentives) produced more efficient and equitable payments than did a noninteractive system based on diagnosis-related groups.

  12. A Mixtures-of-Trees Framework for Multi-Label Classification

    PubMed Central

    Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos

    2015-01-01

    We propose a new probabilistic approach for multi-label classification that aims to represent the class posterior distribution P(Y|X). Our approach uses a mixture of tree-structured Bayesian networks, which can leverage the computational advantages of conditional tree-structured models and the abilities of mixtures to compensate for tree-structured restrictions. We develop algorithms for learning the model from data and for performing multi-label predictions using the learned model. Experiments on multiple datasets demonstrate that our approach outperforms several state-of-the-art multi-label classification methods. PMID:25927011

  13. Cervical cancer survival prediction using hybrid of SMOTE, CART and smooth support vector machine

    NASA Astrophysics Data System (ADS)

    Purnami, S. W.; Khasanah, P. M.; Sumartini, S. H.; Chosuvivatwong, V.; Sriplung, H.

    2016-04-01

    According to the WHO, every two minutes there is one patient who died from cervical cancer. The high mortality rate is due to the lack of awareness of women for early detection. There are several factors that supposedly influence the survival of cervical cancer patients, including age, anemia status, stage, type of treatment, complications and secondary disease. This study wants to classify/predict cervical cancer survival based on those factors. Various classifications methods: classification and regression tree (CART), smooth support vector machine (SSVM), three order spline SSVM (TSSVM) were used. Since the data of cervical cancer are imbalanced, synthetic minority oversampling technique (SMOTE) is used for handling imbalanced dataset. Performances of these methods are evaluated using accuracy, sensitivity and specificity. Results of this study show that balancing data using SMOTE as preprocessing can improve performance of classification. The SMOTE-SSVM method provided better result than SMOTE-TSSVM and SMOTE-CART.

  14. Comparative analysis of tree classification models for detecting fusarium oxysporum f. sp cubense (TR4) based on multi soil sensor parameters

    NASA Astrophysics Data System (ADS)

    Estuar, Maria Regina Justina; Victorino, John Noel; Coronel, Andrei; Co, Jerelyn; Tiausas, Francis; Señires, Chiara Veronica

    2017-09-01

    Use of wireless sensor networks and smartphone integration design to monitor environmental parameters surrounding plantations is made possible because of readily available and affordable sensors. Providing low cost monitoring devices would be beneficial, especially to small farm owners, in a developing country like the Philippines, where agriculture covers a significant amount of the labor market. This study discusses the integration of wireless soil sensor devices and smartphones to create an application that will use multidimensional analysis to detect the presence or absence of plant disease. Specifically, soil sensors are designed to collect soil quality parameters in a sink node from which the smartphone collects data from via Bluetooth. Given these, there is a need to develop a classification model on the mobile phone that will report infection status of a soil. Though tree classification is the most appropriate approach for continuous parameter-based datasets, there is a need to determine whether tree models will result to coherent results or not. Soil sensor data that resides on the phone is modeled using several variations of decision tree, namely: decision tree (DT), best-fit (BF) decision tree, functional tree (FT), Naive Bayes (NB) decision tree, J48, J48graft and LAD tree, where decision tree approaches the problem by considering all sensor nodes as one. Results show that there are significant differences among soil sensor parameters indicating that there are variances in scores between the infected and uninfected sites. Furthermore, analysis of variance in accuracy, recall, precision and F1 measure scores from tree classification models homogeneity among NBTree, J48graft and J48 tree classification models.

  15. Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree

    DTIC Science & Technology

    2008-04-01

    REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music

  16. A classification tree approach for improving the utilization of flow cytometry testing of blood specimens for B-cell non-Hodgkin lymphoproliferative disorders.

    PubMed

    Healey, Ryan; Naugler, Christopher; de Koning, Lawrence; Patel, Jay L

    2015-01-01

    We sought to improve the diagnostic efficiency of flow cytometry investigation on blood by developing data-driven ordering guidelines. Our goal was to improve flow cytometry utilization by decreasing negative testing, therefore reducing healthcare costs. We investigated several laboratory tests performed alongside flow cytometry to identify biomarkers useful in excluding non-leukemic bloods. Test results and patient demographic features were subjected to receiver-operator characteristic (ROC) curve, logistic regression and classification tree analyses to find significant predictors and develop decision rules. Our data show that, in the absence of a compelling clinical indication, flow cytometry testing is largely non-informative on bloods from patients less than 50 years of age having an absolute lymphocyte count (ALC) below 5.0 × 10(9)/L. For patients over age 50 having an ALC below this value, a ferritin value above 450 μg/L is counter-indicative of B-cell clonality. Using these guidelines, 26% of cases were correctly predicted as negative with greater than 97% accuracy.

  17. Exploiting machine learning algorithms for tree species classification in a semiarid woodland using RapidEye image

    NASA Astrophysics Data System (ADS)

    Adelabu, Samuel; Mutanga, Onisimo; Adam, Elhadi; Cho, Moses Azong

    2013-01-01

    Classification of different tree species in semiarid areas can be challenging as a result of the change in leaf structure and orientation due to soil moisture constraints. Tree species mapping is, however, a key parameter for forest management in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples.We performed classification using random forest (RF) and support vector machines (SVM) based on EnMap box. The overall accuracies for classifying the five tree species was 88.75 and 85% for both SVM and RF, respectively. We also demonstrated that the new red-edge band in the RapidEye sensor has the potential for classifying tree species in semiarid environments when integrated with other standard bands. Similarly, we observed that where there are limited training samples, SVM is preferred over RF. Finally, we demonstrated that the two accuracy measures of quantity and allocation disagreement are simpler and more helpful for the vast majority of remote sensing classification process than the kappa coefficient. Overall, high species classification can be achieved using strategically located RapidEye bands integrated with advanced processing algorithms.

  18. Lidar-based individual tree species classification using convolutional neural network

    NASA Astrophysics Data System (ADS)

    Mizoguchi, Tomohiro; Ishii, Akira; Nakamura, Hiroyuki; Inoue, Tsuyoshi; Takamatsu, Hisashi

    2017-06-01

    Terrestrial lidar is commonly used for detailed documentation in the field of forest inventory investigation. Recent improvements of point cloud processing techniques enabled efficient and precise computation of an individual tree shape parameters, such as breast-height diameter, height, and volume. However, tree species are manually specified by skilled workers to date. Previous works for automatic tree species classification mainly focused on aerial or satellite images, and few works have been reported for classification techniques using ground-based sensor data. Several candidate sensors can be considered for classification, such as RGB or multi/hyper spectral cameras. Above all candidates, we use terrestrial lidar because it can obtain high resolution point cloud in the dark forest. We selected bark texture for the classification criteria, since they clearly represent unique characteristics of each tree and do not change their appearance under seasonable variation and aged deterioration. In this paper, we propose a new method for automatic individual tree species classification based on terrestrial lidar using Convolutional Neural Network (CNN). The key component is the creation step of a depth image which well describe the characteristics of each species from a point cloud. We focus on Japanese cedar and cypress which cover the large part of domestic forest. Our experimental results demonstrate the effectiveness of our proposed method.

  19. Discriminant forest classification method and system

    DOEpatents

    Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.

    2012-11-06

    A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.

  20. A regression tree for identifying combinations of fall risk factors associated to recurrent falling: a cross-sectional elderly population-based study.

    PubMed

    Kabeshova, A; Annweiler, C; Fantino, B; Philip, T; Gromov, V A; Launay, C P; Beauchet, O

    2014-06-01

    Regression tree (RT) analyses are particularly adapted to explore the risk of recurrent falling according to various combinations of fall risk factors compared to logistic regression models. The aims of this study were (1) to determine which combinations of fall risk factors were associated with the occurrence of recurrent falls in older community-dwellers, and (2) to compare the efficacy of RT and multiple logistic regression model for the identification of recurrent falls. A total of 1,760 community-dwelling volunteers (mean age ± standard deviation, 71.0 ± 5.1 years; 49.4 % female) were recruited prospectively in this cross-sectional study. Age, gender, polypharmacy, use of psychoactive drugs, fear of falling (FOF), cognitive disorders and sad mood were recorded. In addition, the history of falls within the past year was recorded using a standardized questionnaire. Among 1,760 participants, 19.7 % (n = 346) were recurrent fallers. The RT identified 14 nodes groups and 8 end nodes with FOF as the first major split. Among participants with FOF, those who had sad mood and polypharmacy formed the end node with the greatest OR for recurrent falls (OR = 6.06 with p < 0.001). Among participants without FOF, those who were male and not sad had the lowest OR for recurrent falls (OR = 0.25 with p < 0.001). The RT correctly classified 1,356 from 1,414 non-recurrent fallers (specificity = 95.6 %), and 65 from 346 recurrent fallers (sensitivity = 18.8 %). The overall classification accuracy was 81.0 %. The multiple logistic regression correctly classified 1,372 from 1,414 non-recurrent fallers (specificity = 97.0 %), and 61 from 346 recurrent fallers (sensitivity = 17.6 %). The overall classification accuracy was 81.4 %. Our results show that RT may identify specific combinations of risk factors for recurrent falls, the combination most associated with recurrent falls involving FOF, sad mood and polypharmacy. The FOF emerged as the risk factor strongly associated with recurrent falls. In addition, RT and multiple logistic regression were not sensitive enough to identify the majority of recurrent fallers but appeared efficient in detecting individuals not at risk of recurrent falls.

  1. Predictors of success of external cephalic version and cephalic presentation at birth among 1253 women with non-cephalic presentation using logistic regression and classification tree analyses.

    PubMed

    Hutton, Eileen K; Simioni, Julia C; Thabane, Lehana

    2017-08-01

    Among women with a fetus with a non-cephalic presentation, external cephalic version (ECV) has been shown to reduce the rate of breech presentation at birth and cesarean birth. Compared with ECV at term, beginning ECV prior to 37 weeks' gestation decreases the number of infants in a non-cephalic presentation at birth. The purpose of this secondary analysis was to investigate factors associated with a successful ECV procedure and to present this in a clinically useful format. Data were collected as part of the Early ECV Pilot and Early ECV2 Trials, which randomized 1776 women with a fetus in breech presentation to either early ECV (34-36 weeks' gestation) or delayed ECV (at or after 37 weeks). The outcome of interest was successful ECV, defined as the fetus being in a cephalic presentation immediately following the procedure, as well as at the time of birth. The importance of several factors in predicting successful ECV was investigated using two statistical methods: logistic regression and classification and regression tree (CART) analyses. Among nulliparas, non-engagement of the presenting part and an easily palpable fetal head were independently associated with success. Among multiparas, non-engagement of the presenting part, gestation less than 37 weeks and an easily palpable fetal head were found to be independent predictors of success. These findings were consistent with results of the CART analyses. Regardless of parity, descent of the presenting part was the most discriminating factor in predicting successful ECV and cephalic presentation at birth. © 2017 Nordic Federation of Societies of Obstetrics and Gynecology.

  2. Socio-economic and lifestyle parameters associated with diet quality of children and adolescents using classification and regression tree analysis: the DIATROFI study.

    PubMed

    Yannakoulia, Mary; Lykou, Anastasia; Kastorini, Christina Maria; Saranti Papasaranti, Eirini; Petralias, Athanassios; Veloudaki, Afroditi; Linos, Athena

    2016-02-01

    To explore factors affecting children's and adolescents' diet quality, in the framework of a food aid and promotion of healthy nutrition programme implemented in areas of low socio-economic status of Greece, during the current financial crisis. From a total of 162 schools participating in the programme during 2012-2013, we gathered 15 897 questionnaires recording sociodemographic characteristics, lifestyle parameters and dietary habits of children and their families. As a measure of socio-economic status, the Family Affluence Scale (FAS) was used; whereas for the assessment of diet quality, the KIDMED score was computed. Associations between KIDMED and FAS, physical activity and socio-economic parameters were examined using regression and classification-regression tree analysis (CART). The higher the FAS score, the greater the percentage of children and adolescents who reported to consume, on a daily basis, fruits and vegetables, dairy products and breakfast (P<0·001). Results from CART showed that children and adolescents in the medium or high FAS groups had higher KIDMED score, compared with those in the low FAS group. For those in the low FAS group, KIDMED score is expected to increase by 12·4 % when they spend more than 0·25 h/week in sports activities. The respective threshold for the medium and high FAS groups is 1·75 h/week, while education of the mother and father affected KIDMED score significantly as well. Diet quality is strongly influenced by socio-economic parameters in children and adolescents living in economically disadvantaged areas of Greece, so that lower family affluence is associated with worse diet quality.

  3. Modern modeling techniques had limited external validity in predicting mortality from traumatic brain injury.

    PubMed

    van der Ploeg, Tjeerd; Nieboer, Daan; Steyerberg, Ewout W

    2016-10-01

    Prediction of medical outcomes may potentially benefit from using modern statistical modeling techniques. We aimed to externally validate modeling strategies for prediction of 6-month mortality of patients suffering from traumatic brain injury (TBI) with predictor sets of increasing complexity. We analyzed individual patient data from 15 different studies including 11,026 TBI patients. We consecutively considered a core set of predictors (age, motor score, and pupillary reactivity), an extended set with computed tomography scan characteristics, and a further extension with two laboratory measurements (glucose and hemoglobin). With each of these sets, we predicted 6-month mortality using default settings with five statistical modeling techniques: logistic regression (LR), classification and regression trees, random forests (RFs), support vector machines (SVM) and neural nets. For external validation, a model developed on one of the 15 data sets was applied to each of the 14 remaining sets. This process was repeated 15 times for a total of 630 validations. The area under the receiver operating characteristic curve (AUC) was used to assess the discriminative ability of the models. For the most complex predictor set, the LR models performed best (median validated AUC value, 0.757), followed by RF and support vector machine models (median validated AUC value, 0.735 and 0.732, respectively). With each predictor set, the classification and regression trees models showed poor performance (median validated AUC value, <0.7). The variability in performance across the studies was smallest for the RF- and LR-based models (inter quartile range for validated AUC values from 0.07 to 0.10). In the area of predicting mortality from TBI, nonlinear and nonadditive effects are not pronounced enough to make modern prediction methods beneficial. Copyright © 2016 Elsevier Inc. All rights reserved.

  4. Crown-Level Tree Species Classification Using Integrated Airborne Hyperspectral and LIDAR Remote Sensing Data

    NASA Astrophysics Data System (ADS)

    Wang, Z.; Wu, J.; Wang, Y.; Kong, X.; Bao, H.; Ni, Y.; Ma, L.; Jin, J.

    2018-05-01

    Mapping tree species is essential for sustainable planning as well as to improve our understanding of the role of different trees as different ecological service. However, crown-level tree species automatic classification is a challenging task due to the spectral similarity among diversified tree species, fine-scale spatial variation, shadow, and underlying objects within a crown. Advanced remote sensing data such as airborne Light Detection and Ranging (LiDAR) and hyperspectral imagery offer a great potential opportunity to derive crown spectral, structure and canopy physiological information at the individual crown scale, which can be useful for mapping tree species. In this paper, an innovative approach was developed for tree species classification at the crown level. The method utilized LiDAR data for individual tree crown delineation and morphological structure extraction, and Compact Airborne Spectrographic Imager (CASI) hyperspectral imagery for pure crown-scale spectral extraction. Specifically, four steps were include: 1) A weighted mean filtering method was developed to improve the accuracy of the smoothed Canopy Height Model (CHM) derived from LiDAR data; 2) The marker-controlled watershed segmentation algorithm was, therefore, also employed to delineate the tree-level canopy from the CHM image in this study, and then individual tree height and tree crown were calculated according to the delineated crown; 3) Spectral features within 3 × 3 neighborhood regions centered on the treetops detected by the treetop detection algorithm were derived from the spectrally normalized CASI imagery; 4) The shape characteristics related to their crown diameters and heights were established, and different crown-level tree species were classified using the combination of spectral and shape characteristics. Analysis of results suggests that the developed classification strategy in this paper (OA = 85.12 %, Kc = 0.90) performed better than LiDAR-metrics method (OA = 79.86 %, Kc = 0.81) and spectral-metircs method (OA = 71.26, Kc = 0.69) in terms of classification accuracy, which indicated that the advanced method of data processing and sensitive feature selection are critical for improving the accuracy of crown-level tree species classification.

  5. Analyses of rear-end crashes based on classification tree models.

    PubMed

    Yan, Xuedong; Radwan, Essam

    2006-09-01

    Signalized intersections are accident-prone areas especially for rear-end crashes due to the fact that the diversity of the braking behaviors of drivers increases during the signal change. The objective of this article is to improve knowledge of the relationship between rear-end crashes occurring at signalized intersections and a series of potential traffic risk factors classified by driver characteristics, environments, and vehicle types. Based on the 2001 Florida crash database, the classification tree method and Quasi-induced exposure concept were used to perform the statistical analysis. Two binary classification tree models were developed in this study. One was used for the crash comparison between rear-end and non-rear-end to identify those specific trends of the rear-end crashes. The other was constructed for the comparison between striking vehicles/drivers (at-fault) and struck vehicles/drivers (not-at-fault) to find more complex crash pattern associated with the traffic attributes of driver, vehicle, and environment. The modeling results showed that the rear-end crashes are over-presented in the higher speed limits (45-55 mph); the rear-end crash propensity for daytime is apparently larger than nighttime; and the reduction of braking capacity due to wet and slippery road surface conditions would definitely contribute to rear-end crashes, especially at intersections with higher speed limits. The tree model segmented drivers into four homogeneous age groups: < 21 years, 21-31 years, 32-75 years, and > 75 years. The youngest driver group shows the largest crash propensity; in the 21-31 age group, the male drivers are over-involved in rear-end crashes under adverse weather conditions and the 32-75 years drivers driving large size vehicles have a larger crash propensity compared to those driving passenger vehicles. Combined with the quasi-induced exposure concept, the classification tree method is a proper statistical tool for traffic-safety analysis to investigate crash propensity. Compared to the logistic regression models, tree models have advantages for handling continuous independent variables and easily explaining the complex interaction effect with more than two independent variables. This research recommended that at signalized intersections with higher speed limits, reducing the speed limit to 40 mph efficiently contribute to a lower accident rate. Drivers involved in alcohol use may increase not only rear-end crash risk but also the driver injury severity. Education and enforcement countermeasures should focus on the driver group younger than 21 years. Further studies are suggested to compare crash risk distributions of the driver age for other main crash types to seek corresponding traffic countermeasures.

  6. Comparison of Single and Multi-Scale Method for Leaf and Wood Points Classification from Terrestrial Laser Scanning Data

    NASA Astrophysics Data System (ADS)

    Wei, Hongqiang; Zhou, Guiyun; Zhou, Junjie

    2018-04-01

    The classification of leaf and wood points is an essential preprocessing step for extracting inventory measurements and canopy characterization of trees from the terrestrial laser scanning (TLS) data. The geometry-based approach is one of the widely used classification method. In the geometry-based method, it is common practice to extract salient features at one single scale before the features are used for classification. It remains unclear how different scale(s) used affect the classification accuracy and efficiency. To assess the scale effect on the classification accuracy and efficiency, we extracted the single-scale and multi-scale salient features from the point clouds of two oak trees of different sizes and conducted the classification on leaf and wood. Our experimental results show that the balanced accuracy of the multi-scale method is higher than the average balanced accuracy of the single-scale method by about 10 % for both trees. The average speed-up ratio of single scale classifiers over multi-scale classifier for each tree is higher than 30.

  7. Automated rule-base creation via CLIPS-Induce

    NASA Technical Reports Server (NTRS)

    Murphy, Patrick M.

    1994-01-01

    Many CLIPS rule-bases contain one or more rule groups that perform classification. In this paper we describe CLIPS-Induce, an automated system for the creation of a CLIPS classification rule-base from a set of test cases. CLIPS-Induce consists of two components, a decision tree induction component and a CLIPS production extraction component. ID3, a popular decision tree induction algorithm, is used to induce a decision tree from the test cases. CLIPS production extraction is accomplished through a top-down traversal of the decision tree. Nodes of the tree are used to construct query rules, and branches of the tree are used to construct classification rules. The learned CLIPS productions may easily be incorporated into a large CLIPS system that perform tasks such as accessing a database or displaying information.

  8. Fire frequency in the Interior Columbia River Basin: Building regional models from fire history data

    USGS Publications Warehouse

    McKenzie, D.; Peterson, D.L.; Agee, James K.

    2000-01-01

    Fire frequency affects vegetation composition and successional pathways; thus it is essential to understand fire regimes in order to manage natural resources at broad spatial scales. Fire history data are lacking for many regions for which fire management decisions are being made, so models are needed to estimate past fire frequency where local data are not yet available. We developed multiple regression models and tree-based (classification and regression tree, or CART) models to predict fire return intervals across the interior Columbia River basin at 1-km resolution, using georeferenced fire history, potential vegetation, cover type, and precipitation databases. The models combined semiqualitative methods and rigorous statistics. The fire history data are of uneven quality; some estimates are based on only one tree, and many are not cross-dated. Therefore, we weighted the models based on data quality and performed a sensitivity analysis of the effects on the models of estimation errors that are due to lack of cross-dating. The regression models predict fire return intervals from 1 to 375 yr for forested areas, whereas the tree-based models predict a range of 8 to 150 yr. Both types of models predict latitudinal and elevational gradients of increasing fire return intervals. Examination of regional-scale output suggests that, although the tree-based models explain more of the variation in the original data, the regression models are less likely to produce extrapolation errors. Thus, the models serve complementary purposes in elucidating the relationships among fire frequency, the predictor variables, and spatial scale. The models can provide local managers with quantitative information and provide data to initialize coarse-scale fire-effects models, although predictions for individual sites should be treated with caution because of the varying quality and uneven spatial coverage of the fire history database. The models also demonstrate the integration of qualitative and quantitative methods when requisite data for fully quantitative models are unavailable. They can be tested by comparing new, independent fire history reconstructions against their predictions and can be continually updated, as better fire history data become available.

  9. Habitat features and predictive habitat modeling for the Colorado chipmunk in southern New Mexico

    USGS Publications Warehouse

    Rivieccio, M.; Thompson, B.C.; Gould, W.R.; Boykin, K.G.

    2003-01-01

    Two subspecies of Colorado chipmunk (state threatened and federal species of concern) occur in southern New Mexico: Tamias quadrivittatus australis in the Organ Mountains and T. q. oscuraensis in the Oscura Mountains. We developed a GIS model of potentially suitable habitat based on vegetation and elevation features, evaluated site classifications of the GIS model, and determined vegetation and terrain features associated with chipmunk occurrence. We compared GIS model classifications with actual vegetation and elevation features measured at 37 sites. At 60 sites we measured 18 habitat variables regarding slope, aspect, tree species, shrub species, and ground cover. We used logistic regression to analyze habitat variables associated with chipmunk presence/absence. All (100%) 37 sample sites (28 predicted suitable, 9 predicted unsuitable) were classified correctly by the GIS model regarding elevation and vegetation. For 28 sites predicted suitable by the GIS model, 18 sites (64%) appeared visually suitable based on habitat variables selected from logistic regression analyses, of which 10 sites (36%) were specifically predicted as suitable habitat via logistic regression. We detected chipmunks at 70% of sites deemed suitable via the logistic regression models. Shrub cover, tree density, plant proximity, presence of logs, and presence of rock outcrop were retained in the logistic model for the Oscura Mountains; litter, shrub cover, and grass cover were retained in the logistic model for the Organ Mountains. Evaluation of predictive models illustrates the need for multi-stage analyses to best judge performance. Microhabitat analyses indicate prospective needs for different management strategies between the subspecies. Sensitivities of each population of the Colorado chipmunk to natural and prescribed fire suggest that partial burnings of areas inhabited by Colorado chipmunks in southern New Mexico may be beneficial. These partial burnings may later help avoid a fire that could substantially reduce habitat of chipmunks over a mountain range.

  10. Predicting the risk of patients with biopsy Gleason score 6 to harbor a higher grade cancer.

    PubMed

    Gofrit, Ofer N; Zorn, Kevin C; Taxy, Jerome B; Lin, Shang; Zagaja, Gregory P; Steinberg, Gary D; Shalhav, Arieh L

    2007-11-01

    Prostate cancer Gleason score 3 + 3 = 6 is currently the most common score assigned on prostatic biopsies. We analyzed the clinical variables that predict the likelihood of a patient with biopsy Gleason score 6 to harbor a higher grade tumor. The study population consisted of 448 patients with a mean age of 59.1 years who underwent radical prostatectomy between February 2003 to October 2006 for Gleason score 6 adenocarcinoma. The effect of preoperative variables on the probability of a Gleason score upgrade on final pathological evaluation was evaluated using logistic regression, and classification and regression tree analysis. Gleason score upgrade was found in 91 of 448 patients (20.3%). Logistic regression showed that only serum prostate specific antigen and the greatest percent of cancer in a core were significantly associated with a score upgrade (p = 0.0014 and 0.023, respectively). Classification and regression tree analysis showed that the risk of a Gleason score upgrade was 62% when serum prostate specific antigen was higher than 12 ng/ml and 18% when serum prostate specific antigen was 12 ng/ml or less. In patients with serum prostate specific antigen lower than 12 ng/ml the risk of a score upgrade could be dichotomized at a greatest percent of cancer in a core of 5%. The risk was 22.6% and 10.5% when the greatest percent of cancer in a core was higher than 5% and 5% or lower, respectively. The probability of patients with a prostate biopsy Gleason score of 6 to conceal a Gleason score of 7 or higher can be predicted using serum prostate specific antigen and the greatest percent of cancer in a core. With these parameters it is possible to predict upgrade rates as high as 62% and as low as 10.5%.

  11. Diagnostic and Prognostic Value of Long-Axis Strain and Myocardial Contraction Fraction Using Standard Cardiovascular MR Imaging in Patients with Nonischemic Dilated Cardiomyopathies.

    PubMed

    Arenja, Nisha; Riffel, Johannes H; Fritz, Thomas; André, Florian; Aus dem Siepen, Fabian; Mueller-Hennessen, Matthias; Giannitsis, Evangelos; Katus, Hugo A; Friedrich, Matthias G; Buss, Sebastian J

    2017-06-01

    Purpose To assess the utility of established functional markers versus two additional functional markers derived from standard cardiovascular magnetic resonance (MR) images for their incremental diagnostic and prognostic information in patients with nonischemic dilated cardiomyopathy (NIDCM). Materials and Methods Approval was obtained from the local ethics committee. MR images from 453 patients with NIDCM and 150 healthy control subjects were included between 2005 and 2013 and were analyzed retrospectively. Myocardial contraction fraction (MCF) was calculated by dividing left ventricular (LV) stroke volume by LV myocardial volume, and long-axis strain (LAS) was calculated from the distances between the epicardial border of the LV apex and the midpoint of a line connecting the origins of the mitral valve leaflets at end systole and end diastole. Receiver operating characteristic curve, Kaplan-Meier method, Cox regression, and classification and regression tree (CART) analyses were performed for diagnostic and prognostic performances. Results LAS (area under the receiver operating characteristic curve [AUC] = 0.93, P < .001) and MCF (AUC = 0.92, P < .001) can be used to discriminate patients with NIDCM from age- and sex-matched control subjects. A total of 97 patients reached the combined end point during a median follow-up of 4.8 years. In multivariate Cox regression analysis, only LV ejection fraction (EF) and LAS independently indicated the combined end point (hazard ratio = 2.8 and 1.9, respectively; P < .001 for both). In a risk stratification approach with classification and regression tree analysis, combined LV EF and LAS cutoff values were used to stratify patients into three risk groups (log-rank test, P < .001). Conclusion Cardiovascular MR-derived MCF and LAS serve as reliable diagnostic and prognostic markers in patients with NIDCM. LAS, as a marker for longitudinal contractile function, is an independent parameter for outcome and offers incremental information beyond LV EF and the presence of myocardial fibrosis. © RSNA, 2017 Online supplemental material is available for this article.

  12. Vlsi implementation of flexible architecture for decision tree classification in data mining

    NASA Astrophysics Data System (ADS)

    Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

    2017-07-01

    The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.

  13. Validation of statistical predictive models meant to select melanoma patients for sentinel lymph node biopsy.

    PubMed

    Sabel, Michael S; Rice, John D; Griffith, Kent A; Lowe, Lori; Wong, Sandra L; Chang, Alfred E; Johnson, Timothy M; Taylor, Jeremy M G

    2012-01-01

    To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid sentinel lymph node biopsy (SLNB), several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests, and support vector machines. We sought to validate recently published models meant to predict sentinel node status. We queried our comprehensive, prospectively collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon four published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false-negative rate (FNR). Logistic regression performed comparably with our data when considering NPV (89.4 versus 93.6%); however, the model's specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsy rates that were lower (87.7 versus 94.1 and 29.8 versus 14.3, respectively). Two published models could not be applied to our data due to model complexity and the use of proprietary software. Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Statistical predictive models must be developed in a clinically applicable manner to allow for both validation and ultimately clinical utility.

  14. Identifying the critical success factors in the coverage of low vision services using the classification analysis and regression tree methodology.

    PubMed

    Chiang, Peggy Pei-Chia; Xie, Jing; Keeffe, Jill Elizabeth

    2011-04-25

    To identify the critical success factors (CSF) associated with coverage of low vision services. Data were collected from a survey distributed to Vision 2020 contacts, government, and non-government organizations (NGOs) in 195 countries. The Classification and Regression Tree Analysis (CART) was used to identify the critical success factors of low vision service coverage. Independent variables were sourced from the survey: policies, epidemiology, provision of services, equipment and infrastructure, barriers to services, human resources, and monitoring and evaluation. Socioeconomic and demographic independent variables: health expenditure, population statistics, development status, and human resources in general, were sourced from the World Health Organization (WHO), World Bank, and the United Nations (UN). The findings identified that having >50% of children obtaining devices when prescribed (χ(2) = 44; P < 0.000), multidisciplinary care (χ(2) = 14.54; P = 0.002), >3 rehabilitation workers per 10 million of population (χ(2) = 4.50; P = 0.034), higher percentage of population urbanized (χ(2) = 14.54; P = 0.002), a level of private investment (χ(2) = 14.55; P = 0.015), and being fully funded by government (χ(2) = 6.02; P = 0.014), are critical success factors associated with coverage of low vision services. This study identified the most important predictors for countries with better low vision coverage. The CART is a useful and suitable methodology in survey research and is a novel way to simplify a complex global public health issue in eye care.

  15. Use of admission serum lactate and sodium levels to predict mortality in necrotizing soft-tissue infections.

    PubMed

    Yaghoubian, Arezou; de Virgilio, Christian; Dauphine, Christine; Lewis, Roger J; Lin, Matthew

    2007-09-01

    Simple admission laboratory values can be used to classify patients with necrotizing soft-tissue infection (NSTI) into high and low mortality risk groups. Chart review. Public teaching hospital. All patients with NSTI from 1997 through 2006. Variables analyzed included medical history, admission vital signs, laboratory values, and microbiologic findings. Data analyses included univariate and classification and regression tree analyses. Mortality. One hundred twenty-four patients were identified with NSTI. The overall mortality rate was 21 of 124 (17%). On univariate analysis, factors associated with mortality included a history of cancer (P = .03), intravenous drug abuse (P < .001), low systolic blood pressure on admission (P = .03), base deficit (P = .009), and elevated white blood cell count (P = .06). On exploratory classification and regression tree analysis, admission serum lactate and sodium levels were predictors of mortality, with a sensitivity of 100%, specificity of 28%, positive predictive value of 23%, and negative predictive value of 100%. A serum lactate level greater than or equal to 54.1 mg/dL (6 mmol/L) alone was associated with a 32% mortality, whereas a serum sodium level greater than or equal to 135 mEq/L combined with a lactate level less than 54.1 mg/dL was associated with a mortality of 0%. Mortality for NSTIs remains high. A simple model, using admission serum lactate and serum sodium levels, may help identify patients at greatest risk for death.

  16. Validation of Statistical Predictive Models Meant to Select Melanoma Patients for Sentinel Lymph Node Biopsy

    PubMed Central

    Sabel, Michael S.; Rice, John D.; Griffith, Kent A.; Lowe, Lori; Wong, Sandra L.; Chang, Alfred E.; Johnson, Timothy M.; Taylor, Jeremy M.G.

    2013-01-01

    Introduction To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status. Methods We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR). Results Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software. Conclusions Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility. PMID:21822550

  17. Topographic influences on vegetation mosaics and tree diversity in the Chihuahuan Desert Borderlands.

    PubMed

    Poulos, Helen M; Camp, Ann E

    2010-04-01

    The abundance and distribution of species reflect how the niche requirements of species and the dynamics of populations interact with spatial and temporal variation in the environment. This study investigated the influence of geographical variation in environmental site conditions on tree dominance and diversity patterns in three topographically dissected mountain ranges in west Texas, USA, and northern Mexico. We measured tree abundance and basal area using a systematic sampling design across the forested areas of three mountain ranges and related these data to a suite of environmental parameters derived from field and digital elevation model data. We employed cluster analysis, classification and regression trees (CART), and rarefaction to identify (1) the dominant forest cover types across the three study sites and (2) environmental influences on tree distribution and diversity patterns. Elevation, topographic position, and incident solar radiation were the major influences on tree dominance and diversity. Mesic valley bottoms hosted high-diversity vegetation types, while hotter and drier mid-slopes and ridgetops supported lower tree diversity. Valley bottoms and other topographic positions shared few species, indicating high species turnover at the landscape scale. Mountain ranges with high topographic complexity also had higher species richness, suggesting that geographical variability in environmental conditions was a major influence on tree diversity. This study stressed the importance of landscape- and regional-scale topographic variability as a key factor controlling vegetation pattern and diversity in southwestern North America.

  18. Automated morphological analysis of bone marrow cells in microscopic images for diagnosis of leukemia: nucleus-plasma separation and cell classification using a hierarchical tree model of hematopoesis

    NASA Astrophysics Data System (ADS)

    Krappe, Sebastian; Wittenberg, Thomas; Haferlach, Torsten; Münzenmayer, Christian

    2016-03-01

    The morphological differentiation of bone marrow is fundamental for the diagnosis of leukemia. Currently, the counting and classification of the different types of bone marrow cells is done manually under the use of bright field microscopy. This is a time-consuming, subjective, tedious and error-prone process. Furthermore, repeated examinations of a slide may yield intra- and inter-observer variances. For that reason a computer assisted diagnosis system for bone marrow differentiation is pursued. In this work we focus (a) on a new method for the separation of nucleus and plasma parts and (b) on a knowledge-based hierarchical tree classifier for the differentiation of bone marrow cells in 16 different classes. Classification trees are easily interpretable and understandable and provide a classification together with an explanation. Using classification trees, expert knowledge (i.e. knowledge about similar classes and cell lines in the tree model of hematopoiesis) is integrated in the structure of the tree. The proposed segmentation method is evaluated with more than 10,000 manually segmented cells. For the evaluation of the proposed hierarchical classifier more than 140,000 automatically segmented bone marrow cells are used. Future automated solutions for the morphological analysis of bone marrow smears could potentially apply such an approach for the pre-classification of bone marrow cells and thereby shortening the examination time.

  19. Application of data mining techniques to explore predictors of upper urinary tract damage in patients with neurogenic bladder.

    PubMed

    Fang, H; Lu, B; Wang, X; Zheng, L; Sun, K; Cai, W

    2017-08-17

    This study proposed a decision tree model to screen upper urinary tract damage (UUTD) for patients with neurogenic bladder (NGB). Thirty-four NGB patients with UUTD were recruited in the case group, while 78 without UUTD were included in the control group. A decision tree method, classification and regression tree (CART), was then applied to develop the model in which UUTD was used as a dependent variable and history of urinary tract infections, bladder management, conservative treatment, and urodynamic findings were used as independent variables. The urethra function factor was found to be the primary screening information of patients and treated as the root node of the tree; Pabd max (maximum abdominal pressure, >14 cmH2O), Pves max (maximum intravesical pressure, ≤89 cmH2O), and gender (female) were also variables associated with UUTD. The accuracy of the proposed model was 84.8%, and the area under curve was 0.901 (95%CI=0.844-0.958), suggesting that the decision tree model might provide a new and convenient way to screen UUTD for NGB patients in both undeveloped and developing areas.

  20. Modified TAROT for cross-selling personal financial products

    NASA Astrophysics Data System (ADS)

    Tee, Ya-Mei; LEE, Lai-Soon; LEE, Chew-Ging; SEOW, Hsin-Vonn

    2014-09-01

    The Top Application characteristics Remainder Offer characteristics Tree (TAROT) was first introduced in 2007. This is a modified Classification and Regression Trees (CART) used to help decide which question(s) to ask potential applicants to customise an offer of a personal financial product so that it would have a high probability of take up. In this piece of work the authors are presenting, they have further modified the TAROT to cross TAROT, using its properties and modeling steps to deal with the issue of cross-selling. Since the bank already has ready customers, it would be ideal to cross-sell the financial products seeing that one can ask one (or more) further question(s) based on the initial offer to identify and customise another financial product to offer.

  1. Detection of fraudulent financial statements using the hybrid data mining approach.

    PubMed

    Chen, Suduan

    2016-01-01

    The purpose of this study is to construct a valid and rigorous fraudulent financial statement detection model. The research objects are companies which experienced both fraudulent and non-fraudulent financial statements between the years 2002 and 2013. In the first stage, two decision tree algorithms, including the classification and regression trees (CART) and the Chi squared automatic interaction detector (CHAID) are applied in the selection of major variables. The second stage combines CART, CHAID, Bayesian belief network, support vector machine and artificial neural network in order to construct fraudulent financial statement detection models. According to the results, the detection performance of the CHAID-CART model is the most effective, with an overall accuracy of 87.97 % (the FFS detection accuracy is 92.69 %).

  2. Stacked Denoising Autoencoders Applied to Star/Galaxy Classification

    NASA Astrophysics Data System (ADS)

    Qin, Hao-ran; Lin, Ji-ming; Wang, Jun-yi

    2017-04-01

    In recent years, the deep learning algorithm, with the characteristics of strong adaptability, high accuracy, and structural complexity, has become more and more popular, but it has not yet been used in astronomy. In order to solve the problem that the star/galaxy classification accuracy is high for the bright source set, but low for the faint source set of the Sloan Digital Sky Survey (SDSS) data, we introduced the new deep learning algorithm, namely the SDA (stacked denoising autoencoder) neural network and the dropout fine-tuning technique, which can greatly improve the robustness and antinoise performance. We randomly selected respectively the bright source sets and faint source sets from the SDSS DR12 and DR7 data with spectroscopic measurements, and made preprocessing on them. Then, we randomly selected respectively the training sets and testing sets without replacement from the bright source sets and faint source sets. At last, using these training sets we made the training to obtain the SDA models of the bright sources and faint sources in the SDSS DR7 and DR12, respectively. We compared the test result of the SDA model on the DR12 testing set with the test results of the Library for Support Vector Machines (LibSVM), J48 decision tree, Logistic Model Tree (LMT), Support Vector Machine (SVM), Logistic Regression, and Decision Stump algorithm, and compared the test result of the SDA model on the DR7 testing set with the test results of six kinds of decision trees. The experiments show that the SDA has a better classification accuracy than other machine learning algorithms for the faint source sets of DR7 and DR12. Especially, when the completeness function is used as the evaluation index, compared with the decision tree algorithms, the correctness rate of SDA has improved about 15% for the faint source set of SDSS-DR7.

  3. Comparison of Naive Bayes and Decision Tree on Feature Selection Using Genetic Algorithm for Classification Problem

    NASA Astrophysics Data System (ADS)

    Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias

    2018-03-01

    This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.

  4. A novel tree-based procedure for deciphering the genomic spectrum of clinical disease entities.

    PubMed

    Mbogning, Cyprien; Perdry, Hervé; Toussile, Wilson; Broët, Philippe

    2014-01-01

    Dissecting the genomic spectrum of clinical disease entities is a challenging task. Recursive partitioning (or classification trees) methods provide powerful tools for exploring complex interplay among genomic factors, with respect to a main factor, that can reveal hidden genomic patterns. To take confounding variables into account, the partially linear tree-based regression (PLTR) model has been recently published. It combines regression models and tree-based methodology. It is however computationally burdensome and not well suited for situations for which a large number of exploratory variables is expected. We developed a novel procedure that represents an alternative to the original PLTR procedure, and considered different selection criteria. A simulation study with different scenarios has been performed to compare the performances of the proposed procedure to the original PLTR strategy. The proposed procedure with a Bayesian Information Criterion (BIC) achieved good performances to detect the hidden structure as compared to the original procedure. The novel procedure was used for analyzing patterns of copy-number alterations in lung adenocarcinomas, with respect to Kirsten Rat Sarcoma Viral Oncogene Homolog gene (KRAS) mutation status, while controlling for a cohort effect. Results highlight two subgroups of pure or nearly pure wild-type KRAS tumors with particular copy-number alteration patterns. The proposed procedure with a BIC criterion represents a powerful and practical alternative to the original procedure. Our procedure performs well in a general framework and is simple to implement.

  5. Using decision trees to understand structure in missing data

    PubMed Central

    Tierney, Nicholas J; Harden, Fiona A; Harden, Maurice J; Mengersen, Kerrie L

    2015-01-01

    Objectives Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting Data taken from employees at 3 different industrial sites in Australia. Participants 7915 observations were included. Materials and methods The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions Researchers are encouraged to use CART and BRT models to explore and understand missing data. PMID:26124509

  6. Establishing Decision Trees for Predicting Successful Postpyloric Nasoenteric Tube Placement in Critically Ill Patients.

    PubMed

    Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo

    2016-08-31

    Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.

  7. Establishing Decision Trees for Predicting Successful Postpyloric Nasoenteric Tube Placement in Critically Ill Patients.

    PubMed

    Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo

    2018-01-01

    Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.

  8. Data mining for rapid prediction of facility fit and debottlenecking of biomanufacturing facilities.

    PubMed

    Yang, Yang; Farid, Suzanne S; Thornhill, Nina F

    2014-06-10

    Higher titre processes can pose facility fit challenges in legacy biopharmaceutical purification suites with capacities originally matched to lower titre processes. Bottlenecks caused by mismatches in equipment sizes, combined with process fluctuations upon scale-up, can result in discarding expensive product. This paper describes a data mining decisional tool for rapid prediction of facility fit issues and debottlenecking of biomanufacturing facilities exposed to batch-to-batch variability and higher titres. The predictive tool comprised advanced multivariate analysis techniques to interrogate Monte Carlo stochastic simulation datasets that mimicked batch fluctuations in cell culture titres, step yields and chromatography eluate volumes. A decision tree classification method, CART (classification and regression tree) was introduced to explore the impact of these process fluctuations on product mass loss and reveal the root causes of bottlenecks. The resulting pictorial decision tree determined a series of if-then rules for the critical combinations of factors that lead to different mass loss levels. Three different debottlenecking strategies were investigated involving changes to equipment sizes, using higher capacity chromatography resins and elution buffer optimisation. The analysis compared the impact of each strategy on mass output, direct cost of goods per gram and processing time, as well as consideration of extra capital investment and space requirements. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Land cover and forest formation distributions for St. Kitts, Nevis, St. Eustatius, Grenada and Barbados from decision tree classification of cloud-cleared satellite imagery

    USGS Publications Warehouse

    Helmer, E.H.; Kennaway, T.A.; Pedreros, D.H.; Clark, M.L.; Marcano-Vega, H.; Tieszen, L.L.; Ruzycki, T.R.; Schill, S.R.; Carrington, C.M.S.

    2008-01-01

    Satellite image-based mapping of tropical forests is vital to conservation planning. Standard methods for automated image classification, however, limit classification detail in complex tropical landscapes. In this study, we test an approach to Landsat image interpretation on four islands of the Lesser Antilles, including Grenada and St. Kitts, Nevis and St. Eustatius, testing a more detailed classification than earlier work in the latter three islands. Secondly, we estimate the extents of land cover and protected forest by formation for five islands and ask how land cover has changed over the second half of the 20th century. The image interpretation approach combines image mosaics and ancillary geographic data, classifying the resulting set of raster data with decision tree software. Cloud-free image mosaics for one or two seasons were created by applying regression tree normalization to scene dates that could fill cloudy areas in a base scene. Such mosaics are also known as cloud-filled, cloud-minimized or cloud-cleared imagery, mosaics, or composites. The approach accurately distinguished several classes that more standard methods would confuse; the seamless mosaics aided reference data collection; and the multiseason imagery allowed us to separate drought deciduous forests and woodlands from semi-deciduous ones. Cultivated land areas declined 60 to 100 percent from about 1945 to 2000 on several islands. Meanwhile, forest cover has increased 50 to 950%. This trend will likely continue where sugar cane cultivation has dominated. Like the island of Puerto Rico, most higher-elevation forest formations are protected in formal or informal reserves. Also similarly, lowland forests, which are drier forest types on these islands, are not well represented in reserves. Former cultivated lands in lowland areas could provide lands for new reserves of drier forest types. The land-use history of these islands may provide insight for planners in countries currently considering lowland forest clearing for agriculture. Copyright 2008 College of Arts and Sciences.

  10. Digital soil classification and elemental mapping using imaging Vis-NIR spectroscopy: How to explicitly quantify stagnic properties of a Luvisol under Norway spruce

    NASA Astrophysics Data System (ADS)

    Kriegs, Stefanie; Buddenbaum, Henning; Rogge, Derek; Steffens, Markus

    2015-04-01

    Laboratory imaging Vis-NIR spectroscopy of soil profiles is a novel technique in soil science that can determine quantity and quality of various chemical soil properties with a hitherto unreached spatial resolution in undisturbed soil profiles. We have applied this technique to soil cores in order to get quantitative proof of redoximorphic processes under two different tree species and to proof tree-soil interactions at microscale. Due to the imaging capabilities of Vis-NIR spectroscopy a spatially explicit understanding of soil processes and properties can be achieved. Spatial heterogeneity of the soil profile can be taken into account. We took six 30 cm long rectangular soil columns of adjacent Luvisols derived from quaternary aeolian sediments (Loess) in a forest soil near Freising/Bavaria using stainless steel boxes (100×100×300 mm). Three profiles were sampled under Norway spruce and three under European beech. A hyperspectral camera (VNIR, 400-1000 nm in 160 spectral bands) with spatial resolution of 63×63 µm² per pixel was used for data acquisition. Reference samples were taken at representative spots and analysed for organic carbon (OC) quantity and quality with a CN elemental analyser and for iron oxides (Fe) content using dithionite extraction followed by ICP-OES measurement. We compared two supervised classification algorithms, Spectral Angle Mapper and Maximum Likelihood, using different sets of training areas and spectral libraries. As established in chemometrics we used multivariate analysis such as partial least-squares regression (PLSR) in addition to multivariate adaptive regression splines (MARS) to correlate chemical data with Vis-NIR spectra. As a result elemental mapping of Fe and OC within the soil core at high spatial resolution has been achieved. The regression model was validated by a new set of reference samples for chemical analysis. Digital soil classification easily visualizes soil properties within the soil profiles. By combining both techniques, detailed soil maps, elemental balances and a deeper understanding of soil forming processes at the microscale become feasible for complete soil profiles.

  11. Mapping Species Composition of Forests and Tree Plantations in Northeastern Costa Rica with an Integration of Hyperspectral and Multitemporal Landsat Imagery

    NASA Technical Reports Server (NTRS)

    Fagan, Matthew E.; Defries, Ruth S.; Sesnie, Steven E.; Arroyo-Mora, J. Pablo; Soto, Carlomagno; Singh, Aditya; Townsend, Philip A.; Chazdon, Robin L.

    2015-01-01

    An efficient means to map tree plantations is needed to detect tropical land use change and evaluate reforestation projects. To analyze recent tree plantation expansion in northeastern Costa Rica, we examined the potential of combining moderate-resolution hyperspectral imagery (2005 HyMap mosaic) with multitemporal, multispectral data (Landsat) to accurately classify (1) general forest types and (2) tree plantations by species composition. Following a linear discriminant analysis to reduce data dimensionality, we compared four Random Forest classification models: hyperspectral data (HD) alone; HD plus interannual spectral metrics; HD plus a multitemporal forest regrowth classification; and all three models combined. The fourth, combined model achieved overall accuracy of 88.5%. Adding multitemporal data significantly improved classification accuracy (p less than 0.0001) of all forest types, although the effect on tree plantation accuracy was modest. The hyperspectral data alone classified six species of tree plantations with 75% to 93% producer's accuracy; adding multitemporal spectral data increased accuracy only for two species with dense canopies. Non-native tree species had higher classification accuracy overall and made up the majority of tree plantations in this landscape. Our results indicate that combining occasionally acquired hyperspectral data with widely available multitemporal satellite imagery enhances mapping and monitoring of reforestation in tropical landscapes.

  12. Using CART to Identify Thresholds and Hierarchies in the Determinants of Funding Decisions.

    PubMed

    Schilling, Chris; Mortimer, Duncan; Dalziel, Kim

    2017-02-01

    There is much interest in understanding decision-making processes that determine funding outcomes for health interventions. We use classification and regression trees (CART) to identify cost-effectiveness thresholds and hierarchies in the determinants of funding decisions. The hierarchical structure of CART is suited to analyzing complex conditional and nonlinear relationships. Our analysis uncovered hierarchies where interventions were grouped according to their type and objective. Cost-effectiveness thresholds varied markedly depending on which group the intervention belonged to: lifestyle-type interventions with a prevention objective had an incremental cost-effectiveness threshold of $2356, suggesting that such interventions need to be close to cost saving or dominant to be funded. For lifestyle-type interventions with a treatment objective, the threshold was much higher at $37,024. Lower down the tree, intervention attributes such as the level of patient contribution and the eligibility for government reimbursement influenced the likelihood of funding within groups of similar interventions. Comparison between our CART models and previously published results demonstrated concurrence with standard regression techniques while providing additional insights regarding the role of the funding environment and the structure of decision-maker preferences.

  13. Pre-operative prediction of surgical morbidity in children: comparison of five statistical models.

    PubMed

    Cooper, Jennifer N; Wei, Lai; Fernandez, Soledad A; Minneci, Peter C; Deans, Katherine J

    2015-02-01

    The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children. We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a LR model that assumed linearity and additivity (simple LR model) (2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity. The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination. Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. [Analysis of dietary pattern and diabetes mellitus influencing factors identified by classification tree model in adults of Fujian].

    PubMed

    Yu, F L; Ye, Y; Yan, Y S

    2017-05-10

    Objective: To find out the dietary patterns and explore the relationship between environmental factors (especially dietary patterns) and diabetes mellitus in the adults of Fujian. Methods: Multi-stage sampling method were used to survey residents aged ≥18 years by questionnaire, physical examination and laboratory detection in 10 disease surveillance points in Fujian. Factor analysis was used to identify the dietary patterns, while logistic regression model was applied to analyze relationship between dietary patterns and diabetes mellitus, and classification tree model was adopted to identify the influencing factors for diabetes mellitus. Results: There were four dietary patterns in the population, including meat, plant, high-quality protein, and fried food and beverages patterns. The result of logistic analysis showed that plant pattern, which has higher factor loading of fresh fruit-vegetables and cereal-tubers, was a protective factor for non-diabetes mellitus. The risk of diabetes mellitus in the population at T2 and T3 levels of factor score were 0.727 (95 %CI: 0.561-0.943) times and 0.736 (95 %CI : 0.573-0.944) times higher, respectively, than those whose factor score was in lowest quartile. Thirteen influencing factors and eleven group at high-risk for diabetes mellitus were identified by classification tree model. The influencing factors were dyslipidemia, age, family history of diabetes, hypertension, physical activity, career, sex, sedentary time, abdominal adiposity, BMI, marital status, sleep time and high-quality protein pattern. Conclusion: There is a close association between dietary patterns and diabetes mellitus. It is necessary to promote healthy and reasonable diet, strengthen the monitoring and control of blood lipids, blood pressure and body weight, and have good lifestyle for the prevention and control of diabetes mellitus.

  15. Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.

    PubMed

    Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris

    2016-09-01

    Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. [An object-based information extraction technology for dominant tree species group types].

    PubMed

    Tian, Tian; Fan, Wen-yi; Lu, Wei; Xiao, Xiang

    2015-06-01

    Information extraction for dominant tree group types is difficult in remote sensing image classification, howevers, the object-oriented classification method using high spatial resolution remote sensing data is a new method to realize the accurate type information extraction. In this paper, taking the Jiangle Forest Farm in Fujian Province as the research area, based on the Quickbird image data in 2013, the object-oriented method was adopted to identify the farmland, shrub-herbaceous plant, young afforested land, Pinus massoniana, Cunninghamia lanceolata and broad-leave tree types. Three types of classification factors including spectral, texture, and different vegetation indices were used to establish a class hierarchy. According to the different levels, membership functions and the decision tree classification rules were adopted. The results showed that the method based on the object-oriented method by using texture, spectrum and the vegetation indices achieved the classification accuracy of 91.3%, which was increased by 5.7% compared with that by only using the texture and spectrum.

  17. Analyzing Whitebark Pine Distribution in the Northern Rocky Mountains in Support of Grizzly Bear Recovery

    NASA Astrophysics Data System (ADS)

    Lawrence, R.; Landenburger, L.; Jewett, J.

    2007-12-01

    Whitebark pine seeds have long been identified as the most significant vegetative food source for grizzly bears in the Greater Yellowstone Ecosystem (GYE) and, hence, a crucial element of suitable grizzly bear habitat. The overall health and status of whitebark pine in the GYE is currently threatened by mountain pine beetle infestations and the spread of whitepine blister rust. Whitebark pine distribution (presence/absence) was mapped for the GYE using Landsat 7 Enhanced Thematic Mapper (ETM+) imagery and topographic data as part of a long-term inter-agency monitoring program. Logistic regression was compared with classification tree analysis (CTA) with and without boosting. Overall comparative classification accuracies for the central portion of the GYE covering three ETM+ images along a single path ranged from 91.6% using logistic regression to 95.8% with See5's CTA algorithm with the maximum 99 boosts. The analysis is being extended to the entire northern Rocky Mountain Ecosystem and extended over decadal time scales. The analysis is being extended to the entire northern Rocky Mountain Ecosystem and extended over decadal time scales.

  18. Comparisons and Selections of Features and Classifiers for Short Text Classification

    NASA Astrophysics Data System (ADS)

    Wang, Ye; Zhou, Zhi; Jin, Shan; Liu, Debin; Lu, Mi

    2017-10-01

    Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.

  19. Texture classification of normal tissues in computed tomography using Gabor filters

    NASA Astrophysics Data System (ADS)

    Dettori, Lucia; Bashir, Alia; Hasemann, Julie

    2007-03-01

    The research presented in this article is aimed at developing an automated imaging system for classification of normal tissues in medical images obtained from Computed Tomography (CT) scans. Texture features based on a bank of Gabor filters are used to classify the following tissues of interests: liver, spleen, kidney, aorta, trabecular bone, lung, muscle, IP fat, and SQ fat. The approach consists of three steps: convolution of the regions of interest with a bank of 32 Gabor filters (4 frequencies and 8 orientations), extraction of two Gabor texture features per filter (mean and standard deviation), and creation of a Classification and Regression Tree-based classifier that automatically identifies the various tissues. The data set used consists of approximately 1000 DIACOM images from normal chest and abdominal CT scans of five patients. The regions of interest were labeled by expert radiologists. Optimal trees were generated using two techniques: 10-fold cross-validation and splitting of the data set into a training and a testing set. In both cases, perfect classification rules were obtained provided enough images were available for training (~65%). All performance measures (sensitivity, specificity, precision, and accuracy) for all regions of interest were at 100%. This significantly improves previous results that used Wavelet, Ridgelet, and Curvelet texture features, yielding accuracy values in the 85%-98% range The Gabor filters' ability to isolate features at different frequencies and orientations allows for a multi-resolution analysis of texture essential when dealing with, at times, very subtle differences in the texture of tissues in CT scans.

  20. Instruction-matrix-based genetic programming.

    PubMed

    Li, Gang; Wang, Jin Feng; Lee, Kin Hong; Leung, Kwong-Sak

    2008-08-01

    In genetic programming (GP), evolving tree nodes separately would reduce the huge solution space. However, tree nodes are highly interdependent with respect to their fitness. In this paper, we propose a new GP framework, namely, instruction-matrix (IM)-based GP (IMGP), to handle their interactions. IMGP maintains an IM to evolve tree nodes and subtrees separately. IMGP extracts program trees from an IM and updates the IM with the information of the extracted program trees. As the IM actually keeps most of the information of the schemata of GP and evolves the schemata directly, IMGP is effective and efficient. Our experimental results on benchmark problems have verified that IMGP is not only better than those of canonical GP in terms of the qualities of the solutions and the number of program evaluations, but they are also better than some of the related GP algorithms. IMGP can also be used to evolve programs for classification problems. The classifiers obtained have higher classification accuracies than four other GP classification algorithms on four benchmark classification problems. The testing errors are also comparable to or better than those obtained with well-known classifiers. Furthermore, an extended version, called condition matrix for rule learning, has been used successfully to handle multiclass classification problems.

  1. Prediction of Patient-Controlled Analgesic Consumption: A Multimodel Regression Tree Approach.

    PubMed

    Hu, Yuh-Jyh; Ku, Tien-Hsiung; Yang, Yu-Hung; Shen, Jia-Ying

    2018-01-01

    Several factors contribute to individual variability in postoperative pain, therefore, individuals consume postoperative analgesics at different rates. Although many statistical studies have analyzed postoperative pain and analgesic consumption, most have identified only the correlation and have not subjected the statistical model to further tests in order to evaluate its predictive accuracy. In this study involving 3052 patients, a multistrategy computational approach was developed for analgesic consumption prediction. This approach uses data on patient-controlled analgesia demand behavior over time and combines clustering, classification, and regression to mitigate the limitations of current statistical models. Cross-validation results indicated that the proposed approach significantly outperforms various existing regression methods. Moreover, a comparison between the predictions by anesthesiologists and medical specialists and those of the computational approach for an independent test data set of 60 patients further evidenced the superiority of the computational approach in predicting analgesic consumption because it produced markedly lower root mean squared errors.

  2. A simple and robust classification tree for differentiation between benign and malignant lesions in MR-mammography.

    PubMed

    Baltzer, Pascal A T; Dietzel, Matthias; Kaiser, Werner A

    2013-08-01

    In the face of multiple available diagnostic criteria in MR-mammography (MRM), a practical algorithm for lesion classification is needed. Such an algorithm should be as simple as possible and include only important independent lesion features to differentiate benign from malignant lesions. This investigation aimed to develop a simple classification tree for differential diagnosis in MRM. A total of 1,084 lesions in standardised MRM with subsequent histological verification (648 malignant, 436 benign) were investigated. Seventeen lesion criteria were assessed by 2 readers in consensus. Classification analysis was performed using the chi-squared automatic interaction detection (CHAID) method. Results include the probability for malignancy for every descriptor combination in the classification tree. A classification tree incorporating 5 lesion descriptors with a depth of 3 ramifications (1, root sign; 2, delayed enhancement pattern; 3, border, internal enhancement and oedema) was calculated. Of all 1,084 lesions, 262 (40.4 %) and 106 (24.3 %) could be classified as malignant and benign with an accuracy above 95 %, respectively. Overall diagnostic accuracy was 88.4 %. The classification algorithm reduced the number of categorical descriptors from 17 to 5 (29.4 %), resulting in a high classification accuracy. More than one third of all lesions could be classified with accuracy above 95 %. • A practical algorithm has been developed to classify lesions found in MR-mammography. • A simple decision tree consisting of five criteria reaches high accuracy of 88.4 %. • Unique to this approach, each classification is associated with a diagnostic certainty. • Diagnostic certainty of greater than 95 % is achieved in 34 % of all cases.

  3. Development and validation of a casemix classification to predict costs of specialist palliative care provision across inpatient hospice, hospital and community settings in the UK: a study protocol

    PubMed Central

    Guo, Ping; Dzingina, Mendwas; Firth, Alice M; Davies, Joanna M; Douiri, Abdel; O’Brien, Suzanne M; Pinto, Cathryn; Pask, Sophie; Higginson, Irene J; Eagar, Kathy; Murtagh, Fliss E M

    2018-01-01

    Introduction Provision of palliative care is inequitable with wide variations across conditions and settings in the UK. Lack of a standard way to classify by case complexity is one of the principle obstacles to addressing this. We aim to develop and validate a casemix classification to support the prediction of costs of specialist palliative care provision. Methods and analysis Phase I: A cohort study to determine the variables and potential classes to be included in a casemix classification. Data are collected from clinicians in palliative care services across inpatient hospice, hospital and community settings on: patient demographics, potential complexity/casemix criteria and patient-level resource use. Cost predictors are derived using multivariate regression and then incorporated into a classification using classification and regression trees. Internal validation will be conducted by bootstrapping to quantify any optimism in the predictive performance (calibration and discrimination) of the developed classification. Phase II: A mixed-methods cohort study across settings for external validation of the classification developed in phase I. Patient and family caregiver data will be collected longitudinally on demographics, potential complexity/casemix criteria and patient-level resource use. This will be triangulated with data collected from clinicians on potential complexity/casemix criteria and patient-level resource use, and with qualitative interviews with patients and caregivers about care provision across difference settings. The classification will be refined on the basis of its performance in the validation data set. Ethics and dissemination The study has been approved by the National Health Service Health Research Authority Research Ethics Committee. The results are expected to be disseminated in 2018 through papers for publication in major palliative care journals; policy briefs for clinicians, commissioning leads and policy makers; and lay summaries for patients and public. Trial registration number ISRCTN90752212. PMID:29550781

  4. Development and validation of a casemix classification to predict costs of specialist palliative care provision across inpatient hospice, hospital and community settings in the UK: a study protocol.

    PubMed

    Guo, Ping; Dzingina, Mendwas; Firth, Alice M; Davies, Joanna M; Douiri, Abdel; O'Brien, Suzanne M; Pinto, Cathryn; Pask, Sophie; Higginson, Irene J; Eagar, Kathy; Murtagh, Fliss E M

    2018-03-17

    Provision of palliative care is inequitable with wide variations across conditions and settings in the UK. Lack of a standard way to classify by case complexity is one of the principle obstacles to addressing this. We aim to develop and validate a casemix classification to support the prediction of costs of specialist palliative care provision. Phase I: A cohort study to determine the variables and potential classes to be included in a casemix classification. Data are collected from clinicians in palliative care services across inpatient hospice, hospital and community settings on: patient demographics, potential complexity/casemix criteria and patient-level resource use. Cost predictors are derived using multivariate regression and then incorporated into a classification using classification and regression trees. Internal validation will be conducted by bootstrapping to quantify any optimism in the predictive performance (calibration and discrimination) of the developed classification. Phase II: A mixed-methods cohort study across settings for external validation of the classification developed in phase I. Patient and family caregiver data will be collected longitudinally on demographics, potential complexity/casemix criteria and patient-level resource use. This will be triangulated with data collected from clinicians on potential complexity/casemix criteria and patient-level resource use, and with qualitative interviews with patients and caregivers about care provision across difference settings. The classification will be refined on the basis of its performance in the validation data set. The study has been approved by the National Health Service Health Research Authority Research Ethics Committee. The results are expected to be disseminated in 2018 through papers for publication in major palliative care journals; policy briefs for clinicians, commissioning leads and policy makers; and lay summaries for patients and public. ISRCTN90752212. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  5. VEGETATION COVER ANALYSIS OF HAZARDOUS WASTE SITES IN UTAH AND ARIZONA USING HYPERSPECTRAL REMOTE SENSING

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Serrato, M.; Jungho, I.; Jensen, J.

    2012-01-17

    Remote sensing technology can provide a cost-effective tool for monitoring hazardous waste sites. This study investigated the usability of HyMap airborne hyperspectral remote sensing data (126 bands at 2.3 x 2.3 m spatial resolution) to characterize the vegetation at U.S. Department of Energy uranium processing sites near Monticello, Utah and Monument Valley, Arizona. Grass and shrub species were mixed on an engineered disposal cell cover at the Monticello site while shrub species were dominant in the phytoremediation plantings at the Monument Valley site. The specific objectives of this study were to: (1) estimate leaf-area-index (LAI) of the vegetation using threemore » different methods (i.e., vegetation indices, red-edge positioning (REP), and machine learning regression trees), and (2) map the vegetation cover using machine learning decision trees based on either the scaled reflectance data or mixture tuned matched filtering (MTMF)-derived metrics and vegetation indices. Regression trees resulted in the best calibration performance of LAI estimation (R{sup 2} > 0.80). The use of REPs failed to accurately predict LAI (R{sup 2} < 0.2). The use of the MTMF-derived metrics (matched filter scores and infeasibility) and a range of vegetation indices in decision trees improved the vegetation mapping when compared to the decision tree classification using just the scaled reflectance. Results suggest that hyperspectral imagery are useful for characterizing biophysical characteristics (LAI) and vegetation cover on capped hazardous waste sites. However, it is believed that the vegetation mapping would benefit from the use of 1 higher spatial resolution hyperspectral data due to the small size of many of the vegetation patches (< 1m) found on the sites.« less

  6. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China

    PubMed Central

    Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin

    2016-01-01

    Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328

  7. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China.

    PubMed

    Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin

    2016-05-20

    In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.

  8. Object-based methods for individual tree identification and tree species classification from high-spatial resolution imagery

    NASA Astrophysics Data System (ADS)

    Wang, Le

    2003-10-01

    Modern forest management poses an increasing need for detailed knowledge of forest information at different spatial scales. At the forest level, the information for tree species assemblage is desired whereas at or below the stand level, individual tree related information is preferred. Remote Sensing provides an effective tool to extract the above information at multiple spatial scales in the continuous time domain. To date, the increasing volume and readily availability of high-spatial-resolution data have lead to a much wider application of remotely sensed products. Nevertheless, to make effective use of the improving spatial resolution, conventional pixel-based classification methods are far from satisfactory. Correspondingly, developing object-based methods becomes a central challenge for researchers in the field of Remote Sensing. This thesis focuses on the development of methods for accurate individual tree identification and tree species classification. We develop a method in which individual tree crown boundaries and treetop locations are derived under a unified framework. We apply a two-stage approach with edge detection followed by marker-controlled watershed segmentation. Treetops are modeled from radiometry and geometry aspects. Specifically, treetops are assumed to be represented by local radiation maxima and to be located near the center of the tree-crown. As a result, a marker image was created from the derived treetop to guide a watershed segmentation to further differentiate overlapping trees and to produce a segmented image comprised of individual tree crowns. The image segmentation method developed achieves a promising result for a 256 x 256 CASI image. Then further effort is made to extend our methods to the multiscales which are constructed from a wavelet decomposition. A scale consistency and geometric consistency are designed to examine the gradients along the scale-space for the purpose of separating true crown boundary from unwanted textures occurring due to branches and twigs. As a result from the inverse wavelet transform, the tree crown boundary is enhanced while the unwanted textures are suppressed. Based on the enhanced image, an improvement is achieved when applying the two-stage methods to a high resolution aerial photograph. To improve tree species classification, we develop a new method to choose the optimal scale parameter with the aid of Bhattacharya Distance (BD), a well-known index of class separability in traditional pixel-based classification. The optimal scale parameter is then fed in the process of a region-growing-based segmentation as a break-off value. Our object classification achieves a better accuracy in separating tree species when compared to the conventional Maximum Likelihood Classification (MLC). In summary, we develop two object-based methods for identifying individual trees and classifying tree species from high-spatial resolution imagery. Both methods achieve promising results and will promote integration of Remote Sensing and GIS in forest applications.

  9. New Tree-Classification System Used by the Southern Forest Inventory and Analysis Unit

    Treesearch

    Dennis M. May; John S. Vissage; D. Vince Few

    1990-01-01

    Trees at USDA Forest Service, Southern Forest Inventory and Analysis, sample locations are classified as growing stock or cull based on their ability to produce sawlogs. The old and new classification systems are compared, and the impacts of the new system on the reporting of tree volumes are illustrated with inventory data from north Alabama.

  10. An Automated Algorithm to Screen Massive Training Samples for a Global Impervious Surface Classification

    NASA Technical Reports Server (NTRS)

    Tan, Bin; Brown de Colstoun, Eric; Wolfe, Robert E.; Tilton, James C.; Huang, Chengquan; Smith, Sarah E.

    2012-01-01

    An algorithm is developed to automatically screen the outliers from massive training samples for Global Land Survey - Imperviousness Mapping Project (GLS-IMP). GLS-IMP is to produce a global 30 m spatial resolution impervious cover data set for years 2000 and 2010 based on the Landsat Global Land Survey (GLS) data set. This unprecedented high resolution impervious cover data set is not only significant to the urbanization studies but also desired by the global carbon, hydrology, and energy balance researches. A supervised classification method, regression tree, is applied in this project. A set of accurate training samples is the key to the supervised classifications. Here we developed the global scale training samples from 1 m or so resolution fine resolution satellite data (Quickbird and Worldview2), and then aggregate the fine resolution impervious cover map to 30 m resolution. In order to improve the classification accuracy, the training samples should be screened before used to train the regression tree. It is impossible to manually screen 30 m resolution training samples collected globally. For example, in Europe only, there are 174 training sites. The size of the sites ranges from 4.5 km by 4.5 km to 8.1 km by 3.6 km. The amount training samples are over six millions. Therefore, we develop this automated statistic based algorithm to screen the training samples in two levels: site and scene level. At the site level, all the training samples are divided to 10 groups according to the percentage of the impervious surface within a sample pixel. The samples following in each 10% forms one group. For each group, both univariate and multivariate outliers are detected and removed. Then the screen process escalates to the scene level. A similar screen process but with a looser threshold is applied on the scene level considering the possible variance due to the site difference. We do not perform the screen process across the scenes because the scenes might vary due to the phenology, solar-view geometry, and atmospheric condition etc. factors but not actual landcover difference. Finally, we will compare the classification results from screened and unscreened training samples to assess the improvement achieved by cleaning up the training samples. Keywords:

  11. A classification tree for the prediction of benign versus malignant disease in patients with small renal masses.

    PubMed

    Rendon, Ricardo A; Mason, Ross J; Kirkland, Susan; Lawen, Joseph G; Abdolell, Mohamed

    2014-08-01

    To develop a classification tree for the preoperative prediction of benign versus malignant disease in patients with small renal masses. This is a retrospective study including 395 consecutive patients who underwent surgical treatment for a renal mass < 5 cm in maximum diameter between July 1st 2001 and June 30th 2010. A classification tree to predict the risk of having a benign renal mass preoperatively was developed using recursive partitioning analysis for repeated measures outcomes. Age, sex, volume on preoperative imaging, tumor location (central/peripheral), degree of endophytic component (1%-100%), and tumor axis position were used as potential predictors to develop the model. Forty-five patients (11.4%) were found to have a benign mass postoperatively. A classification tree has been developed which can predict the risk of benign disease with an accuracy of 88.9% (95% CI: 85.3 to 91.8). The significant prognostic factors in the classification tree are tumor volume, degree of endophytic component and symptoms at diagnosis. As an example of its utilization, a renal mass with a volume of < 5.67 cm3 that is < 45% endophytic has a 52.6% chance of having benign pathology. Conversely, a renal mass with a volume ≥ 5.67 cm3 that is ≥ 35% endophytic has only a 5.3% possibility of being benign. A classification tree to predict the risk of benign disease in small renal masses has been developed to aid the clinician when deciding on treatment strategies for small renal masses.

  12. Multi-criteria manufacturability indices for ranking high-concentration monoclonal antibody formulations.

    PubMed

    Yang, Yang; Velayudhan, Ajoy; Thornhill, Nina F; Farid, Suzanne S

    2017-09-01

    The need for high-concentration formulations for subcutaneous delivery of therapeutic monoclonal antibodies (mAbs) can present manufacturability challenges for the final ultrafiltration/diafiltration (UF/DF) step. Viscosity levels and the propensity to aggregate are key considerations for high-concentration formulations. This work presents novel frameworks for deriving a set of manufacturability indices related to viscosity and thermostability to rank high-concentration mAb formulation conditions in terms of their ease of manufacture. This is illustrated by analyzing published high-throughput biophysical screening data that explores the influence of different formulation conditions (pH, ions, and excipients) on the solution viscosity and product thermostability. A decision tree classification method, CART (Classification and Regression Tree) is used to identify the critical formulation conditions that influence the viscosity and thermostability. In this work, three different multi-criteria data analysis frameworks were investigated to derive manufacturability indices from analysis of the stress maps and the process conditions experienced in the final UF/DF step. Polynomial regression techniques were used to transform the experimental data into a set of stress maps that show viscosity and thermostability as functions of the formulation conditions. A mathematical filtrate flux model was used to capture the time profiles of protein concentration and flux decay behavior during UF/DF. Multi-criteria decision-making analysis was used to identify the optimal formulation conditions that minimize the potential for both viscosity and aggregation issues during UF/DF. Biotechnol. Bioeng. 2017;114: 2043-2056. © 2017 The Authors. Biotechnology and Bioengineering Published by Wiley Perodicals, Inc. © 2017 The Authors. Biotechnology and Bioengineering Published by Wiley Perodicals, Inc.

  13. Availability and capacity of substance abuse programs in correctional settings: A classification and regression tree analysis.

    PubMed

    Taxman, Faye S; Kitsantas, Panagiota

    2009-08-01

    OBJECTIVE TO BE ADDRESSED: The purpose of this study was to investigate the structural and organizational factors that contribute to the availability and increased capacity for substance abuse treatment programs in correctional settings. We used classification and regression tree statistical procedures to identify how multi-level data can explain the variability in availability and capacity of substance abuse treatment programs in jails and probation/parole offices. The data for this study combined the National Criminal Justice Treatment Practices (NCJTP) Survey and the 2000 Census. The NCJTP survey was a nationally representative sample of correctional administrators for jails and probation/parole agencies. The sample size included 295 substance abuse treatment programs that were classified according to the intensity of their services: high, medium, and low. The independent variables included jurisdictional-level structural variables, attributes of the correctional administrators, and program and service delivery characteristics of the correctional agency. The two most important variables in predicting the availability of all three types of services were stronger working relationships with other organizations and the adoption of a standardized substance abuse screening tool by correctional agencies. For high and medium intensive programs, the capacity increased when an organizational learning strategy was used by administrators and the organization used a substance abuse screening tool. Implications on advancing treatment practices in correctional settings are discussed, including further work to test theories on how to better understand access to intensive treatment services. This study presents the first phase of understanding capacity-related issues regarding treatment programs offered in correctional settings.

  14. Policy Implications and Suggestions on Administrative Measures of Urban Flood

    NASA Astrophysics Data System (ADS)

    Lee, S. V.; Lee, M. J.; Lee, C.; Yoon, J. H.; Chae, S. H.

    2017-12-01

    The frequency and intensity of floods are increasing worldwide as recent climate change progresses gradually. Flood management should be policy-oriented in urban municipalities due to the characteristics of urban areas with a lot of damage. Therefore, the purpose of this study is to prepare a flood susceptibility map by using data mining model and make a policy suggestion on administrative measures of urban flood. Therefore, we constructed a spatial database by collecting relevant factors including the topography, geology, soil and land use data of the representative city, Seoul, the capital city of Korea. Flood susceptibility map was constructed by applying the data mining models of random forest and boosted tree model to input data and existing flooded area data in 2010. The susceptibility map has been validated using the 2011 flood area data which was not used for training. The predictor importance value of each factor to the results was calculated in this process. The distance from the water, DEM and geology showed a high predictor importance value which means to be a high priority for flood preparation policy. As a result of receiver operating characteristic (ROC), random forest model showed 78.78% and 79.18% accuracy of regression and classification and boosted tree model showed 77.55% and 77.26% accuracy of regression and classification, respectively. The results show that the flood susceptibility maps can be applied to flood prevention and management, and it also can help determine the priority areas for flood mitigation policy by providing useful information to policy makers.

  15. Chronic subdural hematoma: Surgical management and outcome in 986 cases: A classification and regression tree approach

    PubMed Central

    Rovlias, Aristedis; Theodoropoulos, Spyridon; Papoutsakis, Dimitrios

    2015-01-01

    Background: Chronic subdural hematoma (CSDH) is one of the most common clinical entities in daily neurosurgical practice which carries a most favorable prognosis. However, because of the advanced age and medical problems of patients, surgical therapy is frequently associated with various complications. This study evaluated the clinical features, radiological findings, and neurological outcome in a large series of patients with CSDH. Methods: A classification and regression tree (CART) technique was employed in the analysis of data from 986 patients who were operated at Asclepeion General Hospital of Athens from January 1986 to December 2011. Burr holes evacuation with closed system drainage has been the operative technique of first choice at our institution for 29 consecutive years. A total of 27 prognostic factors were examined to predict the outcome at 3-month postoperatively. Results: Our results indicated that neurological status on admission was the best predictor of outcome. With regard to the other data, age, brain atrophy, thickness and density of hematoma, subdural accumulation of air, and antiplatelet and anticoagulant therapy were found to correlate significantly with prognosis. The overall cross-validated predictive accuracy of CART model was 85.34%, with a cross-validated relative error of 0.326. Conclusions: Methodologically, CART technique is quite different from the more commonly used methods, with the primary benefit of illustrating the important prognostic variables as related to outcome. Since, the ideal therapy for the treatment of CSDH is still under debate, this technique may prove useful in developing new therapeutic strategies and approaches for patients with CSDH. PMID:26257985

  16. Identifying changes in dissolved organic matter content and characteristics by fluorescence spectroscopy coupled with self-organizing map and classification and regression tree analysis during wastewater treatment.

    PubMed

    Yu, Huibin; Song, Yonghui; Liu, Ruixia; Pan, Hongwei; Xiang, Liancheng; Qian, Feng

    2014-10-01

    The stabilization of latent tracers of dissolved organic matter (DOM) of wastewater was analyzed by three-dimensional excitation-emission matrix (EEM) fluorescence spectroscopy coupled with self-organizing map and classification and regression tree analysis (CART) in wastewater treatment performance. DOM of water samples collected from primary sedimentation, anaerobic, anoxic, oxic and secondary sedimentation tanks in a large-scale wastewater treatment plant contained four fluorescence components: tryptophan-like (C1), tyrosine-like (C2), microbial humic-like (C3) and fulvic-like (C4) materials extracted by self-organizing map. These components showed good positive linear correlations with dissolved organic carbon of DOM. C1 and C2 were representative components in the wastewater, and they were removed to a higher extent than those of C3 and C4 in the treatment process. C2 was a latent parameter determined by CART to differentiate water samples of oxic and secondary sedimentation tanks from the successive treatment units, indirectly proving that most of tyrosine-like material was degraded by anaerobic microorganisms. C1 was an accurate parameter to comprehensively separate the samples of the five treatment units from each other, indirectly indicating that tryptophan-like material was decomposed by anaerobic and aerobic bacteria. EEM fluorescence spectroscopy in combination with self-organizing map and CART analysis can be a nondestructive effective method for characterizing structural component of DOM fractions and monitoring organic matter removal in wastewater treatment process. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Comparing models for quantitative risk assessment: an application to the European Registry of foreign body injuries in children.

    PubMed

    Berchialla, Paola; Scarinzi, Cecilia; Snidero, Silvia; Gregori, Dario

    2016-08-01

    Risk Assessment is the systematic study of decisions subject to uncertain consequences. An increasing interest has been focused on modeling techniques like Bayesian Networks since their capability of (1) combining in the probabilistic framework different type of evidence including both expert judgments and objective data; (2) overturning previous beliefs in the light of the new information being received and (3) making predictions even with incomplete data. In this work, we proposed a comparison among Bayesian Networks and other classical Quantitative Risk Assessment techniques such as Neural Networks, Classification Trees, Random Forests and Logistic Regression models. Hybrid approaches, combining both Classification Trees and Bayesian Networks, were also considered. Among Bayesian Networks, a clear distinction between purely data-driven approach and combination of expert knowledge with objective data is made. The aim of this paper consists in evaluating among this models which best can be applied, in the framework of Quantitative Risk Assessment, to assess the safety of children who are exposed to the risk of inhalation/insertion/aspiration of consumer products. The issue of preventing injuries in children is of paramount importance, in particular where product design is involved: quantifying the risk associated to product characteristics can be of great usefulness in addressing the product safety design regulation. Data of the European Registry of Foreign Bodies Injuries formed the starting evidence for risk assessment. Results showed that Bayesian Networks appeared to have both the ease of interpretability and accuracy in making prediction, even if simpler models like logistic regression still performed well. © The Author(s) 2013.

  18. Multi‐criteria manufacturability indices for ranking high‐concentration monoclonal antibody formulations

    PubMed Central

    Velayudhan, Ajoy; Thornhill, Nina F.

    2017-01-01

    ABSTRACT The need for high‐concentration formulations for subcutaneous delivery of therapeutic monoclonal antibodies (mAbs) can present manufacturability challenges for the final ultrafiltration/diafiltration (UF/DF) step. Viscosity levels and the propensity to aggregate are key considerations for high‐concentration formulations. This work presents novel frameworks for deriving a set of manufacturability indices related to viscosity and thermostability to rank high‐concentration mAb formulation conditions in terms of their ease of manufacture. This is illustrated by analyzing published high‐throughput biophysical screening data that explores the influence of different formulation conditions (pH, ions, and excipients) on the solution viscosity and product thermostability. A decision tree classification method, CART (Classification and Regression Tree) is used to identify the critical formulation conditions that influence the viscosity and thermostability. In this work, three different multi‐criteria data analysis frameworks were investigated to derive manufacturability indices from analysis of the stress maps and the process conditions experienced in the final UF/DF step. Polynomial regression techniques were used to transform the experimental data into a set of stress maps that show viscosity and thermostability as functions of the formulation conditions. A mathematical filtrate flux model was used to capture the time profiles of protein concentration and flux decay behavior during UF/DF. Multi‐criteria decision‐making analysis was used to identify the optimal formulation conditions that minimize the potential for both viscosity and aggregation issues during UF/DF. Biotechnol. Bioeng. 2017;114: 2043–2056. © 2017 The Authors. Biotechnology and Bioengineering Published by Wiley Perodicals, Inc. PMID:28464235

  19. [Predicting very early rebleeding after acute variceal bleeding based in classification and regression tree analysis (CRTA).].

    PubMed

    Altamirano, J; Augustin, S; Muntaner, L; Zapata, L; González-Angulo, A; Martínez, B; Flores-Arroyo, A; Camargo, L; Genescá, J

    2010-01-01

    Variceal bleeding (VB) is the main cause of death among cirrhotic patients. About 30-50% of early rebleeding is encountered few days after the acute episode of VB. It is necessary to stratify patients with high risk of very early rebleeding (VER) for more aggressive therapies. However, there are few and incompletely understood prognostic models for this purpose. To determine the risk factors associated with VER after an acute VB. Assessment and comparison of a novel prognostic model generated by Classification and Regression Tree Analysis (CART) with classic-used models (MELD and Child-Pugh [CP]). Sixty consecutive cirrhotic patients with acute variceal bleeding. CART analysis, MELD and Child-Pugh scores were performed at admission. Receiver operating characteristic (ROC) curves were constructed to evaluate the predictive performance of the models. Very early rebleeding rate was 13%. Variables associated with VER were: serum albumin (p = 0.027), creatinine (p = 0.021) and transfused blood units in the first 24 hrs (p = 0.05). The area under the ROC for MELD, CHILD-Pugh and CART were 0.46, 0.50 and 0.82, respectively. The value of cut analyzed by CART for the significant variables were: 1) Albumin 2.85 mg/dL, 2) Packed red cells 2 units and 3) Creatinine 1.65 mg/dL the ABC-ROC. Serum albumin, creatinine and number of transfused blood units were associated with VER. A simple CART algorithm combining these variables allows an accurate predictive assessment of VER after acute variceal bleeding. Key words: cirrhosis, variceal bleeding, esophageal varices, prognosis, portal hypertension.

  20. Detection of Aspens Using High Resolution Aerial Laser Scanning Data and Digital Aerial Images

    PubMed Central

    Säynäjoki, Raita; Packalén, Petteri; Maltamo, Matti; Vehmas, Mikko; Eerikäinen, Kalle

    2008-01-01

    The aim was to use high resolution Aerial Laser Scanning (ALS) data and aerial images to detect European aspen (Populus tremula L.) from among other deciduous trees. The field data consisted of 14 sample plots of 30 m × 30 m size located in the Koli National Park in the North Karelia, Eastern Finland. A Canopy Height Model (CHM) was interpolated from the ALS data with a pulse density of 3.86/m2, low-pass filtered using Height-Based Filtering (HBF) and binarized to create the mask needed to separate the ground pixels from the canopy pixels within individual areas. Watershed segmentation was applied to the low-pass filtered CHM in order to create preliminary canopy segments, from which the non-canopy elements were extracted to obtain the final canopy segmentation, i.e. the ground mask was analysed against the canopy mask. A manual classification of aerial images was employed to separate the canopy segments of deciduous trees from those of coniferous trees. Finally, linear discriminant analysis was applied to the correctly classified canopy segments of deciduous trees to classify them into segments belonging to aspen and those belonging to other deciduous trees. The independent variables used in the classification were obtained from the first pulse ALS point data. The accuracy of discrimination between aspen and other deciduous trees was 78.6%. The independent variables in the classification function were the proportion of vegetation hits, the standard deviation of in pulse heights, accumulated intensity at the 90th percentile and the proportion of laser points reflected at the 60th height percentile. The accuracy of classification corresponded to the validation results of earlier ALS-based studies on the classification of individual deciduous trees to tree species. PMID:27873799

  1. Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests.

    PubMed

    Hu, Chen; Steingrimsson, Jon Arni

    2018-01-01

    A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.

  2. QSRR modeling for diverse drugs using different feature selection methods coupled with linear and nonlinear regressions.

    PubMed

    Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan

    2012-12-01

    A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.

  3. Hierarchical Matching and Regression with Application to Photometric Redshift Estimation

    NASA Astrophysics Data System (ADS)

    Murtagh, Fionn

    2017-06-01

    This work emphasizes that heterogeneity, diversity, discontinuity, and discreteness in data is to be exploited in classification and regression problems. A global a priori model may not be desirable. For data analytics in cosmology, this is motivated by the variety of cosmological objects such as elliptical, spiral, active, and merging galaxies at a wide range of redshifts. Our aim is matching and similarity-based analytics that takes account of discrete relationships in the data. The information structure of the data is represented by a hierarchy or tree where the branch structure, rather than just the proximity, is important. The representation is related to p-adic number theory. The clustering or binning of the data values, related to the precision of the measurements, has a central role in this methodology. If used for regression, our approach is a method of cluster-wise regression, generalizing nearest neighbour regression. Both to exemplify this analytics approach, and to demonstrate computational benefits, we address the well-known photometric redshift or `photo-z' problem, seeking to match Sloan Digital Sky Survey (SDSS) spectroscopic and photometric redshifts.

  4. Forest tree species discrimination in western Himalaya using EO-1 Hyperion

    NASA Astrophysics Data System (ADS)

    George, Rajee; Padalia, Hitendra; Kushwaha, S. P. S.

    2014-05-01

    The information acquired in the narrow bands of hyperspectral remote sensing data has potential to capture plant species spectral variability, thereby improving forest tree species mapping. This study assessed the utility of spaceborne EO-1 Hyperion data in discrimination and classification of broadleaved evergreen and conifer forest tree species in western Himalaya. The pre-processing of 242 bands of Hyperion data resulted into 160 noise-free and vertical stripe corrected reflectance bands. Of these, 29 bands were selected through step-wise exclusion of bands (Wilk's Lambda). Spectral Angle Mapper (SAM) and Support Vector Machine (SVM) algorithms were applied to the selected bands to assess their effectiveness in classification. SVM was also applied to broadband data (Landsat TM) to compare the variation in classification accuracy. All commonly occurring six gregarious tree species, viz., white oak, brown oak, chir pine, blue pine, cedar and fir in western Himalaya could be effectively discriminated. SVM produced a better species classification (overall accuracy 82.27%, kappa statistic 0.79) than SAM (overall accuracy 74.68%, kappa statistic 0.70). It was noticed that classification accuracy achieved with Hyperion bands was significantly higher than Landsat TM bands (overall accuracy 69.62%, kappa statistic 0.65). Study demonstrated the potential utility of narrow spectral bands of Hyperion data in discriminating tree species in a hilly terrain.

  5. Identification and Mapping of Tree Species in Urban Areas Using WORLDVIEW-2 Imagery

    NASA Astrophysics Data System (ADS)

    Mustafa, Y. T.; Habeeb, H. N.; Stein, A.; Sulaiman, F. Y.

    2015-10-01

    Monitoring and mapping of urban trees are essential to provide urban forestry authorities with timely and consistent information. Modern techniques increasingly facilitate these tasks, but require the development of semi-automatic tree detection and classification methods. In this article, we propose an approach to delineate and map the crown of 15 tree species in the city of Duhok, Kurdistan Region of Iraq using WorldView-2 (WV-2) imagery. A tree crown object is identified first and is subsequently delineated as an image object (IO) using vegetation indices and texture measurements. Next, three classification methods: Maximum Likelihood, Neural Network, and Support Vector Machine were used to classify IOs using selected IO features. The best results are obtained with Support Vector Machine classification that gives the best map of urban tree species in Duhok. The overall accuracy was between 60.93% to 88.92% and κ-coefficient was between 0.57 to 0.75. We conclude that fifteen tree species were identified and mapped at a satisfactory accuracy in urban areas of this study.

  6. A novel approach to internal crown characterization for coniferous tree species classification

    NASA Astrophysics Data System (ADS)

    Harikumar, A.; Bovolo, F.; Bruzzone, L.

    2016-10-01

    The knowledge about individual trees in forest is highly beneficial in forest management. High density small foot- print multi-return airborne Light Detection and Ranging (LiDAR) data can provide a very accurate information about the structural properties of individual trees in forests. Every tree species has a unique set of crown structural characteristics that can be used for tree species classification. In this paper, we use both the internal and external crown structural information of a conifer tree crown, derived from a high density small foot-print multi-return LiDAR data acquisition for species classification. Considering the fact that branches are the major building blocks of a conifer tree crown, we obtain the internal crown structural information using a branch level analysis. The structure of each conifer branch is represented using clusters in the LiDAR point cloud. We propose the joint use of the k-means clustering and geometric shape fitting, on the LiDAR data projected onto a novel 3-dimensional space, to identify branch clusters. After mapping the identified clusters back to the original space, six internal geometric features are estimated using a branch-level analysis. The external crown characteristics are modeled by using six least correlated features based on cone fitting and convex hull. Species classification is performed using a sparse Support Vector Machines (sparse SVM) classifier.

  7. Forest Tree Species Distribution Mapping Using Landsat Satellite Imagery and Topographic Variables with the Maximum Entropy Method in Mongolia

    NASA Astrophysics Data System (ADS)

    Hao Chiang, Shou; Valdez, Miguel; Chen, Chi-Farn

    2016-06-01

    Forest is a very important ecosystem and natural resource for living things. Based on forest inventories, government is able to make decisions to converse, improve and manage forests in a sustainable way. Field work for forestry investigation is difficult and time consuming, because it needs intensive physical labor and the costs are high, especially surveying in remote mountainous regions. A reliable forest inventory can give us a more accurate and timely information to develop new and efficient approaches of forest management. The remote sensing technology has been recently used for forest investigation at a large scale. To produce an informative forest inventory, forest attributes, including tree species are unavoidably required to be considered. In this study the aim is to classify forest tree species in Erdenebulgan County, Huwsgul province in Mongolia, using Maximum Entropy method. The study area is covered by a dense forest which is almost 70% of total territorial extension of Erdenebulgan County and is located in a high mountain region in northern Mongolia. For this study, Landsat satellite imagery and a Digital Elevation Model (DEM) were acquired to perform tree species mapping. The forest tree species inventory map was collected from the Forest Division of the Mongolian Ministry of Nature and Environment as training data and also used as ground truth to perform the accuracy assessment of the tree species classification. Landsat images and DEM were processed for maximum entropy modeling, and this study applied the model with two experiments. The first one is to use Landsat surface reflectance for tree species classification; and the second experiment incorporates terrain variables in addition to the Landsat surface reflectance to perform the tree species classification. All experimental results were compared with the tree species inventory to assess the classification accuracy. Results show that the second one which uses Landsat surface reflectance coupled with terrain variables produced better result, with the higher overall accuracy and kappa coefficient than first experiment. The results indicate that the Maximum Entropy method is an applicable, and to classify tree species using satellite imagery data coupled with terrain information can improve the classification of tree species in the study area.

  8. Factors associated with success of telaprevir- and boceprevir-based triple therapy for hepatitis C virus infection

    PubMed Central

    Bichoupan, Kian; Tandon, Neeta; Martel-Laferriere, Valerie; Patel, Neal M; Sachs, David; Ng, Michel; Schonfeld, Emily A; Pappas, Alexis; Crismale, James; Stivala, Alicia; Khaitova, Viktoriya; Gardenier, Donald; Linderman, Michael; Olson, William; Perumalswami, Ponni V; Schiano, Thomas D; Odin, Joseph A; Liu, Lawrence U; Dieterich, Douglas T; Branch, Andrea D

    2017-01-01

    AIM To evaluate new therapies for hepatitis C virus (HCV), data about real-world outcomes are needed. METHODS Outcomes of 223 patients with genotype 1 HCV who started telaprevir- or boceprevir-based triple therapy (May 2011-March 2012) at the Mount Sinai Medical Center were analyzed. Human immunodeficiency virus-positive patients and patients who received a liver transplant were excluded. Factors associated with sustained virological response (SVR24) and relapse were analyzed by univariable and multivariable logistic regression as well as classification and regression trees. Fast virological response (FVR) was defined as undetectable HCV RNA at week-4 (telaprevir) or week-8 (boceprevir). RESULTS The median age was 57 years, 18% were black, 44% had advanced fibrosis/cirrhosis (FIB-4 ≥ 3.25). Only 42% (94/223) of patients achieved SVR24 on an intention-to-treat basis. In a model that included platelets, SVR24 was associated with white race [odds ratio (OR) = 5.92, 95% confidence interval (CI): 2.34-14.96], HCV sub-genotype 1b (OR = 2.81, 95%CI: 1.45-5.44), platelet count (OR = 1.10, per x 104 cells/μL, 95%CI: 1.05-1.16), and IL28B CC genotype (OR = 3.54, 95%CI: 1.19-10.53). Platelet counts > 135 x 103/μL were the strongest predictor of SVR by classification and regression tree. Relapse occurred in 25% (27/104) of patients with an end-of-treatment response and was associated with non-FVR (OR = 4.77, 95%CI: 1.68-13.56), HCV sub-genotype 1a (OR = 5.20; 95%CI: 1.40-18.97), and FIB-4 ≥ 3.25 (OR = 2.77; 95%CI: 1.07-7.22). CONCLUSION The SVR rate was 42% with telaprevir- or boceprevir-based triple therapy in real-world practice. Low platelets and advanced fibrosis were associated with treatment failure and relapse. PMID:28469811

  9. Factors associated with success of telaprevir- and boceprevir-based triple therapy for hepatitis C virus infection.

    PubMed

    Bichoupan, Kian; Tandon, Neeta; Martel-Laferriere, Valerie; Patel, Neal M; Sachs, David; Ng, Michel; Schonfeld, Emily A; Pappas, Alexis; Crismale, James; Stivala, Alicia; Khaitova, Viktoriya; Gardenier, Donald; Linderman, Michael; Olson, William; Perumalswami, Ponni V; Schiano, Thomas D; Odin, Joseph A; Liu, Lawrence U; Dieterich, Douglas T; Branch, Andrea D

    2017-04-18

    To evaluate new therapies for hepatitis C virus (HCV), data about real-world outcomes are needed. Outcomes of 223 patients with genotype 1 HCV who started telaprevir- or boceprevir-based triple therapy (May 2011-March 2012) at the Mount Sinai Medical Center were analyzed. Human immunodeficiency virus-positive patients and patients who received a liver transplant were excluded. Factors associated with sustained virological response (SVR24) and relapse were analyzed by univariable and multivariable logistic regression as well as classification and regression trees. Fast virological response (FVR) was defined as undetectable HCV RNA at week-4 (telaprevir) or week-8 (boceprevir). The median age was 57 years, 18% were black, 44% had advanced fibrosis/cirrhosis (FIB-4 ≥ 3.25). Only 42% (94/223) of patients achieved SVR24 on an intention-to-treat basis. In a model that included platelets, SVR24 was associated with white race [odds ratio (OR) = 5.92, 95% confidence interval (CI): 2.34-14.96], HCV sub-genotype 1b (OR = 2.81, 95%CI: 1.45-5.44), platelet count (OR = 1.10, per x 10 4 cells/μL, 95%CI: 1.05-1.16), and IL28B CC genotype (OR = 3.54, 95%CI: 1.19-10.53). Platelet counts > 135 x 10 3 /μL were the strongest predictor of SVR by classification and regression tree. Relapse occurred in 25% (27/104) of patients with an end-of-treatment response and was associated with non-FVR (OR = 4.77, 95%CI: 1.68-13.56), HCV sub-genotype 1a (OR = 5.20; 95%CI: 1.40-18.97), and FIB-4 ≥ 3.25 (OR = 2.77; 95%CI: 1.07-7.22). The SVR rate was 42% with telaprevir- or boceprevir-based triple therapy in real-world practice. Low platelets and advanced fibrosis were associated with treatment failure and relapse.

  10. Time Series of Images to Improve Tree Species Classification

    NASA Astrophysics Data System (ADS)

    Miyoshi, G. T.; Imai, N. N.; de Moraes, M. V. A.; Tommaselli, A. M. G.; Näsi, R.

    2017-10-01

    Tree species classification provides valuable information to forest monitoring and management. The high floristic variation of the tree species appears as a challenging issue in the tree species classification because the vegetation characteristics changes according to the season. To help to monitor this complex environment, the imaging spectroscopy has been largely applied since the development of miniaturized sensors attached to Unmanned Aerial Vehicles (UAV). Considering the seasonal changes in forests and the higher spectral and spatial resolution acquired with sensors attached to UAV, we present the use of time series of images to classify four tree species. The study area is an Atlantic Forest area located in the western part of São Paulo State. Images were acquired in August 2015 and August 2016, generating three data sets of images: only with the image spectra of 2015; only with the image spectra of 2016; with the layer stacking of images from 2015 and 2016. Four tree species were classified using Spectral angle mapper (SAM), Spectral information divergence (SID) and Random Forest (RF). The results showed that SAM and SID caused an overfitting of the data whereas RF showed better results and the use of the layer stacking improved the classification achieving a kappa coefficient of 18.26 %.

  11. Using CART to segment road images

    NASA Astrophysics Data System (ADS)

    Davies, Bob; Lienhart, Rainer

    2006-01-01

    The 2005 DARPA Grand Challenge is a 132 mile race through the desert with autonomous robotic vehicles. Lasers mounted on the car roof provide a map of the road up to 20 meters ahead of the car but the car needs to see further in order to go fast enough to win the race. Computer vision can extend that map of the road ahead but desert road is notoriously similar to the surrounding desert. The CART algorithm (Classification and Regression Trees) provided a machine learning boost to find road while at the same time measuring when that road could not be distinguished from surrounding desert.

  12. Impact of atmospheric correction and image filtering on hyperspectral classification of tree species using support vector machine

    NASA Astrophysics Data System (ADS)

    Shahriari Nia, Morteza; Wang, Daisy Zhe; Bohlman, Stephanie Ann; Gader, Paul; Graves, Sarah J.; Petrovic, Milenko

    2015-01-01

    Hyperspectral images can be used to identify savannah tree species at the landscape scale, which is a key step in measuring biomass and carbon, and tracking changes in species distributions, including invasive species, in these ecosystems. Before automated species mapping can be performed, image processing and atmospheric correction is often performed, which can potentially affect the performance of classification algorithms. We determine how three processing and correction techniques (atmospheric correction, Gaussian filters, and shade/green vegetation filters) affect the prediction accuracy of classification of tree species at pixel level from airborne visible/infrared imaging spectrometer imagery of longleaf pine savanna in Central Florida, United States. Species classification using fast line-of-sight atmospheric analysis of spectral hypercubes (FLAASH) atmospheric correction outperformed ATCOR in the majority of cases. Green vegetation (normalized difference vegetation index) and shade (near-infrared) filters did not increase classification accuracy when applied to large and continuous patches of specific species. Finally, applying a Gaussian filter reduces interband noise and increases species classification accuracy. Using the optimal preprocessing steps, our classification accuracy of six species classes is about 75%.

  13. Classification and Progression Based on CFS-GA and C5.0 Boost Decision Tree of TCM Zheng in Chronic Hepatitis B.

    PubMed

    Chen, Xiao Yu; Ma, Li Zhuang; Chu, Na; Zhou, Min; Hu, Yiyang

    2013-01-01

    Chronic hepatitis B (CHB) is a serious public health problem, and Traditional Chinese Medicine (TCM) plays an important role in the control and treatment for CHB. In the treatment of TCM, zheng discrimination is the most important step. In this paper, an approach based on CFS-GA (Correlation based Feature Selection and Genetic Algorithm) and C5.0 boost decision tree is used for zheng classification and progression in the TCM treatment of CHB. The CFS-GA performs better than the typical method of CFS. By CFS-GA, the acquired attribute subset is classified by C5.0 boost decision tree for TCM zheng classification of CHB, and C5.0 decision tree outperforms two typical decision trees of NBTree and REPTree on CFS-GA, CFS, and nonselection in comparison. Based on the critical indicators from C5.0 decision tree, important lab indicators in zheng progression are obtained by the method of stepwise discriminant analysis for expressing TCM zhengs in CHB, and alterations of the important indicators are also analyzed in zheng progression. In conclusion, all the three decision trees perform better on CFS-GA than on CFS and nonselection, and C5.0 decision tree outperforms the two typical decision trees both on attribute selection and nonselection.

  14. Learning classification trees

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1991-01-01

    Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.

  15. A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem

    PubMed Central

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  16. a Rough Set Decision Tree Based Mlp-Cnn for Very High Resolution Remotely Sensed Image Classification

    NASA Astrophysics Data System (ADS)

    Zhang, C.; Pan, X.; Zhang, S. Q.; Li, H. P.; Atkinson, P. M.

    2017-09-01

    Recent advances in remote sensing have witnessed a great amount of very high resolution (VHR) images acquired at sub-metre spatial resolution. These VHR remotely sensed data has post enormous challenges in processing, analysing and classifying them effectively due to the high spatial complexity and heterogeneity. Although many computer-aid classification methods that based on machine learning approaches have been developed over the past decades, most of them are developed toward pixel level spectral differentiation, e.g. Multi-Layer Perceptron (MLP), which are unable to exploit abundant spatial details within VHR images. This paper introduced a rough set model as a general framework to objectively characterize the uncertainty in CNN classification results, and further partition them into correctness and incorrectness on the map. The correct classification regions of CNN were trusted and maintained, whereas the misclassification areas were reclassified using a decision tree with both CNN and MLP. The effectiveness of the proposed rough set decision tree based MLP-CNN was tested using an urban area at Bournemouth, United Kingdom. The MLP-CNN, well capturing the complementarity between CNN and MLP through the rough set based decision tree, achieved the best classification performance both visually and numerically. Therefore, this research paves the way to achieve fully automatic and effective VHR image classification.

  17. Proposal of a Clinical Decision Tree Algorithm Using Factors Associated with Severe Dengue Infection.

    PubMed

    Tamibmaniam, Jayashamani; Hussin, Narwani; Cheah, Wee Kooi; Ng, Kee Sing; Muninathan, Prema

    2016-01-01

    WHO's new classification in 2009: dengue with or without warning signs and severe dengue, has necessitated large numbers of admissions to hospitals of dengue patients which in turn has been imposing a huge economical and physical burden on many hospitals around the globe, particularly South East Asia and Malaysia where the disease has seen a rapid surge in numbers in recent years. Lack of a simple tool to differentiate mild from life threatening infection has led to unnecessary hospitalization of dengue patients. We conducted a single-centre, retrospective study involving serologically confirmed dengue fever patients, admitted in a single ward, in Hospital Kuala Lumpur, Malaysia. Data was collected for 4 months from February to May 2014. Socio demography, co-morbidity, days of illness before admission, symptoms, warning signs, vital signs and laboratory result were all recorded. Descriptive statistics was tabulated and simple and multiple logistic regression analysis was done to determine significant risk factors associated with severe dengue. 657 patients with confirmed dengue were analysed, of which 59 (9.0%) had severe dengue. Overall, the commonest warning sign were vomiting (36.1%) and abdominal pain (32.1%). Previous co-morbid, vomiting, diarrhoea, pleural effusion, low systolic blood pressure, high haematocrit, low albumin and high urea were found as significant risk factors for severe dengue using simple logistic regression. However the significant risk factors for severe dengue with multiple logistic regressions were only vomiting, pleural effusion, and low systolic blood pressure. Using those 3 risk factors, we plotted an algorithm for predicting severe dengue. When compared to the classification of severe dengue based on the WHO criteria, the decision tree algorithm had a sensitivity of 0.81, specificity of 0.54, positive predictive value of 0.16 and negative predictive of 0.96. The decision tree algorithm proposed in this study showed high sensitivity and NPV in predicting patients with severe dengue that may warrant admission. This tool upon further validation study can be used to help clinicians decide on further managing a patient upon first encounter. It also will have a substantial impact on health resources as low risk patients can be managed as outpatients hence reserving the scarce hospital beds and medical resources for other patients in need.

  18. Quantifying tree mortality in a mixed species woodland using multitemporal high spatial resolution satellite imagery

    USGS Publications Warehouse

    Garrity, Steven R.; Allen, Craig D.; Brumby, Steven P.; Gangodagamage, Chandana; McDowell, Nate G.; Cai, D. Michael

    2013-01-01

    Widespread tree mortality events have recently been observed in several biomes. To effectively quantify the severity and extent of these events, tools that allow for rapid assessment at the landscape scale are required. Past studies using high spatial resolution satellite imagery have primarily focused on detecting green, red, and gray tree canopies during and shortly after tree damage or mortality has occurred. However, detecting trees in various stages of death is not always possible due to limited availability of archived satellite imagery. Here we assess the capability of high spatial resolution satellite imagery for tree mortality detection in a southwestern U.S. mixed species woodland using archived satellite images acquired prior to mortality and well after dead trees had dropped their leaves. We developed a multistep classification approach that uses: supervised masking of non-tree image elements; bi-temporal (pre- and post-mortality) differencing of normalized difference vegetation index (NDVI) and red:green ratio (RGI); and unsupervised multivariate clustering of pixels into live and dead tree classes using a Gaussian mixture model. Classification accuracies were improved in a final step by tuning the rules of pixel classification using the posterior probabilities of class membership obtained from the Gaussian mixture model. Classifications were produced for two images acquired post-mortality with overall accuracies of 97.9% and 98.5%, respectively. Classified images were combined with land cover data to characterize the spatiotemporal characteristics of tree mortality across areas with differences in tree species composition. We found that 38% of tree crown area was lost during the drought period between 2002 and 2006. The majority of tree mortality during this period was concentrated in piñon-juniper (Pinus edulis-Juniperus monosperma) woodlands. An additional 20% of the tree canopy died or was removed between 2006 and 2011, primarily in areas experiencing wildfire and management activity. -Our results demonstrate that unsupervised clustering of bi-temporal NDVI and RGI differences can be used to detect tree mortality resulting from numerous causes and in several forest cover types.

  19. Mapping and characterizing selected canopy tree species at the Angkor World Heritage site in Cambodia using aerial data.

    PubMed

    Singh, Minerva; Evans, Damian; Tan, Boun Suy; Nin, Chan Samean

    2015-01-01

    At present, there is very limited information on the ecology, distribution, and structure of Cambodia's tree species to warrant suitable conservation measures. The aim of this study was to assess various methods of analysis of aerial imagery for characterization of the forest mensuration variables (i.e., tree height and crown width) of selected tree species found in the forested region around the temples of Angkor Thom, Cambodia. Object-based image analysis (OBIA) was used (using multiresolution segmentation) to delineate individual tree crowns from very-high-resolution (VHR) aerial imagery and light detection and ranging (LiDAR) data. Crown width and tree height values that were extracted using multiresolution segmentation showed a high level of congruence with field-measured values of the trees (Spearman's rho 0.782 and 0.589, respectively). Individual tree crowns that were delineated from aerial imagery using multiresolution segmentation had a high level of segmentation accuracy (69.22%), whereas tree crowns delineated using watershed segmentation underestimated the field-measured tree crown widths. Both spectral angle mapper (SAM) and maximum likelihood (ML) classifications were applied to the aerial imagery for mapping of selected tree species. The latter was found to be more suitable for tree species classification. Individual tree species were identified with high accuracy. Inclusion of textural information further improved species identification, albeit marginally. Our findings suggest that VHR aerial imagery, in conjunction with OBIA-based segmentation methods (such as multiresolution segmentation) and supervised classification techniques are useful for tree species mapping and for studies of the forest mensuration variables.

  20. Mapping and Characterizing Selected Canopy Tree Species at the Angkor World Heritage Site in Cambodia Using Aerial Data

    PubMed Central

    Singh, Minerva; Evans, Damian; Tan, Boun Suy; Nin, Chan Samean

    2015-01-01

    At present, there is very limited information on the ecology, distribution, and structure of Cambodia’s tree species to warrant suitable conservation measures. The aim of this study was to assess various methods of analysis of aerial imagery for characterization of the forest mensuration variables (i.e., tree height and crown width) of selected tree species found in the forested region around the temples of Angkor Thom, Cambodia. Object-based image analysis (OBIA) was used (using multiresolution segmentation) to delineate individual tree crowns from very-high-resolution (VHR) aerial imagery and light detection and ranging (LiDAR) data. Crown width and tree height values that were extracted using multiresolution segmentation showed a high level of congruence with field-measured values of the trees (Spearman’s rho 0.782 and 0.589, respectively). Individual tree crowns that were delineated from aerial imagery using multiresolution segmentation had a high level of segmentation accuracy (69.22%), whereas tree crowns delineated using watershed segmentation underestimated the field-measured tree crown widths. Both spectral angle mapper (SAM) and maximum likelihood (ML) classifications were applied to the aerial imagery for mapping of selected tree species. The latter was found to be more suitable for tree species classification. Individual tree species were identified with high accuracy. Inclusion of textural information further improved species identification, albeit marginally. Our findings suggest that VHR aerial imagery, in conjunction with OBIA-based segmentation methods (such as multiresolution segmentation) and supervised classification techniques are useful for tree species mapping and for studies of the forest mensuration variables. PMID:25902148

  1. Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups

    PubMed Central

    2012-01-01

    Background Hospital in-patient falls constitute a prominent problem in terms of costs and consequences. Geriatric institutions are most often affected, and common screening tools cannot predict in-patient falls consistently. Our objectives are to derive comprehensible fall risk classification models from a large data set of geriatric in-patients' assessment data and to evaluate their predictive performance (aim#1), and to identify high-risk subgroups from the data (aim#2). Methods A data set of n = 5,176 single in-patient episodes covering 1.5 years of admissions to a geriatric hospital were extracted from the hospital's data base and matched with fall incident reports (n = 493). A classification tree model was induced using the C4.5 algorithm as well as a logistic regression model, and their predictive performance was evaluated. Furthermore, high-risk subgroups were identified from extracted classification rules with a support of more than 100 instances. Results The classification tree model showed an overall classification accuracy of 66%, with a sensitivity of 55.4%, a specificity of 67.1%, positive and negative predictive values of 15% resp. 93.5%. Five high-risk groups were identified, defined by high age, low Barthel index, cognitive impairment, multi-medication and co-morbidity. Conclusions Our results show that a little more than half of the fallers may be identified correctly by our model, but the positive predictive value is too low to be applicable. Non-fallers, on the other hand, may be sorted out with the model quite well. The high-risk subgroups and the risk factors identified (age, low ADL score, cognitive impairment, institutionalization, polypharmacy and co-morbidity) reflect domain knowledge and may be used to screen certain subgroups of patients with a high risk of falling. Classification models derived from a large data set using data mining methods can compete with current dedicated fall risk screening tools, yet lack diagnostic precision. High-risk subgroups may be identified automatically from existing geriatric assessment data, especially when combined with domain knowledge in a hybrid classification model. Further work is necessary to validate our approach in a controlled prospective setting. PMID:22417403

  2. Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups.

    PubMed

    Marschollek, Michael; Gövercin, Mehmet; Rust, Stefan; Gietzelt, Matthias; Schulze, Mareike; Wolf, Klaus-Hendrik; Steinhagen-Thiessen, Elisabeth

    2012-03-14

    Hospital in-patient falls constitute a prominent problem in terms of costs and consequences. Geriatric institutions are most often affected, and common screening tools cannot predict in-patient falls consistently. Our objectives are to derive comprehensible fall risk classification models from a large data set of geriatric in-patients' assessment data and to evaluate their predictive performance (aim#1), and to identify high-risk subgroups from the data (aim#2). A data set of n = 5,176 single in-patient episodes covering 1.5 years of admissions to a geriatric hospital were extracted from the hospital's data base and matched with fall incident reports (n = 493). A classification tree model was induced using the C4.5 algorithm as well as a logistic regression model, and their predictive performance was evaluated. Furthermore, high-risk subgroups were identified from extracted classification rules with a support of more than 100 instances. The classification tree model showed an overall classification accuracy of 66%, with a sensitivity of 55.4%, a specificity of 67.1%, positive and negative predictive values of 15% resp. 93.5%. Five high-risk groups were identified, defined by high age, low Barthel index, cognitive impairment, multi-medication and co-morbidity. Our results show that a little more than half of the fallers may be identified correctly by our model, but the positive predictive value is too low to be applicable. Non-fallers, on the other hand, may be sorted out with the model quite well. The high-risk subgroups and the risk factors identified (age, low ADL score, cognitive impairment, institutionalization, polypharmacy and co-morbidity) reflect domain knowledge and may be used to screen certain subgroups of patients with a high risk of falling. Classification models derived from a large data set using data mining methods can compete with current dedicated fall risk screening tools, yet lack diagnostic precision. High-risk subgroups may be identified automatically from existing geriatric assessment data, especially when combined with domain knowledge in a hybrid classification model. Further work is necessary to validate our approach in a controlled prospective setting.

  3. Inattention in primary school is not good for your future school achievement—A pattern classification study

    PubMed Central

    Bøe, Tormod; Lundervold, Arvid

    2017-01-01

    Inattention in childhood is associated with academic problems later in life. The contribution of specific aspects of inattentive behaviour is, however, less known. We investigated feature importance of primary school teachers’ reports on nine aspects of inattentive behaviour, gender and age in predicting future academic achievement. Primary school teachers of n = 2491 children (7–9 years) rated nine items reflecting different aspects of inattentive behaviour in 2002. A mean academic achievement score from the previous semester in high school (2012) was available for each youth from an official school register. All scores were at a categorical level. Feature importances were assessed by using multinominal logistic regression, classification and regression trees analysis, and a random forest algorithm. Finally, a comprehensive pattern classification procedure using k-fold cross-validation was implemented. Overall, inattention was rated as more severe in boys, who also obtained lower academic achievement scores in high school than girls. Problems related to sustained attention and distractibility were together with age and gender defined as the most important features to predict future achievement scores. Using these four features as input to a collection of classifiers employing k-fold cross-validation for prediction of academic achievement level, we obtained classification accuracy, precision and recall that were clearly better than chance levels. Primary school teachers’ reports of problems related to sustained attention and distractibility were identified as the two most important features of inattentive behaviour predicting academic achievement in high school. Identification and follow-up procedures of primary school children showing these characteristics should be prioritised to prevent future academic failure. PMID:29182663

  4. Inattention in primary school is not good for your future school achievement-A pattern classification study.

    PubMed

    Lundervold, Astri J; Bøe, Tormod; Lundervold, Arvid

    2017-01-01

    Inattention in childhood is associated with academic problems later in life. The contribution of specific aspects of inattentive behaviour is, however, less known. We investigated feature importance of primary school teachers' reports on nine aspects of inattentive behaviour, gender and age in predicting future academic achievement. Primary school teachers of n = 2491 children (7-9 years) rated nine items reflecting different aspects of inattentive behaviour in 2002. A mean academic achievement score from the previous semester in high school (2012) was available for each youth from an official school register. All scores were at a categorical level. Feature importances were assessed by using multinominal logistic regression, classification and regression trees analysis, and a random forest algorithm. Finally, a comprehensive pattern classification procedure using k-fold cross-validation was implemented. Overall, inattention was rated as more severe in boys, who also obtained lower academic achievement scores in high school than girls. Problems related to sustained attention and distractibility were together with age and gender defined as the most important features to predict future achievement scores. Using these four features as input to a collection of classifiers employing k-fold cross-validation for prediction of academic achievement level, we obtained classification accuracy, precision and recall that were clearly better than chance levels. Primary school teachers' reports of problems related to sustained attention and distractibility were identified as the two most important features of inattentive behaviour predicting academic achievement in high school. Identification and follow-up procedures of primary school children showing these characteristics should be prioritised to prevent future academic failure.

  5. Audio stream classification for multimedia database search

    NASA Astrophysics Data System (ADS)

    Artese, M.; Bianco, S.; Gagliardi, I.; Gasparini, F.

    2013-03-01

    Search and retrieval of huge archives of Multimedia data is a challenging task. A classification step is often used to reduce the number of entries on which to perform the subsequent search. In particular, when new entries of the database are continuously added, a fast classification based on simple threshold evaluation is desirable. In this work we present a CART-based (Classification And Regression Tree [1]) classification framework for audio streams belonging to multimedia databases. The database considered is the Archive of Ethnography and Social History (AESS) [2], which is mainly composed of popular songs and other audio records describing the popular traditions handed down generation by generation, such as traditional fairs, and customs. The peculiarities of this database are that it is continuously updated; the audio recordings are acquired in unconstrained environment; and for the non-expert human user is difficult to create the ground truth labels. In our experiments, half of all the available audio files have been randomly extracted and used as training set. The remaining ones have been used as test set. The classifier has been trained to distinguish among three different classes: speech, music, and song. All the audio files in the dataset have been previously manually labeled into the three classes above defined by domain experts.

  6. Diagnostic classification scheme in Iranian breast cancer patients using a decision tree.

    PubMed

    Malehi, Amal Saki

    2014-01-01

    The objective of this study was to determine a diagnostic classification scheme using a decision tree based model. The study was conducted as a retrospective case-control study in Imam Khomeini hospital in Tehran during 2001 to 2009. Data, including demographic and clinical-pathological characteristics, were uniformly collected from 624 females, 312 of them were referred with positive diagnosis of breast cancer (cases) and 312 healthy women (controls). The decision tree was implemented to develop a diagnostic classification scheme using CART 6.0 Software. The AUC (area under curve), was measured as the overall performance of diagnostic classification of the decision tree. Five variables as main risk factors of breast cancer and six subgroups as high risk were identified. The results indicated that increasing age, low age at menarche, single and divorced statues, irregular menarche pattern and family history of breast cancer are the important diagnostic factors in Iranian breast cancer patients. The sensitivity and specificity of the analysis were 66% and 86.9% respectively. The high AUC (0.82) also showed an excellent classification and diagnostic performance of the model. Decision tree based model appears to be suitable for identifying risk factors and high or low risk subgroups. It can also assists clinicians in making a decision, since it can identify underlying prognostic relationships and understanding the model is very explicit.

  7. A novel transferable individual tree crown delineation model based on Fishing Net Dragging and boundary classification

    NASA Astrophysics Data System (ADS)

    Liu, Tao; Im, Jungho; Quackenbush, Lindi J.

    2015-12-01

    This study provides a novel approach to individual tree crown delineation (ITCD) using airborne Light Detection and Ranging (LiDAR) data in dense natural forests using two main steps: crown boundary refinement based on a proposed Fishing Net Dragging (FiND) method, and segment merging based on boundary classification. FiND starts with approximate tree crown boundaries derived using a traditional watershed method with Gaussian filtering and refines these boundaries using an algorithm that mimics how a fisherman drags a fishing net. Random forest machine learning is then used to classify boundary segments into two classes: boundaries between trees and boundaries between branches that belong to a single tree. Three groups of LiDAR-derived features-two from the pseudo waveform generated along with crown boundaries and one from a canopy height model (CHM)-were used in the classification. The proposed ITCD approach was tested using LiDAR data collected over a mountainous region in the Adirondack Park, NY, USA. Overall accuracy of boundary classification was 82.4%. Features derived from the CHM were generally more important in the classification than the features extracted from the pseudo waveform. A comprehensive accuracy assessment scheme for ITCD was also introduced by considering both area of crown overlap and crown centroids. Accuracy assessment using this new scheme shows the proposed ITCD achieved 74% and 78% as overall accuracy, respectively, for deciduous and mixed forest.

  8. Case-mix groups for VA hospital-based home care.

    PubMed

    Smith, M E; Baker, C R; Branch, L G; Walls, R C; Grimes, R M; Karklins, J M; Kashner, M; Burrage, R; Parks, A; Rogers, P

    1992-01-01

    The purpose of this study is to group hospital-based home care (HBHC) patients homogeneously by their characteristics with respect to cost of care to develop alternative case mix methods for management and reimbursement (allocation) purposes. Six Veterans Affairs (VA) HBHC programs in Fiscal Year (FY) 1986 that maximized patient, program, and regional variation were selected, all of which agreed to participate. All HBHC patients active in each program on October 1, 1987, in addition to all new admissions through September 30, 1988 (FY88), comprised the sample of 874 unique patients. Statistical methods include the use of classification and regression trees (CART software: Statistical Software; Lafayette, CA), analysis of variance, and multiple linear regression techniques. The resulting algorithm is a three-factor model that explains 20% of the cost variance (R2 = 20%, with a cross validation R2 of 12%). Similar classifications such as the RUG-II, which is utilized for VA nursing home and intermediate care, the VA outpatient resource allocation model, and the RUG-HHC, utilized in some states for reimbursing home health care in the private sector, explained less of the cost variance and, therefore, are less adequate for VA home care resource allocation.

  9. Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

    PubMed

    Ishwaran, Hemant; Lu, Min

    2018-06-04

    Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings. Copyright © 2018 John Wiley & Sons, Ltd.

  10. An information-based network approach for protein classification

    PubMed Central

    Wan, Xiaogeng; Zhao, Xin; Yau, Stephen S. T.

    2017-01-01

    Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method. PMID:28350835

  11. Tree species classification in subtropical forests using small-footprint full-waveform LiDAR data

    NASA Astrophysics Data System (ADS)

    Cao, Lin; Coops, Nicholas C.; Innes, John L.; Dai, Jinsong; Ruan, Honghua; She, Guanghui

    2016-07-01

    The accurate classification of tree species is critical for the management of forest ecosystems, particularly subtropical forests, which are highly diverse and complex ecosystems. While airborne Light Detection and Ranging (LiDAR) technology offers significant potential to estimate forest structural attributes, the capacity of this new tool to classify species is less well known. In this research, full-waveform metrics were extracted by a voxel-based composite waveform approach and examined with a Random Forests classifier to discriminate six subtropical tree species (i.e., Masson pine (Pinus massoniana Lamb.)), Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.), Slash pines (Pinus elliottii Engelm.), Sawtooth oak (Quercus acutissima Carruth.) and Chinese holly (Ilex chinensis Sims.) at three levels of discrimination. As part of the analysis, the optimal voxel size for modelling the composite waveforms was investigated, the most important predictor metrics for species classification assessed and the effect of scan angle on species discrimination examined. Results demonstrate that all tree species were classified with relatively high accuracy (68.6% for six classes, 75.8% for four main species and 86.2% for conifers and broadleaved trees). Full-waveform metrics (based on height of median energy, waveform distance and number of waveform peaks) demonstrated high classification importance and were stable among various voxel sizes. The results also suggest that the voxel based approach can alleviate some of the issues associated with large scan angles. In summary, the results indicate that full-waveform LIDAR data have significant potential for tree species classification in the subtropical forests.

  12. An approach for combining airborne LiDAR and high-resolution aerial color imagery using Gaussian processes

    NASA Astrophysics Data System (ADS)

    Liu, Yansong; Monteiro, Sildomar T.; Saber, Eli

    2015-10-01

    Changes in vegetation cover, building construction, road network and traffic conditions caused by urban expansion affect the human habitat as well as the natural environment in rapidly developing cities. It is crucial to assess these changes and respond accordingly by identifying man-made and natural structures with accurate classification algorithms. With the increase in use of multi-sensor remote sensing systems, researchers are able to obtain a more complete description of the scene of interest. By utilizing multi-sensor data, the accuracy of classification algorithms can be improved. In this paper, we propose a method for combining 3D LiDAR point clouds and high-resolution color images to classify urban areas using Gaussian processes (GP). GP classification is a powerful non-parametric classification method that yields probabilistic classification results. It makes predictions in a way that addresses the uncertainty of real world. In this paper, we attempt to identify man-made and natural objects in urban areas including buildings, roads, trees, grass, water and vehicles. LiDAR features are derived from the 3D point clouds and the spatial and color features are extracted from RGB images. For classification, we use the Laplacian approximation for GP binary classification on the new combined feature space. The multiclass classification has been implemented by using one-vs-all binary classification strategy. The result of applying support vector machines (SVMs) and logistic regression (LR) classifier is also provided for comparison. Our experiments show a clear improvement of classification results by using the two sensors combined instead of each sensor separately. Also we found the advantage of applying GP approach to handle the uncertainty in classification result without compromising accuracy compared to SVM, which is considered as the state-of-the-art classification method.

  13. Decision-Tree, Rule-Based, and Random Forest Classification of High-Resolution Multispectral Imagery for Wetland Mapping and Inventory

    EPA Science Inventory

    Efforts are increasingly being made to classify the world’s wetland resources, an important ecosystem and habitat that is diminishing in abundance. There are multiple remote sensing classification methods, including a suite of nonparametric classifiers such as decision-tree...

  14. Mapping Urban Tree Canopy Cover Using Fused Airborne LIDAR and Satellite Imagery Data

    NASA Astrophysics Data System (ADS)

    Parmehr, Ebadat G.; Amati, Marco; Fraser, Clive S.

    2016-06-01

    Urban green spaces, particularly urban trees, play a key role in enhancing the liveability of cities. The availability of accurate and up-to-date maps of tree canopy cover is important for sustainable development of urban green spaces. LiDAR point clouds are widely used for the mapping of buildings and trees, and several LiDAR point cloud classification techniques have been proposed for automatic mapping. However, the effectiveness of point cloud classification techniques for automated tree extraction from LiDAR data can be impacted to the point of failure by the complexity of tree canopy shapes in urban areas. Multispectral imagery, which provides complementary information to LiDAR data, can improve point cloud classification quality. This paper proposes a reliable method for the extraction of tree canopy cover from fused LiDAR point cloud and multispectral satellite imagery data. The proposed method initially associates each LiDAR point with spectral information from the co-registered satellite imagery data. It calculates the normalised difference vegetation index (NDVI) value for each LiDAR point and corrects tree points which have been misclassified as buildings. Then, region growing of tree points, taking the NDVI value into account, is applied. Finally, the LiDAR points classified as tree points are utilised to generate a canopy cover map. The performance of the proposed tree canopy cover mapping method is experimentally evaluated on a data set of airborne LiDAR and WorldView 2 imagery covering a suburb in Melbourne, Australia.

  15. Applying data mining techniques to explore factors contributing to occupational injuries in Taiwan's construction industry.

    PubMed

    Cheng, Ching-Wu; Leu, Sou-Sen; Cheng, Ying-Mei; Wu, Tsung-Chih; Lin, Chen-Chung

    2012-09-01

    Construction accident research involves the systematic sorting, classification, and encoding of comprehensive databases of injuries and fatalities. The present study explores the causes and distribution of occupational accidents in the Taiwan construction industry by analyzing such a database using the data mining method known as classification and regression tree (CART). Utilizing a database of 1542 accident cases during the period 2000-2009, the study seeks to establish potential cause-and-effect relationships regarding serious occupational accidents in the industry. The results of this study show that the occurrence rules for falls and collapses in both public and private project construction industries serve as key factors to predict the occurrence of occupational injuries. The results of the study provide a framework for improving the safety practices and training programs that are essential to protecting construction workers from occasional or unexpected accidents. Copyright © 2011 Elsevier Ltd. All rights reserved.

  16. Comprehensive Chemical Fingerprinting of High-Quality Cocoa at Early Stages of Processing: Effectiveness of Combined Untargeted and Targeted Approaches for Classification and Discrimination.

    PubMed

    Magagna, Federico; Guglielmetti, Alessandro; Liberto, Erica; Reichenbach, Stephen E; Allegrucci, Elena; Gobino, Guido; Bicchi, Carlo; Cordero, Chiara

    2017-08-02

    This study investigates chemical information of volatile fractions of high-quality cocoa (Theobroma cacao L. Malvaceae) from different origins (Mexico, Ecuador, Venezuela, Columbia, Java, Trinidad, and Sao Tomè) produced for fine chocolate. This study explores the evolution of the entire pattern of volatiles in relation to cocoa processing (raw, roasted, steamed, and ground beans). Advanced chemical fingerprinting (e.g., combined untargeted and targeted fingerprinting) with comprehensive two-dimensional gas chromatography coupled with mass spectrometry allows advanced pattern recognition for classification, discrimination, and sensory-quality characterization. The entire data set is analyzed for 595 reliable two-dimensional peak regions, including 130 known analytes and 13 potent odorants. Multivariate analysis with unsupervised exploration (principal component analysis) and simple supervised discrimination methods (Fisher ratios and linear regression trees) reveal informative patterns of similarities and differences and identify characteristic compounds related to sample origin and manufacturing step.

  17. A preliminary case-mix classification system for Medicare home health clients.

    PubMed

    Branch, L G; Goldberg, H B

    1993-04-01

    In this study, a hierarchical case-mix model was developed for grouping Medicare home health beneficiaries homogeneously, based on the allowed charges for their home care. Based on information from a two-page form from 2,830 clients from ten states and using the classification and regression trees method, a four-component model was developed that yielded 11 case-mix groups and explained 22% of the variance for the test sample of 1,929 clients. The four components are rehabilitation, special care, skilled-nurse monitoring, and paralysis; each are categorized as present or absent. The range of mean-allowed charges for the 11 groups in the total sample was $473 to $2,562 with a mean of $847. Of the six groups with mean charges above $1,000, none exceeded 5.2% of clients; thus, the high-cost groups are relatively rare.

  18. Non-English speakers attend gastroenterology clinic appointments at higher rates than English speakers in a vulnerable patient population

    PubMed Central

    Sewell, Justin L.; Kushel, Margot B.; Inadomi, John M.; Yee, Hal F.

    2009-01-01

    Goals We sought to identify factors associated with gastroenterology clinic attendance in an urban safety net healthcare system. Background Missed clinic appointments reduce the efficiency and availability of healthcare, but subspecialty clinic attendance among patients with established healthcare access has not been studied. Study We performed an observational study using secondary data from administrative sources to study patients referred to, and scheduled for an appointment in, the adult gastroenterology clinic serving the safety net healthcare system of San Francisco, California. Our dependent variable was whether subjects attended or missed a scheduled appointment. Analysis included multivariable logistic regression and classification tree analysis. 1,833 patients were referred and scheduled for an appointment between 05/2005 and 08/2006. Prisoners were excluded. All patients had a primary care provider. Results 683 patients (37.3%) missed their appointment; 1,150 (62.7%) attended. Language was highly associated with attendance in the logistic regression; non-English speakers were less likely than English speakers to miss an appointment (adjusted odds ratio 0.42 [0.28,0.63] for Spanish, 0.56 [0.38,0.82] for Asian language, p < 0.001). Other factors were also associated with attendance, but classification tree analysis identified language to be the most highly associated variable. Conclusions In an urban safety net healthcare population, among patients with established healthcare access and a scheduled gastroenterology clinic appointment, not speaking English was most strongly associated with higher attendance rates. Patient related factors associated with not speaking English likely influence subspecialty clinic attendance rates, and these factors may differ from those affecting general healthcare access. PMID:19169147

  19. Predicting Redox Conditions in Groundwater Using Statistical Techniques: Implications for Nitrate Transport in Groundwater and Streams

    NASA Astrophysics Data System (ADS)

    Tesoriero, A. J.; Terziotti, S.

    2014-12-01

    Nitrate trends in streams often do not match expectations based on recent nitrogen source loadings to the land surface. Groundwater discharge with long travel times has been suggested as the likely cause for these observations. The fate of nitrate in groundwater depends to a large extent on the occurrence of denitrification along flow paths. Because denitrification in groundwater is inhibited when dissolved oxygen (DO) concentrations are high, defining the oxic-suboxic interface has been critical in determining pathways for nitrate transport in groundwater and to streams at the local scale. Predicting redox conditions on a regional scale is complicated by the spatial variability of reaction rates. In this study, logistic regression and boosted classification tree analysis were used to predict the probability of oxic water in groundwater in the Chesapeake Bay watershed. The probability of oxic water (DO > 2 mg/L) was predicted by relating DO concentrations in over 3,000 groundwater samples to indicators of residence time and/or electron donor availability. Variables that describe position in the flow system (e.g., depth to top of the open interval), soil drainage and surficial geology were the most important predictors of oxic water. Logistic regression and boosted classification tree analysis correctly predicted the presence or absence of oxic conditions in over 75 % of the samples in both training and validation data sets. Predictions of the percentages of oxic wells in deciles of risk were very accurate (r2>0.9) in both the training and validation data sets. Depth to the bottom of the oxic layer was predicted and is being used to estimate the effect that groundwater denitrification has on stream nitrate concentrations and the time lag between the application of nitrogen at the land surface and its effect on streams.

  20. Mining Health App Data to Find More and Less Successful Weight Loss Subgroups

    PubMed Central

    2016-01-01

    Background More than half of all smartphone app downloads involve weight, diet, and exercise. If successful, these lifestyle apps may have far-reaching effects for disease prevention and health cost-savings, but few researchers have analyzed data from these apps. Objective The purposes of this study were to analyze data from a commercial health app (Lose It!) in order to identify successful weight loss subgroups via exploratory analyses and to verify the stability of the results. Methods Cross-sectional, de-identified data from Lose It! were analyzed. This dataset (n=12,427,196) was randomly split into 24 subsamples, and this study used 3 subsamples (combined n=972,687). Classification and regression tree methods were used to explore groupings of weight loss with one subsample, with descriptive analyses to examine other group characteristics. Data mining validation methods were conducted with 2 additional subsamples. Results In subsample 1, 14.96% of users lost 5% or more of their starting body weight. Classification and regression tree analysis identified 3 distinct subgroups: “the occasional users” had the lowest proportion (4.87%) of individuals who successfully lost weight; “the basic users” had 37.61% weight loss success; and “the power users” achieved the highest percentage of weight loss success at 72.70%. Behavioral factors delineated the subgroups, though app-related behavioral characteristics further distinguished them. Results were replicated in further analyses with separate subsamples. Conclusions This study demonstrates that distinct subgroups can be identified in “messy” commercial app data and the identified subgroups can be replicated in independent samples. Behavioral factors and use of custom app features characterized the subgroups. Targeting and tailoring information to particular subgroups could enhance weight loss success. Future studies should replicate data mining analyses to increase methodology rigor. PMID:27301853

  1. Classification of Urban Aerial Data Based on Pixel Labelling with Deep Convolutional Neural Networks and Logistic Regression

    NASA Astrophysics Data System (ADS)

    Yao, W.; Poleswki, P.; Krzystek, P.

    2016-06-01

    The recent success of deep convolutional neural networks (CNN) on a large number of applications can be attributed to large amounts of available training data and increasing computing power. In this paper, a semantic pixel labelling scheme for urban areas using multi-resolution CNN and hand-crafted spatial-spectral features of airborne remotely sensed data is presented. Both CNN and hand-crafted features are applied to image/DSM patches to produce per-pixel class probabilities with a L1-norm regularized logistical regression classifier. The evidence theory infers a degree of belief for pixel labelling from different sources to smooth regions by handling the conflicts present in the both classifiers while reducing the uncertainty. The aerial data used in this study were provided by ISPRS as benchmark datasets for 2D semantic labelling tasks in urban areas, which consists of two data sources from LiDAR and color infrared camera. The test sites are parts of a city in Germany which is assumed to consist of typical object classes including impervious surfaces, trees, buildings, low vegetation, vehicles and clutter. The evaluation is based on the computation of pixel-based confusion matrices by random sampling. The performance of the strategy with respect to scene characteristics and method combination strategies is analyzed and discussed. The competitive classification accuracy could be not only explained by the nature of input data sources: e.g. the above-ground height of nDSM highlight the vertical dimension of houses, trees even cars and the nearinfrared spectrum indicates vegetation, but also attributed to decision-level fusion of CNN's texture-based approach with multichannel spatial-spectral hand-crafted features based on the evidence combination theory.

  2. Phylogenetic classification and the universal tree.

    PubMed

    Doolittle, W F

    1999-06-25

    From comparative analyses of the nucleotide sequences of genes encoding ribosomal RNAs and several proteins, molecular phylogeneticists have constructed a "universal tree of life," taking it as the basis for a "natural" hierarchical classification of all living things. Although confidence in some of the tree's early branches has recently been shaken, new approaches could still resolve many methodological uncertainties. More challenging is evidence that most archaeal and bacterial genomes (and the inferred ancestral eukaryotic nuclear genome) contain genes from multiple sources. If "chimerism" or "lateral gene transfer" cannot be dismissed as trivial in extent or limited to special categories of genes, then no hierarchical universal classification can be taken as natural. Molecular phylogeneticists will have failed to find the "true tree," not because their methods are inadequate or because they have chosen the wrong genes, but because the history of life cannot properly be represented as a tree. However, taxonomies based on molecular sequences will remain indispensable, and understanding of the evolutionary process will ultimately be enriched, not impoverished.

  3. Comparing ecoregional classifications for natural areas management in the Klamath Region, USA

    USGS Publications Warehouse

    Sarr, Daniel A.; Duff, Andrew; Dinger, Eric C.; Shafer, Sarah L.; Wing, Michael; Seavy, Nathaniel E.; Alexander, John D.

    2015-01-01

    We compared three existing ecoregional classification schemes (Bailey, Omernik, and World Wildlife Fund) with two derived schemes (Omernik Revised and Climate Zones) to explore their effectiveness in explaining species distributions and to better understand natural resource geography in the Klamath Region, USA. We analyzed presence/absence data derived from digital distribution maps for trees, amphibians, large mammals, small mammals, migrant birds, and resident birds using three statistical analyses of classification accuracy (Analysis of Similarity, Canonical Analysis of Principal Coordinates, and Classification Strength). The classifications were roughly comparable in classification accuracy, with Omernik Revised showing the best overall performance. Trees showed the strongest fidelity to the classifications, and large mammals showed the weakest fidelity. We discuss the implications for regional biogeography and describe how intermediate resolution ecoregional classifications may be appropriate for use as natural areas management domains.

  4. Classification and Compression of Multi-Resolution Vectors: A Tree Structured Vector Quantizer Approach

    DTIC Science & Technology

    2002-01-01

    their expression profile and for classification of cells into tumerous and non- tumerous classes. Then we will present a parallel tree method for... cancerous cells. We will use the same dataset and use tree structured classifiers with multi-resolution analysis for classifying cancerous from non- cancerous ...cells. We have the expressions of 4096 genes from 98 different cell types. Of these 98, 72 are cancerous while 26 are non- cancerous . We are interested

  5. A research of selected textural features for detection of asbestos-cement roofing sheets using orthoimages

    NASA Astrophysics Data System (ADS)

    Książek, Judyta

    2015-10-01

    At present, there has been a great interest in the development of texture based image classification methods in many different areas. This study presents the results of research carried out to assess the usefulness of selected textural features for detection of asbestos-cement roofs in orthophotomap classification. Two different orthophotomaps of southern Poland (with ground resolution: 5 cm and 25 cm) were used. On both orthoimages representative samples for two classes: asbestos-cement roofing sheets and other roofing materials were selected. Estimation of texture analysis usefulness was conducted using machine learning methods based on decision trees (C5.0 algorithm). For this purpose, various sets of texture parameters were calculated in MaZda software. During the calculation of decision trees different numbers of texture parameters groups were considered. In order to obtain the best settings for decision trees models cross-validation was performed. Decision trees models with the lowest mean classification error were selected. The accuracy of the classification was held based on validation data sets, which were not used for the classification learning. For 5 cm ground resolution samples, the lowest mean classification error was 15.6%. The lowest mean classification error in the case of 25 cm ground resolution was 20.0%. The obtained results confirm potential usefulness of the texture parameter image processing for detection of asbestos-cement roofing sheets. In order to improve the accuracy another extended study should be considered in which additional textural features as well as spectral characteristics should be analyzed.

  6. Forest tree species clssification based on airborne hyper-spectral imagery

    NASA Astrophysics Data System (ADS)

    Dian, Yuanyong; Li, Zengyuan; Pang, Yong

    2013-10-01

    Forest precision classification products were the basic data for surveying of forest resource, updating forest subplot information, logging and design of forest. However, due to the diversity of stand structure, complexity of the forest growth environment, it's difficult to discriminate forest tree species using multi-spectral image. The airborne hyperspectral images can achieve the high spatial and spectral resolution imagery of forest canopy, so it will good for tree species level classification. The aim of this paper was to test the effective of combining spatial and spectral features in airborne hyper-spectral image classification. The CASI hyper spectral image data were acquired from Liangshui natural reserves area. Firstly, we use the MNF (minimum noise fraction) transform method for to reduce the hyperspectral image dimensionality and highlighting variation. And secondly, we use the grey level co-occurrence matrix (GLCM) to extract the texture features of forest tree canopy from the hyper-spectral image, and thirdly we fused the texture and the spectral features of forest canopy to classify the trees species using support vector machine (SVM) with different kernel functions. The results showed that when using the SVM classifier, MNF and texture-based features combined with linear kernel function can achieve the best overall accuracy which was 85.92%. It was also confirm that combine the spatial and spectral information can improve the accuracy of tree species classification.

  7. Method of Grassland Information Extraction Based on Multi-Level Segmentation and Cart Model

    NASA Astrophysics Data System (ADS)

    Qiao, Y.; Chen, T.; He, J.; Wen, Q.; Liu, F.; Wang, Z.

    2018-04-01

    It is difficult to extract grassland accurately by traditional classification methods, such as supervised method based on pixels or objects. This paper proposed a new method combing the multi-level segmentation with CART (classification and regression tree) model. The multi-level segmentation which combined the multi-resolution segmentation and the spectral difference segmentation could avoid the over and insufficient segmentation seen in the single segmentation mode. The CART model was established based on the spectral characteristics and texture feature which were excavated from training sample data. Xilinhaote City in Inner Mongolia Autonomous Region was chosen as the typical study area and the proposed method was verified by using visual interpretation results as approximate truth value. Meanwhile, the comparison with the nearest neighbor supervised classification method was obtained. The experimental results showed that the total precision of classification and the Kappa coefficient of the proposed method was 95 % and 0.9, respectively. However, the total precision of classification and the Kappa coefficient of the nearest neighbor supervised classification method was 80 % and 0.56, respectively. The result suggested that the accuracy of classification proposed in this paper was higher than the nearest neighbor supervised classification method. The experiment certificated that the proposed method was an effective extraction method of grassland information, which could enhance the boundary of grassland classification and avoid the restriction of grassland distribution scale. This method was also applicable to the extraction of grassland information in other regions with complicated spatial features, which could avoid the interference of woodland, arable land and water body effectively.

  8. Mapping of taiga forest units using AIRSAR data and/or optical data, and retrieval of forest parameters

    NASA Technical Reports Server (NTRS)

    Rignot, Eric; Williams, Cynthia; Way, Jobea; Viereck, Leslie

    1993-01-01

    A maximum a posteriori Bayesian classifier for multifrequency polarimetric SAR data is used to perform a supervised classification of forest types in the floodplains of Alaska. The image classes include white spruce, balsam poplar, black spruce, alder, non-forests, and open water. The authors investigate the effect on classification accuracy of changing environmental conditions, and of frequency and polarization of the signal. The highest classification accuracy (86 percent correctly classified forest pixels, and 91 percent overall) is obtained combining L- and C-band frequencies fully polarimetric on a date where the forest is just recovering from flooding. The forest map compares favorably with a vegetation map assembled from digitized aerial photos which took five years for completion, and address the state of the forest in 1978, ignoring subsequent fires, changes in the course of the river, clear-cutting of trees, and tree growth. HV-polarization is the most useful polarization at L- and C-band for classification. C-band VV (ERS-1 mode) and L-band HH (J-ERS-1 mode) alone or combined yield unsatisfactory classification accuracies. Additional data acquired in the winter season during thawed and frozen days yield classification accuracies respectively 20 percent and 30 percent lower due to a greater confusion between conifers and deciduous trees. Data acquired at the peak of flooding in May 1991 also yield classification accuracies 10 percent lower because of dominant trunk-ground interactions which mask out finer differences in radar backscatter between tree species. Combination of several of these dates does not improve classification accuracy. For comparison, panchromatic optical data acquired by SPOT in the summer season of 1991 are used to classify the same area. The classification accuracy (78 percent for the forest types and 90 percent if open water is included) is lower than that obtained with AIRSAR although conifers and deciduous trees are better separated due to the presence of leaves on the deciduous trees. Optical data do not separate black spruce and white spruce as well as SAR data, cannot separate alder from balsam poplar, and are of course limited by the frequent cloud cover in the polar regions. Yet, combining SPOT and AIRSAR offers better chances to identify vegetation types independent of ground truth information using a combination of NDVI indexes from SPOT, biomass numbers from AIRSAR, and a segmentation map from either one.

  9. Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

    PubMed Central

    Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V

    2012-01-01

    In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999

  10. A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks.

    PubMed

    Alauthaman, Mohammad; Aslam, Nauman; Zhang, Li; Alasem, Rafe; Hossain, M A

    2018-01-01

    In recent years, Botnets have been adopted as a popular method to carry and spread many malicious codes on the Internet. These malicious codes pave the way to execute many fraudulent activities including spam mail, distributed denial-of-service attacks and click fraud. While many Botnets are set up using centralized communication architecture, the peer-to-peer (P2P) Botnets can adopt a decentralized architecture using an overlay network for exchanging command and control data making their detection even more difficult. This work presents a method of P2P Bot detection based on an adaptive multilayer feed-forward neural network in cooperation with decision trees. A classification and regression tree is applied as a feature selection technique to select relevant features. With these features, a multilayer feed-forward neural network training model is created using a resilient back-propagation learning algorithm. A comparison of feature set selection based on the decision tree, principal component analysis and the ReliefF algorithm indicated that the neural network model with features selection based on decision tree has a better identification accuracy along with lower rates of false positives. The usefulness of the proposed approach is demonstrated by conducting experiments on real network traffic datasets. In these experiments, an average detection rate of 99.08 % with false positive rate of 0.75 % was observed.

  11. Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems

    NASA Astrophysics Data System (ADS)

    Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen

    Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.

  12. A hybrid method for classifying cognitive states from fMRI data.

    PubMed

    Parida, S; Dehuri, S; Cho, S-B; Cacha, L A; Poznanski, R R

    2015-09-01

    Functional magnetic resonance imaging (fMRI) makes it possible to detect brain activities in order to elucidate cognitive-states. The complex nature of fMRI data requires under-standing of the analyses applied to produce possible avenues for developing models of cognitive state classification and improving brain activity prediction. While many models of classification task of fMRI data analysis have been developed, in this paper, we present a novel hybrid technique through combining the best attributes of genetic algorithms (GAs) and ensemble decision tree technique that consistently outperforms all other methods which are being used for cognitive-state classification. Specifically, this paper illustrates the combined effort of decision-trees ensemble and GAs for feature selection through an extensive simulation study and discusses the classification performance with respect to fMRI data. We have shown that our proposed method exhibits significant reduction of the number of features with clear edge classification accuracy over ensemble of decision-trees.

  13. Evaluation of Skylab (EREP) data for forest and rangeland surveys. [Georgia, South Dakota, Colorado, and California

    NASA Technical Reports Server (NTRS)

    Aldrich, R. C. (Principal Investigator); Dana, R. W.; Greentree, W. J.; Roberts, E. H.; Norick, N. X.; Waite, T. H.; Francis, R. E.; Driscoll, R. S.; Weber, F. P.

    1975-01-01

    The author has identified the following significant results. Four widely separated sites (near Augusta, Georgia; Lead, South Dakota; Manitou, Colorado; and Redding, California) were selected as typical sites for forest inventory, forest stress, rangeland inventory, and atmospheric and solar measurements, respectively. Results indicated that Skylab S190B color photography is good for classification of Level 1 forest and nonforest land (90 to 95 percent correct) and could be used as a data base for sampling by small and medium scale photography using regression techniques. The accuracy of Level 2 forest and nonforest classes, however, varied from fair to poor. Results of plant community classification tests indicate that both visual and microdensitometric techniques can separate deciduous, conifirous, and grassland classes to the region level in the Ecoclass hierarchical classification system. There was no consistency in classifying tree categories at the series level by visual photointerpretation. The relationship between ground measurements and large scale photo measurements of foliar cover had a correlation coefficient of greater than 0.75. Some of the relationships, however, were site dependent.

  14. Heart rate time series characteristics for early detection of infections in critically ill patients.

    PubMed

    Tambuyzer, T; Guiza, F; Boonen, E; Meersseman, P; Vervenne, H; Hansen, T K; Bjerre, M; Van den Berghe, G; Berckmans, D; Aerts, J M; Meyfroidt, G

    2017-04-01

    It is difficult to make a distinction between inflammation and infection. Therefore, new strategies are required to allow accurate detection of infection. Here, we hypothesize that we can distinguish infected from non-infected ICU patients based on dynamic features of serum cytokine concentrations and heart rate time series. Serum cytokine profiles and heart rate time series of 39 patients were available for this study. The serum concentration of ten cytokines were measured using blood sampled every 10 min between 2100 and 0600 hours. Heart rate was recorded every minute. Ten metrics were used to extract features from these time series to obtain an accurate classification of infected patients. The predictive power of the metrics derived from the heart rate time series was investigated using decision tree analysis. Finally, logistic regression methods were used to examine whether classification performance improved with inclusion of features derived from the cytokine time series. The AUC of a decision tree based on two heart rate features was 0.88. The model had good calibration with 0.09 Hosmer-Lemeshow p value. There was no significant additional value of adding static cytokine levels or cytokine time series information to the generated decision tree model. The results suggest that heart rate is a better marker for infection than information captured by cytokine time series when the exact stage of infection is not known. The predictive value of (expensive) biomarkers should always be weighed against the routinely monitored data, and such biomarkers have to demonstrate added value.

  15. Using classification tree analysis to predict oak wilt distribution in Minnesota and Texas

    Treesearch

    Marla c. Downing; Vernon L. Thomas; Jennifer Juzwik; David N. Appel; Robin M. Reich; Kim Camilli

    2008-01-01

    We developed a methodology and compared results for predicting the potential distribution of Ceratocystis fagacearum (causal agent of oak wilt), in both Anoka County, MN, and Fort Hood, TX. The Potential Distribution of Oak Wilt (PDOW) utilizes a binary classification tree statistical technique that incorporates: geographical information systems (GIS...

  16. A Quality Classification System for Young Hardwood Trees - The First Step in Predicting Future Products

    Treesearch

    David L. Sonderman; Robert L. Brisbin

    1978-01-01

    Forest managers have no objective way to determine the relative value of culturally treated forest stands in terms of product potential. This paper describes the first step in the development of a quality classification system based on the measurement of individual tree characteristics for young hardwood stands.

  17. An automated approach to the design of decision tree classifiers

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Chin, P.; Beaudet, P.

    1980-01-01

    The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.

  18. Classification of driving workload affected by highway alignment conditions based on classification and regression tree algorithm.

    PubMed

    Hu, Jiangbi; Wang, Ronghua

    2018-02-17

    Guaranteeing a safe and comfortable driving workload can contribute to reducing traffic injuries. In order to provide safe and comfortable threshold values, this study attempted to classify driving workload from the aspects of human factors mainly affected by highway geometric conditions and to determine the thresholds of different workload classifications. This article stated a hypothesis that the values of driver workload change within a certain range. Driving workload scales were stated based on a comprehensive literature review. Through comparative analysis of different psychophysiological measures, heart rate variability (HRV) was chosen as the representative measure for quantifying driving workload by field experiments. Seventy-two participants (36 car drivers and 36 large truck drivers) and 6 highways with different geometric designs were selected to conduct field experiments. A wearable wireless dynamic multiparameter physiological detector (KF-2) was employed to detect physiological data that were simultaneously correlated to the speed changes recorded by a Global Positioning System (GPS) (testing time, driving speeds, running track, and distance). Through performing statistical analyses, including the distribution of HRV during the flat, straight segments and P-P plots of modified HRV, a driving workload calculation model was proposed. Integrating driving workload scales with values, the threshold of each scale of driving workload was determined by classification and regression tree (CART) algorithms. The driving workload calculation model was suitable for driving speeds in the range of 40 to 120 km/h. The experimental data of 72 participants revealed that driving workload had a significant effect on modified HRV, revealing a change in driving speed. When the driving speed was between 100 and 120 km/h, drivers showed an apparent increase in the corresponding modified HRV. The threshold value of the normal driving workload K was between -0.0011 and 0.056 for a car driver and between -0.00086 and 0.067 for a truck driver. Heart rate variability was a direct and effective index for measuring driving workload despite being affected by multiple highway alignment elements. The driving workload model and the thresholds of driving workload classifications can be used to evaluate the quality of highway geometric design. A higher quality of highway geometric design could keep driving workload within a safer and more comfortable range. This study provided insight into reducing traffic injuries from the perspective of disciplinary integration of highway engineering and human factor engineering.

  19. Characterisation of Feature Points in Eye Fundus Images

    NASA Astrophysics Data System (ADS)

    Calvo, D.; Ortega, M.; Penedo, M. G.; Rouco, J.

    The retinal vessel tree adds decisive knowledge in the diagnosis of numerous opthalmologic pathologies such as hypertension or diabetes. One of the problems in the analysis of the retinal vessel tree is the lack of information in terms of vessels depth as the image acquisition usually leads to a 2D image. This situation provokes a scenario where two different vessels coinciding in a point could be interpreted as a vessel forking into a bifurcation. That is why, for traking and labelling the retinal vascular tree, bifurcations and crossovers of vessels are considered feature points. In this work a novel method for these retinal vessel tree feature points detection and classification is introduced. The method applies image techniques such as filters or thinning to obtain the adequate structure to detect the points and sets a classification of these points studying its environment. The methodology is tested using a standard database and the results show high classification capabilities.

  20. Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data mining environment

    NASA Astrophysics Data System (ADS)

    Naidoo, L.; Cho, M. A.; Mathieu, R.; Asner, G.

    2012-04-01

    The accurate classification and mapping of individual trees at species level in the savanna ecosystem can provide numerous benefits for the managerial authorities. Such benefits include the mapping of economically useful tree species, which are a key source of food production and fuel wood for the local communities, and of problematic alien invasive and bush encroaching species, which can threaten the integrity of the environment and livelihoods of the local communities. Species level mapping is particularly challenging in African savannas which are complex, heterogeneous, and open environments with high intra-species spectral variability due to differences in geology, topography, rainfall, herbivory and human impacts within relatively short distances. Savanna vegetation are also highly irregular in canopy and crown shape, height and other structural dimensions with a combination of open grassland patches and dense woody thicket - a stark contrast to the more homogeneous forest vegetation. This study classified eight common savanna tree species in the Greater Kruger National Park region, South Africa, using a combination of hyperspectral and Light Detection and Ranging (LiDAR)-derived structural parameters, in the form of seven predictor datasets, in an automated Random Forest modelling approach. The most important predictors, which were found to play an important role in the different classification models and contributed to the success of the hybrid dataset model when combined, were species tree height; NDVI; the chlorophyll b wavelength (466 nm) and a selection of raw, continuum removed and Spectral Angle Mapper (SAM) bands. It was also concluded that the hybrid predictor dataset Random Forest model yielded the highest classification accuracy and prediction success for the eight savanna tree species with an overall classification accuracy of 87.68% and KHAT value of 0.843.

  1. Predictive models for subtypes of autism spectrum disorder based on single-nucleotide polymorphisms and magnetic resonance imaging.

    PubMed

    Jiao, Y; Chen, R; Ke, X; Cheng, L; Chu, K; Lu, Z; Herskovits, E H

    2011-01-01

    Autism spectrum disorder (ASD) is a neurodevelopmental disorder, of which Asperger syndrome and high-functioning autism are subtypes. Our goal is: 1) to determine whether a diagnostic model based on single-nucleotide polymorphisms (SNPs), brain regional thickness measurements, or brain regional volume measurements can distinguish Asperger syndrome from high-functioning autism; and 2) to compare the SNP, thickness, and volume-based diagnostic models. Our study included 18 children with ASD: 13 subjects with high-functioning autism and 5 subjects with Asperger syndrome. For each child, we obtained 25 SNPs for 8 ASD-related genes; we also computed regional cortical thicknesses and volumes for 66 brain structures, based on structural magnetic resonance (MR) examination. To generate diagnostic models, we employed five machine-learning techniques: decision stump, alternating decision trees, multi-class alternating decision trees, logistic model trees, and support vector machines. For SNP-based classification, three decision-tree-based models performed better than the other two machine-learning models. The performance metrics for three decision-tree-based models were similar: decision stump was modestly better than the other two methods, with accuracy = 90%, sensitivity = 0.95 and specificity = 0.75. All thickness and volume-based diagnostic models performed poorly. The SNP-based diagnostic models were superior to those based on thickness and volume. For SNP-based classification, rs878960 in GABRB3 (gamma-aminobutyric acid A receptor, beta 3) was selected by all tree-based models. Our analysis demonstrated that SNP-based classification was more accurate than morphometry-based classification in ASD subtype classification. Also, we found that one SNP--rs878960 in GABRB3--distinguishes Asperger syndrome from high-functioning autism.

  2. Towards a formal genealogical classification of the Lezgian languages (North Caucasus): testing various phylogenetic methods on lexical data.

    PubMed

    Kassian, Alexei

    2015-01-01

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies.

  3. Towards a Formal Genealogical Classification of the Lezgian Languages (North Caucasus): Testing Various Phylogenetic Methods on Lexical Data

    PubMed Central

    Kassian, Alexei

    2015-01-01

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies. PMID:25719456

  4. Automated method for identification and artery-venous classification of vessel trees in retinal vessel networks.

    PubMed

    Joshi, Vinayak S; Reinhardt, Joseph M; Garvin, Mona K; Abramoff, Michael D

    2014-01-01

    The separation of the retinal vessel network into distinct arterial and venous vessel trees is of high interest. We propose an automated method for identification and separation of retinal vessel trees in a retinal color image by converting a vessel segmentation image into a vessel segment map and identifying the individual vessel trees by graph search. Orientation, width, and intensity of each vessel segment are utilized to find the optimal graph of vessel segments. The separated vessel trees are labeled as primary vessel or branches. We utilize the separated vessel trees for arterial-venous (AV) classification, based on the color properties of the vessels in each tree graph. We applied our approach to a dataset of 50 fundus images from 50 subjects. The proposed method resulted in an accuracy of 91.44% correctly classified vessel pixels as either artery or vein. The accuracy of correctly classified major vessel segments was 96.42%.

  5. Decision tree methods: applications for classification and prediction.

    PubMed

    Song, Yan-Yan; Lu, Ying

    2015-04-25

    Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

  6. Classification of the Gabon SAR Mosaic Using a Wavelet Based Rule Classifier

    NASA Technical Reports Server (NTRS)

    Simard, Marc; Saatchi, Sasan; DeGrandi, Gianfranco

    2000-01-01

    A method is developed for semi-automated classification of SAR images of the tropical forest. Information is extracted using the wavelet transform (WT). The transform allows for extraction of structural information in the image as a function of scale. In order to classify the SAR image, a Desicion Tree Classifier is used. The method of pruning is used to optimize classification rate versus tree size. The results give explicit insight on the type of information useful for a given class.

  7. Estimating Leaf Water Potential of Giant Sequoia Trees from Airborne Hyperspectral Imagery

    NASA Astrophysics Data System (ADS)

    Francis, E. J.; Asner, G. P.

    2015-12-01

    Recent drought-induced forest dieback events have motivated research on the mechanisms of tree survival and mortality during drought. Leaf water potential, a measure of the force exerted by the evaporation of water from the leaf surface, is an indicator of plant water stress and can help predict tree mortality in response to drought. Scientists have traditionally measured water potentials on a tree-by-tree basis, but have not been able to produce maps of tree water potential at the scale of a whole forest, leaving forest managers unaware of forest drought stress patterns and their ecosystem-level consequences. Imaging spectroscopy, a technique for remote measurement of chemical properties, has been used to successfully estimate leaf water potentials in wheat and maize crops and pinyon-pine and juniper trees, but these estimates have never been scaled to the canopy level. We used hyperspectral reflectance data collected by the Carnegie Airborne Observatory (CAO) to map leaf water potentials of giant sequoia trees (Sequoiadendron giganteum) in an 800-hectare grove in Sequoia National Park. During the current severe drought in California, we measured predawn and midday leaf water potentials of 48 giant sequoia trees, using the pressure bomb method on treetop foliage samples collected with tree-climbing techniques. The CAO collected hyperspectral reflectance data at 1-meter resolution from the same grove within 1-2 weeks of the tree-level measurements. A partial least squares regression was used to correlate reflectance data extracted from the 48 focal trees with their water potentials, producing a model that predicts water potential of giant sequoia trees. Results show that giant sequoia trees can be mapped in the imagery with a classification accuracy of 0.94, and we predicted the water potential of the mapped trees to assess 1) similarities and differences between a leaf water potential map and a canopy water content map produced from airborne hyperspectral data, 2) spatial variability in leaf water potentials and, 3) relationships between water potential and tree leaf area, topography, and surrounding tree density. These results will help forest managers plan prescribed burns to maintain the health of giant sequoia trees during drought.

  8. PRIM versus CART in subgroup discovery: when patience is harmful.

    PubMed

    Abu-Hanna, Ameen; Nannings, Barry; Dongelmans, Dave; Hasman, Arie

    2010-10-01

    We systematically compare the established algorithms CART (Classification and Regression Trees) and PRIM (Patient Rule Induction Method) in a subgroup discovery task on a large real-world high-dimensional clinical database. Contrary to current conjectures, PRIM's performance was generally inferior to CART's. PRIM often considered "peeling of" a large chunk of data at a value of a relevant discrete ordinal variable unattractive, ultimately missing an important subgroup. This finding has considerable significance in clinical medicine where ordinal scores are ubiquitous. PRIM's utility in clinical databases would increase when global information about (ordinal) variables is better put to use and when the search algorithm keeps track of alternative solutions.

  9. An automated approach to the design of decision tree classifiers

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Chin, R.; Beaudet, P.

    1982-01-01

    An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.

  10. Hydrologic classification of rivers based on cluster analysis of dimensionless hydrologic signatures: Applications for environmental instream flows

    NASA Astrophysics Data System (ADS)

    Praskievicz, S. J.; Luo, C.

    2017-12-01

    Classification of rivers is useful for a variety of purposes, such as generating and testing hypotheses about watershed controls on hydrology, predicting hydrologic variables for ungaged rivers, and setting goals for river management. In this research, we present a bottom-up (based on machine learning) river classification designed to investigate the underlying physical processes governing rivers' hydrologic regimes. The classification was developed for the entire state of Alabama, based on 248 United States Geological Survey (USGS) stream gages that met criteria for length and completeness of records. Five dimensionless hydrologic signatures were derived for each gage: slope of the flow duration curve (indicator of flow variability), baseflow index (ratio of baseflow to average streamflow), rising limb density (number of rising limbs per unit time), runoff ratio (ratio of long-term average streamflow to long-term average precipitation), and streamflow elasticity (sensitivity of streamflow to precipitation). We used a Bayesian clustering algorithm to classify the gages, based on the five hydrologic signatures, into distinct hydrologic regimes. We then used classification and regression trees (CART) to predict each gaged river's membership in different hydrologic regimes based on climatic and watershed variables. Using existing geospatial data, we applied the CART analysis to classify ungaged streams in Alabama, with the National Hydrography Dataset Plus (NHDPlus) catchment (average area 3 km2) as the unit of classification. The results of the classification can be used for meeting management and conservation objectives in Alabama, such as developing statewide standards for environmental instream flows. Such hydrologic classification approaches are promising for contributing to process-based understanding of river systems.

  11. Usefulness of Beta2-Microglobulin as a Predictor of All-Cause and Nonculprit Lesion-Related Cardiovascular Events in Acute Coronary Syndromes (from the PROSPECT Study).

    PubMed

    Möckel, Martin; Muller, Reinhold; Searle, Julia; Slagman, Anna; De Bruyne, Bernard; Serruys, Patrick; Weisz, Giora; Xu, Ke; Holert, Fabian; Müller, Christian; Maehara, Akiko; Stone, Gregg W

    2015-10-01

    In the Providing Regional Observations to Study Predictors of Events in the Coronary Tree (PROSPECT) study, plaque burden, plaque composition, and minimal luminal area were associated with an increased risk of adverse cardiovascular events arising from untreated atherosclerotic lesions (vulnerable plaques) in patients with acute coronary syndromes (ACS). We sought to evaluate the utility of biomarker profiling and clinical risk factors to predict 3-year all-cause and nonculprit lesion-related major adverse cardiac events (MACEs). Of 697 patients who underwent successful percutaneous coronary intervention (PCI) for ACS, an array of 28 baseline biomarkers was analyzed. Median follow-up was 3.4 years. Beta2-microglobulin displayed the strongest predictive power of all variables assessed for all-cause and nonculprit lesion-related MACE. In a classification and regression tree analysis, patients with beta2-microglobulin >1.92 mg/L had an estimated 28.7% 3-year incidence of all-cause MACE; C-peptide <1.32 ng/ml was associated with a further increase in MACE to 51.2%. In a classification and regression tree analysis for untreated nonculprit lesion-related MACE, beta2-microglobulin >1.92 mg/L identified a cohort with a 3-year rate of 18.5%, and C-peptide <2.22 ng/ml was associated with a further increase to 25.5%. By multivariable analysis, beta2-microglobulin was the strongest predictor of all-cause and nonculprit MACE during follow-up. High-density lipoprotein (HDL), transferrin, and history of angina pectoris were also independent predictors of all-cause MACE, and HDL was an independent predictor of nonculprit MACE. In conclusion, in the PROSPECT study, beta2-microglobulin strongly predicted all-cause and nonculprit lesion-related MACE within 3 years after PCI in ACS. C-peptide and HDL provided further risk stratification to identify angiographically mild nonculprit lesions prone to future MACE. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Exploring factors controlling the variability of pesticide concentrations in the Willamette River Basin using tree-based models

    USGS Publications Warehouse

    Qian, S.S.; Anderson, Chauncey W.

    1999-01-01

    We analyzed available concentration data of five commonly used herbicides and three pesticides collected from small streams in the Willamette River Basin in Oregon to identify factors that affect the variation of their concentrations in the area. The emphasis of this paper is the innovative use of classification and regression tree models for exploratory data analysis as well as analyzing data with a substantial amount of left-censored values. Among variables included in this analysis, land-use pattern in the watershed is the most important for all but one (simazine) of the eight pesticides studied, followed by geographic location, intensity of agriculture activities in the watershed (represented by nutrient concentrations in the stream), and the size of the watershed. The significant difference between urban sites and agriculture sites is the variability of stream concentrations. While all 16 nonurban watersheds have significantly higher variation than urban sites, the same is not necessarily true for the mean concentrations. Seasonal variation accounts for only a small fraction of the total variance in all eight pesticides.We analyzed available concentration data of five commonly used herbicides and three pesticides collected from small streams in the Willamette River Basin in Oregon to identify factors that affect the variation of their concentrations in the area. The emphasis of this paper is the innovative use of classification and regression tree models for exploratory data analysis as well as analyzing data with a substantial amount of left-censored values. Among variables included in this analysis, land-use pattern in the watershed is the most important for all but one (simazine) of the eight pesticides studied, followed by geographic location, intensity of agriculture activities in the watershed (represented by nutrient concentrations in the stream), and the size of the watershed. The significant difference between urban sites and agriculture sites is the variability of stream concentrations. While all 16 nonurban watersheds have significantly higher variation than urban sites, the same is not necessarily true for the mean concentrations. Seasonal variation accounts for only a small fraction of the total variance in all eight pesticides.

  13. Fast Image Texture Classification Using Decision Trees

    NASA Technical Reports Server (NTRS)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  14. Comparison of stream invertebrate response models for bioassessment metric

    USGS Publications Warehouse

    Waite, Ian R.; Kennen, Jonathan G.; May, Jason T.; Brown, Larry R.; Cuffney, Thomas F.; Jones, Kimberly A.; Orlando, James L.

    2012-01-01

    We aggregated invertebrate data from various sources to assemble data for modeling in two ecoregions in Oregon and one in California. Our goal was to compare the performance of models developed using multiple linear regression (MLR) techniques with models developed using three relatively new techniques: classification and regression trees (CART), random forest (RF), and boosted regression trees (BRT). We used tolerance of taxa based on richness (RICHTOL) and ratio of observed to expected taxa (O/E) as response variables and land use/land cover as explanatory variables. Responses were generally linear; therefore, there was little improvement to the MLR models when compared to models using CART and RF. In general, the four modeling techniques (MLR, CART, RF, and BRT) consistently selected the same primary explanatory variables for each region. However, results from the BRT models showed significant improvement over the MLR models for each region; increases in R2 from 0.09 to 0.20. The O/E metric that was derived from models specifically calibrated for Oregon consistently had lower R2 values than RICHTOL for the two regions tested. Modeled O/E R2 values were between 0.06 and 0.10 lower for each of the four modeling methods applied in the Willamette Valley and were between 0.19 and 0.36 points lower for the Blue Mountains. As a result, BRT models may indeed represent a good alternative to MLR for modeling species distribution relative to environmental variables.

  15. Risk factors for amendment in type, duration and setting of prescribed outpatient parenteral antimicrobial therapy (OPAT) for adult patients with cellulitis: a retrospective cohort study and CART analysis.

    PubMed

    Quirke, Michael; Curran, Emma May; O'Kelly, Patrick; Moran, Ruth; Daly, Eimear; Aylward, Seamus; McElvaney, Gerry; Wakai, Abel

    2018-01-01

    To measure the percentage rate and risk factors for amendment in the type, duration and setting of outpatient parenteral antimicrobial therapy ( OPAT) for the treatment of cellulitis. A retrospective cohort study of adult patients receiving OPAT for cellulitis was performed. Treatment amendment (TA) was defined as hospital admission or change in antibiotic therapy in order to achieve clinical response. Multivariable logistic regression (MVLR) and classification and regression tree (CART) analysis were performed. There were 307 patients enrolled. TA occurred in 36 patients (11.7%). Significant risk factors for TA on MVLR were increased age, increased Numerical Pain Scale Score (NPSS) and immunocompromise. The median OPAT duration was 7 days. Increased age, heart rate and C reactive protein were associated with treatment prolongation. CART analysis selected age <64.5 years, female gender and NPSS <2.5 in the final model, generating a low-sensitivity (27.8%), high-specificity (97.1%) decision tree. Increased age, NPSS and immunocompromise were associated with OPAT amendment. These identified risk factors can be used to support an evidence-based approach to patient selection for OPAT in cellulitis. The CART algorithm has good specificity but lacks sensitivity and is shown to be inferior in this study to logistic regression modelling. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.

  16. Association of tRNA methyltransferase NSUN2/IGF-II molecular signature with ovarian cancer survival.

    PubMed

    Yang, Jia-Cheng; Risch, Eric; Zhang, Meiqin; Huang, Chan; Huang, Huatian; Lu, Lingeng

    2017-09-01

    To investigate the association between NSUN2/IGF-II signature and ovarian cancer survival. Using a publicly accessible dataset of RNA sequencing and clinical follow-up data, we performed Classification and Regression Tree and survival analyses. Patients with NSUN2 high IGF-II low had significantly superior overall and disease progression-free survival, followed by NSUN2 low IGF-II low , NSUN2 high IGF-II high and NSUN2 low IGF-II high (p < 0.0001 for overall, p = 0.0024 for progression-free survival, respectively). The associations of NSUN2/IGF-II signature with the risks of death and relapse remained significant in multivariate Cox regression models. Random-effects meta-analyses show the upregulated NSUN2 and IGF-II expression in ovarian cancer versus normal tissues. The NSUN2/IGF-II signature associates with heterogeneous outcome and may have clinical implications in managing ovarian cancer.

  17. Tree Species Classification of Broadleaved Forests in Nagano, Central Japan, Using Airborne Laser Data and Multispectral Images

    NASA Astrophysics Data System (ADS)

    Deng, S.; Katoh, M.; Takenaka, Y.; Cheung, K.; Ishii, A.; Fujii, N.; Gao, T.

    2017-10-01

    This study attempted to classify three coniferous and ten broadleaved tree species by combining airborne laser scanning (ALS) data and multispectral images. The study area, located in Nagano, central Japan, is within the broadleaved forests of the Afan Woodland area. A total of 235 trees were surveyed in 2016, and we recorded the species, DBH, and tree height. The geographical position of each tree was collected using a Global Navigation Satellite System (GNSS) device. Tree crowns were manually detected using GNSS position data, field photographs, true-color orthoimages with three bands (red-green-blue, RGB), 3D point clouds, and a canopy height model derived from ALS data. Then a total of 69 features, including 27 image-based and 42 point-based features, were extracted from the RGB images and the ALS data to classify tree species. Finally, the detected tree crowns were classified into two classes for the first level (coniferous and broadleaved trees), four classes for the second level (Pinus densiflora, Larix kaempferi, Cryptomeria japonica, and broadleaved trees), and 13 classes for the third level (three coniferous and ten broadleaved species), using the 27 image-based features, 42 point-based features, all 69 features, and the best combination of features identified using a neighborhood component analysis algorithm, respectively. The overall classification accuracies reached 90 % at the first and second levels but less than 60 % at the third level. The classifications using the best combinations of features had higher accuracies than those using the image-based and point-based features and the combination of all of the 69 features.

  18. Characterization of Escherichia coli isolates from different fecal sources by means of classification tree analysis of fatty acid methyl ester (FAME) profiles.

    PubMed

    Seurinck, Sylvie; Deschepper, Ellen; Deboch, Bishaw; Verstraete, Willy; Siciliano, Steven

    2006-03-01

    Microbial source tracking (MST) methods need to be rapid, inexpensive and accurate. Unfortunately, many MST methods provide a wealth of information that is difficult to interpret by the regulators who use this information to make decisions. This paper describes the use of classification tree analysis to interpret the results of a MST method based on fatty acid methyl ester (FAME) profiles of Escherichia coli isolates, and to present results in a format readily interpretable by water quality managers. Raw sewage E. coli isolates and animal E. coli isolates from cow, dog, gull, and horse were isolated and their FAME profiles collected. Correct classification rates determined with leaveone-out cross-validation resulted in an overall low correct classification rate of 61%. A higher overall correct classification rate of 85% was obtained when the animal isolates were pooled together and compared to the raw sewage isolates. Bootstrap aggregation or adaptive resampling and combining of the FAME profile data increased correct classification rates substantially. Other MST methods may be better suited to differentiate between different fecal sources but classification tree analysis has enabled us to distinguish raw sewage from animal E. coli isolates, which previously had not been possible with other multivariate methods such as principal component analysis and cluster analysis.

  19. Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier.

    PubMed

    Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram

    2015-08-01

    In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.

  20. A fuzzy decision tree for fault classification.

    PubMed

    Zio, Enrico; Baraldi, Piero; Popescu, Irina C

    2008-02-01

    In plant accident management, the control room operators are required to identify the causes of the accident, based on the different patterns of evolution of the monitored process variables thereby developing. This task is often quite challenging, given the large number of process parameters monitored and the intense emotional states under which it is performed. To aid the operators, various techniques of fault classification have been engineered. An important requirement for their practical application is the physical interpretability of the relationships among the process variables underpinning the fault classification. In this view, the present work propounds a fuzzy approach to fault classification, which relies on fuzzy if-then rules inferred from the clustering of available preclassified signal data, which are then organized in a logical and transparent decision tree structure. The advantages offered by the proposed approach are precisely that a transparent fault classification model is mined out of the signal data and that the underlying physical relationships among the process variables are easily interpretable as linguistic if-then rules that can be explicitly visualized in the decision tree structure. The approach is applied to a case study regarding the classification of simulated faults in the feedwater system of a boiling water reactor.

  1. A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

    USGS Publications Warehouse

    Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

    2012-01-01

    Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification of large snags than the RF imputation approach. Adjusting the decision threshold to account for unequal size for presence and absence classes is more straightforward for the logistic regression than for the RF imputation approach. Overall, model accuracies were poor in this study, which can be attributed to the poor predictive quality of the explanatory variables and the large range of forest types and geographic conditions observed in the data.

  2. Classification techniques on computerized systems to predict and/or to detect Apnea: A systematic review.

    PubMed

    Pombo, Nuno; Garcia, Nuno; Bousson, Kouamana

    2017-03-01

    Sleep apnea syndrome (SAS), which can significantly decrease the quality of life is associated with a major risk factor of health implications such as increased cardiovascular disease, sudden death, depression, irritability, hypertension, and learning difficulties. Thus, it is relevant and timely to present a systematic review describing significant applications in the framework of computational intelligence-based SAS, including its performance, beneficial and challenging effects, and modeling for the decision-making on multiple scenarios. This study aims to systematically review the literature on systems for the detection and/or prediction of apnea events using a classification model. Forty-five included studies revealed a combination of classification techniques for the diagnosis of apnea, such as threshold-based (14.75%) and machine learning (ML) models (85.25%). In addition, the ML models, were clustered in a mind map, include neural networks (44.26%), regression (4.91%), instance-based (11.47%), Bayesian algorithms (1.63%), reinforcement learning (4.91%), dimensionality reduction (8.19%), ensemble learning (6.55%), and decision trees (3.27%). A classification model should provide an auto-adaptive and no external-human action dependency. In addition, the accuracy of the classification models is related with the effective features selection. New high-quality studies based on randomized controlled trials and validation of models using a large and multiple sample of data are recommended. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.

  3. Comparative Issues and Methods in Organizational Diagnosis. Report II. The Decision Tree Approach.

    DTIC Science & Technology

    organizational diagnosis . The advantages and disadvantages of the decision-tree approach generally, and in this study specifically, are examined. A pre-test, using a civilian sample of 174 work groups with Survey of Organizations data, was conducted to assess various decision-tree classification criteria, in terms of their similarity to the distance function used by Bowers and Hausser (1977). The results suggested the use of a large developmental sample, which should result in more distinctly defined boundary lines between classification profiles. Also, the decision matrix

  4. Boosted regression tree, table, and figure data

    EPA Pesticide Factsheets

    Spreadsheets are included here to support the manuscript Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. This dataset is associated with the following publication:Golden , H., C. Lane , A. Prues, and E. D'Amico. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition. JAWRA. American Water Resources Association, Middleburg, VA, USA, 52(5): 1251-1274, (2016).

  5. A review of logistic regression models used to predict post-fire tree mortality of western North American conifers

    Treesearch

    Travis Woolley; David C. Shaw; Lisa M. Ganio; Stephen Fitzgerald

    2012-01-01

    Logistic regression models used to predict tree mortality are critical to post-fire management, planning prescribed bums and understanding disturbance ecology. We review literature concerning post-fire mortality prediction using logistic regression models for coniferous tree species in the western USA. We include synthesis and review of: methods to develop, evaluate...

  6. Prospective identification of adolescent suicide ideation using classification tree analysis: Models for community-based screening.

    PubMed

    Hill, Ryan M; Oosterhoff, Benjamin; Kaplow, Julie B

    2017-07-01

    Although a large number of risk markers for suicide ideation have been identified, little guidance has been provided to prospectively identify adolescents at risk for suicide ideation within community settings. The current study addressed this gap in the literature by utilizing classification tree analysis (CTA) to provide a decision-making model for screening adolescents at risk for suicide ideation. Participants were N = 4,799 youth (Mage = 16.15 years, SD = 1.63) who completed both Waves 1 and 2 of the National Longitudinal Study of Adolescent to Adult Health. CTA was used to generate a series of decision rules for identifying adolescents at risk for reporting suicide ideation at Wave 2. Findings revealed 3 distinct solutions with varying sensitivity and specificity for identifying adolescents who reported suicide ideation. Sensitivity of the classification trees ranged from 44.6% to 77.6%. The tree with greatest specificity and lowest sensitivity was based on a history of suicide ideation. The tree with moderate sensitivity and high specificity was based on depressive symptoms, suicide attempts or suicide among family and friends, and social support. The most sensitive but least specific tree utilized these factors and gender, ethnicity, hours of sleep, school-related factors, and future orientation. These classification trees offer community organizations options for instituting large-scale screenings for suicide ideation risk depending on the available resources and modality of services to be provided. This study provides a theoretically and empirically driven model for prospectively identifying adolescents at risk for suicide ideation and has implications for preventive interventions among at-risk youth. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  7. Effects of sample survey design on the accuracy of classification tree models in species distribution models

    USGS Publications Warehouse

    Edwards, T.C.; Cutler, D.R.; Zimmermann, N.E.; Geiser, L.; Moisen, Gretchen G.

    2006-01-01

    We evaluated the effects of probabilistic (hereafter DESIGN) and non-probabilistic (PURPOSIVE) sample surveys on resultant classification tree models for predicting the presence of four lichen species in the Pacific Northwest, USA. Models derived from both survey forms were assessed using an independent data set (EVALUATION). Measures of accuracy as gauged by resubstitution rates were similar for each lichen species irrespective of the underlying sample survey form. Cross-validation estimates of prediction accuracies were lower than resubstitution accuracies for all species and both design types, and in all cases were closer to the true prediction accuracies based on the EVALUATION data set. We argue that greater emphasis should be placed on calculating and reporting cross-validation accuracy rates rather than simple resubstitution accuracy rates. Evaluation of the DESIGN and PURPOSIVE tree models on the EVALUATION data set shows significantly lower prediction accuracy for the PURPOSIVE tree models relative to the DESIGN models, indicating that non-probabilistic sample surveys may generate models with limited predictive capability. These differences were consistent across all four lichen species, with 11 of the 12 possible species and sample survey type comparisons having significantly lower accuracy rates. Some differences in accuracy were as large as 50%. The classification tree structures also differed considerably both among and within the modelled species, depending on the sample survey form. Overlap in the predictor variables selected by the DESIGN and PURPOSIVE tree models ranged from only 20% to 38%, indicating the classification trees fit the two evaluated survey forms on different sets of predictor variables. The magnitude of these differences in predictor variables throws doubt on ecological interpretation derived from prediction models based on non-probabilistic sample surveys. ?? 2006 Elsevier B.V. All rights reserved.

  8. A universal hybrid decision tree classifier design for human activity classification.

    PubMed

    Chien, Chieh; Pottie, Gregory J

    2012-01-01

    A system that reliably classifies daily life activities can contribute to more effective and economical treatments for patients with chronic conditions or undergoing rehabilitative therapy. We propose a universal hybrid decision tree classifier for this purpose. The tree classifier can flexibly implement different decision rules at its internal nodes, and can be adapted from a population-based model when supplemented by training data for individuals. The system was tested using seven subjects each monitored by 14 triaxial accelerometers. Each subject performed fourteen different activities typical of daily life. Using leave-one-out cross validation, our decision tree produced average classification accuracies of 89.9%. In contrast, the MATLAB personalized tree classifiers using Gini's diversity index as the split criterion followed by optimally tuning the thresholds for each subject yielded 69.2%.

  9. Effects of sample survey design on the accuracy of classification tree models in species distribution models

    Treesearch

    Thomas C. Edwards; D. Richard Cutler; Niklaus E. Zimmermann; Linda Geiser; Gretchen G. Moisen

    2006-01-01

    We evaluated the effects of probabilistic (hereafter DESIGN) and non-probabilistic (PURPOSIVE) sample surveys on resultant classification tree models for predicting the presence of four lichen species in the Pacific Northwest, USA. Models derived from both survey forms were assessed using an independent data set (EVALUATION). Measures of accuracy as gauged by...

  10. A Predictive Model of Daily Seismic Activity Induced by Mining, Developed with Data Mining Methods

    NASA Astrophysics Data System (ADS)

    Jakubowski, Jacek

    2014-12-01

    The article presents the development and evaluation of a predictive classification model of daily seismic energy emissions induced by longwall mining in sector XVI of the Piast coal mine in Poland. The model uses data on tremor energy, basic characteristics of the longwall face and mined output in this sector over the period from July 1987 to March 2011. The predicted binary variable is the occurrence of a daily sum of tremor seismic energies in a longwall that is greater than or equal to the threshold value of 105 J. Three data mining analytical methods were applied: logistic regression,neural networks, and stochastic gradient boosted trees. The boosted trees model was chosen as the best for the purposes of the prediction. The validation sample results showed its good predictive capability, taking the complex nature of the phenomenon into account. This may indicate the applied model's suitability for a sequential, short-term prediction of mining induced seismic activity.

  11. Continuous fields of land cover for the conterminous United States using Landsat data: First results from the Web-Enabled Landsat Data (WELD) project

    USGS Publications Warehouse

    Hansen, M.C.; Egorov, Alexey; Roy, David P.; Potapov, P.; Ju, J.; Turubanova, S.; Kommareddy, I.; Loveland, Thomas R.

    2011-01-01

    Vegetation Continuous Field (VCF) layers of 30 m percent tree cover, bare ground, other vegetation and probability of water were derived for the conterminous United States (CONUS) using Landsat 7 Enhanced Thematic Mapper Plus (ETM+) data sets from the Web-Enabled Landsat Data (WELD) project. Turnkey approaches to land cover characterization were enabled due to the systematic WELD Landsat processing, including conversion of digital numbers to calibrated top of atmosphere reflectance and brightness temperature, cloud masking, reprojection into a continental map projection and temporal compositing. Annual, seasonal and monthly WELD composites for 2008 were used as spectral inputs to a bagged regression and classification tree procedure using a large training data set derived from very high spatial resolution imagery and available ancillary data. The results illustrate the ability to perform Landsat land cover characterizations at continental scales that are internally consistent while retaining local spatial and thematic detail.

  12. Identifying Risk Factors for Drug Use in an Iranian Treatment Sample: A Prediction Approach Using Decision Trees.

    PubMed

    Amirabadizadeh, Alireza; Nezami, Hossein; Vaughn, Michael G; Nakhaee, Samaneh; Mehrpour, Omid

    2018-05-12

    Substance abuse exacts considerable social and health care burdens throughout the world. The aim of this study was to create a prediction model to better identify risk factors for drug use. A prospective cross-sectional study was conducted in South Khorasan Province, Iran. Of the total of 678 eligible subjects, 70% (n: 474) were randomly selected to provide a training set for constructing decision tree and multiple logistic regression (MLR) models. The remaining 30% (n: 204) were employed in a holdout sample to test the performance of the decision tree and MLR models. Predictive performance of different models was analyzed by the receiver operating characteristic (ROC) curve using the testing set. Independent variables were selected from demographic characteristics and history of drug use. For the decision tree model, the sensitivity and specificity for identifying people at risk for drug abuse were 66% and 75%, respectively, while the MLR model was somewhat less effective at 60% and 73%. Key independent variables in the analyses included first substance experience, age at first drug use, age, place of residence, history of cigarette use, and occupational and marital status. While study findings are exploratory and lack generalizability they do suggest that the decision tree model holds promise as an effective classification approach for identifying risk factors for drug use. Convergent with prior research in Western contexts is that age of drug use initiation was a critical factor predicting a substance use disorder.

  13. Biomarkers of Host Response Predict Primary End-Point Radiological Pneumonia in Tanzanian Children with Clinical Pneumonia: A Prospective Cohort Study

    PubMed Central

    Erdman, Laura K.; D’Acremont, Valérie; Hayford, Kyla; Kilowoko, Mary; Kyungu, Esther; Hongoa, Philipina; Alamo, Leonor; Streiner, David L.; Genton, Blaise; Kain, Kevin C.

    2015-01-01

    Background Diagnosing pediatric pneumonia is challenging in low-resource settings. The World Health Organization (WHO) has defined primary end-point radiological pneumonia for use in epidemiological and vaccine studies. However, radiography requires expertise and is often inaccessible. We hypothesized that plasma biomarkers of inflammation and endothelial activation may be useful surrogates for end-point pneumonia, and may provide insight into its biological significance. Methods We studied children with WHO-defined clinical pneumonia (n = 155) within a prospective cohort of 1,005 consecutive febrile children presenting to Tanzanian outpatient clinics. Based on x-ray findings, participants were categorized as primary end-point pneumonia (n = 30), other infiltrates (n = 31), or normal chest x-ray (n = 94). Plasma levels of 7 host response biomarkers at presentation were measured by ELISA. Associations between biomarker levels and radiological findings were assessed by Kruskal-Wallis test and multivariable logistic regression. Biomarker ability to predict radiological findings was evaluated using receiver operating characteristic curve analysis and Classification and Regression Tree analysis. Results Compared to children with normal x-ray, children with end-point pneumonia had significantly higher C-reactive protein, procalcitonin and Chitinase 3-like-1, while those with other infiltrates had elevated procalcitonin and von Willebrand Factor and decreased soluble Tie-2 and endoglin. Clinical variables were not predictive of radiological findings. Classification and Regression Tree analysis generated multi-marker models with improved performance over single markers for discriminating between groups. A model based on C-reactive protein and Chitinase 3-like-1 discriminated between end-point pneumonia and non-end-point pneumonia with 93.3% sensitivity (95% confidence interval 76.5–98.8), 80.8% specificity (72.6–87.1), positive likelihood ratio 4.9 (3.4–7.1), negative likelihood ratio 0.083 (0.022–0.32), and misclassification rate 0.20 (standard error 0.038). Conclusions In Tanzanian children with WHO-defined clinical pneumonia, combinations of host biomarkers distinguished between end-point pneumonia, other infiltrates, and normal chest x-ray, whereas clinical variables did not. These findings generate pathophysiological hypotheses and may have potential research and clinical utility. PMID:26366571

  14. Predicting Tillage Patterns in the Tiffin River Watershed Using Remote Sensing Methods

    NASA Astrophysics Data System (ADS)

    Brooks, C.; McCarty, J. L.; Dean, D. B.; Mann, B. F.

    2012-12-01

    Previous research in tillage mapping has focused primarily on utilizing low to no-cost, moderate (30 m to 15 m) resolution satellite data. Successful data processing techniques published in the scientific literature have focused on extracting and/or classifying tillage patterns through manipulation of spectral bands. For instance, Daughtry et al. (2005) evaluated several spectral indices for crop residue cover using satellite multispectral and hyperspectral data and to categorize soil tillage intensity in agricultural fields. A weak to moderate relationship between Landsat Thematic Mapper (TM) indices and crop residue cover was found; similar results were reported in Minnesota. Building on the findings from the scientific literature and previous work done by MTRI in the heavily agricultural Tiffin watershed of northwest Ohio and southeast Michigan, a decision tree classifier approach (also referred to as a classification tree) was used, linking several satellite data to on-the-ground tillage information in order to boost classification results. This approach included five tillage indices and derived products. A decision tree methodology enabled the development of statistically optimized (i.e., minimizing misclassification rates) classification algorithms at various desired time steps: monthly, seasonally, and annual over the 2006-2010 time period. Due to their flexibility, processing speed, and availability within all major remote sensing and statistical software packages, decision trees can ingest several data inputs from multiple sensors and satellite products, selecting only the bands, band ratios, indices, and products that further reduce misclassification errors. The project team created crop-specific tillage pattern classification trees whereby a training data set (~ 50% of available ground data) was created for production of the actual decision tree and a validation data set was set aside (~ 50% of available ground data) in order to assess the accuracy of the classification. A seasonal time step was used, optimizing a decision tree based on seasonal ground data for tillage patterns and satellite data and products for years 2006 through 2010. Annual crop type maps derived by the project team and the USDA Cropland Data Layer project was used an input to understand locations of corn, soybeans, wheat, etc. on a yearly basis. As previously stated, the robustness of the decision tree approach is the ability to implement various satellite data and products across temporal, spectral, and spatial resolutions, thereby improving the resulting classification and providing a reliable method that is not sensor-dependent. Tillage pattern classification from satellite imagery is not a simple task and has proven a challenge to previous researchers investigating this remote sensing topic. The team's decision tree method produced a practical, usable output within a focused project time period. Daughtry, C.S.T., Hunt Jr., E.R., Doraiswamy, P.C., McMurtrey III, J.E. 2005. Remote sensing the spatial distribution of crop residues. Agron. J. 97, 864-871.

  15. Voxel classification based airway tree segmentation

    NASA Astrophysics Data System (ADS)

    Lo, Pechin; de Bruijne, Marleen

    2008-03-01

    This paper presents a voxel classification based method for segmenting the human airway tree in volumetric computed tomography (CT) images. In contrast to standard methods that use only voxel intensities, our method uses a more complex appearance model based on a set of local image appearance features and Kth nearest neighbor (KNN) classification. The optimal set of features for classification is selected automatically from a large set of features describing the local image structure at several scales. The use of multiple features enables the appearance model to differentiate between airway tree voxels and other voxels of similar intensities in the lung, thus making the segmentation robust to pathologies such as emphysema. The classifier is trained on imperfect segmentations that can easily be obtained using region growing with a manual threshold selection. Experiments show that the proposed method results in a more robust segmentation that can grow into the smaller airway branches without leaking into emphysematous areas, and is able to segment many branches that are not present in the training set.

  16. A renewed perspective on agroforestry concepts and classification.

    PubMed

    Torquebiau, E F

    2000-11-01

    Agroforestry, the association of trees with farming practices, is progressively becoming a recognized land-use discipline. However, it is still perceived by some scientists, technicians and farmers as a sort of environmental fashion which does not deserve credit. The peculiar history of agroforestry and the complex relationships between agriculture and forestry explain some misunderstandings about the concepts and classification of agroforestry and reveal that, contrarily to common perception, agroforestry is closer to agriculture than to forestry. Based on field experience from several countries, a structural classification of agroforestry into six simple categories is proposed: crops under tree cover, agroforests, agroforestry in a linear arrangement, animal agroforestry, sequential agroforestry and minor agroforestry techniques. It is argued that this pragmatic classification encompasses all major agroforestry associations and allows simultaneous agroforestry to be clearly differentiated from sequential agroforestry, two categories showing contrasting ecological tree-crop interactions. It can also contribute to a betterment of the image of agroforestry and lead to a simplification of its definition.

  17. Detailed forest formation mapping in the land cover map series for the Caribbean islands

    NASA Astrophysics Data System (ADS)

    Helmer, E. H.; Schill, S.; Pedreros, D. H.; Tieszen, L. L.; Kennaway, T.; Cushing, M.; Ruzycki, T.

    2006-12-01

    Forest formation and land cover maps for several Caribbean islands were developed from Landsat ETM+ imagery as part of a multi-organizational project. The spatially explicit data on forest formation types will permit more refined estimates of some forest attributes. The woody vegetation classification scheme relates closely to that of Areces-Malea et al. (1), who classify Caribbean vegetation according to standards of the US Federal Geographic Data Committee (FGDC, 1997), with modifications similar to those in Helmer et al. (2). For several of the islands, we developed image mosaics that filled cloudy parts of scenes with data from other scene dates after using regression tree normalization (3). The regression tree procedure permitted us to develop mosaics for wet and drought seasons for a few of the islands. The resulting multiseason imagery facilitated separation between classes such as seasonal evergreen forest, semi-deciduous forest (including semi-evergreen forest), and drought deciduous forest or woodland formations. We used decision tree classification methods to classify the Landsat image mosaics to detailed forest formations and land cover for Puerto Rico (4), St. Kitts and Nevis, St. Lucia, St. Vincent and the Grenadines and Grenada. The decision trees classified a stack of raster layers for each mapping area that included the Landsat image bands and various ancillary raster data layers. For Puerto Rico, for example, the ancillary data included climate parameters (5). For some islands, the ancillary data included topographic derivatives such as aspect, slope and slope position, SRTM (6) or other topographic data. Mapping forest formations with decision tree classifiers, ancillary geospatial data, and cloud-free image mosaics, accurately distinguished spectrally similar forest formations, without the aid of ecological zone maps, on the islands where the approach was used. The approach resulted in maps of forest formations with comparable or better detail than when IKONOS or Landsat imagery was hand-digitized, as it was for the Dominican Republic (7) and Barbados. 1. T. Kennaway, E. H. Helmer. (Intl Inst of Tropical Forestry, USDA Forest Service, Río Piedras, Puerto Rico, 2006). 2. A. Areces-Mallea et al. (The Nature Conservancy, 1999). 3. E. H. Helmer, O. Ramos, T. Lopez, M. Quiñones, W. Diaz, Carib J Sci 38, 165-183 (2002). 4. C. Daly, E. H. Helmer, M. Quiñones, Int J Climatology 23, 1359-1381 (2003). 5. T. G. Farr, M. Kobrick, Eos Transactions 81, 583-585 (2000). 6. E. H. Helmer, B. Ruefenacht, Photogrammetric Eng Rem Sens 71, 1079-1089 (2005). 7. S. Hernández, M. Pérez. (Secretaría de Estado de Medio Ambiente y Recursos Naturales de la República Dominicana, Santo Domingo, Dominican Republic, 2005).

  18. Comparison of Hyperspectral and Multispectral Satellites for Forest Alliance Classification in the San Francisco Bay Area

    NASA Astrophysics Data System (ADS)

    Clark, M. L.

    2016-12-01

    The goal of this study was to assess multi-temporal, Hyperspectral Infrared Imager (HyspIRI) satellite imagery for improved forest class mapping relative to multispectral satellites. The study area was the western San Francisco Bay Area, California and forest alliances (e.g., forest communities defined by dominant or co-dominant trees) were defined using the U.S. National Vegetation Classification System. Simulated 30-m HyspIRI, Landsat 8 and Sentinel-2 imagery were processed from image data acquired by NASA's AVIRIS airborne sensor in year 2015, with summer and multi-temporal (spring, summer, fall) data analyzed separately. HyspIRI reflectance was used to generate a suite of hyperspectral metrics that targeted key spectral features related to chemical and structural properties. The Random Forests classifier was applied to the simulated images and overall accuracies (OA) were compared to those from real Landsat 8 images. For each image group, broad land cover (e.g., Needle-leaf Trees, Broad-leaf Trees, Annual agriculture, Herbaceous, Built-up) was classified first, followed by a finer-detail forest alliance classification for pixels mapped as closed-canopy forest. There were 5 needle-leaf tree alliances and 16 broad-leaf tree alliances, including 7 Quercus (oak) alliance types. No forest alliance classification exceeded 50% OA, indicating that there was broad spectral similarity among alliances, most of which were not spectrally pure but rather a mix of tree species. In general, needle-leaf (Pine, Redwood, Douglas Fir) alliances had better class accuracies than broad-leaf alliances (Oaks, Madrone, Bay Laurel, Buckeye, etc). Multi-temporal data classifications all had 5-6% greater OA than with comparable summer data. For simulated data, HyspIRI metrics had 4-5% greater OA than Landsat 8 and Sentinel-2 multispectral imagery and 3-4% greater OA than HyspIRI reflectance. Finally, HyspIRI metrics had 8% greater OA than real Landsat 8 imagery. In conclusion, forest alliance classification was found to be a difficult remote sensing application with moderate resolution (30 m) satellite imagery; however, of the data tested, HyspIRI spectral metrics had the best performance relative to multispectral satellites.

  19. A spatially explicit approach to the study of socio-demographic inequality in the spatial distribution of trees across Boston neighborhoods.

    PubMed

    Duncan, Dustin T; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A; Arbia, Giuseppe; Castro, Marcia C; White, Kellee; Williams, David R

    2014-04-01

    The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran's I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran's I range from 0.24 to 0.86, all P =0.001), for tree density (Global Moran's I =0.452, P =0.001), and in the OLS regression residuals (Global Moran's I range from 0.32 to 0.38, all P <0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (r S =-0.19; conventional P -value=0.016; spatially adjusted P -value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (r S =-0.18; conventional P -value=0.019; spatially adjusted P -value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed.

  20. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data,more » optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the constructed (c) DTB and (d) DTF regression models to predict the T. pyriformis toxicity of diverse chemicals. - Highlights: • Ensemble learning (EL) based models constructed for toxicity prediction of chemicals • Predictive models used a few simple non-quantum mechanical molecular descriptors. • EL-based DTB/DTF models successfully discriminated toxic and non-toxic chemicals. • DTB/DTF regression models precisely predicted toxicity of chemicals in multi-species. • Proposed EL based models can be used as tool to predict toxicity of new chemicals.« less

  1. Predicting Potential Changes in Suitable Habitat and Distribution by 2100 for Tree Species of the Eastern United States

    Treesearch

    Louis R Iverson; Anantha M. Prasad; Mark W. Schwartz; Mark W. Schwartz

    2005-01-01

    We predict current distribution and abundance for tree species present in eastern North America, and subsequently estimate potential suitable habitat for those species under a changed climate with 2 x CO2. We used a series of statistical models (i.e., Regression Tree Analysis (RTA), Multivariate Adaptive Regression Splines (MARS), Bagging Trees (...

  2. Deep Multi-Task Learning for Tree Genera Classification

    NASA Astrophysics Data System (ADS)

    Ko, C.; Kang, J.; Sohn, G.

    2018-05-01

    The goal for our paper is to classify tree genera using airborne Light Detection and Ranging (LiDAR) data with Convolution Neural Network (CNN) - Multi-task Network (MTN) implementation. Unlike Single-task Network (STN) where only one task is assigned to the learning outcome, MTN is a deep learning architect for learning a main task (classification of tree genera) with other tasks (in our study, classification of coniferous and deciduous) simultaneously, with shared classification features. The main contribution of this paper is to improve classification accuracy from CNN-STN to CNN-MTN. This is achieved by introducing a concurrence loss (Lcd) to the designed MTN. This term regulates the overall network performance by minimizing the inconsistencies between the two tasks. Results show that we can increase the classification accuracy from 88.7 % to 91.0 % (from STN to MTN). The second goal of this paper is to solve the problem of small training sample size by multiple-view data generation. The motivation of this goal is to address one of the most common problems in implementing deep learning architecture, the insufficient number of training data. We address this problem by simulating training dataset with multiple-view approach. The promising results from this paper are providing a basis for classifying a larger number of dataset and number of classes in the future.

  3. Alzheimer Classification Using a Minimum Spanning Tree of High-Order Functional Network on fMRI Dataset

    PubMed Central

    Guo, Hao; Liu, Lei; Chen, Junjie; Xu, Yong; Jie, Xiang

    2017-01-01

    Functional magnetic resonance imaging (fMRI) is one of the most useful methods to generate functional connectivity networks of the brain. However, conventional network generation methods ignore dynamic changes of functional connectivity between brain regions. Previous studies proposed constructing high-order functional connectivity networks that consider the time-varying characteristics of functional connectivity, and a clustering method was performed to decrease computational cost. However, random selection of the initial clustering centers and the number of clusters negatively affected classification accuracy, and the network lost neurological interpretability. Here we propose a novel method that introduces the minimum spanning tree method to high-order functional connectivity networks. As an unbiased method, the minimum spanning tree simplifies high-order network structure while preserving its core framework. The dynamic characteristics of time series are not lost with this approach, and the neurological interpretation of the network is guaranteed. Simultaneously, we propose a multi-parameter optimization framework that involves extracting discriminative features from the minimum spanning tree high-order functional connectivity networks. Compared with the conventional methods, our resting-state fMRI classification method based on minimum spanning tree high-order functional connectivity networks greatly improved the diagnostic accuracy for Alzheimer's disease. PMID:29249926

  4. Classification of Parkinsonian syndromes from FDG-PET brain data using decision trees with SSM/PCA features.

    PubMed

    Mudali, D; Teune, L K; Renken, R J; Leenders, K L; Roerdink, J B T M

    2015-01-01

    Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data.

  5. Image segmentation using hidden Markov Gauss mixture models.

    PubMed

    Pyun, Kyungsuk; Lim, Johan; Won, Chee Sun; Gray, Robert M

    2007-07-01

    Image segmentation is an important tool in image processing and can serve as an efficient front end to sophisticated algorithms and thereby simplify subsequent processing. We develop a multiclass image segmentation method using hidden Markov Gauss mixture models (HMGMMs) and provide examples of segmentation of aerial images and textures. HMGMMs incorporate supervised learning, fitting the observation probability distribution given each class by a Gauss mixture estimated using vector quantization with a minimum discrimination information (MDI) distortion. We formulate the image segmentation problem using a maximum a posteriori criteria and find the hidden states that maximize the posterior density given the observation. We estimate both the hidden Markov parameter and hidden states using a stochastic expectation-maximization algorithm. Our results demonstrate that HMGMM provides better classification in terms of Bayes risk and spatial homogeneity of the classified objects than do several popular methods, including classification and regression trees, learning vector quantization, causal hidden Markov models (HMMs), and multiresolution HMMs. The computational load of HMGMM is similar to that of the causal HMM.

  6. Development of a tree classifier for discrimination of surface mine activity from Landsat digital data

    NASA Technical Reports Server (NTRS)

    Solomon, J. L.; Miller, W. F.; Quattrochi, D. A.

    1979-01-01

    In a cooperative project with the Geological Survey of Alabama, the Mississippi State Remote Sensing Applications Program has developed a single purpose, decision-tree classifier using band-ratioing techniques to discriminate various stages of surface mining activity. The tree classifier has four levels and employs only two channels in classification at each level. An accurate computation of the amount of disturbed land resulting from the mining activity can be made as a product of the classification output. The utilization of Landsat data provides a cost-efficient, rapid, and accurate means of monitoring surface mining activities.

  7. Shape classification of wear particles by image boundary analysis using machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Yuan, Wei; Chin, K. S.; Hua, Meng; Dong, Guangneng; Wang, Chunhui

    2016-05-01

    The shape features of wear particles generated from wear track usually contain plenty of information about the wear states of a machinery operational condition. Techniques to quickly identify types of wear particles quickly to respond to the machine operation and prolong the machine's life appear to be lacking and are yet to be established. To bridge rapid off-line feature recognition with on-line wear mode identification, this paper presents a new radial concave deviation (RCD) method that mainly involves the use of the particle boundary signal to analyze wear particle features. Signal output from the RCDs subsequently facilitates the determination of several other feature parameters, typically relevant to the shape and size of the wear particle. Debris feature and type are identified through the use of various classification methods, such as linear discriminant analysis, quadratic discriminant analysis, naïve Bayesian method, and classification and regression tree method (CART). The average errors of the training and test via ten-fold cross validation suggest CART is a highly suitable approach for classifying and analyzing particle features. Furthermore, the results of the wear debris analysis enable the maintenance team to diagnose faults appropriately.

  8. The prediction of intelligence in preschool children using alternative models to regression.

    PubMed

    Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E

    2011-12-01

    Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.

  9. Classification of Dust Days by Satellite Remotely Sensed Aerosol Products

    NASA Technical Reports Server (NTRS)

    Sorek-Hammer, M.; Cohen, A.; Levy, Robert C.; Ziv, B.; Broday, D. M.

    2013-01-01

    Considerable progress in satellite remote sensing (SRS) of dust particles has been seen in the last decade. From an environmental health perspective, such an event detection, after linking it to ground particulate matter (PM) concentrations, can proxy acute exposure to respirable particles of certain properties (i.e. size, composition, and toxicity). Being affected considerably by atmospheric dust, previous studies in the Eastern Mediterranean, and in Israel in particular, have focused on mechanistic and synoptic prediction, classification, and characterization of dust events. In particular, a scheme for identifying dust days (DD) in Israel based on ground PM10 (particulate matter of size smaller than 10 nm) measurements has been suggested, which has been validated by compositional analysis. This scheme requires information regarding ground PM10 levels, which is naturally limited in places with sparse ground-monitoring coverage. In such cases, SRS may be an efficient and cost-effective alternative to ground measurements. This work demonstrates a new model for identifying DD and non-DD (NDD) over Israel based on an integration of aerosol products from different satellite platforms (Moderate Resolution Imaging Spectroradiometer (MODIS) and Ozone Monitoring Instrument (OMI)). Analysis of ground-monitoring data from 2007 to 2008 in southern Israel revealed 67 DD, with more than 88 percent occurring during winter and spring. A Classification and Regression Tree (CART) model that was applied to a database containing ground monitoring (the dependent variable) and SRS aerosol product (the independent variables) records revealed an optimal set of binary variables for the identification of DD. These variables are combinations of the following primary variables: the calendar month, ground-level relative humidity (RH), the aerosol optical depth (AOD) from MODIS, and the aerosol absorbing index (AAI) from OMI. A logistic regression that uses these variables, coded as binary variables, demonstrated 93.2 percent correct classifications of DD and NDD. Evaluation of the combined CART-logistic regression scheme in an adjacent geographical region (Gush Dan) demonstrated good results. Using SRS aerosol products for DD and NDD, identification may enable us to distinguish between health, ecological, and environmental effects that result from exposure to these distinct particle populations.

  10. Predictive mapping of soil organic carbon in wet cultivated lands using classification-tree based models: the case study of Denmark.

    PubMed

    Bou Kheir, Rania; Greve, Mogens H; Bøcher, Peder K; Greve, Mette B; Larsen, René; McCloy, Keith

    2010-05-01

    Soil organic carbon (SOC) is one of the most important carbon stocks globally and has large potential to affect global climate. Distribution patterns of SOC in Denmark constitute a nation-wide baseline for studies on soil carbon changes (with respect to Kyoto protocol). This paper predicts and maps the geographic distribution of SOC across Denmark using remote sensing (RS), geographic information systems (GISs) and decision-tree modeling (un-pruned and pruned classification trees). Seventeen parameters, i.e. parent material, soil type, landscape type, elevation, slope gradient, slope aspect, mean curvature, plan curvature, profile curvature, flow accumulation, specific catchment area, tangent slope, tangent curvature, steady-state wetness index, Normalized Difference Vegetation Index (NDVI), Normalized Difference Wetness Index (NDWI) and Soil Color Index (SCI) were generated to statistically explain SOC field measurements in the area of interest (Denmark). A large number of tree-based classification models (588) were developed using (i) all of the parameters, (ii) all Digital Elevation Model (DEM) parameters only, (iii) the primary DEM parameters only, (iv), the remote sensing (RS) indices only, (v) selected pairs of parameters, (vi) soil type, parent material and landscape type only, and (vii) the parameters having a high impact on SOC distribution in built pruned trees. The best constructed classification tree models (in the number of three) with the lowest misclassification error (ME) and the lowest number of nodes (N) as well are: (i) the tree (T1) combining all of the parameters (ME=29.5%; N=54); (ii) the tree (T2) based on the parent material, soil type and landscape type (ME=31.5%; N=14); and (iii) the tree (T3) constructed using parent material, soil type, landscape type, elevation, tangent slope and SCI (ME=30%; N=39). The produced SOC maps at 1:50,000 cartographic scale using these trees are highly matching with coincidence values equal to 90.5% (Map T1/Map T2), 95% (Map T1/Map T3) and 91% (Map T2/Map T3). The overall accuracies of these maps once compared with field observations were estimated to be 69.54% (Map T1), 68.87% (Map T2) and 69.41% (Map T3). The proposed tree models are relatively simple, and may be also applied to other areas. Copyright 2010 Elsevier Ltd. All rights reserved.

  11. Estimating parameters for tree basal area growth with a system of equations and seemingly unrelated regressions

    Treesearch

    Charles E. Rose; Thomas B. Lynch

    2001-01-01

    A method was developed for estimating parameters in an individual tree basal area growth model using a system of equations based on dbh rank classes. The estimation method developed is a compromise between an individual tree and a stand level basal area growth model that accounts for the correlation between trees within a plot by using seemingly unrelated regression (...

  12. Using ROC curves to compare neural networks and logistic regression for modeling individual noncatastrophic tree mortality

    Treesearch

    Susan L. King

    2003-01-01

    The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...

  13. Forest Stand Segmentation Using Airborne LIDAR Data and Very High Resolution Multispectral Imagery

    NASA Astrophysics Data System (ADS)

    Dechesne, Clément; Mallet, Clément; Le Bris, Arnaud; Gouet, Valérie; Hervieu, Alexandre

    2016-06-01

    Forest stands are the basic units for forest inventory and mapping. Stands are large forested areas (e.g., ≥ 2 ha) of homogeneous tree species composition. The accurate delineation of forest stands is usually performed by visual analysis of human operators on very high resolution (VHR) optical images. This work is highly time consuming and should be automated for scalability purposes. In this paper, a method based on the fusion of airborne laser scanning data (or lidar) and very high resolution multispectral imagery for automatic forest stand delineation and forest land-cover database update is proposed. The multispectral images give access to the tree species whereas 3D lidar point clouds provide geometric information on the trees. Therefore, multi-modal features are computed, both at pixel and object levels. The objects are individual trees extracted from lidar data. A supervised classification is performed at the object level on the computed features in order to coarsely discriminate the existing tree species in the area of interest. The analysis at tree level is particularly relevant since it significantly improves the tree species classification. A probability map is generated through the tree species classification and inserted with the pixel-based features map in an energetical framework. The proposed energy is then minimized using a standard graph-cut method (namely QPBO with α-expansion) in order to produce a segmentation map with a controlled level of details. Comparison with an existing forest land cover database shows that our method provides satisfactory results both in terms of stand labelling and delineation (matching ranges between 94% and 99%).

  14. Semi-supervised SVM for individual tree crown species classification

    NASA Astrophysics Data System (ADS)

    Dalponte, Michele; Ene, Liviu Theodor; Marconcini, Mattia; Gobakken, Terje; Næsset, Erik

    2015-12-01

    In this paper a novel semi-supervised SVM classifier is presented, specifically developed for tree species classification at individual tree crown (ITC) level. In ITC tree species classification, all the pixels belonging to an ITC should have the same label. This assumption is used in the learning of the proposed semi-supervised SVM classifier (ITC-S3VM). This method exploits the information contained in the unlabeled ITC samples in order to improve the classification accuracy of a standard SVM. The ITC-S3VM method can be easily implemented using freely available software libraries. The datasets used in this study include hyperspectral imagery and laser scanning data acquired over two boreal forest areas characterized by the presence of three information classes (Pine, Spruce, and Broadleaves). The experimental results quantify the effectiveness of the proposed approach, which provides classification accuracies significantly higher (from 2% to above 27%) than those obtained by the standard supervised SVM and by a state-of-the-art semi-supervised SVM (S3VM). Particularly, by reducing the number of training samples (i.e. from 100% to 25%, and from 100% to 5% for the two datasets, respectively) the proposed method still exhibits results comparable to the ones of a supervised SVM trained with the full available training set. This property of the method makes it particularly suitable for practical forest inventory applications in which collection of in situ information can be very expensive both in terms of cost and time.

  15. Contributions to "k"-Means Clustering and Regression via Classification Algorithms

    ERIC Educational Resources Information Center

    Salman, Raied

    2012-01-01

    The dissertation deals with clustering algorithms and transforming regression problems into classification problems. The main contributions of the dissertation are twofold; first, to improve (speed up) the clustering algorithms and second, to develop a strict learning environment for solving regression problems as classification tasks by using…

  16. Deep Phylogeny—How a Tree Can Help Characterize Early Life on Earth

    PubMed Central

    Gaucher, Eric A.; Kratzer, James T.; Randall, Ryan N.

    2010-01-01

    The Darwinian concept of biological evolution assumes that life on Earth shares a common ancestor. The diversification of this common ancestor through speciation events and vertical transmission of genetic material implies that the classification of life can be illustrated in a tree-like manner, commonly referred to as the Tree of Life. This article describes features of the Tree of Life, such as how the tree has been both pruned and become bushier throughout the past century as our knowledge of biology has expanded. We present current views that the classification of life may be best illustrated as a ring or even a coral with tree-like characteristics. This article also discusses how the organization of the Tree of Life offers clues about ancient life on Earth. In particular, we focus on the environmental conditions and temperature history of Precambrian life and show how chemical, biological, and geological data can converge to better understand this history. “You know, a tree is a tree.  How many more do you need to look at?” –Ronald Reagan (Governor of California), quoted in the Sacramento Bee, opposing expansion of Redwood National Park, March 3, 1966 PMID:20182607

  17. Deep phylogeny--how a tree can help characterize early life on Earth.

    PubMed

    Gaucher, Eric A; Kratzer, James T; Randall, Ryan N

    2010-01-01

    The Darwinian concept of biological evolution assumes that life on Earth shares a common ancestor. The diversification of this common ancestor through speciation events and vertical transmission of genetic material implies that the classification of life can be illustrated in a tree-like manner, commonly referred to as the Tree of Life. This article describes features of the Tree of Life, such as how the tree has been both pruned and become bushier throughout the past century as our knowledge of biology has expanded. We present current views that the classification of life may be best illustrated as a ring or even a coral with tree-like characteristics. This article also discusses how the organization of the Tree of Life offers clues about ancient life on Earth. In particular, we focus on the environmental conditions and temperature history of Precambrian life and show how chemical, biological, and geological data can converge to better understand this history."You know, a tree is a tree. How many more do you need to look at?"--Ronald Reagan (Governor of California), quoted in the Sacramento Bee, opposing expansion of Redwood National Park, March 3, 1966.

  18. Mapping forest tree species over large areas with partially cloudy Landsat imagery

    NASA Astrophysics Data System (ADS)

    Turlej, K.; Radeloff, V.

    2017-12-01

    Forests provide numerous services to natural systems and humankind, but which services forest provide depends greatly on their tree species composition. That makes it important to track not only changes in forest extent, something that remote sensing excels in, but also to map tree species. The main goal of our work was to map tree species with Landsat imagery, and to identify how to maximize mapping accuracy by including partially cloudy imagery. Our study area covered one Landsat footprint (26/28) in Northern Wisconsin, USA, with temperate and boreal forests. We selected this area because it contains numerous tree species and variable forest composition providing an ideal study area to test the limits of Landsat data. We quantified how species-level classification accuracy was affected by a) the number of acquisitions, b) the seasonal distribution of observations, and c) the amount of cloud contamination. We classified a single year stack of Landsat-7, and -8 images data with a decision tree algorithm to generate a map of dominant tree species at the pixel- and stand-level. We obtained three important results. First, we achieved producer's accuracies in the range 70-80% and user's accuracies in range 80-90% for the most abundant tree species in our study area. Second, classification accuracy improved with more acquisitions, when observations were available from all seasons, and is the best when images with up to 40% cloud cover are included. Finally, classifications for pure stands were 10 to 30 percentage points better than those for mixed stands. We conclude that including partially cloudy Landsat imagery allows to map forest tree species with accuracies that were previously only possible for rare years with many cloud-free observations. Our approach thus provides important information for both forest management and science.

  19. Binary Logistic Regression Versus Boosted Regression Trees in Assessing Landslide Susceptibility for Multiple-Occurring Regional Landslide Events: Application to the 2009 Storm Event in Messina (Sicily, southern Italy).

    NASA Astrophysics Data System (ADS)

    Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.

    2014-12-01

    This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust models in terms of selected predictors and coefficients, as well as of dispersion of the estimated probabilities around the mean value for each mapped pixel. The difference in the behaviour could be interpreted as the result of overfitting effects, which heavily affect decision tree classification more than logistic regression techniques.

  20. Emerald ash borer (Agrilus planipennis): Towards a classification of tree health and early detection

    Treesearch

    Matthew P. Peters; Louis R. Iverson; T. Davis Sydnor

    2009-01-01

    Forty-five green ash (Fraxinus pennsylvanica) street trees in Toledo, Ohio were photographed, measured, and visually rated for conditions related to emerald ash borer (Agrilus planipennis) (EAB) attacks. These trees were later removed, and sections were examined from each tree to determine the length of time that growth rates had...

  1. The relationship between tree growth patterns and likelihood of mortality: A study of two tree species in the Sierra Nevada

    USGS Publications Warehouse

    Das, A.J.; Battles, J.J.; Stephenson, N.L.; van Mantgem, P.J.

    2007-01-01

    We examined mortality of Abies concolor (Gord. & Glend.) Lindl. (white fir) and Pinus lambertiana Dougl. (sugar pine) by developing logistic models using three growth indices obtained from tree rings: average growth, growth trend, and count of abrupt growth declines. For P. lambertiana, models with average growth, growth trend, and count of abrupt declines improved overall prediction (78.6% dead trees correctly classified, 83.7% live trees correctly classified) compared with a model with average recent growth alone (69.6% dead trees correctly classified, 67.3% live trees correctly classified). For A. concolor, counts of abrupt declines and longer time intervals improved overall classification (trees with DBH ???20 cm: 78.9% dead trees correctly classified and 76.7% live trees correctly classified vs. 64.9% dead trees correctly classified and 77.9% live trees correctly classified; trees with DBH <20 cm: 71.6% dead trees correctly classified and 71.0% live trees correctly classified vs. 67.2% dead trees correctly classified and 66.7% live trees correctly classified). In general, count of abrupt declines improved live-tree classification. External validation of A. concolor models showed that they functioned well at stands not used in model development, and the development of size-specific models demonstrated important differences in mortality risk between understory and canopy trees. Population-level mortality-risk models were developed for A. concolor and generated realistic mortality rates at two sites. Our results support the contention that a more comprehensive use of the growth record yields a more robust assessment of mortality risk. ?? 2007 NRC.

  2. Detection of artificially ripened mango using spectrometric analysis

    NASA Astrophysics Data System (ADS)

    Mithun, B. S.; Mondal, Milton; Vishwakarma, Harsh; Shinde, Sujit; Kimbahune, Sanjay

    2017-05-01

    Hyperspectral sensing has been proven to be useful to determine the quality of food in general. It has also been used to distinguish naturally and artificially ripened mangoes by analyzing the spectral signature. However the focus has been on improving the accuracy of classification after performing dimensionality reduction, optimum feature selection and using suitable learning algorithm on the complete visible and NIR spectrum range data, namely 350nm to 1050nm. In this paper we focus on, (i) the use of low wavelength resolution and low cost multispectral sensor to reliably identify artificially ripened mango by selectively using the spectral information so that classification accuracy is not hampered at the cost of low resolution spectral data and (ii) use of visible spectrum i.e. 390nm to 700 nm data to accurately discriminate artificially ripened mangoes. Our results show that on a low resolution spectral data, the use of logistic regression produces an accuracy of 98.83% and outperforms other methods like classification tree, random forest significantly. And this is achieved by analyzing only 36 spectral reflectance data points instead of the complete 216 data points available in visual and NIR range. Another interesting experimental observation is that we are able to achieve more than 98% classification accuracy by selecting only 15 irradiance values in the visible spectrum. Even the number of data needs to be collected using hyper-spectral or multi-spectral sensor can be reduced by a factor of 24 for classification with high degree of confidence

  3. Discrimination of crop types with TerraSAR-X-derived information

    NASA Astrophysics Data System (ADS)

    Sonobe, Rei; Tani, Hiroshi; Wang, Xiufeng; Kobayashi, Nobuyuki; Shimamura, Hideki

    Although classification maps are required for management and for the estimation of agricultural disaster compensation, those techniques have yet to be established. This paper describes the comparison of three different classification algorithms for mapping crops in Hokkaido, Japan, using TerraSAR-X (including TanDEM-X) dual-polarimetric data. In the study area, beans, beets, grasslands, maize, potatoes and winter wheat were cultivated. In this study, classification using TerraSAR-X-derived information was performed. Coherence values, polarimetric parameters and gamma nought values were also obtained and evaluated regarding their usefulness in crop classification. Accurate classification may be possible with currently existing supervised learning models. A comparison between the classification and regression tree (CART), support vector machine (SVM) and random forests (RF) algorithms was performed. Even though J-M distances were lower than 1.0 on all TerraSAR-X acquisition days, good results were achieved (e.g., separability between winter wheat and grass) due to the characteristics of the machine learning algorithm. It was found that SVM performed best, achieving an overall accuracy of 95.0% based on the polarimetric parameters and gamma nought values for HH and VV polarizations. The misclassified fields were less than 100 a in area and 79.5-96.3% were less than 200 a with the exception of grassland. When some feature such as a road or windbreak forest is present in the TerraSAR-X data, the ratio of its extent to that of the field is relatively higher for the smaller fields, which leads to misclassifications.

  4. Improved wetland remote sensing in Yellowstone National Park using classification trees to combine TM imagery and ancillary environmental data

    USGS Publications Warehouse

    Wright, C.; Gallant, Alisa L.

    2007-01-01

    The U.S. Fish and Wildlife Service uses the term palustrine wetland to describe vegetated wetlands traditionally identified as marsh, bog, fen, swamp, or wet meadow. Landsat TM imagery was combined with image texture and ancillary environmental data to model probabilities of palustrine wetland occurrence in Yellowstone National Park using classification trees. Model training and test locations were identified from National Wetlands Inventory maps, and classification trees were built for seven years spanning a range of annual precipitation. At a coarse level, palustrine wetland was separated from upland. At a finer level, five palustrine wetland types were discriminated: aquatic bed (PAB), emergent (PEM), forested (PFO), scrub–shrub (PSS), and unconsolidated shore (PUS). TM-derived variables alone were relatively accurate at separating wetland from upland, but model error rates dropped incrementally as image texture, DEM-derived terrain variables, and other ancillary GIS layers were added. For classification trees making use of all available predictors, average overall test error rates were 7.8% for palustrine wetland/upland models and 17.0% for palustrine wetland type models, with consistent accuracies across years. However, models were prone to wetland over-prediction. While the predominant PEM class was classified with omission and commission error rates less than 14%, we had difficulty identifying the PAB and PSS classes. Ancillary vegetation information greatly improved PSS classification and moderately improved PFO discrimination. Association with geothermal areas distinguished PUS wetlands. Wetland over-prediction was exacerbated by class imbalance in likely combination with spatial and spectral limitations of the TM sensor. Wetland probability surfaces may be more informative than hard classification, and appear to respond to climate-driven wetland variability. The developed method is portable, relatively easy to implement, and should be applicable in other settings and over larger extents.

  5. In silico prediction of toxicity of phenols to Tetrahymena pyriformis by using genetic algorithm and decision tree-based modeling approach.

    PubMed

    Abbasitabar, Fatemeh; Zare-Shahabadi, Vahid

    2017-04-01

    Risk assessment of chemicals is an important issue in environmental protection; however, there is a huge lack of experimental data for a large number of end-points. The experimental determination of toxicity of chemicals involves high costs and time-consuming process. In silico tools such as quantitative structure-toxicity relationship (QSTR) models, which are constructed on the basis of computational molecular descriptors, can predict missing data for toxic end-points for existing or even not yet synthesized chemicals. Phenol derivatives are known to be aquatic pollutants. With this background, we aimed to develop an accurate and reliable QSTR model for the prediction of toxicity of 206 phenols to Tetrahymena pyriformis. A multiple linear regression (MLR)-based QSTR was obtained using a powerful descriptor selection tool named Memorized_ACO algorithm. Statistical parameters of the model were 0.72 and 0.68 for R training 2 and R test 2 , respectively. To develop a high-quality QSTR model, classification and regression tree (CART) was employed. Two approaches were considered: (1) phenols were classified into different modes of action using CART and (2) the phenols in the training set were partitioned to several subsets by a tree in such a manner that in each subset, a high-quality MLR could be developed. For the first approach, the statistical parameters of the resultant QSTR model were improved to 0.83 and 0.75 for R training 2 and R test 2 , respectively. Genetic algorithm was employed in the second approach to obtain an optimal tree, and it was shown that the final QSTR model provided excellent prediction accuracy for the training and test sets (R training 2 and R test 2 were 0.91 and 0.93, respectively). The mean absolute error for the test set was computed as 0.1615. Copyright © 2016 Elsevier Ltd. All rights reserved.

  6. Identification of pests and diseases of Dalbergia hainanensis based on EVI time series and classification of decision tree

    NASA Astrophysics Data System (ADS)

    Luo, Qiu; Xin, Wu; Qiming, Xiong

    2017-06-01

    In the process of vegetation remote sensing information extraction, the problem of phenological features and low performance of remote sensing analysis algorithm is not considered. To solve this problem, the method of remote sensing vegetation information based on EVI time-series and the classification of decision-tree of multi-source branch similarity is promoted. Firstly, to improve the time-series stability of recognition accuracy, the seasonal feature of vegetation is extracted based on the fitting span range of time-series. Secondly, the decision-tree similarity is distinguished by adaptive selection path or probability parameter of component prediction. As an index, it is to evaluate the degree of task association, decide whether to perform migration of multi-source decision tree, and ensure the speed of migration. Finally, the accuracy of classification and recognition of pests and diseases can reach 87%--98% of commercial forest in Dalbergia hainanensis, which is significantly better than that of MODIS coverage accuracy of 80%--96% in this area. Therefore, the validity of the proposed method can be verified.

  7. Comparison of standard maximum likelihood classification and polytomous logistic regression used in remote sensing

    Treesearch

    John Hogland; Nedret Billor; Nathaniel Anderson

    2013-01-01

    Discriminant analysis, referred to as maximum likelihood classification within popular remote sensing software packages, is a common supervised technique used by analysts. Polytomous logistic regression (PLR), also referred to as multinomial logistic regression, is an alternative classification approach that is less restrictive, more flexible, and easy to interpret. To...

  8. Identification of serum proteins discriminating colorectal cancer patients and healthy controls using surface-enhanced laser desorption ionisation-time of flight mass spectrometry.

    PubMed

    Engwegen, Judith Y M N; Helgason, Helgi H; Cats, Annemieke; Harris, Nathan; Bonfrer, Johannes M G; Schellens, Jan H M; Beijnen, Jos H

    2006-03-14

    To detect the new serum biomarkers for colorectal cancer (CRC) by serum protein profiling with surface-enhanced laser desorption ionisation--time of flight mass spectrometry (SELDI-TOF MS). Two independent serum sample sets were analysed separately with the ProteinChip technology (set A: 40 CRC+49 healthy controls; set B: 37 CRC+31 healthy controls), using chips with a weak cation exchange moiety and buffer pH 5. Discriminative power of differentially expressed proteins was assessed with a classification tree algorithm. Sensitivities and specificities of the generated classification trees were obtained by blindly applying data from set A to the generated trees from set B and vice versa. CRC serum protein profiles were also compared with those from breast, ovarian, prostate, and non-small cell lung cancer. Mass-to-charge ratios (m/z) 3.1x10(3), 3.3x10(3), 4.5x10(3), 6.6x10(3) and 28x10(3) were used as classifiers in the best-performing classification trees. Tree sensitivities and specificities were between 65% and 90%. Most of these discriminative m/z values were also different in the other tumour types investigated. M/z 3.3x10(3), main classifier in most trees, was a doubly charged form of the 6.6x10(3)-Da protein. The latter was identified as apolipoprotein C-I. M/z 3.1x10(3) was identified as an N-terminal fragment of albumin, and m/z 28x10(3) as apolipoprotein A-I. SELDI-TOF MS followed by classification tree pattern analysis is a suitable technique for finding new serum markers for CRC. Biomarkers can be identified and reproducibly detected in independent sample sets with high sensitivities and specificities. Although not specific for CRC, these biomarkers have a potential role in disease and treatment monitoring.

  9. A spatially explicit approach to the study of socio-demographic inequality in the spatial distribution of trees across Boston neighborhoods

    PubMed Central

    Duncan, Dustin T.; Kawachi, Ichiro; Kum, Susan; Aldstadt, Jared; Piras, Gianfranco; Matthews, Stephen A.; Arbia, Giuseppe; Castro, Marcia C.; White, Kellee; Williams, David R.

    2017-01-01

    The racial/ethnic and income composition of neighborhoods often influences local amenities, including the potential spatial distribution of trees, which are important for population health and community wellbeing, particularly in urban areas. This ecological study used spatial analytical methods to assess the relationship between neighborhood socio-demographic characteristics (i.e. minority racial/ethnic composition and poverty) and tree density at the census tact level in Boston, Massachusetts (US). We examined spatial autocorrelation with the Global Moran’s I for all study variables and in the ordinary least squares (OLS) regression residuals as well as computed Spearman correlations non-adjusted and adjusted for spatial autocorrelation between socio-demographic characteristics and tree density. Next, we fit traditional regressions (i.e. OLS regression models) and spatial regressions (i.e. spatial simultaneous autoregressive models), as appropriate. We found significant positive spatial autocorrelation for all neighborhood socio-demographic characteristics (Global Moran’s I range from 0.24 to 0.86, all P=0.001), for tree density (Global Moran’s I=0.452, P=0.001), and in the OLS regression residuals (Global Moran’s I range from 0.32 to 0.38, all P<0.001). Therefore, we fit the spatial simultaneous autoregressive models. There was a negative correlation between neighborhood percent non-Hispanic Black and tree density (rS=−0.19; conventional P-value=0.016; spatially adjusted P-value=0.299) as well as a negative correlation between predominantly non-Hispanic Black (over 60% Black) neighborhoods and tree density (rS=−0.18; conventional P-value=0.019; spatially adjusted P-value=0.180). While the conventional OLS regression model found a marginally significant inverse relationship between Black neighborhoods and tree density, we found no statistically significant relationship between neighborhood socio-demographic composition and tree density in the spatial regression models. Methodologically, our study suggests the need to take into account spatial autocorrelation as findings/conclusions can change when the spatial autocorrelation is ignored. Substantively, our findings suggest no need for policy intervention vis-à-vis trees in Boston, though we hasten to add that replication studies, and more nuanced data on tree quality, age and diversity are needed. PMID:29354668

  10. Elevation of B-Type Natriuretic Peptide at Discharge is Associated With 2-Year Mortality After Transcatheter Aortic Valve Replacement in Patients With Severe Aortic Stenosis: Insights From a Multicenter Prospective OCEAN-TAVI (Optimized Transcatheter Valvular Intervention-Transcatheter Aortic Valve Implantation) Registry.

    PubMed

    Mizutani, Kazuki; Hara, Masahiko; Iwata, Shinichi; Murakami, Takashi; Shibata, Toshihiko; Yoshiyama, Minoru; Naganuma, Toru; Yamanaka, Futoshi; Higashimori, Akihiro; Tada, Norio; Takagi, Kensuke; Araki, Motoharu; Ueno, Hiroshi; Tabata, Minoru; Shirai, Shinichi; Watanabe, Yusuke; Yamamoto, Masanori; Hayashida, Kentaro

    2017-07-14

    In this study, we sought to investigate the 2-year prognostic impact of B-type natriuretic peptide (BNP) levels at discharge, following transcatheter aortic valve replacement. We enrolled 1094 consecutive patients who underwent transcatheter aortic valve replacement between 2013 and 2016. Study patients were stratified into 2 groups according to survival classification and regression tree analysis (high versus low BNP groups). We evaluated the impact of high BNP on 2-year mortality compared with that of low BNP using a multivariable Cox model, and assessed whether this stratification would improve predictive accuracy for determining 2-year mortality by assessing time-dependent net reclassification improvement and integrated discrimination improvement. The median age of patients was 85 years (quartile 82-88), and 29.2% of the study population were men. The median Society of Thoracic Surgeons score was 6.8 (4.7-9.5), and BNP at discharge was 186 (93-378) pg/mL. All-cause mortality following discharge was 7.9% (95% CI, 5.8-9.9%) at 1 year and 15.4% (95% CI, 11.6-19.0%) at 2 years. The survival classification and regression tree analysis revealed that the discriminating BNP level to discern 2-year mortality was 202 pg/mL, and that elevated BNP had a statistically significant impact on outcomes, with an adjusted hazard ratio of 2.28 (1.36-3.82, P =0.002). The time-dependent net reclassification improvement ( P =0.047) and integrated discrimination improvement ( P =0.029) analysis revealed that the incorporation of BNP stratification with other clinical variables significantly improved predictive accuracy for 2-year mortality. Elevation of BNP at discharge is associated with 2-year mortality after transcatheter aortic valve replacement. © 2017 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley.

  11. Classification and regression tree (CART) model to predict pulmonary tuberculosis in hospitalized patients.

    PubMed

    Aguiar, Fabio S; Almeida, Luciana L; Ruffino-Netto, Antonio; Kritski, Afranio Lineu; Mello, Fernanda Cq; Werneck, Guilherme L

    2012-08-07

    Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in countries with limited resources.

  12. Schistosoma mansoni reinfection: Analysis of risk factors by classification and regression tree (CART) modeling

    PubMed Central

    Oliveira-Prado, Roberta; Matoso, Leonardo Ferreira; Veloso, Bráulio M.; Andrade, Gisele; Kloos, Helmut; Bethony, Jeffrey M.; Assunção, Renato M.; Correa-Oliveira, Rodrigo

    2017-01-01

    Praziquantel (PZQ) is an effective chemotherapy for schistosomiasis mansoni and a mainstay for its control and potential elimination. However, it does not prevent against reinfection, which can occur rapidly in areas with active transmission. A guide to ranking the risk factors for Schistosoma mansoni reinfection would greatly contribute to prioritizing resources and focusing prevention and control measures to prevent rapid reinfection. The objective of the current study was to explore the relationship among the socioeconomic, demographic, and epidemiological factors that can influence reinfection by S. mansoni one year after successful treatment with PZQ in school-aged children in Northeastern Minas Gerais state Brazil. Parasitological, socioeconomic, demographic, and water contact information were surveyed in 506 S. mansoni-infected individuals, aged 6 to 15 years, resident in these endemic areas. Eligible individuals were treated with PZQ until they were determined to be negative by the absence of S. mansoni eggs in the feces on two consecutive days of Kato-Katz fecal thick smear. These individuals were surveyed again 12 months from the date of successful treatment with PZQ. A classification and regression tree modeling (CART) was then used to explore the relationship between socioeconomic, demographic, and epidemiological variables and their reinfection status. The most important risk factor identified for S. mansoni reinfection was their “heavy” infection at baseline. Additional analyses, excluding heavy infection status, showed that lower socioeconomic status and a lower level of education of the household head were also most important risk factors for S. mansoni reinfection. Our results provide an important contribution toward the control and possible elimination of schistosomiasis by identifying three major risk factors that can be used for targeted treatment and monitoring of reinfection. We suggest that control measures that target heavily infected children in the most economically disadvantaged households would be most beneficial to maintain the success of mass chemotherapy campaigns. PMID:28813451

  13. Integrated Change Detection and Classification in Urban Areas Based on Airborne Laser Scanning Point Clouds.

    PubMed

    Tran, Thi Huong Giang; Ressl, Camillo; Pfeifer, Norbert

    2018-02-03

    This paper suggests a new approach for change detection (CD) in 3D point clouds. It combines classification and CD in one step using machine learning. The point cloud data of both epochs are merged for computing features of four types: features describing the point distribution, a feature relating to relative terrain elevation, features specific for the multi-target capability of laser scanning, and features combining the point clouds of both epochs to identify the change. All these features are merged in the points and then training samples are acquired to create the model for supervised classification, which is then applied to the whole study area. The final results reach an overall accuracy of over 90% for both epochs of eight classes: lost tree, new tree, lost building, new building, changed ground, unchanged building, unchanged tree, and unchanged ground.

  14. Use of multi-frequency, multi-polarization, multi-angle airborne radars for class discrimination in a southern temperature forest

    NASA Technical Reports Server (NTRS)

    Mehta, N. C.

    1984-01-01

    The utility of radar scatterometers for discrimination and characterization of natural vegetation was investigated. Backscatter measurements were acquired with airborne multi-frequency, multi-polarization, multi-angle radar scatterometers over a test site in a southern temperate forest. Separability between ground cover classes was studied using a two-class separability measure. Very good separability is achieved between most classes. Longer wavelength is useful in separating trees from non-tree classes, while shorter wavelength and cross polarization are helpful for discrimination among tree classes. Using the maximum likelihood classifier, 50% overall classification accuracy is achieved using a single, short-wavelength scatterometer channel. Addition of multiple incidence angles and another radar band improves classification accuracy by 20% and 50%, respectively, over the single channel accuracy. Incorporation of a third radar band seems redundant for vegetation classification. Vertical transmit polarization is critically important for all classes.

  15. Classification of Tree Species in Overstorey Canopy of Subtropical Forest Using QuickBird Images.

    PubMed

    Lin, Chinsu; Popescu, Sorin C; Thomson, Gavin; Tsogt, Khongor; Chang, Chein-I

    2015-01-01

    This paper proposes a supervised classification scheme to identify 40 tree species (2 coniferous, 38 broadleaf) belonging to 22 families and 36 genera in high spatial resolution QuickBird multispectral images (HMS). Overall kappa coefficient (OKC) and species conditional kappa coefficients (SCKC) were used to evaluate classification performance in training samples and estimate accuracy and uncertainty in test samples. Baseline classification performance using HMS images and vegetation index (VI) images were evaluated with an OKC value of 0.58 and 0.48 respectively, but performance improved significantly (up to 0.99) when used in combination with an HMS spectral-spatial texture image (SpecTex). One of the 40 species had very high conditional kappa coefficient performance (SCKC ≥ 0.95) using 4-band HMS and 5-band VIs images, but, only five species had lower performance (0.68 ≤ SCKC ≤ 0.94) using the SpecTex images. When SpecTex images were combined with a Visible Atmospherically Resistant Index (VARI), there was a significant improvement in performance in the training samples. The same level of improvement could not be replicated in the test samples indicating that a high degree of uncertainty exists in species classification accuracy which may be due to individual tree crown density, leaf greenness (inter-canopy gaps), and noise in the background environment (intra-canopy gaps). These factors increase uncertainty in the spectral texture features and therefore represent potential problems when using pixel-based classification techniques for multi-species classification.

  16. Extending airborne electromagnetic surveys for regional active layer and permafrost mapping with remote sensing and ancillary data, Yukon Flats ecoregion, central Alaska

    USGS Publications Warehouse

    Pastick, Neal J.; Jorgenson, M. Torre; Wylie, Bruce K.; Minsley, Burke J.; Ji, Lei; Walvoord, Michelle Ann; Smith, Bruce D.; Abraham, Jared D.; Rose, Joshua R.

    2013-01-01

    Machine-learning regression tree models were used to extrapolate airborne electromagnetic resistivity data collected along flight lines in the Yukon Flats Ecoregion, central Alaska, for regional mapping of permafrost. This method of extrapolation (r = 0.86) used subsurface resistivity, Landsat Thematic Mapper (TM) at-sensor reflectance, thermal, TM-derived spectral indices, digital elevation models and other relevant spatial data to estimate near-surface (0–2.6-m depth) resistivity at 30-m resolution. A piecewise regression model (r = 0.82) and a presence/absence decision tree classification (accuracy of 87%) were used to estimate active-layer thickness (ALT) (< 101 cm) and the probability of near-surface (up to 123-cm depth) permafrost occurrence from field data, modelled near-surface (0–2.6 m) resistivity, and other relevant remote sensing and map data. At site scale, the predicted ALTs were similar to those previously observed for different vegetation types. At the landscape scale, the predicted ALTs tended to be thinner on higher-elevation loess deposits than on low-lying alluvial and sand sheet deposits of the Yukon Flats. The ALT and permafrost maps provide a baseline for future permafrost monitoring, serve as inputs for modelling hydrological and carbon cycles at local to regional scales, and offer insight into the ALT response to fire and thaw processes.

  17. Benchmarking protein classification algorithms via supervised cross-validation.

    PubMed

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

  18. An assessment of the effectiveness of a random forest classifier for land-cover classification

    NASA Astrophysics Data System (ADS)

    Rodriguez-Galiano, V. F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J. P.

    2012-01-01

    Land cover monitoring using remotely sensed data requires robust classification methods which allow for the accurate mapping of complex land cover and land use categories. Random forest (RF) is a powerful machine learning classifier that is relatively unknown in land remote sensing and has not been evaluated thoroughly by the remote sensing community compared to more conventional pattern recognition techniques. Key advantages of RF include: their non-parametric nature; high classification accuracy; and capability to determine variable importance. However, the split rules for classification are unknown, therefore RF can be considered to be black box type classifier. RF provides an algorithm for estimating missing values; and flexibility to perform several types of data analysis, including regression, classification, survival analysis, and unsupervised learning. In this paper, the performance of the RF classifier for land cover classification of a complex area is explored. Evaluation was based on several criteria: mapping accuracy, sensitivity to data set size and noise. Landsat-5 Thematic Mapper data captured in European spring and summer were used with auxiliary variables derived from a digital terrain model to classify 14 different land categories in the south of Spain. Results show that the RF algorithm yields accurate land cover classifications, with 92% overall accuracy and a Kappa index of 0.92. RF is robust to training data reduction and noise because significant differences in kappa values were only observed for data reduction and noise addition values greater than 50 and 20%, respectively. Additionally, variables that RF identified as most important for classifying land cover coincided with expectations. A McNemar test indicates an overall better performance of the random forest model over a single decision tree at the 0.00001 significance level.

  19. A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover

    USGS Publications Warehouse

    Huang, C.; Townshend, J.R.G.

    2003-01-01

    A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.

  20. Integrated approach using data mining-based decision tree and object-based image analysis for high-resolution urban mapping of WorldView-2 satellite sensor data

    NASA Astrophysics Data System (ADS)

    Hamedianfar, Alireza; Shafri, Helmi Zulhaidi Mohd

    2016-04-01

    This paper integrates decision tree-based data mining (DM) and object-based image analysis (OBIA) to provide a transferable model for the detailed characterization of urban land-cover classes using WorldView-2 (WV-2) satellite images. Many articles have been published on OBIA in recent years based on DM for different applications. However, less attention has been paid to the generation of a transferable model for characterizing detailed urban land cover features. Three subsets of WV-2 images were used in this paper to generate transferable OBIA rule-sets. Many features were explored by using a DM algorithm, which created the classification rules as a decision tree (DT) structure from the first study area. The developed DT algorithm was applied to object-based classifications in the first study area. After this process, we validated the capability and transferability of the classification rules into second and third subsets. Detailed ground truth samples were collected to assess the classification results. The first, second, and third study areas achieved 88%, 85%, and 85% overall accuracies, respectively. Results from the investigation indicate that DM was an efficient method to provide the optimal and transferable classification rules for OBIA, which accelerates the rule-sets creation stage in the OBIA classification domain.

  1. The information extraction of Gannan citrus orchard based on the GF-1 remote sensing image

    NASA Astrophysics Data System (ADS)

    Wang, S.; Chen, Y. L.

    2017-02-01

    The production of Gannan oranges is the largest in China, which occupied an important part in the world. The extraction of citrus orchard quickly and effectively has important significance for fruit pathogen defense, fruit production and industrial planning. The traditional spectra extraction method of citrus orchard based on pixel has a lower classification accuracy, difficult to avoid the “pepper phenomenon”. In the influence of noise, the phenomenon that different spectrums of objects have the same spectrum is graveness. Taking Xunwu County citrus fruit planting area of Ganzhou as the research object, aiming at the disadvantage of the lower accuracy of the traditional method based on image element classification method, a decision tree classification method based on object-oriented rule set is proposed. Firstly, multi-scale segmentation is performed on the GF-1 remote sensing image data of the study area. Subsequently the sample objects are selected for statistical analysis of spectral features and geometric features. Finally, combined with the concept of decision tree classification, a variety of empirical values of single band threshold, NDVI, band combination and object geometry characteristics are used hierarchically to execute the information extraction of the research area, and multi-scale segmentation and hierarchical decision tree classification is implemented. The classification results are verified with the confusion matrix, and the overall Kappa index is 87.91%.

  2. Black-backed woodpecker habitat suitability mapping using conifer snag basal area estimated from airborne laser scanning

    NASA Astrophysics Data System (ADS)

    Casas Planes, Á.; Garcia, M.; Siegel, R.; Koltunov, A.; Ramirez, C.; Ustin, S.

    2015-12-01

    Occupancy and habitat suitability models for snag-dependent wildlife species are commonly defined as a function of snag basal area. Although critical for predicting or assessing habitat suitability, spatially distributed estimates of snag basal area are not generally available across landscapes at spatial scales relevant for conservation planning. This study evaluates the use of airborne laser scanning (ALS) to 1) identify individual conifer snags and map their basal area across a recently burned forest, and 2) map habitat suitability for a wildlife species known to be dependent on snag basal area, specifically the black-backed woodpecker (Picoides arcticus). This study focuses on the Rim Fire, a megafire that took place in 2013 in the Sierra Nevada Mountains of California, creating large patches of medium- and high-severity burned forest. We use forest inventory plots, single-tree ALS-derived metrics and Gaussian processes classification and regression to identify conifer snags and estimate their stem diameter and basal area. Then, we use the results to map habitat suitability for the black-backed woodpecker using thresholds for conifer basal area from a previously published habitat suitability model. Local maxima detection and watershed segmentation algorithms resulted in 75% detection of trees with stem diameter larger than 30 cm. Snags are identified with an overall accuracy of 91.8 % and conifer snags are identified with an overall accuracy of 84.8 %. Finally, Gaussian process regression reliably estimated stem diameter (R2 = 0.8) using height and crown area. This work provides a fast and efficient methodology to characterize the extent of a burned forest at the tree level and a critical tool for early wildlife assessment in post-fire forest management and biodiversity conservation.

  3. Importance of physical and hydraulic characteristics to unionid mussels: A retrospective analysis in a reach of large river

    USGS Publications Warehouse

    Zigler, S.J.; Newton, T.J.; Steuer, J.J.; Bartsch, M.R.; Sauer, J.S.

    2008-01-01

    Interest in understanding physical and hydraulic factors that might drive distribution and abundance of freshwater mussels has been increasing due to their decline throughout North America. We assessed whether the spatial distribution of unionid mussels could be predicted from physical and hydraulic variables in a reach of the Upper Mississippi River. Classification and regression tree (CART) models were constructed using mussel data compiled from various sources and explanatory variables derived from GIS coverages. Prediction success of CART models for presence-absence of mussels ranged from 71 to 76% across three gears (brail, sled-dredge, and dive-quadrat) and 51% of the deviance in abundance. Models were largely driven by shear stress and substrate stability variables, but interactions with simple physical variables, especially slope, were also important. Geospatial models, which were based on tree model results, predicted few mussels in poorly connected backwater areas (e.g., floodplain lakes) and the navigation channel, whereas main channel border areas with high geomorphic complexity (e.g., river bends, islands, side channel entrances) and small side channels were typically favorable to mussels. Moreover, bootstrap aggregation of discharge-specific regression tree models of dive-quadrat data indicated that variables measured at low discharge were about 25% more predictive (PMSE = 14.8) than variables measured at median discharge (PMSE = 20.4) with high discharge (PMSE = 17.1) variables intermediate. This result suggests that episodic events such as droughts and floods were important in structuring mussel distributions. Although the substantial mussel and ancillary data in our study reach is unusual, our approach to develop exploratory statistical and geospatial models should be useful even when data are more limited. ?? 2007 Springer Science+Business Media B.V.

  4. Demographic and clinical predictors of mortality from highly pathogenic avian influenza A (H5N1) virus infection: CART analysis of international cases.

    PubMed

    Patel, Rita B; Mathur, Maya B; Gould, Michael; Uyeki, Timothy M; Bhattacharya, Jay; Xiao, Yang; Khazeni, Nayer

    2014-01-01

    Human infections with highly pathogenic avian influenza (HPAI) A (H5N1) viruses have occurred in 15 countries, with high mortality to date. Determining risk factors for morbidity and mortality from HPAI H5N1 can inform preventive and therapeutic interventions. We included all cases of human HPAI H5N1 reported in World Health Organization Global Alert and Response updates and those identified through a systematic search of multiple databases (PubMed, Scopus, and Google Scholar), including articles in all languages. We abstracted predefined clinical and demographic predictors and mortality and used bivariate logistic regression analyses to examine the relationship of each candidate predictor with mortality. We developed and pruned a decision tree using nonparametric Classification and Regression Tree methods to create risk strata for mortality. We identified 617 human cases of HPAI H5N1 occurring between December 1997 and April 2013. The median age of subjects was 18 years (interquartile range 6-29 years) and 54% were female. HPAI H5N1 case-fatality proportion was 59%. The final decision tree for mortality included age, country, per capita government health expenditure, and delay from symptom onset to hospitalization, with an area under the receiver operator characteristic (ROC) curve of 0.81 (95% CI: 0.76-0.86). A model defined by four clinical and demographic predictors successfully estimated the probability of mortality from HPAI H5N1 illness. These parameters highlight the importance of early diagnosis and treatment and may enable early, targeted pharmaceutical therapy and supportive care for symptomatic patients with HPAI H5N1 virus infection.

  5. Identifying pollution sources and predicting urban air quality using ensemble learning methods

    NASA Astrophysics Data System (ADS)

    Singh, Kunwar P.; Gupta, Shikha; Rai, Premanjali

    2013-12-01

    In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management.

  6. Single-accelerometer-based daily physical activity classification.

    PubMed

    Long, Xi; Yin, Bin; Aarts, Ronald M

    2009-01-01

    In this study, a single tri-axial accelerometer placed on the waist was used to record the acceleration data for human physical activity classification. The data collection involved 24 subjects performing daily real-life activities in a naturalistic environment without researchers' intervention. For the purpose of assessing customers' daily energy expenditure, walking, running, cycling, driving, and sports were chosen as target activities for classification. This study compared a Bayesian classification with that of a Decision Tree based approach. A Bayes classifier has the advantage to be more extensible, requiring little effort in classifier retraining and software update upon further expansion or modification of the target activities. Principal components analysis was applied to remove the correlation among features and to reduce the feature vector dimension. Experiments using leave-one-subject-out and 10-fold cross validation protocols revealed a classification accuracy of approximately 80%, which was comparable with that obtained by a Decision Tree classifier.

  7. Estimating tree biomass regressions and their error, proceedings of the workshop on tree biomass regression functions and their contribution to the error

    Treesearch

    Eric H. Wharton; Tiberius Cunia

    1987-01-01

    Proceedings of a workshop co-sponsored by the USDA Forest Service, the State University of New York, and the Society of American Foresters. Presented were papers on the methodology of sample tree selection, tree biomass measurement, construction of biomass tables and estimation of their error, and combining the error of biomass tables with that of the sample plots or...

  8. Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression.

    PubMed

    Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M

    2014-12-01

    Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.

  9. Tree planting in the Allegheny section

    Treesearch

    Northeastern Forest Experiment Station

    1961-01-01

    Tree planting involves many considerations - site classification, selection of species, planting practices, and protection from fire, insects, and diseases. The information about many of these aspects of planting is scattered and fragmentary.

  10. Rapid Erosion Modeling in a Western Kenya Watershed using Visible Near Infrared Reflectance, Classification Tree Analysis and 137Cesium.

    PubMed

    deGraffenried, Jeff B; Shepherd, Keith D

    2009-12-15

    Human induced soil erosion has severe economic and environmental impacts throughout the world. It is more severe in the tropics than elsewhere and results in diminished food production and security. Kenya has limited arable land and 30 percent of the country experiences severe to very severe human induced soil degradation. The purpose of this research was to test visible near infrared diffuse reflectance spectroscopy (VNIR) as a tool for rapid assessment and benchmarking of soil condition and erosion severity class. The study was conducted in the Saiwa River watershed in the northern Rift Valley Province of western Kenya, a tropical highland area. Soil 137 Cs concentration was measured to validate spectrally derived erosion classes and establish the background levels for difference land use types. Results indicate VNIR could be used to accurately evaluate a large and diverse soil data set and predict soil erosion characteristics. Soil condition was spectrally assessed and modeled. Analysis of mean raw spectra indicated significant reflectance differences between soil erosion classes. The largest differences occurred between 1,350 and 1,950 nm with the largest separation occurring at 1,920 nm. Classification and Regression Tree (CART) analysis indicated that the spectral model had practical predictive success (72%) with Receiver Operating Characteristic (ROC) of 0.74. The change in 137 Cs concentrations supported the premise that VNIR is an effective tool for rapid screening of soil erosion condition.

  11. Development of an automated ultrasonic testing system

    NASA Astrophysics Data System (ADS)

    Shuxiang, Jiao; Wong, Brian Stephen

    2005-04-01

    Non-Destructive Testing is necessary in areas where defects in structures emerge over time due to wear and tear and structural integrity is necessary to maintain its usability. However, manual testing results in many limitations: high training cost, long training procedure, and worse, the inconsistent test results. A prime objective of this project is to develop an automatic Non-Destructive testing system for a shaft of the wheel axle of a railway carriage. Various methods, such as the neural network, pattern recognition methods and knowledge-based system are used for the artificial intelligence problem. In this paper, a statistical pattern recognition approach, Classification Tree is applied. Before feature selection, a thorough study on the ultrasonic signals produced was carried out. Based on the analysis of the ultrasonic signals, three signal processing methods were developed to enhance the ultrasonic signals: Cross-Correlation, Zero-Phase filter and Averaging. The target of this step is to reduce the noise and make the signal character more distinguishable. Four features: 1. The Auto Regressive Model Coefficients. 2. Standard Deviation. 3. Pearson Correlation 4. Dispersion Uniformity Degree are selected. And then a Classification Tree is created and applied to recognize the peak positions and amplitudes. Searching local maximum is carried out before feature computing. This procedure reduces much computation time in the real-time testing. Based on this algorithm, a software package called SOFRA was developed to recognize the peaks, calibrate automatically and test a simulated shaft automatically. The automatic calibration procedure and the automatic shaft testing procedure are developed.

  12. Evaluation of the Nutritional Changes Caused by Huanglongbing (HLB) to Citrus Plants Using Laser-Induced Breakdown Spectroscopy.

    PubMed

    Ranulfi, Anielle Coelho; Romano, Renan Arnon; Bebeachibuli Magalhães, Aida; Ferreira, Ednaldo José; Ribeiro Villas-Boas, Paulino; Marcondes Bastos Pereira Milori, Débora

    2017-07-01

    Huanglongbing (HLB) is the most recent and destructive bacterial disease of citrus and has no cure yet. A promising alternative to conventional methods is to use laser-induced breakdown spectroscopy (LIBS), a multi-elemental analytical technique, to identify the nutritional changes provoked by the disease to the citrus leaves and associate the mineral composition profile with its health status. The leaves were collected from adult citrus trees and identified by visual inspection as healthy, HLB-symptomatic, and HLB-asymptomatic. Laser-induced breakdown spectroscopy measurements were done in fresh leaves without sample preparation. Nutritional variations were evaluated using statistical tools, such as Student's t-test and analysis of variance applied to LIBS spectra, and the largest were found for Ca, Mg, and K. Considering the nutritional profile changes, a classifier induced by classification via regression combined with partial least squares regression was built resulting in an accuracy of 73% for distinguishing the three categories of leaves.

  13. Modeling the prediction of business intelligence system effectiveness.

    PubMed

    Weng, Sung-Shun; Yang, Ming-Hsien; Koo, Tian-Lih; Hsiao, Pei-I

    2016-01-01

    Although business intelligence (BI) technologies are continually evolving, the capability to apply BI technologies has become an indispensable resource for enterprises running in today's complex, uncertain and dynamic business environment. This study performed pioneering work by constructing models and rules for the prediction of business intelligence system effectiveness (BISE) in relation to the implementation of BI solutions. For enterprises, effectively managing critical attributes that determine BISE to develop prediction models with a set of rules for self-evaluation of the effectiveness of BI solutions is necessary to improve BI implementation and ensure its success. The main study findings identified the critical prediction indicators of BISE that are important to forecasting BI performance and highlighted five classification and prediction rules of BISE derived from decision tree structures, as well as a refined regression prediction model with four critical prediction indicators constructed by logistic regression analysis that can enable enterprises to improve BISE while effectively managing BI solution implementation and catering to academics to whom theory is important.

  14. Unrealistic phylogenetic trees may improve phylogenetic footprinting.

    PubMed

    Nettling, Martin; Treutler, Hendrik; Cerquides, Jesus; Grosse, Ivo

    2017-06-01

    The computational investigation of DNA binding motifs from binding sites is one of the classic tasks in bioinformatics and a prerequisite for understanding gene regulation as a whole. Due to the development of sequencing technologies and the increasing number of available genomes, approaches based on phylogenetic footprinting become increasingly attractive. Phylogenetic footprinting requires phylogenetic trees with attached substitution probabilities for quantifying the evolution of binding sites, but these trees and substitution probabilities are typically not known and cannot be estimated easily. Here, we investigate the influence of phylogenetic trees with different substitution probabilities on the classification performance of phylogenetic footprinting using synthetic and real data. For synthetic data we find that the classification performance is highest when the substitution probability used for phylogenetic footprinting is similar to that used for data generation. For real data, however, we typically find that the classification performance of phylogenetic footprinting surprisingly increases with increasing substitution probabilities and is often highest for unrealistically high substitution probabilities close to one. This finding suggests that choosing realistic model assumptions might not always yield optimal predictions in general and that choosing unrealistically high substitution probabilities close to one might actually improve the classification performance of phylogenetic footprinting. The proposed PF is implemented in JAVA and can be downloaded from https://github.com/mgledi/PhyFoo. : martin.nettling@informatik.uni-halle.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  15. Estimating population extinction thresholds with categorical classification trees for Louisiana black bears

    USGS Publications Warehouse

    Laufenberg, Jared S.; Clark, Joseph D.; Chandler, Richard B.

    2018-01-01

    Monitoring vulnerable species is critical for their conservation. Thresholds or tipping points are commonly used to indicate when populations become vulnerable to extinction and to trigger changes in conservation actions. However, quantitative methods to determine such thresholds have not been well explored. The Louisiana black bear (Ursus americanus luteolus) was removed from the list of threatened and endangered species under the U.S. Endangered Species Act in 2016 and our objectives were to determine the most appropriate parameters and thresholds for monitoring and management action. Capture mark recapture (CMR) data from 2006 to 2012 were used to estimate population parameters and variances. We used stochastic population simulations and conditional classification trees to identify demographic rates for monitoring that would be most indicative of heighted extinction risk. We then identified thresholds that would be reliable predictors of population viability. Conditional classification trees indicated that annual apparent survival rates for adult females averaged over 5 years () was the best predictor of population persistence. Specifically, population persistence was estimated to be ≥95% over 100 years when , suggesting that this statistic can be used as threshold to trigger management intervention. Our evaluation produced monitoring protocols that reliably predicted population persistence and was cost-effective. We conclude that population projections and conditional classification trees can be valuable tools for identifying extinction thresholds used in monitoring programs.

  16. Estimating population extinction thresholds with categorical classification trees for Louisiana black bears.

    PubMed

    Laufenberg, Jared S; Clark, Joseph D; Chandler, Richard B

    2018-01-01

    Monitoring vulnerable species is critical for their conservation. Thresholds or tipping points are commonly used to indicate when populations become vulnerable to extinction and to trigger changes in conservation actions. However, quantitative methods to determine such thresholds have not been well explored. The Louisiana black bear (Ursus americanus luteolus) was removed from the list of threatened and endangered species under the U.S. Endangered Species Act in 2016 and our objectives were to determine the most appropriate parameters and thresholds for monitoring and management action. Capture mark recapture (CMR) data from 2006 to 2012 were used to estimate population parameters and variances. We used stochastic population simulations and conditional classification trees to identify demographic rates for monitoring that would be most indicative of heighted extinction risk. We then identified thresholds that would be reliable predictors of population viability. Conditional classification trees indicated that annual apparent survival rates for adult females averaged over 5 years ([Formula: see text]) was the best predictor of population persistence. Specifically, population persistence was estimated to be ≥95% over 100 years when [Formula: see text], suggesting that this statistic can be used as threshold to trigger management intervention. Our evaluation produced monitoring protocols that reliably predicted population persistence and was cost-effective. We conclude that population projections and conditional classification trees can be valuable tools for identifying extinction thresholds used in monitoring programs.

  17. Wetland habitat disturbance best predicts metrics of an amphibian index of biotic integrity

    USGS Publications Warehouse

    Stapanian, Martin A.; Micacchion, Mick; Adams, Jean V.

    2015-01-01

    Regression and classification trees were used to identify the best predictors of the five component metrics of the Ohio Amphibian Index of Biotic Integrity (AmphIBI) in 54 wetlands in Ohio, USA. Of the 17 wetland- and surrounding landscape-scale variables considered, the best predictor for all AmphIBI metrics was habitat alteration and development within the wetland. The results were qualitatively similar to the best predictors for a wetland vegetation index of biotic integrity, suggesting that similar management practices (e.g., reducing or eliminating nutrient enrichment from agriculture, mowing, grazing, logging, and removing down woody debris) within the boundaries of the wetland can be applied to effectively increase the quality of wetland vegetation and amphibian communities.

  18. Mapping trees outside forests using high-resolution aerial imagery: a comparison of pixel- and object based classification approaches

    Treesearch

    Dacia M. Meneguzzo; Greg C. Liknes; Mark D. Nelson

    2013-01-01

    Discrete trees and small groups of trees in nonforest settings are considered an essential resource around the world and are collectively referred to as trees outside forests (ToF). ToF provide important functions across the landscape, such as protecting soil and water resources, providing wildlife habitat, and improving farmstead energy efficiency and aesthetics....

  19. A modified tree classification for use in growth studies and timber marking in Black Hills ponderosa pine

    Treesearch

    E. M. Hornibrook

    1939-01-01

    A satisfactory silvicultural management of ponderosa pine stands requires a judicious selection of trees to be left in the reserve stand. The timber marker must know what type of tree has the greatest growth potentialities and what type of tree will respond but slightly upon being released. The silvicultural problem in marking therefore is one of recognizing the...

  20. Automated structural classification of lipids by machine learning.

    PubMed

    Taylor, Ryan; Miller, Ryan H; Miller, Ryan D; Porter, Michael; Dalgleish, James; Prince, John T

    2015-03-01

    Modern lipidomics is largely dependent upon structural ontologies because of the great diversity exhibited in the lipidome, but no automated lipid classification exists to facilitate this partitioning. The size of the putative lipidome far exceeds the number currently classified, despite a decade of work. Automated classification would benefit ongoing classification efforts by decreasing the time needed and increasing the accuracy of classification while providing classifications for mass spectral identification algorithms. We introduce a tool that automates classification into the LIPID MAPS ontology of known lipids with >95% accuracy and novel lipids with 63% accuracy. The classification is based upon simple chemical characteristics and modern machine learning algorithms. The decision trees produced are intelligible and can be used to clarify implicit assumptions about the current LIPID MAPS classification scheme. These characteristics and decision trees are made available to facilitate alternative implementations. We also discovered many hundreds of lipids that are currently misclassified in the LIPID MAPS database, strongly underscoring the need for automated classification. Source code and chemical characteristic lists as SMARTS search strings are available under an open-source license at https://www.github.com/princelab/lipid_classifier. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Multi-phenology WorldView-2 imagery improves remote sensing of savannah tree species

    NASA Astrophysics Data System (ADS)

    Madonsela, Sabelo; Cho, Moses Azong; Mathieu, Renaud; Mutanga, Onisimo; Ramoelo, Abel; Kaszta, Żaneta; Kerchove, Ruben Van De; Wolff, Eléonore

    2017-06-01

    Biodiversity mapping in African savannah is important for monitoring changes and ensuring sustainable use of ecosystem resources. Biodiversity mapping can benefit from multi-spectral instruments such as WorldView-2 with very high spatial resolution and a spectral configuration encompassing important spectral regions not previously available for vegetation mapping. This study investigated i) the benefits of the eight-band WorldView-2 (WV-2) spectral configuration for discriminating tree species in Southern African savannah and ii) if multiple-images acquired at key points of the typical phenological development of savannahs (peak productivity, transition to senescence) improve on tree species classifications. We first assessed the discriminatory power of WV-2 bands using interspecies-Spectral Angle Mapper (SAM) via Band Add-On procedure and tested the spectral capability of WorldView-2 against simulated IKONOS for tree species classification. The results from interspecies-SAM procedure identified the yellow and red bands as the most statistically significant bands (p = 0.000251 and p = 0.000039 respectively) in the discriminatory power of WV-2 during the transition from wet to dry season (April). Using Random Forest classifier, the classification scenarios investigated showed that i) the 8-bands of the WV-2 sensor achieved higher classification accuracy for the April date (transition from wet to dry season, senescence) compared to the March date (peak productivity season) ii) the WV-2 spectral configuration systematically outperformed the IKONOS sensor spectral configuration and iii) the multi-temporal approach (March and April combined) improved the discrimination of tress species and produced the highest overall accuracy results at 80.4%. Consistent with the interspecies-SAM procedure, the yellow (605 nm) band also showed a statistically significant contribution in the improved classification accuracy from WV-2. These results highlight the mapping opportunities presented by WV-2 data for monitoring the distribution status of e.g. species often harvested by local communities (e.g. Sclerocharya birrea), encroaching species, or species-specific tree losses induced by elephants.

  2. Hierarchical classification with a competitive evolutionary neural tree.

    PubMed

    Adams, R G.; Butchart, K; Davey, N

    1999-04-01

    A new, dynamic, tree structured network, the Competitive Evolutionary Neural Tree (CENT) is introduced. The network is able to provide a hierarchical classification of unlabelled data sets. The main advantage that the CENT offers over other hierarchical competitive networks is its ability to self determine the number, and structure, of the competitive nodes in the network, without the need for externally set parameters. The network produces stable classificatory structures by halting its growth using locally calculated heuristics. The results of network simulations are presented over a range of data sets, including Anderson's IRIS data set. The CENT network demonstrates its ability to produce a representative hierarchical structure to classify a broad range of data sets.

  3. Dynamic travel time estimation using regression trees.

    DOT National Transportation Integrated Search

    2008-10-01

    This report presents a methodology for travel time estimation by using regression trees. The dissemination of travel time information has become crucial for effective traffic management, especially under congested road conditions. In the absence of c...

  4. Differences in forest area classification based on tree tally from variable- and fixed-radius plots

    Treesearch

    David Azuma; Vicente J. Monleon

    2011-01-01

    In forest inventory, it is not enough to formulate a definition; it is also necessary to define the "measurement procedure." In the classification of forestland by dominant cover type, the measurement design (the plot) can affect the outcome of the classification. We present results of a simulation study comparing classification of the dominant cover type...

  5. An analysis of tree mortality using high resolution remotely-sensed data for mixed-conifer forests in San Diego county

    NASA Astrophysics Data System (ADS)

    Freeman, Mary Pyott

    ABSTRACT An Analysis of Tree Mortality Using High Resolution Remotely-Sensed Data for Mixed-Conifer Forests in San Diego County by Mary Pyott Freeman The montane mixed-conifer forests of San Diego County are currently experiencing extensive tree mortality, which is defined as dieback where whole stands are affected. This mortality is likely the result of the complex interaction of many variables, such as altered fire regimes, climatic conditions such as drought, as well as forest pathogens and past management strategies. Conifer tree mortality and its spatial pattern and change over time were examined in three components. In component 1, two remote sensing approaches were compared for their effectiveness in delineating dead trees, a spatial contextual approach and an OBIA (object based image analysis) approach, utilizing various dates and spatial resolutions of airborne image data. For each approach transforms and masking techniques were explored, which were found to improve classifications, and an object-based assessment approach was tested. In component 2, dead tree maps produced by the most effective techniques derived from component 1 were utilized for point pattern and vector analyses to further understand spatio-temporal changes in tree mortality for the years 1997, 2000, 2002, and 2005 for three study areas: Palomar, Volcan and Laguna mountains. Plot-based fieldwork was conducted to further assess mortality patterns. Results indicate that conifer mortality was significantly clustered, increased substantially between 2002 and 2005, and was non-random with respect to tree species and diameter class sizes. In component 3, multiple environmental variables were used in Generalized Linear Model (GLM-logistic regression) and decision tree classifier model development, revealing the importance of climate and topographic factors such as precipitation and elevation, in being able to predict areas of high risk for tree mortality. The results from this study highlight the importance of multi-scale spatial as well as temporal analyses, in order to understand mixed-conifer forest structure, dynamics, and processes of decline, which can lead to more sustainable management of forests with continued natural and anthropogenic disturbance.

  6. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

    PubMed

    Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

    2017-11-02

    Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.

  7. Predicting acute aquatic toxicity of structurally diverse chemicals in fish using artificial intelligence approaches.

    PubMed

    Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali

    2013-09-01

    The research aims to develop global modeling tools capable of categorizing structurally diverse chemicals in various toxicity classes according to the EEC and European Community directives, and to predict their acute toxicity in fathead minnow using set of selected molecular descriptors. Accordingly, artificial intelligence approach based classification and regression models, such as probabilistic neural networks (PNN), generalized regression neural networks (GRNN), multilayer perceptron neural network (MLPN), radial basis function neural network (RBFN), support vector machines (SVM), gene expression programming (GEP), and decision tree (DT) were constructed using the experimental toxicity data. Diversity and non-linearity in the chemicals' data were tested using the Tanimoto similarity index and Brock-Dechert-Scheinkman statistics. Predictive and generalization abilities of various models constructed here were compared using several statistical parameters. PNN and GRNN models performed relatively better than MLPN, RBFN, SVM, GEP, and DT. Both in two and four category classifications, PNN yielded a considerably high accuracy of classification in training (95.85 percent and 90.07 percent) and validation data (91.30 percent and 86.96 percent), respectively. GRNN rendered a high correlation between the measured and model predicted -log LC50 values both for the training (0.929) and validation (0.910) data and low prediction errors (RMSE) of 0.52 and 0.49 for two sets. Efficiency of the selected PNN and GRNN models in predicting acute toxicity of new chemicals was adequately validated using external datasets of different fish species (fathead minnow, bluegill, trout, and guppy). The PNN and GRNN models showed good predictive and generalization abilities and can be used as tools for predicting toxicities of structurally diverse chemical compounds. Copyright © 2013 Elsevier Inc. All rights reserved.

  8. Field responses of Prunus serotina and Asclepias syriaca to ozone around southern Lake Michigan.

    PubMed

    Bennett, J P; Jepsen, E A; Roth, J A

    2006-07-01

    Higher ozone concentrations east of southern Lake Michigan compared to west of the lake were used to test hypotheses about injury and growth effects on two plant species. We measured approximately 1000 black cherry trees and over 3000 milkweed stems from 1999 to 2001 for this purpose. Black cherry branch elongation and milkweed growth and pod formation were significantly higher west of Lake Michigan while ozone injury was greater east of Lake Michigan. Using classification and regression tree (CART) analyses we determined that departures from normal precipitation, soil nitrogen and ozone exposure/peak hourly concentrations were the most important variables affecting cherry branch elongation, and milkweed stem height and pod formation. The effects of ozone were not consistently comparable with the effects of soil nutrients, weather, insect or disease injury, and depended on species. Ozone SUM06 exposures greater than 13 ppm-h decreased cherry branch elongation 18%; peak 1-h exposures greater than 93 ppb reduced milkweed stem height 13%; and peak 1-h concentrations greater than 98 ppb reduced pod formation 11% in milkweed.

  9. Analyzing tree-shape anatomical structures using topological descriptors of branching and ensemble of classifiers.

    PubMed

    Skoura, Angeliki; Bakic, Predrag R; Megalooikonomou, Vasilis

    2013-01-01

    The analysis of anatomical tree-shape structures visualized in medical images provides insight into the relationship between tree topology and pathology of the corresponding organs. In this paper, we propose three methods to extract descriptive features of the branching topology; the asymmetry index, the encoding of branching patterns using a node labeling scheme and an extension of the Sholl analysis. Based on these descriptors, we present classification schemes for tree topologies with respect to the underlying pathology. Moreover, we present a classifier ensemble approach which combines the predictions of the individual classifiers to optimize the classification accuracy. We applied the proposed methodology to a dataset of x-ray galactograms, medical images which visualize the breast ductal tree, in order to recognize images with radiological findings regarding breast cancer. The experimental results demonstrate the effectiveness of the proposed framework compared to state-of-the-art techniques suggesting that the proposed descriptors provide more valuable information regarding the topological patterns of ductal trees and indicating the potential of facilitating early breast cancer diagnosis.

  10. Analyzing tree-shape anatomical structures using topological descriptors of branching and ensemble of classifiers

    PubMed Central

    Skoura, Angeliki; Bakic, Predrag R.; Megalooikonomou, Vasilis

    2014-01-01

    The analysis of anatomical tree-shape structures visualized in medical images provides insight into the relationship between tree topology and pathology of the corresponding organs. In this paper, we propose three methods to extract descriptive features of the branching topology; the asymmetry index, the encoding of branching patterns using a node labeling scheme and an extension of the Sholl analysis. Based on these descriptors, we present classification schemes for tree topologies with respect to the underlying pathology. Moreover, we present a classifier ensemble approach which combines the predictions of the individual classifiers to optimize the classification accuracy. We applied the proposed methodology to a dataset of x-ray galactograms, medical images which visualize the breast ductal tree, in order to recognize images with radiological findings regarding breast cancer. The experimental results demonstrate the effectiveness of the proposed framework compared to state-of-the-art techniques suggesting that the proposed descriptors provide more valuable information regarding the topological patterns of ductal trees and indicating the potential of facilitating early breast cancer diagnosis. PMID:25414850

  11. Statistical analysis of texture in trunk images for biometric identification of tree species.

    PubMed

    Bressane, Adriano; Roveda, José A F; Martins, Antônio C G

    2015-04-01

    The identification of tree species is a key step for sustainable management plans of forest resources, as well as for several other applications that are based on such surveys. However, the present available techniques are dependent on the presence of tree structures, such as flowers, fruits, and leaves, limiting the identification process to certain periods of the year. Therefore, this article introduces a study on the application of statistical parameters for texture classification of tree trunk images. For that, 540 samples from five Brazilian native deciduous species were acquired and measures of entropy, uniformity, smoothness, asymmetry (third moment), mean, and standard deviation were obtained from the presented textures. Using a decision tree, a biometric species identification system was constructed and resulted to a 0.84 average precision rate for species classification with 0.83accuracy and 0.79 agreement. Thus, it can be considered that the use of texture presented in trunk images can represent an important advance in tree identification, since the limitations of the current techniques can be overcome.

  12. Object based technique for delineating and mapping 15 tree species using VHR WorldView-2 imagery

    NASA Astrophysics Data System (ADS)

    Mustafa, Yaseen T.; Habeeb, Hindav N.

    2014-10-01

    Monitoring and analyzing forests and trees are required task to manage and establish a good plan for the forest sustainability. To achieve such a task, information and data collection of the trees are requested. The fastest way and relatively low cost technique is by using satellite remote sensing. In this study, we proposed an approach to identify and map 15 tree species in the Mangish sub-district, Kurdistan Region-Iraq. Image-objects (IOs) were used as the tree species mapping unit. This is achieved using the shadow index, normalized difference vegetation index and texture measurements. Four classification methods (Maximum Likelihood, Mahalanobis Distance, Neural Network, and Spectral Angel Mapper) were used to classify IOs using selected IO features derived from WorldView-2 imagery. Results showed that overall accuracy was increased 5-8% using the Neural Network method compared with other methods with a Kappa coefficient of 69%. This technique gives reasonable results of various tree species classifications by means of applying the Neural Network method with IOs techniques on WorldView-2 imagery.

  13. Using nonlinear quantile regression to estimate the self-thinning boundary curve

    Treesearch

    Quang V. Cao; Thomas J. Dean

    2015-01-01

    The relationship between tree size (quadratic mean diameter) and tree density (number of trees per unit area) has been a topic of research and discussion for many decades. Starting with Reineke in 1933, the maximum size-density relationship, on a log-log scale, has been assumed to be linear. Several techniques, including linear quantile regression, have been employed...

  14. Evolving optimised decision rules for intrusion detection using particle swarm paradigm

    NASA Astrophysics Data System (ADS)

    Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.

    2012-12-01

    The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.

  15. A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models

    PubMed Central

    Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.

    2016-01-01

    Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale. PMID:26964095

  16. A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models.

    PubMed

    Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S

    2016-01-01

    Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0-20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The good performance of the RF model was attributable to its ability to handle the non-linear and hierarchical relationships between soil Cd and environmental variables. These results confirm that the RF approach is promising for the prediction and spatial distribution mapping of soil Cd at the regional scale.

  17. A new tool for post-AGB SED classification

    NASA Astrophysics Data System (ADS)

    Bendjoya, P.; Suarez, O.; Galluccio, L.; Michel, O.

    We present the results of an unsupervised classification method applied on a set of 344 spectral energy distributions (SED) of post-AGB stars extracted from the Torun catalogue of Galactic post-AGB stars. This method aims to find a new unbiased method for post-AGB star classification based on the information contained in the IR region of the SED (fluxes, IR excess, colours). We used the data from IRAS and MSX satellites, and from the 2MASS survey. We applied a classification method based on the construction of the dataset of a minimal spanning tree (MST) with the Prim's algorithm. In order to build this tree, different metrics have been tested on both flux and color indices. Our method is able to classify the set of 344 post-AGB stars in 9 distinct groups according to their SEDs.

  18. Recruiting Conventional Tree Architecture Models into State-of-the-Art LiDAR Mapping for Investigating Tree Growth Habits in Structure.

    PubMed

    Lin, Yi; Jiang, Miao; Pellikka, Petri; Heiskanen, Janne

    2018-01-01

    Mensuration of tree growth habits is of considerable importance for understanding forest ecosystem processes and forest biophysical responses to climate changes. However, the complexity of tree crown morphology that is typically formed after many years of growth tends to render it a non-trivial task, even for the state-of-the-art 3D forest mapping technology-light detection and ranging (LiDAR). Fortunately, botanists have deduced the large structural diversity of tree forms into only a limited number of tree architecture models, which can present a-priori knowledge about tree structure, growth, and other attributes for different species. This study attempted to recruit Hallé architecture models (HAMs) into LiDAR mapping to investigate tree growth habits in structure. First, following the HAM-characterized tree structure organization rules, we run the kernel procedure of tree species classification based on the LiDAR-collected point clouds using a support vector machine classifier in the leave-one-out-for-cross-validation mode. Then, the HAM corresponding to each of the classified tree species was identified based on expert knowledge, assisted by the comparison of the LiDAR-derived feature parameters. Next, the tree growth habits in structure for each of the tree species were derived from the determined HAM. In the case of four tree species growing in the boreal environment, the tests indicated that the classification accuracy reached 85.0%, and their growth habits could be derived by qualitative and quantitative means. Overall, the strategy of recruiting conventional HAMs into LiDAR mapping for investigating tree growth habits in structure was validated, thereby paving a new way for efficiently reflecting tree growth habits and projecting forest structure dynamics.

  19. Recruiting Conventional Tree Architecture Models into State-of-the-Art LiDAR Mapping for Investigating Tree Growth Habits in Structure

    PubMed Central

    Lin, Yi; Jiang, Miao; Pellikka, Petri; Heiskanen, Janne

    2018-01-01

    Mensuration of tree growth habits is of considerable importance for understanding forest ecosystem processes and forest biophysical responses to climate changes. However, the complexity of tree crown morphology that is typically formed after many years of growth tends to render it a non-trivial task, even for the state-of-the-art 3D forest mapping technology—light detection and ranging (LiDAR). Fortunately, botanists have deduced the large structural diversity of tree forms into only a limited number of tree architecture models, which can present a-priori knowledge about tree structure, growth, and other attributes for different species. This study attempted to recruit Hallé architecture models (HAMs) into LiDAR mapping to investigate tree growth habits in structure. First, following the HAM-characterized tree structure organization rules, we run the kernel procedure of tree species classification based on the LiDAR-collected point clouds using a support vector machine classifier in the leave-one-out-for-cross-validation mode. Then, the HAM corresponding to each of the classified tree species was identified based on expert knowledge, assisted by the comparison of the LiDAR-derived feature parameters. Next, the tree growth habits in structure for each of the tree species were derived from the determined HAM. In the case of four tree species growing in the boreal environment, the tests indicated that the classification accuracy reached 85.0%, and their growth habits could be derived by qualitative and quantitative means. Overall, the strategy of recruiting conventional HAMs into LiDAR mapping for investigating tree growth habits in structure was validated, thereby paving a new way for efficiently reflecting tree growth habits and projecting forest structure dynamics. PMID:29515616

  20. Potential habitat distribution for the freshwater diatom Didymosphenia geminata in the continental US

    USGS Publications Warehouse

    Kumar, S.; Spaulding, S.A.; Stohlgren, T.J.; Hermann, K.A.; Schmidt, T.S.; Bahls, L.L.

    2009-01-01

    The diatom Didymosphenia geminata is a single-celled alga found in lakes, streams, and rivers. Nuisance blooms of D geminata affect the diversity, abundance, and productivity of other aquatic organisms. Because D geminata can be transported by humans on waders and other gear, accurate spatial prediction of habitat suitability is urgently needed for early detection and rapid response, as well as for evaluation of monitoring and control programs. We compared four modeling methods to predict D geminata's habitat distribution; two methods use presence-absence data (logistic regression and classification and regression tree [CART]), and two involve presence data (maximum entropy model [Maxent] and genetic algorithm for rule-set production [GARP]). Using these methods, we evaluated spatially explicit, bioclimatic and environmental variables as predictors of diatom distribution. The Maxent model provided the most accurate predictions, followed by logistic regression, CART, and GARP. The most suitable habitats were predicted to occur in the western US, in relatively cool sites, and at high elevations with a high base-flow index. The results provide insights into the factors that affect the distribution of D geminata and a spatial basis for the prediction of nuisance blooms. ?? The Ecological Society of America.

  1. Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy.

    PubMed

    Baldeck, Claire A; Asner, Gregory P; Martin, Robin E; Anderson, Christopher B; Knapp, David E; Kellner, James R; Wright, S Joseph

    2015-01-01

    Remote identification and mapping of canopy tree species can contribute valuable information towards our understanding of ecosystem biodiversity and function over large spatial scales. However, the extreme challenges posed by highly diverse, closed-canopy tropical forests have prevented automated remote species mapping of non-flowering tree crowns in these ecosystems. We set out to identify individuals of three focal canopy tree species amongst a diverse background of tree and liana species on Barro Colorado Island, Panama, using airborne imaging spectroscopy data. First, we compared two leading single-class classification methods--binary support vector machine (SVM) and biased SVM--for their performance in identifying pixels of a single focal species. From this comparison we determined that biased SVM was more precise and created a multi-species classification model by combining the three biased SVM models. This model was applied to the imagery to identify pixels belonging to the three focal species and the prediction results were then processed to create a map of focal species crown objects. Crown-level cross-validation of the training data indicated that the multi-species classification model had pixel-level producer's accuracies of 94-97% for the three focal species, and field validation of the predicted crown objects indicated that these had user's accuracies of 94-100%. Our results demonstrate the ability of high spatial and spectral resolution remote sensing to accurately detect non-flowering crowns of focal species within a diverse tropical forest. We attribute the success of our model to recent classification and mapping techniques adapted to species detection in diverse closed-canopy forests, which can pave the way for remote species mapping in a wider variety of ecosystems.

  2. Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy

    PubMed Central

    Baldeck, Claire A.; Asner, Gregory P.; Martin, Robin E.; Anderson, Christopher B.; Knapp, David E.; Kellner, James R.; Wright, S. Joseph

    2015-01-01

    Remote identification and mapping of canopy tree species can contribute valuable information towards our understanding of ecosystem biodiversity and function over large spatial scales. However, the extreme challenges posed by highly diverse, closed-canopy tropical forests have prevented automated remote species mapping of non-flowering tree crowns in these ecosystems. We set out to identify individuals of three focal canopy tree species amongst a diverse background of tree and liana species on Barro Colorado Island, Panama, using airborne imaging spectroscopy data. First, we compared two leading single-class classification methods—binary support vector machine (SVM) and biased SVM—for their performance in identifying pixels of a single focal species. From this comparison we determined that biased SVM was more precise and created a multi-species classification model by combining the three biased SVM models. This model was applied to the imagery to identify pixels belonging to the three focal species and the prediction results were then processed to create a map of focal species crown objects. Crown-level cross-validation of the training data indicated that the multi-species classification model had pixel-level producer’s accuracies of 94–97% for the three focal species, and field validation of the predicted crown objects indicated that these had user’s accuracies of 94–100%. Our results demonstrate the ability of high spatial and spectral resolution remote sensing to accurately detect non-flowering crowns of focal species within a diverse tropical forest. We attribute the success of our model to recent classification and mapping techniques adapted to species detection in diverse closed-canopy forests, which can pave the way for remote species mapping in a wider variety of ecosystems. PMID:26153693

  3. Predicting the limits to tree height using statistical regressions of leaf traits.

    PubMed

    Burgess, Stephen S O; Dawson, Todd E

    2007-01-01

    Leaf morphology and physiological functioning demonstrate considerable plasticity within tree crowns, with various leaf traits often exhibiting pronounced vertical gradients in very tall trees. It has been proposed that the trajectory of these gradients, as determined by regression methods, could be used in conjunction with theoretical biophysical limits to estimate the maximum height to which trees can grow. Here, we examined this approach using published and new experimental data from tall conifer and angiosperm species. We showed that height predictions were sensitive to tree-to-tree variation in the shape of the regression and to the biophysical endpoints selected. We examined the suitability of proposed end-points and their theoretical validity. We also noted that site and environment influenced height predictions considerably. Use of leaf mass per unit area or leaf water potential coupled with vulnerability of twigs to cavitation poses a number of difficulties for predicting tree height. Photosynthetic rate and carbon isotope discrimination show more promise, but in the second case, the complex relationship between light, water availability, photosynthetic capacity and internal conductance to CO(2) must first be characterized.

  4. A global reference database from very high resolution commercial satellite data and methodology for application to Landsat derived 30 m continuous field tree cover data

    USGS Publications Warehouse

    Pengra, Bruce; Long, Jordan; Dahal, Devendra; Stehman, Stephen V.; Loveland, Thomas R.

    2015-01-01

    The methodology for selection, creation, and application of a global remote sensing validation dataset using high resolution commercial satellite data is presented. High resolution data are obtained for a stratified random sample of 500 primary sampling units (5 km  ×  5 km sample blocks), where the stratification based on Köppen climate classes is used to distribute the sample globally among biomes. The high resolution data are classified to categorical land cover maps using an analyst mediated classification workflow. Our initial application of these data is to evaluate a global 30 m Landsat-derived, continuous field tree cover product. For this application, the categorical reference classification produced at 2 m resolution is converted to percent tree cover per 30 m pixel (secondary sampling unit)for comparison to Landsat-derived estimates of tree cover. We provide example results (based on a subsample of 25 sample blocks in South America) illustrating basic analyses of agreement that can be produced from these reference data. Commercial high resolution data availability and data quality are shown to provide a viable means of validating continuous field tree cover. When completed, the reference classifications for the full sample of 500 blocks will be released for public use.

  5. Can Statistical Machine Learning Algorithms Help for Classification of Obstructive Sleep Apnea Severity to Optimal Utilization of Polysomnography Resources?

    PubMed

    Bozkurt, Selen; Bostanci, Asli; Turhan, Murat

    2017-08-11

    The goal of this study is to evaluate the results of machine learning methods for the classification of OSA severity of patients with suspected sleep disorder breathing as normal, mild, moderate and severe based on non-polysomnographic variables: 1) clinical data, 2) symptoms and 3) physical examination. In order to produce classification models for OSA severity, five different machine learning methods (Bayesian network, Decision Tree, Random Forest, Neural Networks and Logistic Regression) were trained while relevant variables and their relationships were derived empirically from observed data. Each model was trained and evaluated using 10-fold cross-validation and to evaluate classification performances of all methods, true positive rate (TPR), false positive rate (FPR), Positive Predictive Value (PPV), F measure and Area Under Receiver Operating Characteristics curve (ROC-AUC) were used. Results of 10-fold cross validated tests with different variable settings promisingly indicated that the OSA severity of suspected OSA patients can be classified, using non-polysomnographic features, with 0.71 true positive rate as the highest and, 0.15 false positive rate as the lowest, respectively. Moreover, the test results of different variables settings revealed that the accuracy of the classification models was significantly improved when physical examination variables were added to the model. Study results showed that machine learning methods can be used to estimate the probabilities of no, mild, moderate, and severe obstructive sleep apnea and such approaches may improve accurate initial OSA screening and help referring only the suspected moderate or severe OSA patients to sleep laboratories for the expensive tests.

  6. Decision tree analysis in subarachnoid hemorrhage: prediction of outcome parameters during the course of aneurysmal subarachnoid hemorrhage using decision tree analysis.

    PubMed

    Hostettler, Isabel Charlotte; Muroi, Carl; Richter, Johannes Konstantin; Schmid, Josef; Neidert, Marian Christoph; Seule, Martin; Boss, Oliver; Pangalu, Athina; Germans, Menno Robbert; Keller, Emanuela

    2018-01-19

    OBJECTIVE The aim of this study was to create prediction models for outcome parameters by decision tree analysis based on clinical and laboratory data in patients with aneurysmal subarachnoid hemorrhage (aSAH). METHODS The database consisted of clinical and laboratory parameters of 548 patients with aSAH who were admitted to the Neurocritical Care Unit, University Hospital Zurich. To examine the model performance, the cohort was randomly divided into a derivation cohort (60% [n = 329]; training data set) and a validation cohort (40% [n = 219]; test data set). The classification and regression tree prediction algorithm was applied to predict death, functional outcome, and ventriculoperitoneal (VP) shunt dependency. Chi-square automatic interaction detection was applied to predict delayed cerebral infarction on days 1, 3, and 7. RESULTS The overall mortality was 18.4%. The accuracy of the decision tree models was good for survival on day 1 and favorable functional outcome at all time points, with a difference between the training and test data sets of < 5%. Prediction accuracy for survival on day 1 was 75.2%. The most important differentiating factor was the interleukin-6 (IL-6) level on day 1. Favorable functional outcome, defined as Glasgow Outcome Scale scores of 4 and 5, was observed in 68.6% of patients. Favorable functional outcome at all time points had a prediction accuracy of 71.1% in the training data set, with procalcitonin on day 1 being the most important differentiating factor at all time points. A total of 148 patients (27%) developed VP shunt dependency. The most important differentiating factor was hyperglycemia on admission. CONCLUSIONS The multiple variable analysis capability of decision trees enables exploration of dependent variables in the context of multiple changing influences over the course of an illness. The decision tree currently generated increases awareness of the early systemic stress response, which is seemingly pertinent for prognostication.

  7. SVM-based tree-type neural networks as a critic in adaptive critic designs for control.

    PubMed

    Deb, Alok Kanti; Jayadeva; Gopal, Madan; Chandra, Suresh

    2007-07-01

    In this paper, we use the approach of adaptive critic design (ACD) for control, specifically, the action-dependent heuristic dynamic programming (ADHDP) method. A least squares support vector machine (SVM) regressor has been used for generating the control actions, while an SVM-based tree-type neural network (NN) is used as the critic. After a failure occurs, the critic and action are retrained in tandem using the failure data. Failure data is binary classification data, where the number of failure states are very few as compared to the number of no-failure states. The difficulty of conventional multilayer feedforward NNs in learning this type of classification data has been overcome by using the SVM-based tree-type NN, which due to its feature to add neurons to learn misclassified data, has the capability to learn any binary classification data without a priori choice of the number of neurons or the structure of the network. The capability of the trained controller to handle unforeseen situations is demonstrated.

  8. [A strategy for assessing environmental influence on airway allergy using a regression binary tree-based method].

    PubMed

    Yoshioka, Fumi; Azuma, Emiko; Nakajima, Takae; Hashimoto, Masafumi; Toyoshima, Kyoichiro; Komachi, Yoshio

    2004-08-01

    To clarify the living environment factors that increase the risk of allergic sensitization to house dust mites, we applied a regression binary tree-based method (CART, Classification & Regression Trees) to an epidemiological study on airway allergy. The utility of the tree map in personal sanitary guidance for preventing allergic sensitization was examined with respect to feasibility and validity. A questionnaire was given to 386 healthy adult women, asking them about their individual living environments. Also, blood samples were collected to measure Dermatophagoides pteronyssinus (Dp)-specific IgE, the presence/absence of Dp-sensitization being expressed as positive/negative. The questionnaire consisted of nine items on (1) home ventilation by keeping windows open, (2) personal or family smoking habits, (3) use of air conditioners in hot weather, (4) type of flooring (tatami/wooden/carpet) in the living room, (5) visible mold proliferation in the kitchen, (6) type of housing (concrete/wooden), (7) residential area (heavy or light traffic area) (8) heating system (use of unventilated combustion appliances), and (9) frequency of cleaning (every day or less often). There also were queries on the past history of airway allergic diseases, such as bronchial asthma and allergic rhinitis. CART and a multivariate logistic regression analysis (MLRA) were performed. The subjects were first classified into two groups, with and without a history of airway allergic diseases (Groups WPH and WOPH). In each group, the involvement of living environment factors in Dp-sensitization was examined using CART and MLRA. In the MLRA study, individual living environment factors showed promotional or suppressive effects on Dp-sensitization with differences between the two groups. With respect to the CART results, the two groups were first split by the factor that had the most significant odds ratio for MLRA. In Group WPH, which had a Dp-sensitization risk of 19.5%, the first split was by the factor of visible mold proliferation in the kitchen into the factor-present group with a risk value of 45.5% and the factor-absent group with 13.5%. The mold proliferation group was split with reference to frequent cleaning, and the risk rose to 75% in the factor-absent group and to 100% when family smoking habits were reported. Group WOPH (the risk: 10.8%) was first split into two groups according to the use of air conditioners in hot weather for more than 6 hours a day or less, which showed risk values of 16.7% and 6.9%, respectively. The risk of the group that intensively used air conditioners fell to 8.3% with tatami as flooring in the living room, and, if others, rose to 20.8%. The risk of the factor-lacking group fell to 4.0% without wooden flooring. CART analysis enables us to express complex relationships between living environment factors and Dp-sensitization simply by a binary regression tree, pointing to preventive strategies that can be flexibly changed according to the individual living environments of the subjects.

  9. Woodland Mapping at Single-Tree Levels Using Object-Oriented Classification of Unmanned Aerial Vehicle (uav) Images

    NASA Astrophysics Data System (ADS)

    Chenari, A.; Erfanifard, Y.; Dehghani, M.; Pourghasemi, H. R.

    2017-09-01

    Remotely sensed datasets offer a reliable means to precisely estimate biophysical characteristics of individual species sparsely distributed in open woodlands. Moreover, object-oriented classification has exhibited significant advantages over different classification methods for delineation of tree crowns and recognition of species in various types of ecosystems. However, it still is unclear if this widely-used classification method can have its advantages on unmanned aerial vehicle (UAV) digital images for mapping vegetation cover at single-tree levels. In this study, UAV orthoimagery was classified using object-oriented classification method for mapping a part of wild pistachio nature reserve in Zagros open woodlands, Fars Province, Iran. This research focused on recognizing two main species of the study area (i.e., wild pistachio and wild almond) and estimating their mean crown area. The orthoimage of study area was consisted of 1,076 images with spatial resolution of 3.47 cm which was georeferenced using 12 ground control points (RMSE=8 cm) gathered by real-time kinematic (RTK) method. The results showed that the UAV orthoimagery classified by object-oriented method efficiently estimated mean crown area of wild pistachios (52.09±24.67 m2) and wild almonds (3.97±1.69 m2) with no significant difference with their observed values (α=0.05). In addition, the results showed that wild pistachios (accuracy of 0.90 and precision of 0.92) and wild almonds (accuracy of 0.90 and precision of 0.89) were well recognized by image segmentation. In general, we concluded that UAV orthoimagery can efficiently produce precise biophysical data of vegetation stands at single-tree levels, which therefore is suitable for assessment and monitoring open woodlands.

  10. New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis

    Treesearch

    L.R. Iverson; A.M. Prasad; A. Liaw

    2004-01-01

    More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...

  11. Equations for predicting biomass in 2- to 6-year-old Eucalyptus saligna in Hawaii

    Treesearch

    Craig D. Whitesell; Susan C. Miyasaka; Robert F. Strand; Thomas H. Schubert; Katharine E. McDuffie

    1988-01-01

    Eucalyptus saligna trees grown in short-rotation plantations on the island of Hawaii were measured, harvested, and weighed to provide data for developing regression equations using non-destructive stand measurements. Regression analysis of the data from 190 trees in the 2.0- to 3.5-year range and 96 trees in the 4- to 6-year range related stem-only...

  12. Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

    Treesearch

    Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

    2009-01-01

    Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....

  13. Statistical classification of drug incidents due to look-alike sound-alike mix-ups.

    PubMed

    Wong, Zoie Shui Yee

    2016-06-01

    It has been recognised that medication names that look or sound similar are a cause of medication errors. This study builds statistical classifiers for identifying medication incidents due to look-alike sound-alike mix-ups. A total of 227 patient safety incident advisories related to medication were obtained from the Canadian Patient Safety Institute's Global Patient Safety Alerts system. Eight feature selection strategies based on frequent terms, frequent drug terms and constituent terms were performed. Statistical text classifiers based on logistic regression, support vector machines with linear, polynomial, radial-basis and sigmoid kernels and decision tree were trained and tested. The models developed achieved an average accuracy of above 0.8 across all the model settings. The receiver operating characteristic curves indicated the classifiers performed reasonably well. The results obtained in this study suggest that statistical text classification can be a feasible method for identifying medication incidents due to look-alike sound-alike mix-ups based on a database of advisories from Global Patient Safety Alerts. © The Author(s) 2014.

  14. Predicting the disease of Alzheimer with SNP biomarkers and clinical data using data mining classification approach: decision tree.

    PubMed

    Erdoğan, Onur; Aydin Son, Yeşim

    2014-01-01

    Single Nucleotide Polymorphisms (SNPs) are the most common genomic variations where only a single nucleotide differs between individuals. Individual SNPs and SNP profiles associated with diseases can be utilized as biological markers. But there is a need to determine the SNP subsets and patients' clinical data which is informative for the diagnosis. Data mining approaches have the highest potential for extracting the knowledge from genomic datasets and selecting the representative SNPs as well as most effective and informative clinical features for the clinical diagnosis of the diseases. In this study, we have applied one of the widely used data mining classification methodology: "decision tree" for associating the SNP biomarkers and significant clinical data with the Alzheimer's disease (AD), which is the most common form of "dementia". Different tree construction parameters have been compared for the optimization, and the most accurate tree for predicting the AD is presented.

  15. Evaluation of forest cover estimates for Haiti using supervised classification of Landsat data

    NASA Astrophysics Data System (ADS)

    Churches, Christopher E.; Wampler, Peter J.; Sun, Wanxiao; Smith, Andrew J.

    2014-08-01

    This study uses 2010-2011 Landsat Thematic Mapper (TM) imagery to estimate total forested area in Haiti. The thematic map was generated using radiometric normalization of digital numbers by a modified normalization method utilizing pseudo-invariant polygons (PIPs), followed by supervised classification of the mosaicked image using the Food and Agriculture Organization (FAO) of the United Nations Land Cover Classification System. Classification results were compared to other sources of land-cover data produced for similar years, with an emphasis on the statistics presented by the FAO. Three global land cover datasets (GLC2000, Globcover, 2009, and MODIS MCD12Q1), and a national-scale dataset (a land cover analysis by Haitian National Centre for Geospatial Information (CNIGS)) were reclassified and compared. According to our classification, approximately 32.3% of Haiti's total land area was tree covered in 2010-2011. This result was confirmed using an error-adjusted area estimator, which predicted a tree covered area of 32.4%. Standardization to the FAO's forest cover class definition reduces the amount of tree cover of our supervised classification to 29.4%. This result was greater than the reported FAO value of 4% and the value for the recoded GLC2000 dataset of 7.0%, but is comparable to values for three other recoded datasets: MCD12Q1 (21.1%), Globcover (2009) (26.9%), and CNIGS (19.5%). We propose that at coarse resolutions, the segmented and patchy nature of Haiti's forests resulted in a systematic underestimation of the extent of forest cover. It appears the best explanation for the significant difference between our results, FAO statistics, and compared datasets is the accuracy of the data sources and the resolution of the imagery used for land cover analyses. Analysis of recoded global datasets and results from this study suggest a strong linear relationship (R2 = 0.996 for tree cover) between spatial resolution and land cover estimates.

  16. A tree classification for the selection forests of the Sierra Nevada

    Treesearch

    Duncan Dunning

    1928-01-01

    Individuality in man is accepted without question. In domestic animals, also, good and bad individuals are generally recognized. Even in some cultivated plants —orange trees and rubber trees— the poor producers are searched out and eliminated. Indeed, individual variability is a normal condition in all groups of organisms. Yet forest trees are...

  17. The CERAD Neuropsychological Assessment Battery Is Sensitive to Alcohol-Related Cognitive Deficiencies in Elderly Patients: A Retrospective Matched Case-Control Study.

    PubMed

    Kaufmann, Liane; Huber, Stefan; Mayer, Daniel; Moeller, Korbinian; Marksteiner, Josef

    2018-04-01

    Adverse effects of heavy drinking on cognition have frequently been reported. In the present study, we systematically examined for the first time whether clinical neuropsychological assessments may be sensitive to alcohol abuse in elderly patients with suspected minor neurocognitive disorder. A total of 144 elderly with and without alcohol abuse (each group n=72; mean age 66.7 years) were selected from a patient pool of n=738 by applying propensity score matching (a statistical method allowing to match participants in experimental and control group by balancing various covariates to reduce selection bias). Accordingly, study groups were almost perfectly matched regarding age, education, gender, and Mini Mental State Examination score. Neuropsychological performance was measured using the CERAD (Consortium to Establish a Registry for Alzheimer's Disease). Classification analyses (i.e., decision tree and boosted trees models) were conducted to examine whether CERAD variables or total score contributed to group classification. Decision tree models disclosed that groups could be reliably classified based on the CERAD variables "Word List Discriminability" (tapping verbal recognition memory, 64% classification accuracy) and "Trail Making Test A" (measuring visuo-motor speed, 59% classification accuracy). Boosted tree analyses further indicated the sensitivity of "Word List Recall" (measuring free verbal recall) for discriminating elderly with versus without a history of alcohol abuse. This indicates that specific CERAD variables seem to be sensitive to alcohol-related cognitive dysfunctions in elderly patients with suspected minor neurocognitive disorder. (JINS, 2018, 24, 360-371).

  18. An ecological classification system for the central hardwoods region: The Hoosier National Forest

    Treesearch

    James E. Van Kley; George R. Parker

    1993-01-01

    This study, a multifactor ecological classification system, using vegetation, soil characteristics, and physiography, was developed for the landscape of the Hoosier National Forest in Southern Indiana. Measurements of ground flora, saplings, and canopy trees from selected stands older than 80 years were subjected to TWINSPAN classification and DECORANA ordination....

  19. A comprehensive simulation study on classification of RNA-Seq data.

    PubMed

    Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet

    2017-01-01

    RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.

  20. Dendroagricultural Signal in Algeria

    NASA Astrophysics Data System (ADS)

    Touchan, R.; Kherchouche, D.; Anchukaitis, K. J.; Oudjehih, B.; Touchane, H.; Slimani, S.; Meko, D. M.

    2015-12-01

    Dalila Kherchouche2, Kevin J. Anchukaitis3, Bachir Oudjehih2, Hayat Touchan4, Said Slimani5, and David M. Meko1Drought is one of the main natural factors in declining tree-ring growth and the production of agricultural crops in Algeria. Here we will address the variability of growing conditions for wheat in Algeria with climatic data and a tree-ring reconstruction of January-June precipitation from ten Pinus halepensis tree-ring chronologies. A regression-based reconstruction equation explains up to 74% of the variance of precipitation in the 1970-2011 calibration period and cross validates well. Classification of dry years by the 30% percentile of observed precipitation (131 mm) yields a maximum length of drought of five years (1877-1881) and increasing frequency of dry years in the late 20th and early 21stcenturies. A correlation-based sensitivity analysis shows a similar pattern of dependence of tree-growth and wheat production on monthly and seasonal precipitation, but contrasting patterns of dependence on temperature. The patterns are interpreted by reference to phenology, growth phases, and - for wheat agricultural practices. We apply these interpretations to understand possible impacts of climate variability on the agricultural productivity of past civilizations in the Mediterranean. 2Institute of Veterinary and Agronomy Sciences, The University Hadj-Lakhdar, Batna 05000, Algeria, d.kherchouche@yahoo.fr and oudjehihbachir@yahoo.fr3University of Arizona, ENR2 Building, 1064 E Lowell Street, PO Box 210137, Tucson, AZ 85721-0137, kanchukaitis@email.arizona.edu4Faculty of Agriculture, University of Aleppo, Aleppo-Syria, dr.htouchan@gmail.com5Faculty of Biological Sciences and Agronomy, The University Mouloud Mammeri, Tizi Ouzou 15000, Algeria, slimanisaid@yahoo.fr1Laboratory of Tree Ring Research, The University of Arizona, 1215 E. Lowell St. Bldg. 45B, Tucson, AZ 85721, USA, dmeko@ltrr.arizona.edu

  1. Land Cover Mapping using GEOBIA to Estimate Loss of Salacca zalacca Trees in Landslide Area of Clapar, Madukara District of Banjarnegara

    NASA Astrophysics Data System (ADS)

    Permata, Anggi; Juniansah, Anwar; Nurcahyati, Eka; Dimas Afrizal, Mousafi; Adnan Shafry Untoro, Muhammad; Arifatha, Na'ima; Ramadhani Yudha Adiwijaya, Raden; Farda, Nur Mohammad

    2016-11-01

    Landslide is an unpredictable natural disaster which commonly happens in highslope area. Aerial photography in small format is one of acquisition method that can reach and obtain high resolution spatial data faster than other methods, and provide data such as orthomosaic and Digital Surface Model (DSM). The study area contained landslide area in Clapar, Madukara District of Banjarnegara. Aerial photographs of landslide area provided advantage in objects visibility. Object's characters such as shape, size, and texture were clearly seen, therefore GEOBIA (Geography Object Based Image Analysis) was compatible as method for classifying land cover in study area. Dissimilar with PPA (PerPixel Analyst) method that used spectral information as base object detection, GEOBIA could use spatial elements as classification basis to establish a land cover map with better accuracy. GEOBIA method used classification hierarchy to divide post disaster land cover into three main objects: vegetation, landslide/soil, and building. Those three were required to obtain more detailed information that can be used in estimating loss caused by landslide and establishing land cover map in landslide area. Estimating loss in landslide area related to damage in Salak (Salacca zalacca) plantations. This estimation towards quantity of Salak tree that were drifted away by landslide was calculated in assumption that every tree damaged by landslide had same age and production class with other tree that weren't damaged. Loss calculation was done by approximating quantity of damaged trees in landslide area with data of trees around area that were acquired from GEOBIA classification method.

  2. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition

    EPA Science Inventory

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...

  3. Phylogenetic classification of bony fishes.

    PubMed

    Betancur-R, Ricardo; Wiley, Edward O; Arratia, Gloria; Acero, Arturo; Bailly, Nicolas; Miya, Masaki; Lecointre, Guillaume; Ortí, Guillermo

    2017-07-06

    Fish classifications, as those of most other taxonomic groups, are being transformed drastically as new molecular phylogenies provide support for natural groups that were unanticipated by previous studies. A brief review of the main criteria used by ichthyologists to define their classifications during the last 50 years, however, reveals slow progress towards using an explicit phylogenetic framework. Instead, the trend has been to rely, in varying degrees, on deep-rooted anatomical concepts and authority, often mixing taxa with explicit phylogenetic support with arbitrary groupings. Two leading sources in ichthyology frequently used for fish classifications (JS Nelson's volumes of Fishes of the World and W. Eschmeyer's Catalog of Fishes) fail to adopt a global phylogenetic framework despite much recent progress made towards the resolution of the fish Tree of Life. The first explicit phylogenetic classification of bony fishes was published in 2013, based on a comprehensive molecular phylogeny ( www.deepfin.org ). We here update the first version of that classification by incorporating the most recent phylogenetic results. The updated classification presented here is based on phylogenies inferred using molecular and genomic data for nearly 2000 fishes. A total of 72 orders (and 79 suborders) are recognized in this version, compared with 66 orders in version 1. The phylogeny resolves placement of 410 families, or ~80% of the total of 514 families of bony fishes currently recognized. The ordinal status of 30 percomorph families included in this study, however, remains uncertain (incertae sedis in the series Carangaria, Ovalentaria, or Eupercaria). Comments to support taxonomic decisions and comparisons with conflicting taxonomic groups proposed by others are presented. We also highlight cases were morphological support exist for the groups being classified. This version of the phylogenetic classification of bony fishes is substantially improved, providing resolution for more taxa than previous versions, based on more densely sampled phylogenetic trees. The classification presented in this study represents, unlike any other, the most up-to-date hypothesis of the Tree of Life of fishes.

  4. A multitemporal (1979-2009) land-use/land-cover dataset of the binational Santa Cruz Watershed

    USGS Publications Warehouse

    2011-01-01

    Trends derived from multitemporal land-cover data can be used to make informed land management decisions and to help managers model future change scenarios. We developed a multitemporal land-use/land-cover dataset for the binational Santa Cruz watershed of southern Arizona, United States, and northern Sonora, Mexico by creating a series of land-cover maps at decadal intervals (1979, 1989, 1999, and 2009) using Landsat Multispectral Scanner and Thematic Mapper data and a classification and regression tree classifier. The classification model exploited phenological changes of different land-cover spectral signatures through the use of biseasonal imagery collected during the (dry) early summer and (wet) late summer following rains from the North American monsoon. Landsat images were corrected to remove atmospheric influences, and the data were converted from raw digital numbers to surface reflectance values. The 14-class land-cover classification scheme is based on the 2001 National Land Cover Database with a focus on "Developed" land-use classes and riverine "Forest" and "Wetlands" cover classes required for specific watershed models. The classification procedure included the creation of several image-derived and topographic variables, including digital elevation model derivatives, image variance, and multitemporal Kauth-Thomas transformations. The accuracy of the land-cover maps was assessed using a random-stratified sampling design, reference aerial photography, and digital imagery. This showed high accuracy results, with kappa values (the statistical measure of agreement between map and reference data) ranging from 0.80 to 0.85.

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Chun-Chieh; Department of Medical Imaging and Radiological Science, Chang Gung University School of Medicine, Taoyuan, Taiwan; Lai, Chyong-Huey

    Purpose: To study the prognostic value of the human papillomavirus (HPV) genotypes in cervical cancer patients undergoing radiotherapy. Patients and Methods: A total of 1,010 patients with cervical cancer after radiotherapy between 1993 and 2000 were eligible for this study. The HPV genotypes were determined by a genechip, which detects 38 types of HPV. The patient characteristics and treatment outcomes were analyzed using the Cox regression hazard model and classification and regression tree decision tree method. Results: A total of 25 genotypes of HPV were detected in 992 specimens (98.2%). The leading 8 types were HPV16, 58, 18, 33, 52,more » 39, 31, and 45. These types belong to two high-risk HPV species: alpha-7 (HPV18, 39, 45) and alpha-9 (HPV16, 31, 33, 52, 58). Three HPV-based risk groups, which were independent of established prognostic factors, such as International Federation of Gynecology and Obstetrics stage, age, pathologic features, squamous cell carcinoma antigen, and lymph node metastasis, were associated with the survival outcomes. The high-risk group consisted of the patients without HPV infection or the ones infected with the alpha-7 species only. Patients co-infected with the alpha-7 and alpha-9 species belonged to the medium-risk group, and the others were included in the low-risk group. Conclusion: The results of the present study have confirmed the prognostic value of HPV genotypes in cervical cancer treated with radiotherapy. The different effect of the alpha-7 and alpha-9 species on the radiation response deserves additional exploration.« less

  6. Regression trees modeling and forecasting of PM10 air pollution in urban areas

    NASA Astrophysics Data System (ADS)

    Stoimenova, M.; Voynikova, D.; Ivanov, A.; Gocheva-Ilieva, S.; Iliev, I.

    2017-10-01

    Fine particulate matter (PM10) air pollution is a serious problem affecting the health of the population in many Bulgarian cities. As an example, the object of this study is the pollution with PM10 of the town of Pleven, Northern Bulgaria. The measured concentrations of this air pollutant for this city consistently exceeded the permissible limits set by European and national legislation. Based on data for the last 6 years (2011-2016), the analysis shows that this applies both to the daily limit of 50 micrograms per cubic meter and the allowable number of daily concentration exceedances to 35 per year. Also, the average annual concentration of PM10 exceeded the prescribed norm of no more than 40 micrograms per cubic meter. The aim of this work is to build high performance mathematical models for effective prediction and forecasting the level of PM10 pollution. The study was conducted with the powerful flexible data mining technique Classification and Regression Trees (CART). The values of PM10 were fitted with respect to meteorological data such as maximum and minimum air temperature, relative humidity, wind speed and direction and others, as well as with time and autoregressive variables. As a result the obtained CART models demonstrate high predictive ability and fit the actual data with up to 80%. The best models were applied for forecasting the level pollution for 3 to 7 days ahead. An interpretation of the modeling results is presented.

  7. Concurrent validation of a neurocognitive assessment protocol for clients with mental illness in job matching as shop sales in supported employment.

    PubMed

    Ng, S S W; Lak, D C C; Lee, S C K; Ng, P P K

    2015-03-01

    Occupational therapists play a major role in the assessment and referral of clients with severe mental illness for supported employment. Nonetheless, there is scarce literature about the content and predictive validity of the process. In addition, the criteria of successful job matching have not been analysed and job supervisors have relied on experience rather than objective standards in recruitment. This study aimed to explore the profile of successful clients working in 'shop sales' in a supportive environment using a neurocognitive assessment protocol, and to validate the protocol against 'internal standards' of the job supervisors. This was a concurrent validation study of criterion-related scales for a single job type. The subjective ratings from the supervisors were concurrently validated against the results of neurocognitive assessment of intellectual function and work-related cognitive behaviour. A regression model was established for clients who succeeded and failed in employment using supervisor's ratings and a cutoff value of 10.5 for the Performance Fitness Rating Scale (R(2) = 0.918, F[41] = 3.794, p = 0.003). Classification And Regression Tree was also plotted to identify the profile of cases, with an overall accuracy of 0.861 (relative error, 0.26). Use of both inference statistics and data mining techniques enables the decision tree of neurocognitive assessments to be more readily applied by therapists in vocational rehabilitation, and thus directly improve the efficiency and efficacy of the process.

  8. Decision Tree Repository and Rule Set Based Mingjiang River Estuarine Wetlands Classifaction

    NASA Astrophysics Data System (ADS)

    Zhang, W.; Li, X.; Xiao, W.

    2018-05-01

    The increasing urbanization and industrialization have led to wetland losses in estuarine area of Mingjiang River over past three decades. There has been increasing attention given to produce wetland inventories using remote sensing and GIS technology. Due to inconsistency training site and training sample, traditionally pixel-based image classification methods can't achieve a comparable result within different organizations. Meanwhile, object-oriented image classification technique shows grate potential to solve this problem and Landsat moderate resolution remote sensing images are widely used to fulfill this requirement. Firstly, the standardized atmospheric correct, spectrally high fidelity texture feature enhancement was conducted before implementing the object-oriented wetland classification method in eCognition. Secondly, we performed the multi-scale segmentation procedure, taking the scale, hue, shape, compactness and smoothness of the image into account to get the appropriate parameters, using the top and down region merge algorithm from single pixel level, the optimal texture segmentation scale for different types of features is confirmed. Then, the segmented object is used as the classification unit to calculate the spectral information such as Mean value, Maximum value, Minimum value, Brightness value and the Normalized value. The Area, length, Tightness and the Shape rule of the image object Spatial features and texture features such as Mean, Variance and Entropy of image objects are used as classification features of training samples. Based on the reference images and the sampling points of on-the-spot investigation, typical training samples are selected uniformly and randomly for each type of ground objects. The spectral, texture and spatial characteristics of each type of feature in each feature layer corresponding to the range of values are used to create the decision tree repository. Finally, with the help of high resolution reference images, the random sampling method is used to conduct the field investigation, achieve an overall accuracy of 90.31 %, and the Kappa coefficient is 0.88. The classification method based on decision tree threshold values and rule set developed by the repository, outperforms the results obtained from the traditional methodology. Our decision tree repository and rule set based object-oriented classification technique was an effective method for producing comparable and consistency wetlands data set.

  9. Using high-resolution topography and hyperspectral data to classify tree species at the San Joaquin Experimental Range

    NASA Astrophysics Data System (ADS)

    Dibb, S. D.; Ustin, S.; Grigsby, S.

    2015-12-01

    Air- and space-borne remote sensing instruments allow for rapid and precise study of the diversity of the Earth's ecosystems. After atmospheric correction and ground validation are performed, the gathered hyperspectral and topographic data can be assembled into a stack of layers for land cover classification. Data for this project were collected in multiple field campaigns, including the 2013 NSF NEON California campaign and 2015 NASA SARP campaign. Using hyperspectral and high resolution topography data, 25 discriminatory attributes were processed in Exelis' ENVI software and collected for use in a decision forest to classify the four major tree species (Blue Oak, Live Oak, California Buckeye, and Foothill Pine) at the San Joaquin Experimental Range near Fresno, CA. These attributes include 21 classic vegetation indices and a number of other spectral characteristics, such as color and albedo, and four topographic layers, including slope, aspect, elevation, and tree height. Additionally, a number of nearby terrain classes, including bare earth, asphalt, water, rock, shadow, structures, and grass were created. Fifty training pixels were used for each class. The training pixels for each tree species came from collected GPS points in the field. Ensemble bootstrap aggregation of decision trees was performed in MATLAB, and an arbitrary number of 500 trees were selected to be grown. The tree that produced the minimum out-of-bag classification error (4.65%) was selected to classify the entire scene. Classification results accurately distinguished between oak species, but was suboptimal in dense areas. The entire San Joaquin Experimental Range was mapped with an overall accuracy of 94.7% and a Kappa coefficient 0.94. Finally, the Commission and Omission percentage averages were 5.3% each. A highly accurate map of tree species at this scale supports studies on drought effects, disease, and species-specific growth traits.

  10. Relating FIA data to habitat classifications via tree-based models of canopy cover

    Treesearch

    Mark D. Nelson; Brian G. Tavernia; Chris Toney; Brian F. Walters

    2012-01-01

    Wildlife species-habitat matrices are used to relate lists of species with abundance of their habitats. The Forest Inventory and Analysis Program provides data on forest composition and structure, but these attributes may not correspond directly with definitions of wildlife habitats. We used FIA tree data and tree crown diameter models to estimate canopy cover, from...

  11. A Regional Simulation to Explore Impacts of Resource Use and Constraints

    DTIC Science & Technology

    2007-03-01

    mountaintops. (10) Deciduous Forest - This class is composed of forests, which contain at least 75% deciduous trees in the canopy, deciduous ... trees , pine plantations, and evergreen woodlands. (12) Mixed Forest - This class includes forests with mixed deciduous /coniferous canopies, natural...reflective surfaces. Classification of forested wetlands dominated by deciduous trees is probably more accurate than that in areas with 104

  12. Impact of an Onsite Clinic on Utilization of Preventive Services.

    PubMed

    Ostovari, Mina; Yu, Denny; Yih, Yuehwern; Steele-Morris, Charlotte Joy

    2017-07-01

    To assess impact of an onsite clinic on healthcare utilization of preventive services for employees of a public university and their dependents. Descriptive statistics, logistic regression and classification tree techniques were used to assess health claim data to identify changes in patterns of healthcare utilization and factors impacting usage of onsite clinic. Utilization of preventive services significantly increased for women and men employees by 9% and 14% one year after implementation of the onsite clinic. Hourly-paid employees, employees without diabetes, employees with spouse opt out or no coverage were more likely to go to the onsite clinic. Adapted framework for assessing performance of onsite clinics based on usage of health informatics would help to identify health utilization patterns and interaction between onsite clinic and offsite health providers.

  13. [Characterization of Mexican households with food insecurity].

    PubMed

    Mundo-Rosas, Verónica; Méndez-Gómez Humarán, Ignacio; Shamah-Levy, Teresa

    2014-01-01

    To describe the sociodemographic and health characteristics associated with food insecurity (FI) in Mexican households. The study included information about 40 809 households from the National Health and Nutrition Survey 2012. The Latin American and Caribbean Scale Food Safety (ELCSA) was used to categorize households in terms of food insecurity. Classification and regression trees were used to identify the most significant characteristics in households with high prevalence of FI. The characteristics associated with higher prevalence of FI in homes were: lowest quintiles of welfare status, lack of education or walking or moving disability of household head, and not receiving money from social programmes, pension or remittances. Monitoring of the factors that favor the presence of FI is required to detect social groups being excluded from the right to food.

  14. Discrimination of rectal cancer through human serum using surface-enhanced Raman spectroscopy

    NASA Astrophysics Data System (ADS)

    Li, Xiaozhou; Yang, Tianyue; Li, Siqi; Zhang, Su; Jin, Lili

    2015-05-01

    In this paper, surface-enhanced Raman spectroscopy (SERS) was used to detect the changes in blood serum components that accompany rectal cancer. The differences in serum SERS data between rectal cancer patients and healthy controls were examined. Postoperative rectal cancer patients also participated in the comparison to monitor the effects of cancer treatments. The results show that there are significant variations at certain wavenumbers which indicates alteration of corresponding biological substances. Principal component analysis (PCA) and parameters of intensity ratios were used on the original SERS spectra for the extraction of featured variables. These featured variables then underwent linear discriminant analysis (LDA) and classification and regression tree (CART) for the discrimination analysis. Accuracies of 93.5 and 92.4 % were obtained for PCA-LDA and parameter-CART, respectively.

  15. Electric Trees and Pond Creatures.

    ERIC Educational Resources Information Center

    Weaver, Helen; Hounshell, Paul B.

    1978-01-01

    Two learning activities are presented to develop observation and classification skills at the elementary level. The first is an electric box that associates tree names with leaf and bark specimens, and the second is a pond water observation and slide preparation activity. (BB)

  16. Impact of Resident Rotations on Critically Ill Patient Outcomes: Results of a French Multicenter Observational Study.

    PubMed

    Chousterman, Benjamin G; Pirracchio, Romain; Guidet, Bertrand; Aegerter, Philippe; Mentec, Hervé

    2016-01-01

    The impact of resident rotation on patient outcomes in the intensive care unit (ICU) has been poorly studied. The aim of this study was to address this question using a large ICU database. We retrospectively analyzed the French CUB-REA database. French residents rotate every six months. Two periods were compared: the first (POST) and fifth (PRE) months of the rotation. The primary endpoint was ICU mortality. The secondary endpoints were the length of ICU stay (LOS), the number of organ supports, and the duration of mechanical ventilation (DMV). The impact of resident rotation was explored using multivariate regression, classification tree and random forest models. 262,772 patients were included between 1996 and 2010 in the database. The patient characteristics were similar between the PRE (n = 44,431) and POST (n = 49,979) periods. Multivariate analysis did not reveal any impact of resident rotation on ICU mortality (OR = 1.01, 95% CI = 0.94; 1.07, p = 0.91). Based on the classification trees, the SAPS II and the number of organ failures were the strongest predictors of ICU mortality. In the less severe patients (SAPS II<24), the POST period was associated with increased mortality (OR = 1.65, 95%CI = 1.17-2.33, p = 0.004). After adjustment, no significant association was observed between the rotation period and the LOS, the number of organ supports, or the DMV. Resident rotation exerts no impact on overall ICU mortality at French teaching hospitals but might affect the prognosis of less severe ICU patients. Surveillance should be reinforced when treating those patients.

  17. A data mining approach to predict in situ chlorinated ethene detoxification potential

    NASA Astrophysics Data System (ADS)

    Lee, J.; Im, J.; Kim, U.; Loeffler, F. E.

    2015-12-01

    Despite major advances in physicochemical remediation technologies, in situ biostimulation and bioaugmentation treatment aimed at stimulating Dehalococcoides mccartyi (Dhc) reductive dechlorination activity remains a cornerstone approach to remedy sites impacted with chlorinated ethenes. In practice, selecting the best remedial strategy is challenging due to uncertainties associated with the microbiology (e.g., presence and activity of Dhc) and geochemical factors influencing Dhc activity. Extensive groundwater datasets collected over decades of monitoring exist, but have not been systematically analyzed. In the present study, geochemical and microbial data sets collected from 35 wells at 5 contaminated sites were used to develop a predictive empirical model using a machine learning algorithm (i) to rank the relative importance of parameters that affect in situ reductive dechlorination potential, and (ii) to provide recommendations for selecting the optimal remediation strategy at a specific site. Classification and regression tree (CART) analysis was applied, and a representative classification tree model was developed that allowed short-term prediction of dechlorination potential. Indirect indicators for low dissolved oxygen (e.g., low NO3-and NO2-, high Fe2+ and CH4) were the most influential factors for predicting dechlorination potential, followed by total organic carbon content (TOC) and Dhc cell abundance. These findings indicate that machine learning-based data mining techniques applied to groundwater monitoring data can lead to the development of predictive groundwater remediation models. A major need for improving the predictive capabilities of the data mining approach is a curated, up-to-date and comprehensive collection of groundwater monitoring data.

  18. Use of CHAID Decision Trees to Formulate Pathways for the Early Detection of Metabolic Syndrome in Young Adults

    PubMed Central

    Liu, Pei-Yang

    2014-01-01

    Metabolic syndrome (MetS) in young adults (age 20–39) is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID) decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES) 2009-2010 Cohort as a representative sample of the United States population (n = 745). Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP) classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS. PMID:24817904

  19. A Decision Tree to Identify Children Affected by Prenatal Alcohol Exposure

    PubMed Central

    Goh, Patrick K.; Doyle, Lauren R.; Glass, Leila; Jones, Kenneth L.; Riley, Edward P.; Coles, Claire D.; Hoyme, H. Eugene; Kable, Julie A.; May, Philip A.; Kalberg, Wendy O.; Elizabeth, R. Sowell; Wozniak, Jeffrey R.; Mattson, Sarah N.

    2017-01-01

    Objective To develop and validate a hierarchical decision tree model, combining neurobehavioral and physical measures, for identification of children affected by prenatal alcohol exposure even when facial dysmorphology is not present. Study design Data were collected as part of a multisite study across the United States. The model was developed after evaluating over 1000 neurobehavioral and dysmorphology variables collected from 434 children (8–16y) with prenatal alcohol exposure, with and without fetal alcohol syndrome (FAS), and non-exposed controls, with and without other clinically-relevant behavioral or cognitive concerns. The model was subsequently validated in an independent sample of 454 children in two age ranges (5–7y or 10–16y). In all analyses, the discriminatory ability of each model step was tested with logistic regression. Classification accuracies and positive and negative predictive values were calculated. Results The model consisted of variables from 4 measures (2 parent questionnaires, an IQ score, and a physical examination). Overall accuracy rates for both the development and validation samples met or exceeded our goal of 80% overall accuracy. Conclusions The decision tree model distinguished children affected by prenatal alcohol exposure from non-exposed controls, including those with other behavioral concerns or conditions. Improving identification of this population will streamline access to clinical services, including multidisciplinary evaluation and treatment. PMID:27476634

  20. Investigating the limitations of tree species classification using the Combined Cluster and Discriminant Analysis method for low density ALS data from a dense forest region in Aggtelek (Hungary)

    NASA Astrophysics Data System (ADS)

    Koma, Zsófia; Deák, Márton; Kovács, József; Székely, Balázs; Kelemen, Kristóf; Standovár, Tibor

    2016-04-01

    Airborne Laser Scanning (ALS) is a widely used technology for forestry classification applications. However, single tree detection and species classification from low density ALS point cloud is limited in a dense forest region. In this study we investigate the division of a forest into homogenous groups at stand level. The study area is located in the Aggtelek karst region (Northeast Hungary) with a complex relief topography. The ALS dataset contained only 4 discrete echoes (at 2-4 pt/m2 density) from the study area during leaf-on season. Ground-truth measurements about canopy closure and proportion of tree species cover are available for every 70 meter in 500 square meter circular plots. In the first step, ALS data were processed and geometrical and intensity based features were calculated into a 5×5 meter raster based grid. The derived features contained: basic statistics of relative height, canopy RMS, echo ratio, openness, pulse penetration ratio, basic statistics of radiometric feature. In the second step the data were investigated using Combined Cluster and Discriminant Analysis (CCDA, Kovács et al., 2014). The CCDA method first determines a basic grouping for the multiple circle shaped sampling locations using hierarchical clustering and then for the arising grouping possibilities a core cycle is executed comparing the goodness of the investigated groupings with random ones. Out of these comparisons difference values arise, yielding information about the optimal grouping out of the investigated ones. If sub-groups are then further investigated, one might even find homogeneous groups. We found that low density ALS data classification into homogeneous groups are highly dependent on canopy closure, and the proportion of the dominant tree species. The presented results show high potential using CCDA for determination of homogenous separable groups in LiDAR based tree species classification. Aggtelek Karst/Slovakian Karst Caves" (HUSK/1101/221/0180, Aggtelek NP), data evaluation: 'Multipurpose assessment serving forest biodiversity conservation in the Carpathian region of Hungary', Swiss-Hungarian Cooperation Programme (SH/4/13 Project). BS contributed as an Alexander von Humboldt Research Fellow. J. Kovács, S. Kovács, N. Magyar, P. Tanos, I. G. Hatvani, and A. Anda (2014), Classification into homogeneous groups using combined cluster and discriminant analysis, Environmental Modelling & Software, 57, 52-59.

  1. Terrain Classification and Identification of Tree Stems Using Ground-Based Lidar

    DTIC Science & Technology

    2012-12-01

    hailing from North America and Eastern Asia. Stands are mixed age and very diverse, making this an appealing test site in terms of tree variety...sparse scene in Fig. 3(b) contains several deciduous trees and shrubs, but is largely open. The moderate scene, shown in Fig. 3(c), is cluttered with...numerous deciduous trees and shrubs, and significant ground cover. The remaining two data sets, dense1 and dense2 were collected at Breakheart

  2. Scalable Regression Tree Learning on Hadoop using OpenPlanet

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yin, Wei; Simmhan, Yogesh; Prasanna, Viktor

    As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework usingmore » a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.« less

  3. Mining hidden data to predict patient prognosis: texture feature extraction and machine learning in mammography

    NASA Astrophysics Data System (ADS)

    Leighs, J. A.; Halling-Brown, M. D.; Patel, M. N.

    2018-03-01

    The UK currently has a national breast cancer-screening program and images are routinely collected from a number of screening sites, representing a wealth of invaluable data that is currently under-used. Radiologists evaluate screening images manually and recall suspicious cases for further analysis such as biopsy. Histological testing of biopsy samples confirms the malignancy of the tumour, along with other diagnostic and prognostic characteristics such as disease grade. Machine learning is becoming increasingly popular for clinical image classification problems, as it is capable of discovering patterns in data otherwise invisible. This is particularly true when applied to medical imaging features; however clinical datasets are often relatively small. A texture feature extraction toolkit has been developed to mine a wide range of features from medical images such as mammograms. This study analysed a dataset of 1,366 radiologist-marked, biopsy-proven malignant lesions obtained from the OPTIMAM Medical Image Database (OMI-DB). Exploratory data analysis methods were employed to better understand extracted features. Machine learning techniques including Classification and Regression Trees (CART), ensemble methods (e.g. random forests), and logistic regression were applied to the data to predict the disease grade of the analysed lesions. Prediction scores of up to 83% were achieved; sensitivity and specificity of the models trained have been discussed to put the results into a clinical context. The results show promise in the ability to predict prognostic indicators from the texture features extracted and thus enable prioritisation of care for patients at greatest risk.

  4. A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Soner Yorgun, M.; Rood, Richard B.

    An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less

  5. A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

    DOE PAGES

    Soner Yorgun, M.; Rood, Richard B.

    2016-11-11

    An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less

  6. Spatial modeling and classification of corneal shape.

    PubMed

    Marsolo, Keith; Twa, Michael; Bullimore, Mark A; Parthasarathy, Srinivasan

    2007-03-01

    One of the most promising applications of data mining is in biomedical data used in patient diagnosis. Any method of data analysis intended to support the clinical decision-making process should meet several criteria: it should capture clinically relevant features, be computationally feasible, and provide easily interpretable results. In an initial study, we examined the feasibility of using Zernike polynomials to represent biomedical instrument data in conjunction with a decision tree classifier to distinguish between the diseased and non-diseased eyes. Here, we provide a comprehensive follow-up to that work, examining a second representation, pseudo-Zernike polynomials, to determine whether they provide any increase in classification accuracy. We compare the fidelity of both methods using residual root-mean-square (rms) error and evaluate accuracy using several classifiers: neural networks, C4.5 decision trees, Voting Feature Intervals, and Naïve Bayes. We also examine the effect of several meta-learning strategies: boosting, bagging, and Random Forests (RFs). We present results comparing accuracy as it relates to dataset and transformation resolution over a larger, more challenging, multi-class dataset. They show that classification accuracy is similar for both data transformations, but differs by classifier. We find that the Zernike polynomials provide better feature representation than the pseudo-Zernikes and that the decision trees yield the best balance of classification accuracy and interpretability.

  7. Application of decision tree for prediction of cutaneous leishmaniasis incidence based on environmental and topographic factors in Isfahan Province, Iran.

    PubMed

    Ramezankhani, Roghieh; Sajjadi, Nooshin; Nezakati Esmaeilzadeh, Roya; Jozi, Seyed Ali; Shirzadi, Mohammad Reza

    2018-05-08

    Cutaneous Leishmaniasis (CL) is a neglected tropical disease that continues to be a health problem in Iran. Nearly 350 million people are thought to be at risk. We investigated the impact of the environmental factors on CL incidence during the period 2007- 2015 in a known endemic area for this disease in Isfahan Province, Iran. After collecting data with regard to the climatic, topographic, vegetation coverage and CL cases in the study area, a decision tree model was built using the classification and regression tree algorithm. CL data for the years 2007 until 2012 were used for model construction and the data for the years 2013 until 2015 were used for testing the model. The Root Mean Square error and the correlation factor were used to evaluate the predictive performance of the decision tree model. We found that wind speeds less than 14 m/s, altitudes between 1234 and 1810 m above the mean sea level, vegetation coverage according to the normalized difference vegetation index (NDVI) less than 0.12, rainfall less than 1.6 mm and air temperatures higher than 30°C would correspond to a seasonal incidence of 163.28 per 100,000 persons, while if wind speed is less than 14 m/s, altitude less than 1,810 m and NDVI higher than 0.12, then the mean seasonal incidence of the disease would be 2.27 per 100,000 persons. Environmental factors were found to be important predictive variables for CL incidence and should be considered in surveillance and prevention programmes for CL control.

  8. Microhabitat selection by three common bird species of montane farmlands in Northern Greece.

    PubMed

    Tsiakiris, Rigas; Stara, Kalliopi; Pantis, John; Sgardelis, Stefanos

    2009-11-01

    Common farmland birds are declining throughout Europe; however, marginal farmlands that escaped intensification or land abandonment remain a haven for farmland species in some Mediterranean mountains. The purpose of this study is to identify the most important anthropogenic microhabitat characteristics for Red-Backed Shrike (Lanius collurio), Corn Bunting (Miliaria calandra) and Common Whitethroat (Sylvia communis) in three such areas within the newly established Northern Pindos National Park. We compare land use structural and physiognomic characteristics of the habitat within 133 plots containing birds paired with randomly selected "non-bird" plots. Using logistic regression and classification-tree models we identify the specific habitat requirements for each of the three birds. The three species show a preference for agricultural mosaics dominated by rangelands with scattered shrub or short trees mixed with arable land. Areas with dikes and dirt roads are preferred by all three species, while the presence of fences and periodically burned bushes and hedges are of particular importance for Red-Backed Shrike. Across the gradient of vegetation density and height, M. calandra is mostly found in grasslands with few dwarf shrubs and short trees, S. communis in places with more dense and tall vegetation of shrub, trees and hedges, and L. collurio, being a typical bird of ecotones, occurs in both habitats and in intermediate situations. In all cases those requirements are associated with habitat features maintained either directly or indirectly by the traditional agricultural activities in the area and particularly by the long established extensive controlled grazing that prevent shrub expansion.

  9. Microhabitat Selection by Three Common Bird Species of Montane Farmlands in Northern Greece

    NASA Astrophysics Data System (ADS)

    Tsiakiris, Rigas; Stara, Kalliopi; Pantis, John; Sgardelis, Stefanos

    2009-11-01

    Common farmland birds are declining throughout Europe; however, marginal farmlands that escaped intensification or land abandonment remain a haven for farmland species in some Mediterranean mountains. The purpose of this study is to identify the most important anthropogenic microhabitat characteristics for Red-Backed Shrike ( Lanius collurio), Corn Bunting ( Miliaria calandra) and Common Whitethroat ( Sylvia communis) in three such areas within the newly established Northern Pindos National Park. We compare land use structural and physiognomic characteristics of the habitat within 133 plots containing birds paired with randomly selected “non-bird” plots. Using logistic regression and classification-tree models we identify the specific habitat requirements for each of the three birds. The three species show a preference for agricultural mosaics dominated by rangelands with scattered shrub or short trees mixed with arable land. Areas with dikes and dirt roads are preferred by all three species, while the presence of fences and periodically burned bushes and hedges are of particular importance for Red-Backed Shrike. Across the gradient of vegetation density and height, M. calandra is mostly found in grasslands with few dwarf shrubs and short trees, S. communis in places with more dense and tall vegetation of shrub, trees and hedges, and L. collurio, being a typical bird of ecotones, occurs in both habitats and in intermediate situations. In all cases those requirements are associated with habitat features maintained either directly or indirectly by the traditional agricultural activities in the area and particularly by the long established extensive controlled grazing that prevent shrub expansion.

  10. Tree Colors: Color Schemes for Tree-Structured Data.

    PubMed

    Tennekes, Martijn; de Jonge, Edwin

    2014-12-01

    We present a method to map tree structures to colors from the Hue-Chroma-Luminance color model, which is known for its well balanced perceptual properties. The Tree Colors method can be tuned with several parameters, whose effect on the resulting color schemes is discussed in detail. We provide a free and open source implementation with sensible parameter defaults. Categorical data are very common in statistical graphics, and often these categories form a classification tree. We evaluate applying Tree Colors to tree structured data with a survey on a large group of users from a national statistical institute. Our user study suggests that Tree Colors are useful, not only for improving node-link diagrams, but also for unveiling tree structure in non-hierarchical visualizations.

  11. Method for estimating potential tree-grade distributions for northeastern forest species

    Treesearch

    Daniel A. Yaussy; Daniel A. Yaussy

    1993-01-01

    Generalized logistic regression was used to distribute trees into four potential tree grades for 20 northeastern species groups. The potential tree grade is defined as the tree grade based on the length and amount of clear cuttings and defects only, disregarding minimum grading diameter. The algorithms described use site index and tree diameter as the predictive...

  12. Additivity of nonlinear biomass equations

    Treesearch

    Bernard R. Parresol

    2001-01-01

    Two procedures that guarantee the property of additivity among the components of tree biomass and total tree biomass utilizing nonlinear functions are developed. Procedure 1 is a simple combination approach, and procedure 2 is based on nonlinear joint-generalized regression (nonlinear seemingly unrelated regressions) with parameter restrictions. Statistical theory is...

  13. Identification of phreatophytic groundwater dependent ecosystems using geospatial technologies

    NASA Astrophysics Data System (ADS)

    Perez Hoyos, Isabel Cristina

    The protection of groundwater dependent ecosystems (GDEs) is increasingly being recognized as an essential aspect for the sustainable management and allocation of water resources. Ecosystem services are crucial for human well-being and for a variety of flora and fauna. However, the conservation of GDEs is only possible if knowledge about their location and extent is available. Several studies have focused on the identification of GDEs at specific locations using ground-based measurements. However, recent progress in technologies such as remote sensing and their integration with geographic information systems (GIS) has provided alternative ways to map GDEs at much larger spatial extents. This study is concerned with the discovery of patterns in geospatial data sets using data mining techniques for mapping phreatophytic GDEs in the United States at 1 km spatial resolution. A methodology to identify the probability of an ecosystem to be groundwater dependent is developed. Probabilities are obtained by modeling the relationship between the known locations of GDEs and main factors influencing groundwater dependency, namely water table depth (WTD) and aridity index (AI). A methodology is proposed to predict WTD at 1 km spatial resolution using relevant geospatial data sets calibrated with WTD observations. An ensemble learning algorithm called random forest (RF) is used in order to model the distribution of groundwater in three study areas: Nevada, California, and Washington, as well as in the entire United States. RF regression performance is compared with a single regression tree (RT). The comparison is based on contrasting training error, true prediction error, and variable importance estimates of both methods. Additionally, remote sensing variables are omitted from the process of fitting the RF model to the data to evaluate the deterioration in the model performance when these variables are not used as an input. Research results suggest that although the prediction accuracy of a single RT is reduced in comparison with RFs, single trees can still be used to understand the interactions that might be taking place between predictor variables and the response variable. Regarding RF, there is a great potential in using the power of an ensemble of trees for prediction of WTD. The superior capability of RF to accurately map water table position in Nevada, California, and Washington demonstrate that this technique can be applied at scales larger than regional levels. It is also shown that the removal of remote sensing variables from the RF training process degrades the performance of the model. Using the predicted WTD, the probability of an ecosystem to be groundwater dependent (GDE probability) is estimated at 1 km spatial resolution. The modeling technique is evaluated in the state of Nevada, USA to develop a systematic approach for the identification of GDEs and it is then applied in the United States. The modeling approach selected for the development of the GDE probability map results from a comparison of the performance of classification trees (CT) and classification forests (CF). Predictive performance evaluation for the selection of the most accurate model is achieved using a threshold independent technique, and the prediction accuracy of both models is assessed in greater detail using threshold-dependent measures. The resulting GDE probability map can potentially be used for the definition of conservation areas since it can be translated into a binary classification map with two classes: GDE and NON-GDE. These maps are created by selecting a probability threshold. It is demonstrated that the choice of this threshold has dramatic effects on deterministic model performance measures.

  14. An optimal sample data usage strategy to minimize overfitting and underfitting effects in regression tree models based on remotely-sensed data

    USGS Publications Warehouse

    Gu, Yingxin; Wylie, Bruce K.; Boyte, Stephen; Picotte, Joshua J.; Howard, Danny; Smith, Kelcy; Nelson, Kurtis

    2016-01-01

    Regression tree models have been widely used for remote sensing-based ecosystem mapping. Improper use of the sample data (model training and testing data) may cause overfitting and underfitting effects in the model. The goal of this study is to develop an optimal sampling data usage strategy for any dataset and identify an appropriate number of rules in the regression tree model that will improve its accuracy and robustness. Landsat 8 data and Moderate-Resolution Imaging Spectroradiometer-scaled Normalized Difference Vegetation Index (NDVI) were used to develop regression tree models. A Python procedure was designed to generate random replications of model parameter options across a range of model development data sizes and rule number constraints. The mean absolute difference (MAD) between the predicted and actual NDVI (scaled NDVI, value from 0–200) and its variability across the different randomized replications were calculated to assess the accuracy and stability of the models. In our case study, a six-rule regression tree model developed from 80% of the sample data had the lowest MAD (MADtraining = 2.5 and MADtesting = 2.4), which was suggested as the optimal model. This study demonstrates how the training data and rule number selections impact model accuracy and provides important guidance for future remote-sensing-based ecosystem modeling.

  15. An Australian casemix classification for palliative care: technical development and results.

    PubMed

    Eagar, Kathy; Green, Janette; Gordon, Robert

    2004-04-01

    To develop a palliative care casemix classification for use in all settings including hospital, hospice and home-based care. 3866 palliative care patients who, in a three-month period, had 4596 episodes of care provided by 58 palliative care services in Australia and New Zealand. A detailed clinical and service utilization profile was collected on each patient with staff time and other resources measured on a daily basis. Each day of care was costed using actual cost data from each study site. Regression tree analysis was used to group episodes of care with similar costs and clinical characteristics. In the resulting classification, the Australian National Sub-acute and Non-acute Patient (AN-SNAP) Classification Version 1, the branch for classifying inpatient palliative care episodes (including hospice care) has 11 classes and explains 20.98% of the variance in inpatient palliative care phase costs using trimmed data. There are 22 classes in the ambulatory palliative care branch that explains 17.14% variation in ambulatory phase cost using trimmed data. The term 'subacute' is used in Australia to describe health care in which the goal--a change in functional status or improvement in quality of life--is a better predictor of the need for, and the cost of, care than the patient's underlying diagnosis. The results suggest that phase of care (stage of illness) is the best predictor of the cost of Australian palliative care. Other predictors of cost are functional status and age. In the ambulatory setting, symptom severity and the model of palliative care are also predictive of cost. These variables are used in the AN-SNAP Version 1 classification to create 33 palliative care classes. The classification has clinical meaning but the overall statistical performance is only moderate. The structure of the classification allows for it to be improved over time as models of palliative care service delivery develop.

  16. A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery Using a Probabilistic Learning Framework

    NASA Technical Reports Server (NTRS)

    Basu, Saikat; Ganguly, Sangram; Michaelis, Andrew; Votava, Petr; Roy, Anshuman; Mukhopadhyay, Supratik; Nemani, Ramakrishna

    2015-01-01

    Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets, which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.

  17. A New Morphological Phylogeny of the Ophiuroidea (Echinodermata) Accords with Molecular Evidence and Renders Microfossils Accessible for Cladistics

    PubMed Central

    Thuy, Ben; Stöhr, Sabine

    2016-01-01

    Ophiuroid systematics is currently in a state of upheaval, with recent molecular estimates fundamentally clashing with traditional, morphology-based classifications. Here, we attempt a long overdue recast of a morphological phylogeny estimate of the Ophiuroidea taking into account latest insights on microstructural features of the arm skeleton. Our final estimate is based on a total of 45 ingroup taxa, including 41 recent species covering the full range of extant ophiuroid higher taxon diversity and 4 fossil species known from exceptionally preserved material, and the Lower Carboniferous Aganaster gregarius as the outgroup. A total of 130 characters were scored directly on specimens. The tree resulting from the Bayesian inference analysis of the full data matrix is reasonably well resolved and well supported, and refutes all previous classifications, with most traditional families discredited as poly- or paraphyletic. In contrast, our tree agrees remarkably well with the latest molecular estimate, thus paving the way towards an integrated new classification of the Ophiuroidea. Among the characters which were qualitatively found to accord best with our tree topology, we selected a list of potential synapomorphies for future formal clade definitions. Furthermore, an analysis with 13 of the ingroup taxa reduced to the lateral arm plate characters produced a tree which was essentially similar to the full dataset tree. This suggests that dissociated lateral arm plates can be analysed in combination with fully known taxa and thus effectively unlocks the extensive record of fossil lateral arm plates for phylogenetic estimates. Finally, the age and position within our tree implies that the ophiuroid crown-group had started to diversify by the Early Triassic. PMID:27227685

  18. A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery using a Probabilistic Learning Framework

    NASA Astrophysics Data System (ADS)

    Basu, S.; Ganguly, S.; Michaelis, A.; Votava, P.; Roy, A.; Mukhopadhyay, S.; Nemani, R. R.

    2015-12-01

    Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.

  19. Components of Antagonism and Mutualism in Ips pini–Fungal Interactions: Relationship to a Life History of Colonizing Highly Stressed and Dead Trees

    Treesearch

    Brian J. Kopper; Kier D. Klepzig; Kenneth F. Raffa

    2004-01-01

    Efforts to describe the complex relationships between bark beetles and the ophiostomatoid (stain) fungi they transport have largely resulted in a dichotomous classification. These symbioses have been viewed as either mutualistic (i.e., fungi help bark beetles colonize living trees by overcoming tree defenses or by providing nutrients after colonization in return for...

  20. Tree species classification using within crown localization of waveform LiDAR attributes

    NASA Astrophysics Data System (ADS)

    Blomley, Rosmarie; Hovi, Aarne; Weinmann, Martin; Hinz, Stefan; Korpela, Ilkka; Jutzi, Boris

    2017-11-01

    Since forest planning is increasingly taking an ecological, diversity-oriented perspective into account, remote sensing technologies are becoming ever more important in assessing existing resources with reduced manual effort. While the light detection and ranging (LiDAR) technology provides a good basis for predictions of tree height and biomass, tree species identification based on this type of data is particularly challenging in structurally heterogeneous forests. In this paper, we analyse existing approaches with respect to the geometrical scale of feature extraction (whole tree, within crown partitions or within laser footprint) and conclude that currently features are always extracted separately from the different scales. Since multi-scale approaches however have proven successful in other applications, we aim to utilize the within-tree-crown distribution of within-footprint signal characteristics as additional features. To do so, a spin image algorithm, originally devised for the extraction of 3D surface features in object recognition, is adapted. This algorithm relies on spinning an image plane around a defined axis, e.g. the tree stem, collecting the number of LiDAR returns or mean values of returns attributes per pixel as respective values. Based on this representation, spin image features are extracted that comprise only those components of highest variability among a given set of library trees. The relative performance and the combined improvement of these spin image features with respect to non-spatial statistical metrics of the waveform (WF) attributes are evaluated for the tree species classification of Scots pine (Pinus sylvestris L.), Norway spruce (Picea abies (L.) Karst.) and Silver/Downy birch (Betula pendula Roth/Betula pubescens Ehrh.) in a boreal forest environment. This evaluation is performed for two WF LiDAR datasets that differ in footprint size, pulse density at ground, laser wavelength and pulse width. Furthermore, we evaluate the robustness of the proposed method with respect to internal parameters and tree size. The results reveal, that the consideration of the crown-internal distribution of within-footprint signal characteristics captured in spin image features improves the classification results in nearly all test cases.

Top