Sample records for classification-tree based models

  1. Predictive models for subtypes of autism spectrum disorder based on single-nucleotide polymorphisms and magnetic resonance imaging.

    PubMed

    Jiao, Y; Chen, R; Ke, X; Cheng, L; Chu, K; Lu, Z; Herskovits, E H

    2011-01-01

    Autism spectrum disorder (ASD) is a neurodevelopmental disorder, of which Asperger syndrome and high-functioning autism are subtypes. Our goal is: 1) to determine whether a diagnostic model based on single-nucleotide polymorphisms (SNPs), brain regional thickness measurements, or brain regional volume measurements can distinguish Asperger syndrome from high-functioning autism; and 2) to compare the SNP, thickness, and volume-based diagnostic models. Our study included 18 children with ASD: 13 subjects with high-functioning autism and 5 subjects with Asperger syndrome. For each child, we obtained 25 SNPs for 8 ASD-related genes; we also computed regional cortical thicknesses and volumes for 66 brain structures, based on structural magnetic resonance (MR) examination. To generate diagnostic models, we employed five machine-learning techniques: decision stump, alternating decision trees, multi-class alternating decision trees, logistic model trees, and support vector machines. For SNP-based classification, three decision-tree-based models performed better than the other two machine-learning models. The performance metrics for three decision-tree-based models were similar: decision stump was modestly better than the other two methods, with accuracy = 90%, sensitivity = 0.95 and specificity = 0.75. All thickness and volume-based diagnostic models performed poorly. The SNP-based diagnostic models were superior to those based on thickness and volume. For SNP-based classification, rs878960 in GABRB3 (gamma-aminobutyric acid A receptor, beta 3) was selected by all tree-based models. Our analysis demonstrated that SNP-based classification was more accurate than morphometry-based classification in ASD subtype classification. Also, we found that one SNP--rs878960 in GABRB3--distinguishes Asperger syndrome from high-functioning autism.

  2. Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems

    NASA Astrophysics Data System (ADS)

    Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen

    Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.

  3. Comparative analysis of tree classification models for detecting fusarium oxysporum f. sp cubense (TR4) based on multi soil sensor parameters

    NASA Astrophysics Data System (ADS)

    Estuar, Maria Regina Justina; Victorino, John Noel; Coronel, Andrei; Co, Jerelyn; Tiausas, Francis; Señires, Chiara Veronica

    2017-09-01

    Use of wireless sensor networks and smartphone integration design to monitor environmental parameters surrounding plantations is made possible because of readily available and affordable sensors. Providing low cost monitoring devices would be beneficial, especially to small farm owners, in a developing country like the Philippines, where agriculture covers a significant amount of the labor market. This study discusses the integration of wireless soil sensor devices and smartphones to create an application that will use multidimensional analysis to detect the presence or absence of plant disease. Specifically, soil sensors are designed to collect soil quality parameters in a sink node from which the smartphone collects data from via Bluetooth. Given these, there is a need to develop a classification model on the mobile phone that will report infection status of a soil. Though tree classification is the most appropriate approach for continuous parameter-based datasets, there is a need to determine whether tree models will result to coherent results or not. Soil sensor data that resides on the phone is modeled using several variations of decision tree, namely: decision tree (DT), best-fit (BF) decision tree, functional tree (FT), Naive Bayes (NB) decision tree, J48, J48graft and LAD tree, where decision tree approaches the problem by considering all sensor nodes as one. Results show that there are significant differences among soil sensor parameters indicating that there are variances in scores between the infected and uninfected sites. Furthermore, analysis of variance in accuracy, recall, precision and F1 measure scores from tree classification models homogeneity among NBTree, J48graft and J48 tree classification models.

  4. From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

    PubMed

    Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

    2010-01-30

    Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.

  5. From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

    PubMed Central

    2010-01-01

    Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. PMID:20113515

  6. Support-vector-machine tree-based domain knowledge learning toward automated sports video classification

    NASA Astrophysics Data System (ADS)

    Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin

    2010-12-01

    We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.

  7. Effects of sample survey design on the accuracy of classification tree models in species distribution models

    USGS Publications Warehouse

    Edwards, T.C.; Cutler, D.R.; Zimmermann, N.E.; Geiser, L.; Moisen, Gretchen G.

    2006-01-01

    We evaluated the effects of probabilistic (hereafter DESIGN) and non-probabilistic (PURPOSIVE) sample surveys on resultant classification tree models for predicting the presence of four lichen species in the Pacific Northwest, USA. Models derived from both survey forms were assessed using an independent data set (EVALUATION). Measures of accuracy as gauged by resubstitution rates were similar for each lichen species irrespective of the underlying sample survey form. Cross-validation estimates of prediction accuracies were lower than resubstitution accuracies for all species and both design types, and in all cases were closer to the true prediction accuracies based on the EVALUATION data set. We argue that greater emphasis should be placed on calculating and reporting cross-validation accuracy rates rather than simple resubstitution accuracy rates. Evaluation of the DESIGN and PURPOSIVE tree models on the EVALUATION data set shows significantly lower prediction accuracy for the PURPOSIVE tree models relative to the DESIGN models, indicating that non-probabilistic sample surveys may generate models with limited predictive capability. These differences were consistent across all four lichen species, with 11 of the 12 possible species and sample survey type comparisons having significantly lower accuracy rates. Some differences in accuracy were as large as 50%. The classification tree structures also differed considerably both among and within the modelled species, depending on the sample survey form. Overlap in the predictor variables selected by the DESIGN and PURPOSIVE tree models ranged from only 20% to 38%, indicating the classification trees fit the two evaluated survey forms on different sets of predictor variables. The magnitude of these differences in predictor variables throws doubt on ecological interpretation derived from prediction models based on non-probabilistic sample surveys. ?? 2006 Elsevier B.V. All rights reserved.

  8. Diagnostic classification scheme in Iranian breast cancer patients using a decision tree.

    PubMed

    Malehi, Amal Saki

    2014-01-01

    The objective of this study was to determine a diagnostic classification scheme using a decision tree based model. The study was conducted as a retrospective case-control study in Imam Khomeini hospital in Tehran during 2001 to 2009. Data, including demographic and clinical-pathological characteristics, were uniformly collected from 624 females, 312 of them were referred with positive diagnosis of breast cancer (cases) and 312 healthy women (controls). The decision tree was implemented to develop a diagnostic classification scheme using CART 6.0 Software. The AUC (area under curve), was measured as the overall performance of diagnostic classification of the decision tree. Five variables as main risk factors of breast cancer and six subgroups as high risk were identified. The results indicated that increasing age, low age at menarche, single and divorced statues, irregular menarche pattern and family history of breast cancer are the important diagnostic factors in Iranian breast cancer patients. The sensitivity and specificity of the analysis were 66% and 86.9% respectively. The high AUC (0.82) also showed an excellent classification and diagnostic performance of the model. Decision tree based model appears to be suitable for identifying risk factors and high or low risk subgroups. It can also assists clinicians in making a decision, since it can identify underlying prognostic relationships and understanding the model is very explicit.

  9. PCA based feature reduction to improve the accuracy of decision tree c4.5 classification

    NASA Astrophysics Data System (ADS)

    Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.

    2018-03-01

    Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.

  10. A novel transferable individual tree crown delineation model based on Fishing Net Dragging and boundary classification

    NASA Astrophysics Data System (ADS)

    Liu, Tao; Im, Jungho; Quackenbush, Lindi J.

    2015-12-01

    This study provides a novel approach to individual tree crown delineation (ITCD) using airborne Light Detection and Ranging (LiDAR) data in dense natural forests using two main steps: crown boundary refinement based on a proposed Fishing Net Dragging (FiND) method, and segment merging based on boundary classification. FiND starts with approximate tree crown boundaries derived using a traditional watershed method with Gaussian filtering and refines these boundaries using an algorithm that mimics how a fisherman drags a fishing net. Random forest machine learning is then used to classify boundary segments into two classes: boundaries between trees and boundaries between branches that belong to a single tree. Three groups of LiDAR-derived features-two from the pseudo waveform generated along with crown boundaries and one from a canopy height model (CHM)-were used in the classification. The proposed ITCD approach was tested using LiDAR data collected over a mountainous region in the Adirondack Park, NY, USA. Overall accuracy of boundary classification was 82.4%. Features derived from the CHM were generally more important in the classification than the features extracted from the pseudo waveform. A comprehensive accuracy assessment scheme for ITCD was also introduced by considering both area of crown overlap and crown centroids. Accuracy assessment using this new scheme shows the proposed ITCD achieved 74% and 78% as overall accuracy, respectively, for deciduous and mixed forest.

  11. Online adaptive decision trees: pattern classification and function approximation.

    PubMed

    Basak, Jayanta

    2006-09-01

    Recently we have shown that decision trees can be trained in the online adaptive (OADT) mode (Basak, 2004), leading to better generalization score. OADTs were bottlenecked by the fact that they are able to handle only two-class classification tasks with a given structure. In this article, we provide an architecture based on OADT, ExOADT, which can handle multiclass classification tasks and is able to perform function approximation. ExOADT is structurally similar to OADT extended with a regression layer. We also show that ExOADT is capable not only of adapting the local decision hyperplanes in the nonterminal nodes but also has the potential of smoothly changing the structure of the tree depending on the data samples. We provide the learning rules based on steepest gradient descent for the new model ExOADT. Experimentally we demonstrate the effectiveness of ExOADT in the pattern classification and function approximation tasks. Finally, we briefly discuss the relationship of ExOADT with other classification models.

  12. Voxel classification based airway tree segmentation

    NASA Astrophysics Data System (ADS)

    Lo, Pechin; de Bruijne, Marleen

    2008-03-01

    This paper presents a voxel classification based method for segmenting the human airway tree in volumetric computed tomography (CT) images. In contrast to standard methods that use only voxel intensities, our method uses a more complex appearance model based on a set of local image appearance features and Kth nearest neighbor (KNN) classification. The optimal set of features for classification is selected automatically from a large set of features describing the local image structure at several scales. The use of multiple features enables the appearance model to differentiate between airway tree voxels and other voxels of similar intensities in the lung, thus making the segmentation robust to pathologies such as emphysema. The classifier is trained on imperfect segmentations that can easily be obtained using region growing with a manual threshold selection. Experiments show that the proposed method results in a more robust segmentation that can grow into the smaller airway branches without leaking into emphysematous areas, and is able to segment many branches that are not present in the training set.

  13. A research of selected textural features for detection of asbestos-cement roofing sheets using orthoimages

    NASA Astrophysics Data System (ADS)

    Książek, Judyta

    2015-10-01

    At present, there has been a great interest in the development of texture based image classification methods in many different areas. This study presents the results of research carried out to assess the usefulness of selected textural features for detection of asbestos-cement roofs in orthophotomap classification. Two different orthophotomaps of southern Poland (with ground resolution: 5 cm and 25 cm) were used. On both orthoimages representative samples for two classes: asbestos-cement roofing sheets and other roofing materials were selected. Estimation of texture analysis usefulness was conducted using machine learning methods based on decision trees (C5.0 algorithm). For this purpose, various sets of texture parameters were calculated in MaZda software. During the calculation of decision trees different numbers of texture parameters groups were considered. In order to obtain the best settings for decision trees models cross-validation was performed. Decision trees models with the lowest mean classification error were selected. The accuracy of the classification was held based on validation data sets, which were not used for the classification learning. For 5 cm ground resolution samples, the lowest mean classification error was 15.6%. The lowest mean classification error in the case of 25 cm ground resolution was 20.0%. The obtained results confirm potential usefulness of the texture parameter image processing for detection of asbestos-cement roofing sheets. In order to improve the accuracy another extended study should be considered in which additional textural features as well as spectral characteristics should be analyzed.

  14. Prospective identification of adolescent suicide ideation using classification tree analysis: Models for community-based screening.

    PubMed

    Hill, Ryan M; Oosterhoff, Benjamin; Kaplow, Julie B

    2017-07-01

    Although a large number of risk markers for suicide ideation have been identified, little guidance has been provided to prospectively identify adolescents at risk for suicide ideation within community settings. The current study addressed this gap in the literature by utilizing classification tree analysis (CTA) to provide a decision-making model for screening adolescents at risk for suicide ideation. Participants were N = 4,799 youth (Mage = 16.15 years, SD = 1.63) who completed both Waves 1 and 2 of the National Longitudinal Study of Adolescent to Adult Health. CTA was used to generate a series of decision rules for identifying adolescents at risk for reporting suicide ideation at Wave 2. Findings revealed 3 distinct solutions with varying sensitivity and specificity for identifying adolescents who reported suicide ideation. Sensitivity of the classification trees ranged from 44.6% to 77.6%. The tree with greatest specificity and lowest sensitivity was based on a history of suicide ideation. The tree with moderate sensitivity and high specificity was based on depressive symptoms, suicide attempts or suicide among family and friends, and social support. The most sensitive but least specific tree utilized these factors and gender, ethnicity, hours of sleep, school-related factors, and future orientation. These classification trees offer community organizations options for instituting large-scale screenings for suicide ideation risk depending on the available resources and modality of services to be provided. This study provides a theoretically and empirically driven model for prospectively identifying adolescents at risk for suicide ideation and has implications for preventive interventions among at-risk youth. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  15. Integrated approach using data mining-based decision tree and object-based image analysis for high-resolution urban mapping of WorldView-2 satellite sensor data

    NASA Astrophysics Data System (ADS)

    Hamedianfar, Alireza; Shafri, Helmi Zulhaidi Mohd

    2016-04-01

    This paper integrates decision tree-based data mining (DM) and object-based image analysis (OBIA) to provide a transferable model for the detailed characterization of urban land-cover classes using WorldView-2 (WV-2) satellite images. Many articles have been published on OBIA in recent years based on DM for different applications. However, less attention has been paid to the generation of a transferable model for characterizing detailed urban land cover features. Three subsets of WV-2 images were used in this paper to generate transferable OBIA rule-sets. Many features were explored by using a DM algorithm, which created the classification rules as a decision tree (DT) structure from the first study area. The developed DT algorithm was applied to object-based classifications in the first study area. After this process, we validated the capability and transferability of the classification rules into second and third subsets. Detailed ground truth samples were collected to assess the classification results. The first, second, and third study areas achieved 88%, 85%, and 85% overall accuracies, respectively. Results from the investigation indicate that DM was an efficient method to provide the optimal and transferable classification rules for OBIA, which accelerates the rule-sets creation stage in the OBIA classification domain.

  16. Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods

    Treesearch

    Gretchen G. Moisen; Elizabeth A. Freeman; Jock A. Blackard; Tracey S. Frescino; Niklaus E. Zimmermann; Thomas C. Edwards

    2006-01-01

    Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's© See5 and Cubist (for binary and continuous responses,...

  17. Predictive mapping of soil organic carbon in wet cultivated lands using classification-tree based models: the case study of Denmark.

    PubMed

    Bou Kheir, Rania; Greve, Mogens H; Bøcher, Peder K; Greve, Mette B; Larsen, René; McCloy, Keith

    2010-05-01

    Soil organic carbon (SOC) is one of the most important carbon stocks globally and has large potential to affect global climate. Distribution patterns of SOC in Denmark constitute a nation-wide baseline for studies on soil carbon changes (with respect to Kyoto protocol). This paper predicts and maps the geographic distribution of SOC across Denmark using remote sensing (RS), geographic information systems (GISs) and decision-tree modeling (un-pruned and pruned classification trees). Seventeen parameters, i.e. parent material, soil type, landscape type, elevation, slope gradient, slope aspect, mean curvature, plan curvature, profile curvature, flow accumulation, specific catchment area, tangent slope, tangent curvature, steady-state wetness index, Normalized Difference Vegetation Index (NDVI), Normalized Difference Wetness Index (NDWI) and Soil Color Index (SCI) were generated to statistically explain SOC field measurements in the area of interest (Denmark). A large number of tree-based classification models (588) were developed using (i) all of the parameters, (ii) all Digital Elevation Model (DEM) parameters only, (iii) the primary DEM parameters only, (iv), the remote sensing (RS) indices only, (v) selected pairs of parameters, (vi) soil type, parent material and landscape type only, and (vii) the parameters having a high impact on SOC distribution in built pruned trees. The best constructed classification tree models (in the number of three) with the lowest misclassification error (ME) and the lowest number of nodes (N) as well are: (i) the tree (T1) combining all of the parameters (ME=29.5%; N=54); (ii) the tree (T2) based on the parent material, soil type and landscape type (ME=31.5%; N=14); and (iii) the tree (T3) constructed using parent material, soil type, landscape type, elevation, tangent slope and SCI (ME=30%; N=39). The produced SOC maps at 1:50,000 cartographic scale using these trees are highly matching with coincidence values equal to 90.5% (Map T1/Map T2), 95% (Map T1/Map T3) and 91% (Map T2/Map T3). The overall accuracies of these maps once compared with field observations were estimated to be 69.54% (Map T1), 68.87% (Map T2) and 69.41% (Map T3). The proposed tree models are relatively simple, and may be also applied to other areas. Copyright 2010 Elsevier Ltd. All rights reserved.

  18. a Rough Set Decision Tree Based Mlp-Cnn for Very High Resolution Remotely Sensed Image Classification

    NASA Astrophysics Data System (ADS)

    Zhang, C.; Pan, X.; Zhang, S. Q.; Li, H. P.; Atkinson, P. M.

    2017-09-01

    Recent advances in remote sensing have witnessed a great amount of very high resolution (VHR) images acquired at sub-metre spatial resolution. These VHR remotely sensed data has post enormous challenges in processing, analysing and classifying them effectively due to the high spatial complexity and heterogeneity. Although many computer-aid classification methods that based on machine learning approaches have been developed over the past decades, most of them are developed toward pixel level spectral differentiation, e.g. Multi-Layer Perceptron (MLP), which are unable to exploit abundant spatial details within VHR images. This paper introduced a rough set model as a general framework to objectively characterize the uncertainty in CNN classification results, and further partition them into correctness and incorrectness on the map. The correct classification regions of CNN were trusted and maintained, whereas the misclassification areas were reclassified using a decision tree with both CNN and MLP. The effectiveness of the proposed rough set decision tree based MLP-CNN was tested using an urban area at Bournemouth, United Kingdom. The MLP-CNN, well capturing the complementarity between CNN and MLP through the rough set based decision tree, achieved the best classification performance both visually and numerically. Therefore, this research paves the way to achieve fully automatic and effective VHR image classification.

  19. Individualized Prediction of Heat Stress in Firefighters: A Data-Driven Approach Using Classification and Regression Trees.

    PubMed

    Mani, Ashutosh; Rao, Marepalli; James, Kelley; Bhattacharya, Amit

    2015-01-01

    The purpose of this study was to explore data-driven models, based on decision trees, to develop practical and easy to use predictive models for early identification of firefighters who are likely to cross the threshold of hyperthermia during live-fire training. Predictive models were created for three consecutive live-fire training scenarios. The final predicted outcome was a categorical variable: will a firefighter cross the upper threshold of hyperthermia - Yes/No. Two tiers of models were built, one with and one without taking into account the outcome (whether a firefighter crossed hyperthermia or not) from the previous training scenario. First tier of models included age, baseline heart rate and core body temperature, body mass index, and duration of training scenario as predictors. The second tier of models included the outcome of the previous scenario in the prediction space, in addition to all the predictors from the first tier of models. Classification and regression trees were used independently for prediction. The response variable for the regression tree was the quantitative variable: core body temperature at the end of each scenario. The predicted quantitative variable from regression trees was compared to the upper threshold of hyperthermia (38°C) to predict whether a firefighter would enter hyperthermia. The performance of classification and regression tree models was satisfactory for the second (success rate = 79%) and third (success rate = 89%) training scenarios but not for the first (success rate = 43%). Data-driven models based on decision trees can be a useful tool for predicting physiological response without modeling the underlying physiological systems. Early prediction of heat stress coupled with proactive interventions, such as pre-cooling, can help reduce heat stress in firefighters.

  20. Automated morphological analysis of bone marrow cells in microscopic images for diagnosis of leukemia: nucleus-plasma separation and cell classification using a hierarchical tree model of hematopoesis

    NASA Astrophysics Data System (ADS)

    Krappe, Sebastian; Wittenberg, Thomas; Haferlach, Torsten; Münzenmayer, Christian

    2016-03-01

    The morphological differentiation of bone marrow is fundamental for the diagnosis of leukemia. Currently, the counting and classification of the different types of bone marrow cells is done manually under the use of bright field microscopy. This is a time-consuming, subjective, tedious and error-prone process. Furthermore, repeated examinations of a slide may yield intra- and inter-observer variances. For that reason a computer assisted diagnosis system for bone marrow differentiation is pursued. In this work we focus (a) on a new method for the separation of nucleus and plasma parts and (b) on a knowledge-based hierarchical tree classifier for the differentiation of bone marrow cells in 16 different classes. Classification trees are easily interpretable and understandable and provide a classification together with an explanation. Using classification trees, expert knowledge (i.e. knowledge about similar classes and cell lines in the tree model of hematopoiesis) is integrated in the structure of the tree. The proposed segmentation method is evaluated with more than 10,000 manually segmented cells. For the evaluation of the proposed hierarchical classifier more than 140,000 automatically segmented bone marrow cells are used. Future automated solutions for the morphological analysis of bone marrow smears could potentially apply such an approach for the pre-classification of bone marrow cells and thereby shortening the examination time.

  1. Simulation of land use change in the three gorges reservoir area based on CART-CA

    NASA Astrophysics Data System (ADS)

    Yuan, Min

    2018-05-01

    This study proposes a new method to simulate spatiotemporal complex multiple land uses by using classification and regression tree algorithm (CART) based CA model. In this model, we use classification and regression tree algorithm to calculate land class conversion probability, and combine neighborhood factor, random factor to extract cellular transformation rules. The overall Kappa coefficient is 0.8014 and the overall accuracy is 0.8821 in the land dynamic simulation results of the three gorges reservoir area from 2000 to 2010, and the simulation results are satisfactory.

  2. Presence of indicator plant species as a predictor of wetland vegetation integrity

    USGS Publications Warehouse

    Stapanian, Martin A.; Adams, Jean V.; Gara, Brian

    2013-01-01

    We fit regression and classification tree models to vegetation data collected from Ohio (USA) wetlands to determine (1) which species best predict Ohio vegetation index of biotic integrity (OVIBI) score and (2) which species best predict high-quality wetlands (OVIBI score >75). The simplest regression tree model predicted OVIBI score based on the occurrence of three plant species: skunk-cabbage (Symplocarpus foetidus), cinnamon fern (Osmunda cinnamomea), and swamp rose (Rosa palustris). The lowest OVIBI scores were best predicted by the absence of the selected plant species rather than by the presence of other species. The simplest classification tree model predicted high-quality wetlands based on the occurrence of two plant species: skunk-cabbage and marsh-fern (Thelypteris palustris). The overall misclassification rate from this tree was 13 %. Again, low-quality wetlands were better predicted than high-quality wetlands by the absence of selected species rather than the presence of other species using the classification tree model. Our results suggest that a species’ wetland status classification and coefficient of conservatism are of little use in predicting wetland quality. A simple, statistically derived species checklist such as the one created in this study could be used by field biologists to quickly and efficiently identify wetland sites likely to be regulated as high-quality, and requiring more intensive field assessments. Alternatively, it can be used for advanced determinations of low-quality wetlands. Agencies can save considerable money by screening wetlands for the presence/absence of such “indicator” species before issuing permits.

  3. A classification tree based modeling approach for segment related crashes on multilane highways.

    PubMed

    Pande, Anurag; Abdel-Aty, Mohamed; Das, Abhishek

    2010-10-01

    This study presents a classification tree based alternative to crash frequency analysis for analyzing crashes on mid-block segments of multilane arterials. The traditional approach of modeling counts of crashes that occur over a period of time works well for intersection crashes where each intersection itself provides a well-defined unit over which to aggregate the crash data. However, in the case of mid-block segments the crash frequency based approach requires segmentation of the arterial corridor into segments of arbitrary lengths. In this study we have used random samples of time, day of week, and location (i.e., milepost) combinations and compared them with the sample of crashes from the same arterial corridor. For crash and non-crash cases, geometric design/roadside and traffic characteristics were derived based on their milepost locations. The variables used in the analysis are non-event specific and therefore more relevant for roadway safety feature improvement programs. First classification tree model is a model comparing all crashes with the non-crash data and then four groups of crashes (rear-end, lane-change related, pedestrian, and single-vehicle/off-road crashes) are separately compared to the non-crash cases. The classification tree models provide a list of significant variables as well as a measure to classify crash from non-crash cases. ADT along with time of day/day of week are significantly related to all crash types with different groups of crashes being more likely to occur at different times. From the classification performance of different models it was apparent that using non-event specific information may not be suitable for single vehicle/off-road crashes. The study provides the safety analysis community an additional tool to assess safety without having to aggregate the corridor crash data over arbitrary segment lengths. Copyright © 2010. Published by Elsevier Ltd.

  4. Decision tree methods: applications for classification and prediction.

    PubMed

    Song, Yan-Yan; Lu, Ying

    2015-04-25

    Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.

  5. Relating FIA data to habitat classifications via tree-based models of canopy cover

    Treesearch

    Mark D. Nelson; Brian G. Tavernia; Chris Toney; Brian F. Walters

    2012-01-01

    Wildlife species-habitat matrices are used to relate lists of species with abundance of their habitats. The Forest Inventory and Analysis Program provides data on forest composition and structure, but these attributes may not correspond directly with definitions of wildlife habitats. We used FIA tree data and tree crown diameter models to estimate canopy cover, from...

  6. Tree Species Classification of Broadleaved Forests in Nagano, Central Japan, Using Airborne Laser Data and Multispectral Images

    NASA Astrophysics Data System (ADS)

    Deng, S.; Katoh, M.; Takenaka, Y.; Cheung, K.; Ishii, A.; Fujii, N.; Gao, T.

    2017-10-01

    This study attempted to classify three coniferous and ten broadleaved tree species by combining airborne laser scanning (ALS) data and multispectral images. The study area, located in Nagano, central Japan, is within the broadleaved forests of the Afan Woodland area. A total of 235 trees were surveyed in 2016, and we recorded the species, DBH, and tree height. The geographical position of each tree was collected using a Global Navigation Satellite System (GNSS) device. Tree crowns were manually detected using GNSS position data, field photographs, true-color orthoimages with three bands (red-green-blue, RGB), 3D point clouds, and a canopy height model derived from ALS data. Then a total of 69 features, including 27 image-based and 42 point-based features, were extracted from the RGB images and the ALS data to classify tree species. Finally, the detected tree crowns were classified into two classes for the first level (coniferous and broadleaved trees), four classes for the second level (Pinus densiflora, Larix kaempferi, Cryptomeria japonica, and broadleaved trees), and 13 classes for the third level (three coniferous and ten broadleaved species), using the 27 image-based features, 42 point-based features, all 69 features, and the best combination of features identified using a neighborhood component analysis algorithm, respectively. The overall classification accuracies reached 90 % at the first and second levels but less than 60 % at the third level. The classifications using the best combinations of features had higher accuracies than those using the image-based and point-based features and the combination of all of the 69 features.

  7. An information-based network approach for protein classification

    PubMed Central

    Wan, Xiaogeng; Zhao, Xin; Yau, Stephen S. T.

    2017-01-01

    Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method. PMID:28350835

  8. Application of classification tree and logistic regression for the management and health intervention plans in a community-based study.

    PubMed

    Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq

    2007-10-01

    A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.

  9. Object-based methods for individual tree identification and tree species classification from high-spatial resolution imagery

    NASA Astrophysics Data System (ADS)

    Wang, Le

    2003-10-01

    Modern forest management poses an increasing need for detailed knowledge of forest information at different spatial scales. At the forest level, the information for tree species assemblage is desired whereas at or below the stand level, individual tree related information is preferred. Remote Sensing provides an effective tool to extract the above information at multiple spatial scales in the continuous time domain. To date, the increasing volume and readily availability of high-spatial-resolution data have lead to a much wider application of remotely sensed products. Nevertheless, to make effective use of the improving spatial resolution, conventional pixel-based classification methods are far from satisfactory. Correspondingly, developing object-based methods becomes a central challenge for researchers in the field of Remote Sensing. This thesis focuses on the development of methods for accurate individual tree identification and tree species classification. We develop a method in which individual tree crown boundaries and treetop locations are derived under a unified framework. We apply a two-stage approach with edge detection followed by marker-controlled watershed segmentation. Treetops are modeled from radiometry and geometry aspects. Specifically, treetops are assumed to be represented by local radiation maxima and to be located near the center of the tree-crown. As a result, a marker image was created from the derived treetop to guide a watershed segmentation to further differentiate overlapping trees and to produce a segmented image comprised of individual tree crowns. The image segmentation method developed achieves a promising result for a 256 x 256 CASI image. Then further effort is made to extend our methods to the multiscales which are constructed from a wavelet decomposition. A scale consistency and geometric consistency are designed to examine the gradients along the scale-space for the purpose of separating true crown boundary from unwanted textures occurring due to branches and twigs. As a result from the inverse wavelet transform, the tree crown boundary is enhanced while the unwanted textures are suppressed. Based on the enhanced image, an improvement is achieved when applying the two-stage methods to a high resolution aerial photograph. To improve tree species classification, we develop a new method to choose the optimal scale parameter with the aid of Bhattacharya Distance (BD), a well-known index of class separability in traditional pixel-based classification. The optimal scale parameter is then fed in the process of a region-growing-based segmentation as a break-off value. Our object classification achieves a better accuracy in separating tree species when compared to the conventional Maximum Likelihood Classification (MLC). In summary, we develop two object-based methods for identifying individual trees and classifying tree species from high-spatial resolution imagery. Both methods achieve promising results and will promote integration of Remote Sensing and GIS in forest applications.

  10. Comprehensive decision tree models in bioinformatics.

    PubMed

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.

  11. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449

  12. A Mixtures-of-Trees Framework for Multi-Label Classification

    PubMed Central

    Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos

    2015-01-01

    We propose a new probabilistic approach for multi-label classification that aims to represent the class posterior distribution P(Y|X). Our approach uses a mixture of tree-structured Bayesian networks, which can leverage the computational advantages of conditional tree-structured models and the abilities of mixtures to compensate for tree-structured restrictions. We develop algorithms for learning the model from data and for performing multi-label predictions using the learned model. Experiments on multiple datasets demonstrate that our approach outperforms several state-of-the-art multi-label classification methods. PMID:25927011

  13. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  14. Tree species classification in subtropical forests using small-footprint full-waveform LiDAR data

    NASA Astrophysics Data System (ADS)

    Cao, Lin; Coops, Nicholas C.; Innes, John L.; Dai, Jinsong; Ruan, Honghua; She, Guanghui

    2016-07-01

    The accurate classification of tree species is critical for the management of forest ecosystems, particularly subtropical forests, which are highly diverse and complex ecosystems. While airborne Light Detection and Ranging (LiDAR) technology offers significant potential to estimate forest structural attributes, the capacity of this new tool to classify species is less well known. In this research, full-waveform metrics were extracted by a voxel-based composite waveform approach and examined with a Random Forests classifier to discriminate six subtropical tree species (i.e., Masson pine (Pinus massoniana Lamb.)), Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.), Slash pines (Pinus elliottii Engelm.), Sawtooth oak (Quercus acutissima Carruth.) and Chinese holly (Ilex chinensis Sims.) at three levels of discrimination. As part of the analysis, the optimal voxel size for modelling the composite waveforms was investigated, the most important predictor metrics for species classification assessed and the effect of scan angle on species discrimination examined. Results demonstrate that all tree species were classified with relatively high accuracy (68.6% for six classes, 75.8% for four main species and 86.2% for conifers and broadleaved trees). Full-waveform metrics (based on height of median energy, waveform distance and number of waveform peaks) demonstrated high classification importance and were stable among various voxel sizes. The results also suggest that the voxel based approach can alleviate some of the issues associated with large scan angles. In summary, the results indicate that full-waveform LIDAR data have significant potential for tree species classification in the subtropical forests.

  15. Combining a generic process-based productivity model classification method to predict the presence and absence species in the Pacific Northwest, U.S.A

    Treesearch

    Nicholas C. Coops; Richard H. Waring; Todd A. Schroeder

    2009-01-01

    Although long-lived tree species experience considerable environmental variation over their life spans, their geographical distributions reflect sensitivity mainly to mean monthly climatic conditions.We introduce an approach that incorporates a physiologically based growth model to illustrate how a half-dozen tree species differ in their responses to monthly variation...

  16. Optimal land use/cover classification using remote sensing imagery for hydrological modelling in a Himalayan watershed

    NASA Astrophysics Data System (ADS)

    Saran, Sameer; Sterk, Geert; Kumar, Suresh

    2007-10-01

    Land use/cover is an important watershed surface characteristic that affects surface runoff and erosion. Many of the available hydrological models divide the watershed into Hydrological Response Units (HRU), which are spatial units with expected similar hydrological behaviours. The division into HRU's requires good-quality spatial data on land use/cover. This paper presents different approaches to attain an optimal land use/cover map based on remote sensing imagery for a Himalayan watershed in northern India. First digital classifications using maximum likelihood classifier (MLC) and a decision tree classifier were applied. The results obtained from the decision tree were better and even improved after post classification sorting. But the obtained land use/cover map was not sufficient for the delineation of HRUs, since the agricultural land use/cover class did not discriminate between the two major crops in the area i.e. paddy and maize. Therefore we adopted a visual classification approach using optical data alone and also fused with ENVISAT ASAR data. This second step with detailed classification system resulted into better classification accuracy within the 'agricultural land' class which will be further combined with topography and soil type to derive HRU's for physically-based hydrological modelling.

  17. Forest Tree Species Distribution Mapping Using Landsat Satellite Imagery and Topographic Variables with the Maximum Entropy Method in Mongolia

    NASA Astrophysics Data System (ADS)

    Hao Chiang, Shou; Valdez, Miguel; Chen, Chi-Farn

    2016-06-01

    Forest is a very important ecosystem and natural resource for living things. Based on forest inventories, government is able to make decisions to converse, improve and manage forests in a sustainable way. Field work for forestry investigation is difficult and time consuming, because it needs intensive physical labor and the costs are high, especially surveying in remote mountainous regions. A reliable forest inventory can give us a more accurate and timely information to develop new and efficient approaches of forest management. The remote sensing technology has been recently used for forest investigation at a large scale. To produce an informative forest inventory, forest attributes, including tree species are unavoidably required to be considered. In this study the aim is to classify forest tree species in Erdenebulgan County, Huwsgul province in Mongolia, using Maximum Entropy method. The study area is covered by a dense forest which is almost 70% of total territorial extension of Erdenebulgan County and is located in a high mountain region in northern Mongolia. For this study, Landsat satellite imagery and a Digital Elevation Model (DEM) were acquired to perform tree species mapping. The forest tree species inventory map was collected from the Forest Division of the Mongolian Ministry of Nature and Environment as training data and also used as ground truth to perform the accuracy assessment of the tree species classification. Landsat images and DEM were processed for maximum entropy modeling, and this study applied the model with two experiments. The first one is to use Landsat surface reflectance for tree species classification; and the second experiment incorporates terrain variables in addition to the Landsat surface reflectance to perform the tree species classification. All experimental results were compared with the tree species inventory to assess the classification accuracy. Results show that the second one which uses Landsat surface reflectance coupled with terrain variables produced better result, with the higher overall accuracy and kappa coefficient than first experiment. The results indicate that the Maximum Entropy method is an applicable, and to classify tree species using satellite imagery data coupled with terrain information can improve the classification of tree species in the study area.

  18. Comparison of Naive Bayes and Decision Tree on Feature Selection Using Genetic Algorithm for Classification Problem

    NASA Astrophysics Data System (ADS)

    Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias

    2018-03-01

    This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.

  19. Detection of Aspens Using High Resolution Aerial Laser Scanning Data and Digital Aerial Images

    PubMed Central

    Säynäjoki, Raita; Packalén, Petteri; Maltamo, Matti; Vehmas, Mikko; Eerikäinen, Kalle

    2008-01-01

    The aim was to use high resolution Aerial Laser Scanning (ALS) data and aerial images to detect European aspen (Populus tremula L.) from among other deciduous trees. The field data consisted of 14 sample plots of 30 m × 30 m size located in the Koli National Park in the North Karelia, Eastern Finland. A Canopy Height Model (CHM) was interpolated from the ALS data with a pulse density of 3.86/m2, low-pass filtered using Height-Based Filtering (HBF) and binarized to create the mask needed to separate the ground pixels from the canopy pixels within individual areas. Watershed segmentation was applied to the low-pass filtered CHM in order to create preliminary canopy segments, from which the non-canopy elements were extracted to obtain the final canopy segmentation, i.e. the ground mask was analysed against the canopy mask. A manual classification of aerial images was employed to separate the canopy segments of deciduous trees from those of coniferous trees. Finally, linear discriminant analysis was applied to the correctly classified canopy segments of deciduous trees to classify them into segments belonging to aspen and those belonging to other deciduous trees. The independent variables used in the classification were obtained from the first pulse ALS point data. The accuracy of discrimination between aspen and other deciduous trees was 78.6%. The independent variables in the classification function were the proportion of vegetation hits, the standard deviation of in pulse heights, accumulated intensity at the 90th percentile and the proportion of laser points reflected at the 60th height percentile. The accuracy of classification corresponded to the validation results of earlier ALS-based studies on the classification of individual deciduous trees to tree species. PMID:27873799

  20. A new tree classification system for southern hardwoods

    Treesearch

    James S. Meadows; Daniel A. Jr. Skojac

    2008-01-01

    A new tree classification system for southern hardwoods is described. The new system is based on the Putnam tree classification system, originally developed by Putnam et al., 1960, Management ond inventory of southern hardwoods, Agriculture Handbook 181, US For. Sew., Washington, DC, which consists of four tree classes: (1) preferred growing stock, (2) reserve growing...

  1. A combined M5P tree and hazard-based duration model for predicting urban freeway traffic accident durations.

    PubMed

    Lin, Lei; Wang, Qian; Sadek, Adel W

    2016-06-01

    The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Recruiting Conventional Tree Architecture Models into State-of-the-Art LiDAR Mapping for Investigating Tree Growth Habits in Structure.

    PubMed

    Lin, Yi; Jiang, Miao; Pellikka, Petri; Heiskanen, Janne

    2018-01-01

    Mensuration of tree growth habits is of considerable importance for understanding forest ecosystem processes and forest biophysical responses to climate changes. However, the complexity of tree crown morphology that is typically formed after many years of growth tends to render it a non-trivial task, even for the state-of-the-art 3D forest mapping technology-light detection and ranging (LiDAR). Fortunately, botanists have deduced the large structural diversity of tree forms into only a limited number of tree architecture models, which can present a-priori knowledge about tree structure, growth, and other attributes for different species. This study attempted to recruit Hallé architecture models (HAMs) into LiDAR mapping to investigate tree growth habits in structure. First, following the HAM-characterized tree structure organization rules, we run the kernel procedure of tree species classification based on the LiDAR-collected point clouds using a support vector machine classifier in the leave-one-out-for-cross-validation mode. Then, the HAM corresponding to each of the classified tree species was identified based on expert knowledge, assisted by the comparison of the LiDAR-derived feature parameters. Next, the tree growth habits in structure for each of the tree species were derived from the determined HAM. In the case of four tree species growing in the boreal environment, the tests indicated that the classification accuracy reached 85.0%, and their growth habits could be derived by qualitative and quantitative means. Overall, the strategy of recruiting conventional HAMs into LiDAR mapping for investigating tree growth habits in structure was validated, thereby paving a new way for efficiently reflecting tree growth habits and projecting forest structure dynamics.

  3. Recruiting Conventional Tree Architecture Models into State-of-the-Art LiDAR Mapping for Investigating Tree Growth Habits in Structure

    PubMed Central

    Lin, Yi; Jiang, Miao; Pellikka, Petri; Heiskanen, Janne

    2018-01-01

    Mensuration of tree growth habits is of considerable importance for understanding forest ecosystem processes and forest biophysical responses to climate changes. However, the complexity of tree crown morphology that is typically formed after many years of growth tends to render it a non-trivial task, even for the state-of-the-art 3D forest mapping technology—light detection and ranging (LiDAR). Fortunately, botanists have deduced the large structural diversity of tree forms into only a limited number of tree architecture models, which can present a-priori knowledge about tree structure, growth, and other attributes for different species. This study attempted to recruit Hallé architecture models (HAMs) into LiDAR mapping to investigate tree growth habits in structure. First, following the HAM-characterized tree structure organization rules, we run the kernel procedure of tree species classification based on the LiDAR-collected point clouds using a support vector machine classifier in the leave-one-out-for-cross-validation mode. Then, the HAM corresponding to each of the classified tree species was identified based on expert knowledge, assisted by the comparison of the LiDAR-derived feature parameters. Next, the tree growth habits in structure for each of the tree species were derived from the determined HAM. In the case of four tree species growing in the boreal environment, the tests indicated that the classification accuracy reached 85.0%, and their growth habits could be derived by qualitative and quantitative means. Overall, the strategy of recruiting conventional HAMs into LiDAR mapping for investigating tree growth habits in structure was validated, thereby paving a new way for efficiently reflecting tree growth habits and projecting forest structure dynamics. PMID:29515616

  4. Systematic Model-in-the-Loop Test of Embedded Control Systems

    NASA Astrophysics Data System (ADS)

    Krupp, Alexander; Müller, Wolfgang

    Current model-based development processes offer new opportunities for verification automation, e.g., in automotive development. The duty of functional verification is the detection of design flaws. Current functional verification approaches exhibit a major gap between requirement definition and formal property definition, especially when analog signals are involved. Besides lack of methodical support for natural language formalization, there does not exist a standardized and accepted means for formal property definition as a target for verification planning. This article addresses several shortcomings of embedded system verification. An Enhanced Classification Tree Method is developed based on the established Classification Tree Method for Embeded Systems CTM/ES which applies a hardware verification language to define a verification environment.

  5. Landsat TM Classifications For SAFIS Using FIA Field Plots

    Treesearch

    William H. Cooke; Andrew J. Hartsell

    2001-01-01

    Wall-to-wall Landsat Thematic Mapper (TM) classification efforts in Georgia require field validation. We developed a new crown modeling procedure based on Forest Health Monitoring (FHM) data to test Forest Inventory and Analysis (FIA) data. These models simulate the proportion of tree crowns that reflect light on a FIA subplot basis. We averaged subplot crown...

  6. A universal hybrid decision tree classifier design for human activity classification.

    PubMed

    Chien, Chieh; Pottie, Gregory J

    2012-01-01

    A system that reliably classifies daily life activities can contribute to more effective and economical treatments for patients with chronic conditions or undergoing rehabilitative therapy. We propose a universal hybrid decision tree classifier for this purpose. The tree classifier can flexibly implement different decision rules at its internal nodes, and can be adapted from a population-based model when supplemented by training data for individuals. The system was tested using seven subjects each monitored by 14 triaxial accelerometers. Each subject performed fourteen different activities typical of daily life. Using leave-one-out cross validation, our decision tree produced average classification accuracies of 89.9%. In contrast, the MATLAB personalized tree classifiers using Gini's diversity index as the split criterion followed by optimally tuning the thresholds for each subject yielded 69.2%.

  7. Crown-level tree species classification from AISA hyperspectral imagery using an innovative pixel-weighting approach

    NASA Astrophysics Data System (ADS)

    Liu, Haijian; Wu, Changshan

    2018-06-01

    Crown-level tree species classification is a challenging task due to the spectral similarity among different tree species. Shadow, underlying objects, and other materials within a crown may decrease the purity of extracted crown spectra and further reduce classification accuracy. To address this problem, an innovative pixel-weighting approach was developed for tree species classification at the crown level. The method utilized high density discrete LiDAR data for individual tree delineation and Airborne Imaging Spectrometer for Applications (AISA) hyperspectral imagery for pure crown-scale spectra extraction. Specifically, three steps were included: 1) individual tree identification using LiDAR data, 2) pixel-weighted representative crown spectra calculation using hyperspectral imagery, with which pixel-based illuminated-leaf fractions estimated using a linear spectral mixture analysis (LSMA) were employed as weighted factors, and 3) representative spectra based tree species classification was performed through applying a support vector machine (SVM) approach. Analysis of results suggests that the developed pixel-weighting approach (OA = 82.12%, Kc = 0.74) performed better than treetop-based (OA = 70.86%, Kc = 0.58) and pixel-majority methods (OA = 72.26, Kc = 0.62) in terms of classification accuracy. McNemar tests indicated the differences in accuracy between pixel-weighting and treetop-based approaches as well as that between pixel-weighting and pixel-majority approaches were statistically significant.

  8. Classification of Parkinsonian syndromes from FDG-PET brain data using decision trees with SSM/PCA features.

    PubMed

    Mudali, D; Teune, L K; Renken, R J; Leenders, K L; Roerdink, J B T M

    2015-01-01

    Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data.

  9. A classification model of Hyperion image base on SAM combined decision tree

    NASA Astrophysics Data System (ADS)

    Wang, Zhenghai; Hu, Guangdao; Zhou, YongZhang; Liu, Xin

    2009-10-01

    Monitoring the Earth using imaging spectrometers has necessitated more accurate analyses and new applications to remote sensing. A very high dimensional input space requires an exponentially large amount of data to adequately and reliably represent the classes in that space. On the other hand, with increase in the input dimensionality the hypothesis space grows exponentially, which makes the classification performance highly unreliable. Traditional classification algorithms Classification of hyperspectral images is challenging. New algorithms have to be developed for hyperspectral data classification. The Spectral Angle Mapper (SAM) is a physically-based spectral classification that uses an ndimensional angle to match pixels to reference spectra. The algorithm determines the spectral similarity between two spectra by calculating the angle between the spectra, treating them as vectors in a space with dimensionality equal to the number of bands. The key and difficulty is that we should artificial defining the threshold of SAM. The classification precision depends on the rationality of the threshold of SAM. In order to resolve this problem, this paper proposes a new automatic classification model of remote sensing image using SAM combined with decision tree. It can automatic choose the appropriate threshold of SAM and improve the classify precision of SAM base on the analyze of field spectrum. The test area located in Heqing Yunnan was imaged by EO_1 Hyperion imaging spectrometer using 224 bands in visual and near infrared. The area included limestone areas, rock fields, soil and forests. The area was classified into four different vegetation and soil types. The results show that this method choose the appropriate threshold of SAM and eliminates the disturbance and influence of unwanted objects effectively, so as to improve the classification precision. Compared with the likelihood classification by field survey data, the classification precision of this model heightens 9.9%.

  10. Ensemble classification of individual Pinus crowns from multispectral satellite imagery and airborne LiDAR

    NASA Astrophysics Data System (ADS)

    Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph

    2018-03-01

    Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.

  11. Spectral analysis of white ash response to emerald ash borer infestations

    NASA Astrophysics Data System (ADS)

    Calandra, Laura

    The emerald ash borer (EAB) (Agrilus planipennis Fairmaire) is an invasive insect that has killed over 50 million ash trees in the US. The goal of this research was to establish a method to identify ash trees infested with EAB using remote sensing techniques at the leaf-level and tree crown level. First, a field-based study at the leaf-level used the range of spectral bands from the WorldView-2 sensor to determine if there was a significant difference between EAB-infested white ash (Fraxinus americana) and healthy leaves. Binary logistic regression models were developed using individual and combinations of wavelengths; the most successful model included 545 and 950 nm bands. The second half of this research employed imagery to identify healthy and EAB-infested trees, comparing pixel- and object-based methods by applying an unsupervised classification approach and a tree crown delineation algorithm, respectively. The pixel-based models attained the highest overall accuracies.

  12. Using methods from the data mining and machine learning literature for disease classification and prediction: A case study examining classification of heart failure sub-types

    PubMed Central

    Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.

    2014-01-01

    Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592

  13. Automated rule-base creation via CLIPS-Induce

    NASA Technical Reports Server (NTRS)

    Murphy, Patrick M.

    1994-01-01

    Many CLIPS rule-bases contain one or more rule groups that perform classification. In this paper we describe CLIPS-Induce, an automated system for the creation of a CLIPS classification rule-base from a set of test cases. CLIPS-Induce consists of two components, a decision tree induction component and a CLIPS production extraction component. ID3, a popular decision tree induction algorithm, is used to induce a decision tree from the test cases. CLIPS production extraction is accomplished through a top-down traversal of the decision tree. Nodes of the tree are used to construct query rules, and branches of the tree are used to construct classification rules. The learned CLIPS productions may easily be incorporated into a large CLIPS system that perform tasks such as accessing a database or displaying information.

  14. Modeling time-to-event (survival) data using classification tree analysis.

    PubMed

    Linden, Ariel; Yarnold, Paul R

    2017-12-01

    Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.

  15. Evolving optimised decision rules for intrusion detection using particle swarm paradigm

    NASA Astrophysics Data System (ADS)

    Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.

    2012-12-01

    The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.

  16. A novel approach to internal crown characterization for coniferous tree species classification

    NASA Astrophysics Data System (ADS)

    Harikumar, A.; Bovolo, F.; Bruzzone, L.

    2016-10-01

    The knowledge about individual trees in forest is highly beneficial in forest management. High density small foot- print multi-return airborne Light Detection and Ranging (LiDAR) data can provide a very accurate information about the structural properties of individual trees in forests. Every tree species has a unique set of crown structural characteristics that can be used for tree species classification. In this paper, we use both the internal and external crown structural information of a conifer tree crown, derived from a high density small foot-print multi-return LiDAR data acquisition for species classification. Considering the fact that branches are the major building blocks of a conifer tree crown, we obtain the internal crown structural information using a branch level analysis. The structure of each conifer branch is represented using clusters in the LiDAR point cloud. We propose the joint use of the k-means clustering and geometric shape fitting, on the LiDAR data projected onto a novel 3-dimensional space, to identify branch clusters. After mapping the identified clusters back to the original space, six internal geometric features are estimated using a branch-level analysis. The external crown characteristics are modeled by using six least correlated features based on cone fitting and convex hull. Species classification is performed using a sparse Support Vector Machines (sparse SVM) classifier.

  17. Optimal land use/land cover classification using remote sensing imagery for hydrological modeling in a Himalayan watershed

    NASA Astrophysics Data System (ADS)

    Saran, Sameer; Sterk, Geert; Kumar, Suresh

    2009-10-01

    Land use/land cover is an important watershed surface characteristic that affects surface runoff and erosion. Many of the available hydrological models divide the watershed into Hydrological Response Units (HRU), which are spatial units with expected similar hydrological behaviours. The division into HRU's requires good-quality spatial data on land use/land cover. This paper presents different approaches to attain an optimal land use/land cover map based on remote sensing imagery for a Himalayan watershed in northern India. First digital classifications using maximum likelihood classifier (MLC) and a decision tree classifier were applied. The results obtained from the decision tree were better and even improved after post classification sorting. But the obtained land use/land cover map was not sufficient for the delineation of HRUs, since the agricultural land use/land cover class did not discriminate between the two major crops in the area i.e. paddy and maize. Subsequently the digital classification on fused data (ASAR and ASTER) were attempted to map land use/land cover classes with emphasis to delineate the paddy and maize crops but the supervised classification over fused datasets did not provide the desired accuracy and proper delineation of paddy and maize crops. Eventually, we adopted a visual classification approach on fused data. This second step with detailed classification system resulted into better classification accuracy within the 'agricultural land' class which will be further combined with topography and soil type to derive HRU's for physically-based hydrological modeling.

  18. Mapping Species Composition of Forests and Tree Plantations in Northeastern Costa Rica with an Integration of Hyperspectral and Multitemporal Landsat Imagery

    NASA Technical Reports Server (NTRS)

    Fagan, Matthew E.; Defries, Ruth S.; Sesnie, Steven E.; Arroyo-Mora, J. Pablo; Soto, Carlomagno; Singh, Aditya; Townsend, Philip A.; Chazdon, Robin L.

    2015-01-01

    An efficient means to map tree plantations is needed to detect tropical land use change and evaluate reforestation projects. To analyze recent tree plantation expansion in northeastern Costa Rica, we examined the potential of combining moderate-resolution hyperspectral imagery (2005 HyMap mosaic) with multitemporal, multispectral data (Landsat) to accurately classify (1) general forest types and (2) tree plantations by species composition. Following a linear discriminant analysis to reduce data dimensionality, we compared four Random Forest classification models: hyperspectral data (HD) alone; HD plus interannual spectral metrics; HD plus a multitemporal forest regrowth classification; and all three models combined. The fourth, combined model achieved overall accuracy of 88.5%. Adding multitemporal data significantly improved classification accuracy (p less than 0.0001) of all forest types, although the effect on tree plantation accuracy was modest. The hyperspectral data alone classified six species of tree plantations with 75% to 93% producer's accuracy; adding multitemporal spectral data increased accuracy only for two species with dense canopies. Non-native tree species had higher classification accuracy overall and made up the majority of tree plantations in this landscape. Our results indicate that combining occasionally acquired hyperspectral data with widely available multitemporal satellite imagery enhances mapping and monitoring of reforestation in tropical landscapes.

  19. Lidar-based individual tree species classification using convolutional neural network

    NASA Astrophysics Data System (ADS)

    Mizoguchi, Tomohiro; Ishii, Akira; Nakamura, Hiroyuki; Inoue, Tsuyoshi; Takamatsu, Hisashi

    2017-06-01

    Terrestrial lidar is commonly used for detailed documentation in the field of forest inventory investigation. Recent improvements of point cloud processing techniques enabled efficient and precise computation of an individual tree shape parameters, such as breast-height diameter, height, and volume. However, tree species are manually specified by skilled workers to date. Previous works for automatic tree species classification mainly focused on aerial or satellite images, and few works have been reported for classification techniques using ground-based sensor data. Several candidate sensors can be considered for classification, such as RGB or multi/hyper spectral cameras. Above all candidates, we use terrestrial lidar because it can obtain high resolution point cloud in the dark forest. We selected bark texture for the classification criteria, since they clearly represent unique characteristics of each tree and do not change their appearance under seasonable variation and aged deterioration. In this paper, we propose a new method for automatic individual tree species classification based on terrestrial lidar using Convolutional Neural Network (CNN). The key component is the creation step of a depth image which well describe the characteristics of each species from a point cloud. We focus on Japanese cedar and cypress which cover the large part of domestic forest. Our experimental results demonstrate the effectiveness of our proposed method.

  20. Evaluating multimedia chemical persistence: Classification and regression tree analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, D.H.; McKone, T.E.; Kastenberg, W.E.

    2000-04-01

    For the thousands of chemicals continuously released into the environment, it is desirable to make prospective assessments of those likely to be persistent. Widely distributed persistent chemicals are impossible to remove from the environment and remediation by natural processes may take decades, which is problematic if adverse health or ecological effects are discovered after prolonged release into the environment. A tiered approach using a classification scheme and a multimedia model for determining persistence is presented. Using specific criteria for persistence, a classification tree is developed to classify a chemical as persistent or nonpersistent based on the chemical properties. In thismore » approach, the classification is derived from the results of a standardized unit world multimedia model. Thus, the classifications are more robust for multimedia pollutants than classifications using a single medium half-life. The method can be readily implemented and provides insight without requiring extensive and often unavailable data. This method can be used to classify chemicals when only a few properties are known and can be used to direct further data collection. Case studies are presented to demonstrate the advantages of the approach.« less

  1. C-fuzzy variable-branch decision tree with storage and classification error rate constraints

    NASA Astrophysics Data System (ADS)

    Yang, Shiueng-Bien

    2009-10-01

    The C-fuzzy decision tree (CFDT), which is based on the fuzzy C-means algorithm, has recently been proposed. The CFDT is grown by selecting the nodes to be split according to its classification error rate. However, the CFDT design does not consider the classification time taken to classify the input vector. Thus, the CFDT can be improved. We propose a new C-fuzzy variable-branch decision tree (CFVBDT) with storage and classification error rate constraints. The design of the CFVBDT consists of two phases-growing and pruning. The CFVBDT is grown by selecting the nodes to be split according to the classification error rate and the classification time in the decision tree. Additionally, the pruning method selects the nodes to prune based on the storage requirement and the classification time of the CFVBDT. Furthermore, the number of branches of each internal node is variable in the CFVBDT. Experimental results indicate that the proposed CFVBDT outperforms the CFDT and other methods.

  2. Use of classification trees to apportion single echo detections to species: Application to the pelagic fish community of Lake Superior

    USGS Publications Warehouse

    Yule, Daniel L.; Adams, Jean V.; Hrabik, Thomas R.; Vinson, Mark R.; Woiak, Zebadiah; Ahrenstroff, Tyler D.

    2013-01-01

    Acoustic methods are used to estimate the density of pelagic fish in large lakes with results of midwater trawling used to assign species composition. Apportionment in lakes having mixed species can be challenging because only a small fraction of the water sampled acoustically is sampled with trawl gear. Here we describe a new method where single echo detections (SEDs) are assigned to species based on classification tree models developed from catch data that separate species based on fish size and the spatial habitats they occupy. During the summer of 2011, we conducted a spatially-balanced lake-wide acoustic and midwater trawl survey of Lake Superior. A total of 51 sites in four bathymetric depth strata (0–30 m, 30–100 m, 100–200 m, and >200 m) were sampled. We developed classification tree models for each stratum and found fish length was the most important variable for separating species. To apply these trees to the acoustic data, we needed to identify a target strength to length (TS-to-L) relationship appropriate for all abundant Lake Superior pelagic species. We tested performance of 7 general (i.e., multi-species) relationships derived from three published studies. The best-performing relationship was identified by comparing predicted and observed catch compositions using a second independent Lake Superior data set. Once identified, the relationship was used to predict lengths of SEDs from the lake-wide survey, and the classification tree models were used to assign each SED to a species. Exotic rainbow smelt (Osmerus mordax) were the most common species at bathymetric depths 100 m (384 million; 6.0 kt). Cisco (Coregonus artedi) were widely distributed over all strata with their population estimated at 182 million (44 kt). The apportionment method we describe should be transferable to other large lakes provided fish are not tightly aggregated, and an appropriate TS-to-L relationship for abundant pelagic fish species can be determined.

  3. An improved classification tree analysis of high cost modules based upon an axiomatic definition of complexity

    NASA Technical Reports Server (NTRS)

    Tian, Jianhui; Porter, Adam; Zelkowitz, Marvin V.

    1992-01-01

    Identification of high cost modules has been viewed as one mechanism to improve overall system reliability, since such modules tend to produce more than their share of problems. A decision tree model was used to identify such modules. In this current paper, a previously developed axiomatic model of program complexity is merged with the previously developed decision tree process for an improvement in the ability to identify such modules. This improvement was tested using data from the NASA Software Engineering Laboratory.

  4. Integrated Change Detection and Classification in Urban Areas Based on Airborne Laser Scanning Point Clouds.

    PubMed

    Tran, Thi Huong Giang; Ressl, Camillo; Pfeifer, Norbert

    2018-02-03

    This paper suggests a new approach for change detection (CD) in 3D point clouds. It combines classification and CD in one step using machine learning. The point cloud data of both epochs are merged for computing features of four types: features describing the point distribution, a feature relating to relative terrain elevation, features specific for the multi-target capability of laser scanning, and features combining the point clouds of both epochs to identify the change. All these features are merged in the points and then training samples are acquired to create the model for supervised classification, which is then applied to the whole study area. The final results reach an overall accuracy of over 90% for both epochs of eight classes: lost tree, new tree, lost building, new building, changed ground, unchanged building, unchanged tree, and unchanged ground.

  5. A fuzzy decision tree for fault classification.

    PubMed

    Zio, Enrico; Baraldi, Piero; Popescu, Irina C

    2008-02-01

    In plant accident management, the control room operators are required to identify the causes of the accident, based on the different patterns of evolution of the monitored process variables thereby developing. This task is often quite challenging, given the large number of process parameters monitored and the intense emotional states under which it is performed. To aid the operators, various techniques of fault classification have been engineered. An important requirement for their practical application is the physical interpretability of the relationships among the process variables underpinning the fault classification. In this view, the present work propounds a fuzzy approach to fault classification, which relies on fuzzy if-then rules inferred from the clustering of available preclassified signal data, which are then organized in a logical and transparent decision tree structure. The advantages offered by the proposed approach are precisely that a transparent fault classification model is mined out of the signal data and that the underlying physical relationships among the process variables are easily interpretable as linguistic if-then rules that can be explicitly visualized in the decision tree structure. The approach is applied to a case study regarding the classification of simulated faults in the feedwater system of a boiling water reactor.

  6. Effects of sample survey design on the accuracy of classification tree models in species distribution models

    Treesearch

    Thomas C. Edwards; D. Richard Cutler; Niklaus E. Zimmermann; Linda Geiser; Gretchen G. Moisen

    2006-01-01

    We evaluated the effects of probabilistic (hereafter DESIGN) and non-probabilistic (PURPOSIVE) sample surveys on resultant classification tree models for predicting the presence of four lichen species in the Pacific Northwest, USA. Models derived from both survey forms were assessed using an independent data set (EVALUATION). Measures of accuracy as gauged by...

  7. [An object-based information extraction technology for dominant tree species group types].

    PubMed

    Tian, Tian; Fan, Wen-yi; Lu, Wei; Xiao, Xiang

    2015-06-01

    Information extraction for dominant tree group types is difficult in remote sensing image classification, howevers, the object-oriented classification method using high spatial resolution remote sensing data is a new method to realize the accurate type information extraction. In this paper, taking the Jiangle Forest Farm in Fujian Province as the research area, based on the Quickbird image data in 2013, the object-oriented method was adopted to identify the farmland, shrub-herbaceous plant, young afforested land, Pinus massoniana, Cunninghamia lanceolata and broad-leave tree types. Three types of classification factors including spectral, texture, and different vegetation indices were used to establish a class hierarchy. According to the different levels, membership functions and the decision tree classification rules were adopted. The results showed that the method based on the object-oriented method by using texture, spectrum and the vegetation indices achieved the classification accuracy of 91.3%, which was increased by 5.7% compared with that by only using the texture and spectrum.

  8. A self-trained classification technique for producing 30 m percent-water maps from Landsat data

    USGS Publications Warehouse

    Rover, Jennifer R.; Wylie, Bruce K.; Ji, Lei

    2010-01-01

    Small bodies of water can be mapped with moderate-resolution satellite data using methods where water is mapped as subpixel fractions using field measurements or high-resolution images as training datasets. A new method, developed from a regression-tree technique, uses a 30 m Landsat image for training the regression tree that, in turn, is applied to the same image to map subpixel water. The self-trained method was evaluated by comparing the percent-water map with three other maps generated from established percent-water mapping methods: (1) a regression-tree model trained with a 5 m SPOT 5 image, (2) a regression-tree model based on endmembers and (3) a linear unmixing classification technique. The results suggest that subpixel water fractions can be accurately estimated when high-resolution satellite data or intensively interpreted training datasets are not available, which increases our ability to map small water bodies or small changes in lake size at a regional scale.

  9. A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Soner Yorgun, M.; Rood, Richard B.

    An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less

  10. A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

    DOE PAGES

    Soner Yorgun, M.; Rood, Richard B.

    2016-11-11

    An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less

  11. Comparison of Single and Multi-Scale Method for Leaf and Wood Points Classification from Terrestrial Laser Scanning Data

    NASA Astrophysics Data System (ADS)

    Wei, Hongqiang; Zhou, Guiyun; Zhou, Junjie

    2018-04-01

    The classification of leaf and wood points is an essential preprocessing step for extracting inventory measurements and canopy characterization of trees from the terrestrial laser scanning (TLS) data. The geometry-based approach is one of the widely used classification method. In the geometry-based method, it is common practice to extract salient features at one single scale before the features are used for classification. It remains unclear how different scale(s) used affect the classification accuracy and efficiency. To assess the scale effect on the classification accuracy and efficiency, we extracted the single-scale and multi-scale salient features from the point clouds of two oak trees of different sizes and conducted the classification on leaf and wood. Our experimental results show that the balanced accuracy of the multi-scale method is higher than the average balanced accuracy of the single-scale method by about 10 % for both trees. The average speed-up ratio of single scale classifiers over multi-scale classifier for each tree is higher than 30.

  12. Using Baidu Search Index to Predict Dengue Outbreak in China

    NASA Astrophysics Data System (ADS)

    Liu, Kangkang; Wang, Tao; Yang, Zhicong; Huang, Xiaodong; Milinovich, Gabriel J.; Lu, Yi; Jing, Qinlong; Xia, Yao; Zhao, Zhengyang; Yang, Yang; Tong, Shilu; Hu, Wenbiao; Lu, Jiahai

    2016-12-01

    This study identified the possible threshold to predict dengue fever (DF) outbreaks using Baidu Search Index (BSI). Time-series classification and regression tree models based on BSI were used to develop a predictive model for DF outbreak in Guangzhou and Zhongshan, China. In the regression tree models, the mean autochthonous DF incidence rate increased approximately 30-fold in Guangzhou when the weekly BSI for DF at the lagged moving average of 1-3 weeks was more than 382. When the weekly BSI for DF at the lagged moving average of 1-5 weeks was more than 91.8, there was approximately 9-fold increase of the mean autochthonous DF incidence rate in Zhongshan. In the classification tree models, the results showed that when the weekly BSI for DF at the lagged moving average of 1-3 weeks was more than 99.3, there was 89.28% chance of DF outbreak in Guangzhou, while, in Zhongshan, when the weekly BSI for DF at the lagged moving average of 1-5 weeks was more than 68.1, the chance of DF outbreak rose up to 100%. The study indicated that less cost internet-based surveillance systems can be the valuable complement to traditional DF surveillance in China.

  13. Modeling brook trout presence and absence from landscape variables using four different analytical methods

    USGS Publications Warehouse

    Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.

    2006-01-01

    As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.

  14. Modelling spruce bark beetle infestation probability

    Treesearch

    Paulius Zolubas; Jose Negron; A. Steven Munson

    2009-01-01

    Spruce bark beetle (Ips typographus L.) risk model, based on pure Norway spruce (Picea abies Karst.) stand characteristics in experimental and control plots was developed using classification and regression tree statistical technique under endemic pest population density. The most significant variable in spruce bark beetle...

  15. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    PubMed

    Henrard, S; Speybroeck, N; Hermans, C

    2015-11-01

    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  16. Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.

    PubMed

    Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi

    2013-01-01

    The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.

  17. Tree Classification Software

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1993-01-01

    This paper introduces the IND Tree Package to prospective users. IND does supervised learning using classification trees. This learning task is a basic tool used in the development of diagnosis, monitoring and expert systems. The IND Tree Package was developed as part of a NASA project to semi-automate the development of data analysis and modelling algorithms using artificial intelligence techniques. The IND Tree Package integrates features from CART and C4 with newer Bayesian and minimum encoding methods for growing classification trees and graphs. The IND Tree Package also provides an experimental control suite on top. The newer features give improved probability estimates often required in diagnostic and screening tasks. The package comes with a manual, Unix 'man' entries, and a guide to tree methods and research. The IND Tree Package is implemented in C under Unix and was beta-tested at university and commercial research laboratories in the United States.

  18. Learning accurate very fast decision trees from uncertain data streams

    NASA Astrophysics Data System (ADS)

    Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo

    2015-12-01

    Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.

  19. Towards a formal genealogical classification of the Lezgian languages (North Caucasus): testing various phylogenetic methods on lexical data.

    PubMed

    Kassian, Alexei

    2015-01-01

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies.

  20. Towards a Formal Genealogical Classification of the Lezgian Languages (North Caucasus): Testing Various Phylogenetic Methods on Lexical Data

    PubMed Central

    Kassian, Alexei

    2015-01-01

    A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies. PMID:25719456

  1. Classification and Progression Based on CFS-GA and C5.0 Boost Decision Tree of TCM Zheng in Chronic Hepatitis B.

    PubMed

    Chen, Xiao Yu; Ma, Li Zhuang; Chu, Na; Zhou, Min; Hu, Yiyang

    2013-01-01

    Chronic hepatitis B (CHB) is a serious public health problem, and Traditional Chinese Medicine (TCM) plays an important role in the control and treatment for CHB. In the treatment of TCM, zheng discrimination is the most important step. In this paper, an approach based on CFS-GA (Correlation based Feature Selection and Genetic Algorithm) and C5.0 boost decision tree is used for zheng classification and progression in the TCM treatment of CHB. The CFS-GA performs better than the typical method of CFS. By CFS-GA, the acquired attribute subset is classified by C5.0 boost decision tree for TCM zheng classification of CHB, and C5.0 decision tree outperforms two typical decision trees of NBTree and REPTree on CFS-GA, CFS, and nonselection in comparison. Based on the critical indicators from C5.0 decision tree, important lab indicators in zheng progression are obtained by the method of stepwise discriminant analysis for expressing TCM zhengs in CHB, and alterations of the important indicators are also analyzed in zheng progression. In conclusion, all the three decision trees perform better on CFS-GA than on CFS and nonselection, and C5.0 decision tree outperforms the two typical decision trees both on attribute selection and nonselection.

  2. Modeling of air pollutant removal by dry deposition to urban trees using a WRF/CMAQ/i-Tree Eco coupled system.

    PubMed

    Cabaraban, Maria Theresa I; Kroll, Charles N; Hirabayashi, Satoshi; Nowak, David J

    2013-05-01

    A distributed adaptation of i-Tree Eco was used to simulate dry deposition in an urban area. This investigation focused on the effects of varying temperature, LAI, and NO2 concentration inputs on estimated NO2 dry deposition to trees in Baltimore, MD. A coupled modeling system is described, wherein WRF provided temperature and LAI fields, and CMAQ provided NO2 concentrations. A base case simulation was conducted using built-in distributed i-Tree Eco tools, and simulations using different inputs were compared against this base case. Differences in land cover classification and tree cover between the distributed i-Tree Eco and WRF resulted in changes in estimated LAI, which in turn resulted in variations in simulated NO2 dry deposition. Estimated NO2 removal decreased when CMAQ-derived concentration was applied to the distributed i-Tree Eco simulation. Discrepancies in temperature inputs did little to affect estimates of NO2 removal by dry deposition to trees in Baltimore. Copyright © 2013 Elsevier Ltd. All rights reserved.

  3. The CERAD Neuropsychological Assessment Battery Is Sensitive to Alcohol-Related Cognitive Deficiencies in Elderly Patients: A Retrospective Matched Case-Control Study.

    PubMed

    Kaufmann, Liane; Huber, Stefan; Mayer, Daniel; Moeller, Korbinian; Marksteiner, Josef

    2018-04-01

    Adverse effects of heavy drinking on cognition have frequently been reported. In the present study, we systematically examined for the first time whether clinical neuropsychological assessments may be sensitive to alcohol abuse in elderly patients with suspected minor neurocognitive disorder. A total of 144 elderly with and without alcohol abuse (each group n=72; mean age 66.7 years) were selected from a patient pool of n=738 by applying propensity score matching (a statistical method allowing to match participants in experimental and control group by balancing various covariates to reduce selection bias). Accordingly, study groups were almost perfectly matched regarding age, education, gender, and Mini Mental State Examination score. Neuropsychological performance was measured using the CERAD (Consortium to Establish a Registry for Alzheimer's Disease). Classification analyses (i.e., decision tree and boosted trees models) were conducted to examine whether CERAD variables or total score contributed to group classification. Decision tree models disclosed that groups could be reliably classified based on the CERAD variables "Word List Discriminability" (tapping verbal recognition memory, 64% classification accuracy) and "Trail Making Test A" (measuring visuo-motor speed, 59% classification accuracy). Boosted tree analyses further indicated the sensitivity of "Word List Recall" (measuring free verbal recall) for discriminating elderly with versus without a history of alcohol abuse. This indicates that specific CERAD variables seem to be sensitive to alcohol-related cognitive dysfunctions in elderly patients with suspected minor neurocognitive disorder. (JINS, 2018, 24, 360-371).

  4. Analyses of rear-end crashes based on classification tree models.

    PubMed

    Yan, Xuedong; Radwan, Essam

    2006-09-01

    Signalized intersections are accident-prone areas especially for rear-end crashes due to the fact that the diversity of the braking behaviors of drivers increases during the signal change. The objective of this article is to improve knowledge of the relationship between rear-end crashes occurring at signalized intersections and a series of potential traffic risk factors classified by driver characteristics, environments, and vehicle types. Based on the 2001 Florida crash database, the classification tree method and Quasi-induced exposure concept were used to perform the statistical analysis. Two binary classification tree models were developed in this study. One was used for the crash comparison between rear-end and non-rear-end to identify those specific trends of the rear-end crashes. The other was constructed for the comparison between striking vehicles/drivers (at-fault) and struck vehicles/drivers (not-at-fault) to find more complex crash pattern associated with the traffic attributes of driver, vehicle, and environment. The modeling results showed that the rear-end crashes are over-presented in the higher speed limits (45-55 mph); the rear-end crash propensity for daytime is apparently larger than nighttime; and the reduction of braking capacity due to wet and slippery road surface conditions would definitely contribute to rear-end crashes, especially at intersections with higher speed limits. The tree model segmented drivers into four homogeneous age groups: < 21 years, 21-31 years, 32-75 years, and > 75 years. The youngest driver group shows the largest crash propensity; in the 21-31 age group, the male drivers are over-involved in rear-end crashes under adverse weather conditions and the 32-75 years drivers driving large size vehicles have a larger crash propensity compared to those driving passenger vehicles. Combined with the quasi-induced exposure concept, the classification tree method is a proper statistical tool for traffic-safety analysis to investigate crash propensity. Compared to the logistic regression models, tree models have advantages for handling continuous independent variables and easily explaining the complex interaction effect with more than two independent variables. This research recommended that at signalized intersections with higher speed limits, reducing the speed limit to 40 mph efficiently contribute to a lower accident rate. Drivers involved in alcohol use may increase not only rear-end crash risk but also the driver injury severity. Education and enforcement countermeasures should focus on the driver group younger than 21 years. Further studies are suggested to compare crash risk distributions of the driver age for other main crash types to seek corresponding traffic countermeasures.

  5. Pre-operative prediction of surgical morbidity in children: comparison of five statistical models.

    PubMed

    Cooper, Jennifer N; Wei, Lai; Fernandez, Soledad A; Minneci, Peter C; Deans, Katherine J

    2015-02-01

    The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children. We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a LR model that assumed linearity and additivity (simple LR model) (2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity. The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination. Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks. Copyright © 2014 Elsevier Ltd. All rights reserved.

  6. Discriminant forest classification method and system

    DOEpatents

    Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.

    2012-11-06

    A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.

  7. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015-077

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov

    2015-01-01

    The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…

  8. Stratification of the severity of critically ill patients with classification trees

    PubMed Central

    2009-01-01

    Background Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69-75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients. PMID:20003229

  9. Automated Decision Tree Classification of Corneal Shape

    PubMed Central

    Twa, Michael D.; Parthasarathy, Srinivasan; Roberts, Cynthia; Mahmoud, Ashraf M.; Raasch, Thomas W.; Bullimore, Mark A.

    2011-01-01

    Purpose The volume and complexity of data produced during videokeratography examinations present a challenge of interpretation. As a consequence, results are often analyzed qualitatively by subjective pattern recognition or reduced to comparisons of summary indices. We describe the application of decision tree induction, an automated machine learning classification method, to discriminate between normal and keratoconic corneal shapes in an objective and quantitative way. We then compared this method with other known classification methods. Methods The corneal surface was modeled with a seventh-order Zernike polynomial for 132 normal eyes of 92 subjects and 112 eyes of 71 subjects diagnosed with keratoconus. A decision tree classifier was induced using the C4.5 algorithm, and its classification performance was compared with the modified Rabinowitz–McDonnell index, Schwiegerling’s Z3 index (Z3), Keratoconus Prediction Index (KPI), KISA%, and Cone Location and Magnitude Index using recommended classification thresholds for each method. We also evaluated the area under the receiver operator characteristic (ROC) curve for each classification method. Results Our decision tree classifier performed equal to or better than the other classifiers tested: accuracy was 92% and the area under the ROC curve was 0.97. Our decision tree classifier reduced the information needed to distinguish between normal and keratoconus eyes using four of 36 Zernike polynomial coefficients. The four surface features selected as classification attributes by the decision tree method were inferior elevation, greater sagittal depth, oblique toricity, and trefoil. Conclusions Automated decision tree classification of corneal shape through Zernike polynomials is an accurate quantitative method of classification that is interpretable and can be generated from any instrument platform capable of raw elevation data output. This method of pattern classification is extendable to other classification problems. PMID:16357645

  10. The process and utility of classification and regression tree methodology in nursing research

    PubMed Central

    Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda

    2014-01-01

    Aim This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Background Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Design Discussion paper. Data sources English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984–2013. Discussion Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Implications for Nursing Research Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Conclusion Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. PMID:24237048

  11. The process and utility of classification and regression tree methodology in nursing research.

    PubMed

    Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda

    2014-06-01

    This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.

  12. Identifying type 1 and type 2 diabetic cases using administrative data: a tree-structured model.

    PubMed

    Lo-Ciganic, Weihsuan; Zgibor, Janice C; Ruppert, Kristine; Arena, Vincent C; Stone, Roslyn A

    2011-05-01

    To date, few administrative diabetes mellitus (DM) registries have distinguished type 1 diabetes mellitus (T1DM) from type 2 diabetes mellitus (T2DM). Using a classification tree model, a prediction rule was developed to distinguish T1DM from T2DM in a large administrative database. The Medical Archival Retrieval System at the University of Pittsburgh Medical Center included administrative and clinical data from January 1, 2000, through September 30, 2009, for 209,647 DM patients aged ≥18 years. Probable cases (8,173 T1DM and 125,111 T2DM) were identified by applying clinical criteria to administrative data. Nonparametric classification tree models were fit using TIBCO Spotfire S+ 8.1 (TIBCO Software), with model size based on 10-fold cross validation. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of T1DM were estimated. The main predictors that distinguished T1DM from T2DM are age <40 years; International Classification of Disease, 9th revision, codes of T1DM or T2DM diagnosis; inpatient oral hypoglycemic agent use; inpatient insulin use; and episode(s) of diabetic ketoacidosis diagnosis. Compared with a complex clinical algorithm, the tree-structured model to predict T1DM had 92.8% sensitivity, 99.3% specificity, 89.5% PPV, and 99.5% NPV. The preliminary predictive rule appears to be promising. Being able to distinguish between DM subtypes in administrative databases will allow large-scale subtype-specific analyses of medical care costs, morbidity, and mortality. © 2011 Diabetes Technology Society.

  13. Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data mining environment

    NASA Astrophysics Data System (ADS)

    Naidoo, L.; Cho, M. A.; Mathieu, R.; Asner, G.

    2012-04-01

    The accurate classification and mapping of individual trees at species level in the savanna ecosystem can provide numerous benefits for the managerial authorities. Such benefits include the mapping of economically useful tree species, which are a key source of food production and fuel wood for the local communities, and of problematic alien invasive and bush encroaching species, which can threaten the integrity of the environment and livelihoods of the local communities. Species level mapping is particularly challenging in African savannas which are complex, heterogeneous, and open environments with high intra-species spectral variability due to differences in geology, topography, rainfall, herbivory and human impacts within relatively short distances. Savanna vegetation are also highly irregular in canopy and crown shape, height and other structural dimensions with a combination of open grassland patches and dense woody thicket - a stark contrast to the more homogeneous forest vegetation. This study classified eight common savanna tree species in the Greater Kruger National Park region, South Africa, using a combination of hyperspectral and Light Detection and Ranging (LiDAR)-derived structural parameters, in the form of seven predictor datasets, in an automated Random Forest modelling approach. The most important predictors, which were found to play an important role in the different classification models and contributed to the success of the hybrid dataset model when combined, were species tree height; NDVI; the chlorophyll b wavelength (466 nm) and a selection of raw, continuum removed and Spectral Angle Mapper (SAM) bands. It was also concluded that the hybrid predictor dataset Random Forest model yielded the highest classification accuracy and prediction success for the eight savanna tree species with an overall classification accuracy of 87.68% and KHAT value of 0.843.

  14. A Hierarchical Object-oriented Urban Land Cover Classification Using WorldView-2 Imagery and Airborne LiDAR data

    NASA Astrophysics Data System (ADS)

    Wu, M. F.; Sun, Z. C.; Yang, B.; Yu, S. S.

    2016-11-01

    In order to reduce the “salt and pepper” in pixel-based urban land cover classification and expand the application of fusion of multi-source data in the field of urban remote sensing, WorldView-2 imagery and airborne Light Detection and Ranging (LiDAR) data were used to improve the classification of urban land cover. An approach of object- oriented hierarchical classification was proposed in our study. The processing of proposed method consisted of two hierarchies. (1) In the first hierarchy, LiDAR Normalized Digital Surface Model (nDSM) image was segmented to objects. The NDVI, Costal Blue and nDSM thresholds were set for extracting building objects. (2) In the second hierarchy, after removing building objects, WorldView-2 fused imagery was obtained by Haze-ratio-based (HR) fusion, and was segmented. A SVM classifier was applied to generate road/parking lot, vegetation and bare soil objects. (3) Trees and grasslands were split based on an nDSM threshold (2.4 meter). The results showed that compared with pixel-based and non-hierarchical object-oriented approach, proposed method provided a better performance of urban land cover classification, the overall accuracy (OA) and overall kappa (OK) improved up to 92.75% and 0.90. Furthermore, proposed method reduced “salt and pepper” in pixel-based classification, improved the extraction accuracy of buildings based on LiDAR nDSM image segmentation, and reduced the confusion between trees and grasslands through setting nDSM threshold.

  15. Estimating probabilities of infestation and extent of damage by the roundheaded pine beetle in ponderosa pine in the Sacramento Mountains, New Mexico

    Treesearch

    Jose Negron

    1997-01-01

    Classification trees and linear regression analysis were used to build models to predict probabilities of infestation and amount of tree mortality in terms of basal area resulting from roundheaded pine beetle, Dendroctonus adjunctus Blandford, activity in ponderosa pine, Pinus ponderosa Laws., in the Sacramento Mountains, New Mexico. Classification trees were built for...

  16. The relationship between tree growth patterns and likelihood of mortality: A study of two tree species in the Sierra Nevada

    USGS Publications Warehouse

    Das, A.J.; Battles, J.J.; Stephenson, N.L.; van Mantgem, P.J.

    2007-01-01

    We examined mortality of Abies concolor (Gord. & Glend.) Lindl. (white fir) and Pinus lambertiana Dougl. (sugar pine) by developing logistic models using three growth indices obtained from tree rings: average growth, growth trend, and count of abrupt growth declines. For P. lambertiana, models with average growth, growth trend, and count of abrupt declines improved overall prediction (78.6% dead trees correctly classified, 83.7% live trees correctly classified) compared with a model with average recent growth alone (69.6% dead trees correctly classified, 67.3% live trees correctly classified). For A. concolor, counts of abrupt declines and longer time intervals improved overall classification (trees with DBH ???20 cm: 78.9% dead trees correctly classified and 76.7% live trees correctly classified vs. 64.9% dead trees correctly classified and 77.9% live trees correctly classified; trees with DBH <20 cm: 71.6% dead trees correctly classified and 71.0% live trees correctly classified vs. 67.2% dead trees correctly classified and 66.7% live trees correctly classified). In general, count of abrupt declines improved live-tree classification. External validation of A. concolor models showed that they functioned well at stands not used in model development, and the development of size-specific models demonstrated important differences in mortality risk between understory and canopy trees. Population-level mortality-risk models were developed for A. concolor and generated realistic mortality rates at two sites. Our results support the contention that a more comprehensive use of the growth record yields a more robust assessment of mortality risk. ?? 2007 NRC.

  17. Modeling ready biodegradability of fragrance materials.

    PubMed

    Ceriani, Lidia; Papa, Ester; Kovarich, Simona; Boethling, Robert; Gramatica, Paola

    2015-06-01

    In the present study, quantitative structure activity relationships were developed for predicting ready biodegradability of approximately 200 heterogeneous fragrance materials. Two classification methods, classification and regression tree (CART) and k-nearest neighbors (kNN), were applied to perform the modeling. The models were validated with multiple external prediction sets, and the structural applicability domain was verified by the leverage approach. The best models had good sensitivity (internal ≥80%; external ≥68%), specificity (internal ≥80%; external 73%), and overall accuracy (≥75%). Results from the comparison with BIOWIN global models, based on group contribution method, show that specific models developed in the present study perform better in prediction than BIOWIN6, in particular for the correct classification of not readily biodegradable fragrance materials. © 2015 SETAC.

  18. Instruction-matrix-based genetic programming.

    PubMed

    Li, Gang; Wang, Jin Feng; Lee, Kin Hong; Leung, Kwong-Sak

    2008-08-01

    In genetic programming (GP), evolving tree nodes separately would reduce the huge solution space. However, tree nodes are highly interdependent with respect to their fitness. In this paper, we propose a new GP framework, namely, instruction-matrix (IM)-based GP (IMGP), to handle their interactions. IMGP maintains an IM to evolve tree nodes and subtrees separately. IMGP extracts program trees from an IM and updates the IM with the information of the extracted program trees. As the IM actually keeps most of the information of the schemata of GP and evolves the schemata directly, IMGP is effective and efficient. Our experimental results on benchmark problems have verified that IMGP is not only better than those of canonical GP in terms of the qualities of the solutions and the number of program evaluations, but they are also better than some of the related GP algorithms. IMGP can also be used to evolve programs for classification problems. The classifiers obtained have higher classification accuracies than four other GP classification algorithms on four benchmark classification problems. The testing errors are also comparable to or better than those obtained with well-known classifiers. Furthermore, an extended version, called condition matrix for rule learning, has been used successfully to handle multiclass classification problems.

  19. Generation of 2D Land Cover Maps for Urban Areas Using Decision Tree Classification

    NASA Astrophysics Data System (ADS)

    Höhle, J.

    2014-09-01

    A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects like buildings, roads, grassland, trees, hedges, and walls from such an "intelligent" point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using image analysis techniques. The proposed methodology is described step by step. The classification, assessment, and refinement is carried out by the open source software "R"; the generation of the dense and accurate digital surface model by the "Match-T DSM" program of the Trimble Company. A practical example of a 2D land cover map generation is carried out. Images of a multispectral medium-format aerial camera covering an urban area in Switzerland are used. The assessment of the produced land cover map is based on class-wise stratified sampling where reference values of samples are determined by means of stereo-observations of false-colour stereopairs. The stratified statistical assessment of the produced land cover map with six classes and based on 91 points per class reveals a high thematic accuracy for classes "building" (99 %, 95 % CI: 95 %-100 %) and "road and parking lot" (90 %, 95 % CI: 83 %-95 %). Some other accuracy measures (overall accuracy, kappa value) and their 95 % confidence intervals are derived as well. The proposed methodology has a high potential for automation and fast processing and may be applied to other scenes and sensors.

  20. Land Cover Mapping using GEOBIA to Estimate Loss of Salacca zalacca Trees in Landslide Area of Clapar, Madukara District of Banjarnegara

    NASA Astrophysics Data System (ADS)

    Permata, Anggi; Juniansah, Anwar; Nurcahyati, Eka; Dimas Afrizal, Mousafi; Adnan Shafry Untoro, Muhammad; Arifatha, Na'ima; Ramadhani Yudha Adiwijaya, Raden; Farda, Nur Mohammad

    2016-11-01

    Landslide is an unpredictable natural disaster which commonly happens in highslope area. Aerial photography in small format is one of acquisition method that can reach and obtain high resolution spatial data faster than other methods, and provide data such as orthomosaic and Digital Surface Model (DSM). The study area contained landslide area in Clapar, Madukara District of Banjarnegara. Aerial photographs of landslide area provided advantage in objects visibility. Object's characters such as shape, size, and texture were clearly seen, therefore GEOBIA (Geography Object Based Image Analysis) was compatible as method for classifying land cover in study area. Dissimilar with PPA (PerPixel Analyst) method that used spectral information as base object detection, GEOBIA could use spatial elements as classification basis to establish a land cover map with better accuracy. GEOBIA method used classification hierarchy to divide post disaster land cover into three main objects: vegetation, landslide/soil, and building. Those three were required to obtain more detailed information that can be used in estimating loss caused by landslide and establishing land cover map in landslide area. Estimating loss in landslide area related to damage in Salak (Salacca zalacca) plantations. This estimation towards quantity of Salak tree that were drifted away by landslide was calculated in assumption that every tree damaged by landslide had same age and production class with other tree that weren't damaged. Loss calculation was done by approximating quantity of damaged trees in landslide area with data of trees around area that were acquired from GEOBIA classification method.

  1. Comparing the performance of flat and hierarchical Habitat/Land-Cover classification models in a NATURA 2000 site

    NASA Astrophysics Data System (ADS)

    Gavish, Yoni; O'Connell, Jerome; Marsh, Charles J.; Tarantino, Cristina; Blonda, Palma; Tomaselli, Valeria; Kunin, William E.

    2018-02-01

    The increasing need for high quality Habitat/Land-Cover (H/LC) maps has triggered considerable research into novel machine-learning based classification models. In many cases, H/LC classes follow pre-defined hierarchical classification schemes (e.g., CORINE), in which fine H/LC categories are thematically nested within more general categories. However, none of the existing machine-learning algorithms account for this pre-defined hierarchical structure. Here we introduce a novel Random Forest (RF) based application of hierarchical classification, which fits a separate local classification model in every branching point of the thematic tree, and then integrates all the different local models to a single global prediction. We applied the hierarchal RF approach in a NATURA 2000 site in Italy, using two land-cover (CORINE, FAO-LCCS) and one habitat classification scheme (EUNIS) that differ from one another in the shape of the class hierarchy. For all 3 classification schemes, both the hierarchical model and a flat model alternative provided accurate predictions, with kappa values mostly above 0.9 (despite using only 2.2-3.2% of the study area as training cells). The flat approach slightly outperformed the hierarchical models when the hierarchy was relatively simple, while the hierarchical model worked better under more complex thematic hierarchies. Most misclassifications came from habitat pairs that are thematically distant yet spectrally similar. In 2 out of 3 classification schemes, the additional constraints of the hierarchical model resulted with fewer such serious misclassifications relative to the flat model. The hierarchical model also provided valuable information on variable importance which can shed light into "black-box" based machine learning algorithms like RF. We suggest various ways by which hierarchical classification models can increase the accuracy and interpretability of H/LC classification maps.

  2. New Tree-Classification System Used by the Southern Forest Inventory and Analysis Unit

    Treesearch

    Dennis M. May; John S. Vissage; D. Vince Few

    1990-01-01

    Trees at USDA Forest Service, Southern Forest Inventory and Analysis, sample locations are classified as growing stock or cull based on their ability to produce sawlogs. The old and new classification systems are compared, and the impacts of the new system on the reporting of tree volumes are illustrated with inventory data from north Alabama.

  3. Improving ensemble decision tree performance using Adaboost and Bagging

    NASA Astrophysics Data System (ADS)

    Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie

    2015-12-01

    Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.

  4. A framework for mapping tree species combining hyperspectral and LiDAR data: Role of selected classifiers and sensor across three spatial scales

    NASA Astrophysics Data System (ADS)

    Ghosh, Aniruddha; Fassnacht, Fabian Ewald; Joshi, P. K.; Koch, Barbara

    2014-02-01

    Knowledge of tree species distribution is important worldwide for sustainable forest management and resource evaluation. The accuracy and information content of species maps produced using remote sensing images vary with scale, sensor (optical, microwave, LiDAR), classification algorithm, verification design and natural conditions like tree age, forest structure and density. Imaging spectroscopy reduces the inaccuracies making use of the detailed spectral response. However, the scale effect still has a strong influence and cannot be neglected. This study aims to bridge the knowledge gap in understanding the scale effect in imaging spectroscopy when moving from 4 to 30 m pixel size for tree species mapping, keeping in mind that most current and future hyperspectral satellite based sensors work with spatial resolution around 30 m or more. Two airborne (HyMAP) and one spaceborne (Hyperion) imaging spectroscopy dataset with pixel sizes of 4, 8 and 30 m, respectively were available to examine the effect of scale over a central European forest. The forest under examination is a typical managed forest with relatively homogenous stands featuring mostly two canopy layers. Normalized digital surface model (nDSM) derived from LiDAR data was used additionally to examine the effect of height information in tree species mapping. Six different sets of predictor variables (reflectance value of all bands, selected components of a Minimum Noise Fraction (MNF), Vegetation Indices (VI) and each of these sets combined with LiDAR derived height) were explored at each scale. Supervised kernel based (Support Vector Machines) and ensemble based (Random Forest) machine learning algorithms were applied on the dataset to investigate the effect of the classifier. Iterative bootstrap-validation with 100 iterations was performed for classification model building and testing for all the trials. For scale, analysis of overall classification accuracy and kappa values indicated that 8 m spatial resolution (reaching kappa values of over 0.83) slightly outperformed the results obtained from 4 m for the study area and five tree species under examination. The 30 m resolution Hyperion image produced sound results (kappa values of over 0.70), which in some areas of the test site were comparable with the higher spatial resolution imagery when qualitatively assessing the map outputs. Considering input predictor sets, MNF bands performed best at 4 and 8 m resolution. Optical bands were found to be best for 30 m spatial resolution. Classification with MNF as input predictors produced better visual appearance of tree species patches when compared with reference maps. Based on the analysis, it was concluded that there is no significant effect of height information on tree species classification accuracies for the present framework and study area. Furthermore, in the examined cases there was no single best choice among the two classifiers across scales and predictors. It can be concluded that tree species mapping from imaging spectroscopy for forest sites comparable to the one under investigation is possible with reliable accuracies not only from airborne but also from spaceborne imaging spectroscopy datasets.

  5. Newer classification and regression tree techniques: Bagging and Random Forests for ecological prediction

    Treesearch

    Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw

    2006-01-01

    We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.

  6. Prevalence and Determinants of Preterm Birth in Tehran, Iran: A Comparison between Logistic Regression and Decision Tree Methods.

    PubMed

    Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi

    2017-06-01

    Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.

  7. Improved wetland remote sensing in Yellowstone National Park using classification trees to combine TM imagery and ancillary environmental data

    USGS Publications Warehouse

    Wright, C.; Gallant, Alisa L.

    2007-01-01

    The U.S. Fish and Wildlife Service uses the term palustrine wetland to describe vegetated wetlands traditionally identified as marsh, bog, fen, swamp, or wet meadow. Landsat TM imagery was combined with image texture and ancillary environmental data to model probabilities of palustrine wetland occurrence in Yellowstone National Park using classification trees. Model training and test locations were identified from National Wetlands Inventory maps, and classification trees were built for seven years spanning a range of annual precipitation. At a coarse level, palustrine wetland was separated from upland. At a finer level, five palustrine wetland types were discriminated: aquatic bed (PAB), emergent (PEM), forested (PFO), scrub–shrub (PSS), and unconsolidated shore (PUS). TM-derived variables alone were relatively accurate at separating wetland from upland, but model error rates dropped incrementally as image texture, DEM-derived terrain variables, and other ancillary GIS layers were added. For classification trees making use of all available predictors, average overall test error rates were 7.8% for palustrine wetland/upland models and 17.0% for palustrine wetland type models, with consistent accuracies across years. However, models were prone to wetland over-prediction. While the predominant PEM class was classified with omission and commission error rates less than 14%, we had difficulty identifying the PAB and PSS classes. Ancillary vegetation information greatly improved PSS classification and moderately improved PFO discrimination. Association with geothermal areas distinguished PUS wetlands. Wetland over-prediction was exacerbated by class imbalance in likely combination with spatial and spectral limitations of the TM sensor. Wetland probability surfaces may be more informative than hard classification, and appear to respond to climate-driven wetland variability. The developed method is portable, relatively easy to implement, and should be applicable in other settings and over larger extents.

  8. Improved Fuzzy K-Nearest Neighbor Using Modified Particle Swarm Optimization

    NASA Astrophysics Data System (ADS)

    Jamaluddin; Siringoringo, Rimbun

    2017-12-01

    Fuzzy k-Nearest Neighbor (FkNN) is one of the most powerful classification methods. The presence of fuzzy concepts in this method successfully improves its performance on almost all classification issues. The main drawbackof FKNN is that it is difficult to determine the parameters. These parameters are the number of neighbors (k) and fuzzy strength (m). Both parameters are very sensitive. This makes it difficult to determine the values of ‘m’ and ‘k’, thus making FKNN difficult to control because no theories or guides can deduce how proper ‘m’ and ‘k’ should be. This study uses Modified Particle Swarm Optimization (MPSO) to determine the best value of ‘k’ and ‘m’. MPSO is focused on the Constriction Factor Method. Constriction Factor Method is an improvement of PSO in order to avoid local circumstances optima. The model proposed in this study was tested on the German Credit Dataset. The test of the data/The data test has been standardized by UCI Machine Learning Repository which is widely applied to classification problems. The application of MPSO to the determination of FKNN parameters is expected to increase the value of classification performance. Based on the experiments that have been done indicating that the model offered in this research results in a better classification performance compared to the Fk-NN model only. The model offered in this study has an accuracy rate of 81%, while. With using Fk-NN model, it has the accuracy of 70%. At the end is done comparison of research model superiority with 2 other classification models;such as Naive Bayes and Decision Tree. This research model has a better performance level, where Naive Bayes has accuracy 75%, and the decision tree model has 70%

  9. A hybrid method for classifying cognitive states from fMRI data.

    PubMed

    Parida, S; Dehuri, S; Cho, S-B; Cacha, L A; Poznanski, R R

    2015-09-01

    Functional magnetic resonance imaging (fMRI) makes it possible to detect brain activities in order to elucidate cognitive-states. The complex nature of fMRI data requires under-standing of the analyses applied to produce possible avenues for developing models of cognitive state classification and improving brain activity prediction. While many models of classification task of fMRI data analysis have been developed, in this paper, we present a novel hybrid technique through combining the best attributes of genetic algorithms (GAs) and ensemble decision tree technique that consistently outperforms all other methods which are being used for cognitive-state classification. Specifically, this paper illustrates the combined effort of decision-trees ensemble and GAs for feature selection through an extensive simulation study and discusses the classification performance with respect to fMRI data. We have shown that our proposed method exhibits significant reduction of the number of features with clear edge classification accuracy over ensemble of decision-trees.

  10. [RS estimation of inventory parameters and carbon storage of moso bamboo forest based on synergistic use of object-based image analysis and decision tree].

    PubMed

    Du, Hua Qiang; Sun, Xiao Yan; Han, Ning; Mao, Fang Jie

    2017-10-01

    By synergistically using the object-based image analysis (OBIA) and the classification and regression tree (CART) methods, the distribution information, the indexes (including diameter at breast, tree height, and crown closure), and the aboveground carbon storage (AGC) of moso bamboo forest in Shanchuan Town, Anji County, Zhejiang Province were investigated. The results showed that the moso bamboo forest could be accurately delineated by integrating the multi-scale ima ge segmentation in OBIA technique and CART, which connected the image objects at various scales, with a pretty good producer's accuracy of 89.1%. The investigation of indexes estimated by regression tree model that was constructed based on the features extracted from the image objects reached normal or better accuracy, in which the crown closure model archived the best estimating accuracy of 67.9%. The estimating accuracy of diameter at breast and tree height was relatively low, which was consistent with conclusion that estimating diameter at breast and tree height using optical remote sensing could not achieve satisfactory results. Estimation of AGC reached relatively high accuracy, and accuracy of the region of high value achieved above 80%.

  11. Remote sensing-based predictors improve distribution models of rare, early successional and boradleaf tree species in Utah

    Treesearch

    N. E. Zimmermann; T. C. Edwards; G. G. Moisen; T. S. Frescino; J. A. Blackard

    2007-01-01

    Compared to bioclimatic variables, remote sensing predictors are rarely used for predictive species modelling. When used, the predictors represent typically habitat classifications or filters rather than gradual spectral, surface or biophysical properties. Consequently, the full potential of remotely sensed predictors for modelling the spatial distribution of species...

  12. Boosted classification trees result in minor to modest improvement in the accuracy in classifying cardiovascular outcomes compared to conventional classification trees

    PubMed Central

    Austin, Peter C; Lee, Douglas S

    2011-01-01

    Purpose: Classification trees are increasingly being used to classifying patients according to the presence or absence of a disease or health outcome. A limitation of classification trees is their limited predictive accuracy. In the data-mining and machine learning literature, boosting has been developed to improve classification. Boosting with classification trees iteratively grows classification trees in a sequence of reweighted datasets. In a given iteration, subjects that were misclassified in the previous iteration are weighted more highly than subjects that were correctly classified. Classifications from each of the classification trees in the sequence are combined through a weighted majority vote to produce a final classification. The authors' objective was to examine whether boosting improved the accuracy of classification trees for predicting outcomes in cardiovascular patients. Methods: We examined the utility of boosting classification trees for classifying 30-day mortality outcomes in patients hospitalized with either acute myocardial infarction or congestive heart failure. Results: Improvements in the misclassification rate using boosted classification trees were at best minor compared to when conventional classification trees were used. Minor to modest improvements to sensitivity were observed, with only a negligible reduction in specificity. For predicting cardiovascular mortality, boosted classification trees had high specificity, but low sensitivity. Conclusions: Gains in predictive accuracy for predicting cardiovascular outcomes were less impressive than gains in performance observed in the data mining literature. PMID:22254181

  13. Land Covers Classification Based on Random Forest Method Using Features from Full-Waveform LIDAR Data

    NASA Astrophysics Data System (ADS)

    Ma, L.; Zhou, M.; Li, C.

    2017-09-01

    In this study, a Random Forest (RF) based land covers classification method is presented to predict the types of land covers in Miyun area. The returned full-waveforms which were acquired by a LiteMapper 5600 airborne LiDAR system were processed, including waveform filtering, waveform decomposition and features extraction. The commonly used features that were distance, intensity, Full Width at Half Maximum (FWHM), skewness and kurtosis were extracted. These waveform features were used as attributes of training data for generating the RF prediction model. The RF prediction model was applied to predict the types of land covers in Miyun area as trees, buildings, farmland and ground. The classification results of these four types of land covers were obtained according to the ground truth information acquired from CCD image data of the same region. The RF classification results were compared with that of SVM method and show better results. The RF classification accuracy reached 89.73% and the classification Kappa was 0.8631.

  14. Computer aided diagnosis system for the Alzheimer's disease based on partial least squares and random forest SPECT image classification.

    PubMed

    Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P

    2010-03-19

    This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.

  15. An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

    PubMed

    Liang, Ying; Liao, Bo; Zhu, Wen

    2017-01-01

    Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.

  16. Influence of Texture and Colour in Breast TMA Classification

    PubMed Central

    Fernández-Carrobles, M. Milagro; Bueno, Gloria; Déniz, Oscar; Salido, Jesús; García-Rojo, Marcial; González-López, Lucía

    2015-01-01

    Breast cancer diagnosis is still done by observation of biopsies under the microscope. The development of automated methods for breast TMA classification would reduce diagnostic time. This paper is a step towards the solution for this problem and shows a complete study of breast TMA classification based on colour models and texture descriptors. The TMA images were divided into four classes: i) benign stromal tissue with cellularity, ii) adipose tissue, iii) benign and benign anomalous structures, and iv) ductal and lobular carcinomas. A relevant set of features was obtained on eight different colour models from first and second order Haralick statistical descriptors obtained from the intensity image, Fourier, Wavelets, Multiresolution Gabor, M-LBP and textons descriptors. Furthermore, four types of classification experiments were performed using six different classifiers: (1) classification per colour model individually, (2) classification by combination of colour models, (3) classification by combination of colour models and descriptors, and (4) classification by combination of colour models and descriptors with a previous feature set reduction. The best result shows an average of 99.05% accuracy and 98.34% positive predictive value. These results have been obtained by means of a bagging tree classifier with combination of six colour models and the use of 1719 non-correlated (correlation threshold of 97%) textural features based on Statistical, M-LBP, Gabor and Spatial textons descriptors. PMID:26513238

  17. Unrealistic phylogenetic trees may improve phylogenetic footprinting.

    PubMed

    Nettling, Martin; Treutler, Hendrik; Cerquides, Jesus; Grosse, Ivo

    2017-06-01

    The computational investigation of DNA binding motifs from binding sites is one of the classic tasks in bioinformatics and a prerequisite for understanding gene regulation as a whole. Due to the development of sequencing technologies and the increasing number of available genomes, approaches based on phylogenetic footprinting become increasingly attractive. Phylogenetic footprinting requires phylogenetic trees with attached substitution probabilities for quantifying the evolution of binding sites, but these trees and substitution probabilities are typically not known and cannot be estimated easily. Here, we investigate the influence of phylogenetic trees with different substitution probabilities on the classification performance of phylogenetic footprinting using synthetic and real data. For synthetic data we find that the classification performance is highest when the substitution probability used for phylogenetic footprinting is similar to that used for data generation. For real data, however, we typically find that the classification performance of phylogenetic footprinting surprisingly increases with increasing substitution probabilities and is often highest for unrealistically high substitution probabilities close to one. This finding suggests that choosing realistic model assumptions might not always yield optimal predictions in general and that choosing unrealistically high substitution probabilities close to one might actually improve the classification performance of phylogenetic footprinting. The proposed PF is implemented in JAVA and can be downloaded from https://github.com/mgledi/PhyFoo. : martin.nettling@informatik.uni-halle.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.

  18. The Early Detection of the Emerald Ash Borer (EAB) Using Advanced Geospacial Technologies

    NASA Astrophysics Data System (ADS)

    Hu, B.; Li, J.; Wang, J.; Hall, B.

    2014-11-01

    The objectives of this study were to exploit Light Detection And Ranging (LiDAR) and very high spatial resolution (VHR) data and their synergy with hyperspectral imagery in the early detection of the EAB presence in trees within urban areas and to develop a framework to combine information extracted from multiple data sources. To achieve these, an object-oriented framework was developed to combine information derived from available data sets to characterize ash trees. Within this framework, individual trees were first extracted and then classified into different species based on their spectral information derived from hyperspectral imagery, spatial information from VHR imagery, and for each ash tree its health state and EAB infestation stage were determined based on hyperspectral imagery. The developed framework and methods were demonstrated to be effective according to the results obtained on two study sites in the city of Toronto, Ontario Canada. The individual tree delineation method provided satisfactory results with an overall accuracy of 78 % and 19 % commission and 23 % omission errors when used on the combined very high-spatial resolution imagery and LiDAR data. In terms of the identification of ash trees, given sufficient representative training data, our classification model was able to predict tree species with above 75 % overall accuracy, and mis-classification occurred mainly between ash and maple trees. The hypothesis that a strong correlation exists between general tree stress and EAB infestation was confirmed. Vegetation indices sensitive to leaf chlorophyll content derived from hyperspectral imagery can be used to predict the EAB infestation levels for each ash tree.

  19. Spatial prediction of landslides using a hybrid machine learning approach based on Random Subspace and Classification and Regression Trees

    NASA Astrophysics Data System (ADS)

    Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu

    2018-02-01

    A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.

  20. A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery Using a Probabilistic Learning Framework

    NASA Technical Reports Server (NTRS)

    Basu, Saikat; Ganguly, Sangram; Michaelis, Andrew; Votava, Petr; Roy, Anshuman; Mukhopadhyay, Supratik; Nemani, Ramakrishna

    2015-01-01

    Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets, which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.

  1. A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery using a Probabilistic Learning Framework

    NASA Astrophysics Data System (ADS)

    Basu, S.; Ganguly, S.; Michaelis, A.; Votava, P.; Roy, A.; Mukhopadhyay, S.; Nemani, R. R.

    2015-12-01

    Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.

  2. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    PubMed

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.

  3. Exploiting machine learning algorithms for tree species classification in a semiarid woodland using RapidEye image

    NASA Astrophysics Data System (ADS)

    Adelabu, Samuel; Mutanga, Onisimo; Adam, Elhadi; Cho, Moses Azong

    2013-01-01

    Classification of different tree species in semiarid areas can be challenging as a result of the change in leaf structure and orientation due to soil moisture constraints. Tree species mapping is, however, a key parameter for forest management in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples.We performed classification using random forest (RF) and support vector machines (SVM) based on EnMap box. The overall accuracies for classifying the five tree species was 88.75 and 85% for both SVM and RF, respectively. We also demonstrated that the new red-edge band in the RapidEye sensor has the potential for classifying tree species in semiarid environments when integrated with other standard bands. Similarly, we observed that where there are limited training samples, SVM is preferred over RF. Finally, we demonstrated that the two accuracy measures of quantity and allocation disagreement are simpler and more helpful for the vast majority of remote sensing classification process than the kappa coefficient. Overall, high species classification can be achieved using strategically located RapidEye bands integrated with advanced processing algorithms.

  4. Investigating the limitations of tree species classification using the Combined Cluster and Discriminant Analysis method for low density ALS data from a dense forest region in Aggtelek (Hungary)

    NASA Astrophysics Data System (ADS)

    Koma, Zsófia; Deák, Márton; Kovács, József; Székely, Balázs; Kelemen, Kristóf; Standovár, Tibor

    2016-04-01

    Airborne Laser Scanning (ALS) is a widely used technology for forestry classification applications. However, single tree detection and species classification from low density ALS point cloud is limited in a dense forest region. In this study we investigate the division of a forest into homogenous groups at stand level. The study area is located in the Aggtelek karst region (Northeast Hungary) with a complex relief topography. The ALS dataset contained only 4 discrete echoes (at 2-4 pt/m2 density) from the study area during leaf-on season. Ground-truth measurements about canopy closure and proportion of tree species cover are available for every 70 meter in 500 square meter circular plots. In the first step, ALS data were processed and geometrical and intensity based features were calculated into a 5×5 meter raster based grid. The derived features contained: basic statistics of relative height, canopy RMS, echo ratio, openness, pulse penetration ratio, basic statistics of radiometric feature. In the second step the data were investigated using Combined Cluster and Discriminant Analysis (CCDA, Kovács et al., 2014). The CCDA method first determines a basic grouping for the multiple circle shaped sampling locations using hierarchical clustering and then for the arising grouping possibilities a core cycle is executed comparing the goodness of the investigated groupings with random ones. Out of these comparisons difference values arise, yielding information about the optimal grouping out of the investigated ones. If sub-groups are then further investigated, one might even find homogeneous groups. We found that low density ALS data classification into homogeneous groups are highly dependent on canopy closure, and the proportion of the dominant tree species. The presented results show high potential using CCDA for determination of homogenous separable groups in LiDAR based tree species classification. Aggtelek Karst/Slovakian Karst Caves" (HUSK/1101/221/0180, Aggtelek NP), data evaluation: 'Multipurpose assessment serving forest biodiversity conservation in the Carpathian region of Hungary', Swiss-Hungarian Cooperation Programme (SH/4/13 Project). BS contributed as an Alexander von Humboldt Research Fellow. J. Kovács, S. Kovács, N. Magyar, P. Tanos, I. G. Hatvani, and A. Anda (2014), Classification into homogeneous groups using combined cluster and discriminant analysis, Environmental Modelling & Software, 57, 52-59.

  5. Using cluster analysis and a classification and regression tree model to developed cover types in the Sky Islands of southeastern Arizona

    Treesearch

    Jose M. Iniguez; Joseph L. Ganey; Peter J. Daughtery; John D. Bailey

    2005-01-01

    The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such a system we qualitatively and quantitatively compared a hierarchical (Ward’s) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups represented by...

  6. Using cluster analysis and a classification and regression tree model to developed cover types in the Sky Islands of southeastern Arizona [Abstract

    Treesearch

    Jose M. Iniguez; Joseph L. Ganey; Peter J. Daugherty; John D. Bailey

    2005-01-01

    The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such system we qualitatively and quantitatively compared a hierarchical (Ward’s) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups and plots...

  7. Classification techniques on computerized systems to predict and/or to detect Apnea: A systematic review.

    PubMed

    Pombo, Nuno; Garcia, Nuno; Bousson, Kouamana

    2017-03-01

    Sleep apnea syndrome (SAS), which can significantly decrease the quality of life is associated with a major risk factor of health implications such as increased cardiovascular disease, sudden death, depression, irritability, hypertension, and learning difficulties. Thus, it is relevant and timely to present a systematic review describing significant applications in the framework of computational intelligence-based SAS, including its performance, beneficial and challenging effects, and modeling for the decision-making on multiple scenarios. This study aims to systematically review the literature on systems for the detection and/or prediction of apnea events using a classification model. Forty-five included studies revealed a combination of classification techniques for the diagnosis of apnea, such as threshold-based (14.75%) and machine learning (ML) models (85.25%). In addition, the ML models, were clustered in a mind map, include neural networks (44.26%), regression (4.91%), instance-based (11.47%), Bayesian algorithms (1.63%), reinforcement learning (4.91%), dimensionality reduction (8.19%), ensemble learning (6.55%), and decision trees (3.27%). A classification model should provide an auto-adaptive and no external-human action dependency. In addition, the accuracy of the classification models is related with the effective features selection. New high-quality studies based on randomized controlled trials and validation of models using a large and multiple sample of data are recommended. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.

  8. Decision-Tree, Rule-Based, and Random Forest Classification of High-Resolution Multispectral Imagery for Wetland Mapping and Inventory

    EPA Science Inventory

    Efforts are increasingly being made to classify the world’s wetland resources, an important ecosystem and habitat that is diminishing in abundance. There are multiple remote sensing classification methods, including a suite of nonparametric classifiers such as decision-tree...

  9. Object-based class modelling for multi-scale riparian forest habitat mapping

    NASA Astrophysics Data System (ADS)

    Strasser, Thomas; Lang, Stefan

    2015-05-01

    Object-based class modelling allows for mapping complex, hierarchical habitat systems. The riparian zone, including forests, represents such a complex ecosystem. Forests within riparian zones are biologically high productive and characterized by a rich biodiversity; thus considered of high community interest with an imperative to be protected and regularly monitored. Satellite earth observation (EO) provides tools for capturing the current state of forest habitats such as forest composition including intermixture of non-native tree species. Here we present a semi-automated object based image analysis (OBIA) approach for the mapping of riparian forests by applying class modelling of habitats based on the European Nature Information System (EUNIS) habitat classifications and the European Habitats Directive (HabDir) Annex 1. A very high resolution (VHR) WorldView-2 satellite image provided the required spatial and spectral details for a multi-scale image segmentation and rule-base composition to generate a six-level hierarchical representation of riparian forest habitats. Thereby habitats were hierarchically represented within an image object hierarchy as forest stands, stands of homogenous tree species and single trees represented by sunlit tree crowns. 522 EUNIS level 3 (EUNIS-3) habitat patches with a mean patch size (MPS) of 12,349.64 m2 were modelled from 938 forest stand patches (MPS = 6868.20 m2) and 43,742 tree stand patches (MPS = 140.79 m2). The delineation quality of the modelled EUNIS-3 habitats (focal level) was quantitatively assessed to an expert-based visual interpretation showing a mean deviation of 11.71%.

  10. Classification of Amazonian rosewood essential oil by Raman spectroscopy and PLS-DA with reliability estimation.

    PubMed

    Almeida, Mariana R; Fidelis, Carlos H V; Barata, Lauro E S; Poppi, Ronei J

    2013-12-15

    The Amazon tree Aniba rosaeodora Ducke (rosewood) provides an essential oil valuable for the perfume industry, but after decades of predatory extraction it is at risk of extinction. The extraction of the essential oil from wood implies the cutting of the tree, and then the study of oil extracted from the leaves is important as a sustainable alternative. The goal of this study was to test the applicability of Raman spectroscopy and Partial Least Square Discriminant Analysis (PLS-DA) as means to classify the essential oil extracted from different parties (wood, leaves and branches) of the Brazilian tree A. rosaeodora. For the development of classification models, the Raman spectra were split into two sets: training and test. The value of the limit that separates the classes was calculated based on the distribution of samples of training. This value was calculated in a manner that the classes are divided with a lower probability of incorrect classification for future estimates. The best model presented sensitivity and specificity of 100%, predictive accuracy and efficiency of 100%. These results give an overall vision of the behavior of the model, but do not give information about individual samples; in this case, the confidence interval for each sample of classification was also calculated using the resampling bootstrap technique. The methodology developed have the potential to be an alternative for standard procedures used for oil analysis and it can be employed as screening method, since it is fast, non-destructive and robust. © 2013 Elsevier B.V. All rights reserved.

  11. Observations and Modelling of Alternative Tree Cover States of the Boreal Ecosystem

    NASA Astrophysics Data System (ADS)

    Abis, B.; Brovkin, V.

    2017-12-01

    Recently, multimodality of the tree cover distribution of the boreal forests has been detected, revealing the existence of three alternative vegetation modes. Identifying which are the regions with a potential for alternative tree cover states, and assessing which are the main factors underlying their existence, is important to project future change of natural vegetation cover and its effect on climate.Through the use of generalised additive models and phase-space analysis, we study the link between tree cover distribution and eight globally-observed environmental factors, such as rainfall, temperature, and permafrost distribution. Using a classification based on these factors, we show the location of areas with potentially alternative tree cover states under the same environmental conditions in the boreal region. Furthermore, to explain the multimodality found in the data and the asymmetry between North America and Eurasia, we study a conceptual model based on tree species competition, and use it to simulate the sensitivity of tree cover to changes in environmental factors.We find that the link between individual environmental variables and tree cover differs regionally. Nonetheless, environmental conditions uniquely determine the vegetation state among the three dominant modes in ˜95% of the cases. On the other hand, areas with potentially alternative tree cover states encompass ˜1.1 million km2, and correspond to possible transition zones with a reduced resilience to disturbances. Employing our conceptual model, we show that multimodality can be explained through competition between tree species with different adaptations to environmental factors and disturbances. Moreover, the model is able to reproduce the asymmetry in tree species distribution between Eurasia and North America. Finally, we find that changes in permafrost could be associated with bifurcation points of the model, corroborating the importance of permafrost in a changing climate.

  12. Multiclass Classification by Adaptive Network of Dendritic Neurons with Binary Synapses Using Structural Plasticity

    PubMed Central

    Hussain, Shaista; Basu, Arindam

    2016-01-01

    The development of power-efficient neuromorphic devices presents the challenge of designing spike pattern classification algorithms which can be implemented on low-precision hardware and can also achieve state-of-the-art performance. In our pursuit of meeting this challenge, we present a pattern classification model which uses a sparse connection matrix and exploits the mechanism of nonlinear dendritic processing to achieve high classification accuracy. A rate-based structural learning rule for multiclass classification is proposed which modifies a connectivity matrix of binary synaptic connections by choosing the best “k” out of “d” inputs to make connections on every dendritic branch (k < < d). Because learning only modifies connectivity, the model is well suited for implementation in neuromorphic systems using address-event representation (AER). We develop an ensemble method which combines several dendritic classifiers to achieve enhanced generalization over individual classifiers. We have two major findings: (1) Our results demonstrate that an ensemble created with classifiers comprising moderate number of dendrites performs better than both ensembles of perceptrons and of complex dendritic trees. (2) In order to determine the moderate number of dendrites required for a specific classification problem, a two-step solution is proposed. First, an adaptive approach is proposed which scales the relative size of the dendritic trees of neurons for each class. It works by progressively adding dendrites with fixed number of synapses to the network, thereby allocating synaptic resources as per the complexity of the given problem. As a second step, theoretical capacity calculations are used to convert each neuronal dendritic tree to its optimal topology where dendrites of each class are assigned different number of synapses. The performance of the model is evaluated on classification of handwritten digits from the benchmark MNIST dataset and compared with other spike classifiers. We show that our system can achieve classification accuracy within 1 − 2% of other reported spike-based classifiers while using much less synaptic resources (only 7%) compared to that used by other methods. Further, an ensemble classifier created with adaptively learned sizes can attain accuracy of 96.4% which is at par with the best reported performance of spike-based classifiers. Moreover, the proposed method achieves this by using about 20% of the synapses used by other spike algorithms. We also present results of applying our algorithm to classify the MNIST-DVS dataset collected from a real spike-based image sensor and show results comparable to the best reported ones (88.1% accuracy). For VLSI implementations, we show that the reduced synaptic memory can save upto 4X area compared to conventional crossbar topologies. Finally, we also present a biologically realistic spike-based version for calculating the correlations required by the structural learning rule and demonstrate the correspondence between the rate-based and spike-based methods of learning. PMID:27065782

  13. Comparing statistical and machine learning classifiers: alternatives for predictive modeling in human factors research.

    PubMed

    Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann

    2003-01-01

    Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.

  14. Design of a hybrid model for cardiac arrhythmia classification based on Daubechies wavelet transform.

    PubMed

    Rajagopal, Rekha; Ranganathan, Vidhyapriya

    2018-06-05

    Automation in cardiac arrhythmia classification helps medical professionals make accurate decisions about the patient's health. The aim of this work was to design a hybrid classification model to classify cardiac arrhythmias. The design phase of the classification model comprises the following stages: preprocessing of the cardiac signal by eliminating detail coefficients that contain noise, feature extraction through Daubechies wavelet transform, and arrhythmia classification using a collaborative decision from the K nearest neighbor classifier (KNN) and a support vector machine (SVM). The proposed model is able to classify 5 arrhythmia classes as per the ANSI/AAMI EC57: 1998 classification standard. Level 1 of the proposed model involves classification using the KNN and the classifier is trained with examples from all classes. Level 2 involves classification using an SVM and is trained specifically to classify overlapped classes. The final classification of a test heartbeat pertaining to a particular class is done using the proposed KNN/SVM hybrid model. The experimental results demonstrated that the average sensitivity of the proposed model was 92.56%, the average specificity 99.35%, the average positive predictive value 98.13%, the average F-score 94.5%, and the average accuracy 99.78%. The results obtained using the proposed model were compared with the results of discriminant, tree, and KNN classifiers. The proposed model is able to achieve a high classification accuracy.

  15. Classification

    NASA Astrophysics Data System (ADS)

    Oza, Nikunj

    2012-03-01

    A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. A set of training examples— examples with known output values—is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate’s measurements. The generalization performance of a learned model (how closely the target outputs and the model’s predicted outputs agree for patterns that have not been presented to the learning algorithm) would provide an indication of how well the model has learned the desired mapping. More formally, a classification learning algorithm L takes a training set T as its input. The training set consists of |T| examples or instances. It is assumed that there is a probability distribution D from which all training examples are drawn independently—that is, all the training examples are independently and identically distributed (i.i.d.). The ith training example is of the form (x_i, y_i), where x_i is a vector of values of several features and y_i represents the class to be predicted.* In the sunspot classification example given above, each training example would represent one sunspot’s classification (y_i) and the corresponding set of measurements (x_i). The output of a supervised learning algorithm is a model h that approximates the unknown mapping from the inputs to the outputs. In our example, h would map from the sunspot measurements to the type of sunspot. We may have a test set S—a set of examples not used in training that we use to test how well the model h predicts the outputs on new examples. Just as with the examples in T, the examples in S are assumed to be independent and identically distributed (i.i.d.) draws from the distribution D. We measure the error of h on the test set as the proportion of test cases that h misclassifies: 1/|S| Sigma(x,y union S)[I(h(x)!= y)] where I(v) is the indicator function—it returns 1 if v is true and 0 otherwise. In our sunspot classification example, we would identify additional examples of sunspots that were not used in generating the model, and use these to determine how accurate the model is—the fraction of the test samples that the model classifies correctly. An example of a classification model is the decision tree shown in Figure 23.1. We will discuss the decision tree learning algorithm in more detail later—for now, we assume that, given a training set with examples of sunspots, this decision tree is derived. This can be used to classify previously unseen examples of sunpots. For example, if a new sunspot’s inputs indicate that its "Group Length" is in the range 10-15, then the decision tree would classify the sunspot as being of type “E,” whereas if the "Group Length" is "NULL," the "Magnetic Type" is "bipolar," and the "Penumbra" is "rudimentary," then it would be classified as type "C." In this chapter, we will add to the above description of classification problems. We will discuss decision trees and several other classification models. In particular, we will discuss the learning algorithms that generate these classification models, how to use them to classify new examples, and the strengths and weaknesses of these models. We will end with pointers to further reading on classification methods applied to astronomy data.

  16. A Quality Classification System for Young Hardwood Trees - The First Step in Predicting Future Products

    Treesearch

    David L. Sonderman; Robert L. Brisbin

    1978-01-01

    Forest managers have no objective way to determine the relative value of culturally treated forest stands in terms of product potential. This paper describes the first step in the development of a quality classification system based on the measurement of individual tree characteristics for young hardwood stands.

  17. Markov-random-field-based super-resolution mapping for identification of urban trees in VHR images

    NASA Astrophysics Data System (ADS)

    Ardila, Juan P.; Tolpekin, Valentyn A.; Bijker, Wietske; Stein, Alfred

    2011-11-01

    Identification of tree crowns from remote sensing requires detailed spectral information and submeter spatial resolution imagery. Traditional pixel-based classification techniques do not fully exploit the spatial and spectral characteristics of remote sensing datasets. We propose a contextual and probabilistic method for detection of tree crowns in urban areas using a Markov random field based super resolution mapping (SRM) approach in very high resolution images. Our method defines an objective energy function in terms of the conditional probabilities of panchromatic and multispectral images and it locally optimizes the labeling of tree crown pixels. Energy and model parameter values are estimated from multiple implementations of SRM in tuning areas and the method is applied in QuickBird images to produce a 0.6 m tree crown map in a city of The Netherlands. The SRM output shows an identification rate of 66% and commission and omission errors in small trees and shrub areas. The method outperforms tree crown identification results obtained with maximum likelihood, support vector machines and SRM at nominal resolution (2.4 m) approaches.

  18. Estimating global distribution of boreal, temperate, and tropical tree plant functional types using clustering techniques

    NASA Astrophysics Data System (ADS)

    Wang, Audrey; Price, David T.

    2007-03-01

    A simple integrated algorithm was developed to relate global climatology to distributions of tree plant functional types (PFT). Multivariate cluster analysis was performed to analyze the statistical homogeneity of the climate space occupied by individual tree PFTs. Forested regions identified from the satellite-based GLC2000 classification were separated into tropical, temperate, and boreal sub-PFTs for use in the Canadian Terrestrial Ecosystem Model (CTEM). Global data sets of monthly minimum temperature, growing degree days, an index of climatic moisture, and estimated PFT cover fractions were then used as variables in the cluster analysis. The statistical results for individual PFT clusters were found consistent with other global-scale classifications of dominant vegetation. As an improvement of the quantification of the climatic limitations on PFT distributions, the results also demonstrated overlapping of PFT cluster boundaries that reflected vegetation transitions, for example, between tropical and temperate biomes. The resulting global database should provide a better basis for simulating the interaction of climate change and terrestrial ecosystem dynamics using global vegetation models.

  19. Protein classification based on text document classification techniques.

    PubMed

    Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith

    2005-03-01

    The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.

  20. Mapping and characterizing selected canopy tree species at the Angkor World Heritage site in Cambodia using aerial data.

    PubMed

    Singh, Minerva; Evans, Damian; Tan, Boun Suy; Nin, Chan Samean

    2015-01-01

    At present, there is very limited information on the ecology, distribution, and structure of Cambodia's tree species to warrant suitable conservation measures. The aim of this study was to assess various methods of analysis of aerial imagery for characterization of the forest mensuration variables (i.e., tree height and crown width) of selected tree species found in the forested region around the temples of Angkor Thom, Cambodia. Object-based image analysis (OBIA) was used (using multiresolution segmentation) to delineate individual tree crowns from very-high-resolution (VHR) aerial imagery and light detection and ranging (LiDAR) data. Crown width and tree height values that were extracted using multiresolution segmentation showed a high level of congruence with field-measured values of the trees (Spearman's rho 0.782 and 0.589, respectively). Individual tree crowns that were delineated from aerial imagery using multiresolution segmentation had a high level of segmentation accuracy (69.22%), whereas tree crowns delineated using watershed segmentation underestimated the field-measured tree crown widths. Both spectral angle mapper (SAM) and maximum likelihood (ML) classifications were applied to the aerial imagery for mapping of selected tree species. The latter was found to be more suitable for tree species classification. Individual tree species were identified with high accuracy. Inclusion of textural information further improved species identification, albeit marginally. Our findings suggest that VHR aerial imagery, in conjunction with OBIA-based segmentation methods (such as multiresolution segmentation) and supervised classification techniques are useful for tree species mapping and for studies of the forest mensuration variables.

  1. Mapping and Characterizing Selected Canopy Tree Species at the Angkor World Heritage Site in Cambodia Using Aerial Data

    PubMed Central

    Singh, Minerva; Evans, Damian; Tan, Boun Suy; Nin, Chan Samean

    2015-01-01

    At present, there is very limited information on the ecology, distribution, and structure of Cambodia’s tree species to warrant suitable conservation measures. The aim of this study was to assess various methods of analysis of aerial imagery for characterization of the forest mensuration variables (i.e., tree height and crown width) of selected tree species found in the forested region around the temples of Angkor Thom, Cambodia. Object-based image analysis (OBIA) was used (using multiresolution segmentation) to delineate individual tree crowns from very-high-resolution (VHR) aerial imagery and light detection and ranging (LiDAR) data. Crown width and tree height values that were extracted using multiresolution segmentation showed a high level of congruence with field-measured values of the trees (Spearman’s rho 0.782 and 0.589, respectively). Individual tree crowns that were delineated from aerial imagery using multiresolution segmentation had a high level of segmentation accuracy (69.22%), whereas tree crowns delineated using watershed segmentation underestimated the field-measured tree crown widths. Both spectral angle mapper (SAM) and maximum likelihood (ML) classifications were applied to the aerial imagery for mapping of selected tree species. The latter was found to be more suitable for tree species classification. Individual tree species were identified with high accuracy. Inclusion of textural information further improved species identification, albeit marginally. Our findings suggest that VHR aerial imagery, in conjunction with OBIA-based segmentation methods (such as multiresolution segmentation) and supervised classification techniques are useful for tree species mapping and for studies of the forest mensuration variables. PMID:25902148

  2. Quantifying tree mortality in a mixed species woodland using multitemporal high spatial resolution satellite imagery

    USGS Publications Warehouse

    Garrity, Steven R.; Allen, Craig D.; Brumby, Steven P.; Gangodagamage, Chandana; McDowell, Nate G.; Cai, D. Michael

    2013-01-01

    Widespread tree mortality events have recently been observed in several biomes. To effectively quantify the severity and extent of these events, tools that allow for rapid assessment at the landscape scale are required. Past studies using high spatial resolution satellite imagery have primarily focused on detecting green, red, and gray tree canopies during and shortly after tree damage or mortality has occurred. However, detecting trees in various stages of death is not always possible due to limited availability of archived satellite imagery. Here we assess the capability of high spatial resolution satellite imagery for tree mortality detection in a southwestern U.S. mixed species woodland using archived satellite images acquired prior to mortality and well after dead trees had dropped their leaves. We developed a multistep classification approach that uses: supervised masking of non-tree image elements; bi-temporal (pre- and post-mortality) differencing of normalized difference vegetation index (NDVI) and red:green ratio (RGI); and unsupervised multivariate clustering of pixels into live and dead tree classes using a Gaussian mixture model. Classification accuracies were improved in a final step by tuning the rules of pixel classification using the posterior probabilities of class membership obtained from the Gaussian mixture model. Classifications were produced for two images acquired post-mortality with overall accuracies of 97.9% and 98.5%, respectively. Classified images were combined with land cover data to characterize the spatiotemporal characteristics of tree mortality across areas with differences in tree species composition. We found that 38% of tree crown area was lost during the drought period between 2002 and 2006. The majority of tree mortality during this period was concentrated in piñon-juniper (Pinus edulis-Juniperus monosperma) woodlands. An additional 20% of the tree canopy died or was removed between 2006 and 2011, primarily in areas experiencing wildfire and management activity. -Our results demonstrate that unsupervised clustering of bi-temporal NDVI and RGI differences can be used to detect tree mortality resulting from numerous causes and in several forest cover types.

  3. On the Accuracy of Language Trees

    PubMed Central

    Pompei, Simone; Loreto, Vittorio; Tria, Francesca

    2011-01-01

    Historical linguistics aims at inferring the most likely language phylogenetic tree starting from information concerning the evolutionary relatedness of languages. The available information are typically lists of homologous (lexical, phonological, syntactic) features or characters for many different languages: a set of parallel corpora whose compilation represents a paramount achievement in linguistics. From this perspective the reconstruction of language trees is an example of inverse problems: starting from present, incomplete and often noisy, information, one aims at inferring the most likely past evolutionary history. A fundamental issue in inverse problems is the evaluation of the inference made. A standard way of dealing with this question is to generate data with artificial models in order to have full access to the evolutionary process one is going to infer. This procedure presents an intrinsic limitation: when dealing with real data sets, one typically does not know which model of evolution is the most suitable for them. A possible way out is to compare algorithmic inference with expert classifications. This is the point of view we take here by conducting a thorough survey of the accuracy of reconstruction methods as compared with the Ethnologue expert classifications. We focus in particular on state-of-the-art distance-based methods for phylogeny reconstruction using worldwide linguistic databases. In order to assess the accuracy of the inferred trees we introduce and characterize two generalizations of standard definitions of distances between trees. Based on these scores we quantify the relative performances of the distance-based algorithms considered. Further we quantify how the completeness and the coverage of the available databases affect the accuracy of the reconstruction. Finally we draw some conclusions about where the accuracy of the reconstructions in historical linguistics stands and about the leading directions to improve it. PMID:21674034

  4. Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy.

    PubMed

    Baldeck, Claire A; Asner, Gregory P; Martin, Robin E; Anderson, Christopher B; Knapp, David E; Kellner, James R; Wright, S Joseph

    2015-01-01

    Remote identification and mapping of canopy tree species can contribute valuable information towards our understanding of ecosystem biodiversity and function over large spatial scales. However, the extreme challenges posed by highly diverse, closed-canopy tropical forests have prevented automated remote species mapping of non-flowering tree crowns in these ecosystems. We set out to identify individuals of three focal canopy tree species amongst a diverse background of tree and liana species on Barro Colorado Island, Panama, using airborne imaging spectroscopy data. First, we compared two leading single-class classification methods--binary support vector machine (SVM) and biased SVM--for their performance in identifying pixels of a single focal species. From this comparison we determined that biased SVM was more precise and created a multi-species classification model by combining the three biased SVM models. This model was applied to the imagery to identify pixels belonging to the three focal species and the prediction results were then processed to create a map of focal species crown objects. Crown-level cross-validation of the training data indicated that the multi-species classification model had pixel-level producer's accuracies of 94-97% for the three focal species, and field validation of the predicted crown objects indicated that these had user's accuracies of 94-100%. Our results demonstrate the ability of high spatial and spectral resolution remote sensing to accurately detect non-flowering crowns of focal species within a diverse tropical forest. We attribute the success of our model to recent classification and mapping techniques adapted to species detection in diverse closed-canopy forests, which can pave the way for remote species mapping in a wider variety of ecosystems.

  5. Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy

    PubMed Central

    Baldeck, Claire A.; Asner, Gregory P.; Martin, Robin E.; Anderson, Christopher B.; Knapp, David E.; Kellner, James R.; Wright, S. Joseph

    2015-01-01

    Remote identification and mapping of canopy tree species can contribute valuable information towards our understanding of ecosystem biodiversity and function over large spatial scales. However, the extreme challenges posed by highly diverse, closed-canopy tropical forests have prevented automated remote species mapping of non-flowering tree crowns in these ecosystems. We set out to identify individuals of three focal canopy tree species amongst a diverse background of tree and liana species on Barro Colorado Island, Panama, using airborne imaging spectroscopy data. First, we compared two leading single-class classification methods—binary support vector machine (SVM) and biased SVM—for their performance in identifying pixels of a single focal species. From this comparison we determined that biased SVM was more precise and created a multi-species classification model by combining the three biased SVM models. This model was applied to the imagery to identify pixels belonging to the three focal species and the prediction results were then processed to create a map of focal species crown objects. Crown-level cross-validation of the training data indicated that the multi-species classification model had pixel-level producer’s accuracies of 94–97% for the three focal species, and field validation of the predicted crown objects indicated that these had user’s accuracies of 94–100%. Our results demonstrate the ability of high spatial and spectral resolution remote sensing to accurately detect non-flowering crowns of focal species within a diverse tropical forest. We attribute the success of our model to recent classification and mapping techniques adapted to species detection in diverse closed-canopy forests, which can pave the way for remote species mapping in a wider variety of ecosystems. PMID:26153693

  6. Voice based gender classification using machine learning

    NASA Astrophysics Data System (ADS)

    Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.

    2017-11-01

    Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.

  7. Forest tree species clssification based on airborne hyper-spectral imagery

    NASA Astrophysics Data System (ADS)

    Dian, Yuanyong; Li, Zengyuan; Pang, Yong

    2013-10-01

    Forest precision classification products were the basic data for surveying of forest resource, updating forest subplot information, logging and design of forest. However, due to the diversity of stand structure, complexity of the forest growth environment, it's difficult to discriminate forest tree species using multi-spectral image. The airborne hyperspectral images can achieve the high spatial and spectral resolution imagery of forest canopy, so it will good for tree species level classification. The aim of this paper was to test the effective of combining spatial and spectral features in airborne hyper-spectral image classification. The CASI hyper spectral image data were acquired from Liangshui natural reserves area. Firstly, we use the MNF (minimum noise fraction) transform method for to reduce the hyperspectral image dimensionality and highlighting variation. And secondly, we use the grey level co-occurrence matrix (GLCM) to extract the texture features of forest tree canopy from the hyper-spectral image, and thirdly we fused the texture and the spectral features of forest canopy to classify the trees species using support vector machine (SVM) with different kernel functions. The results showed that when using the SVM classifier, MNF and texture-based features combined with linear kernel function can achieve the best overall accuracy which was 85.92%. It was also confirm that combine the spatial and spectral information can improve the accuracy of tree species classification.

  8. Crown-Level Tree Species Classification Using Integrated Airborne Hyperspectral and LIDAR Remote Sensing Data

    NASA Astrophysics Data System (ADS)

    Wang, Z.; Wu, J.; Wang, Y.; Kong, X.; Bao, H.; Ni, Y.; Ma, L.; Jin, J.

    2018-05-01

    Mapping tree species is essential for sustainable planning as well as to improve our understanding of the role of different trees as different ecological service. However, crown-level tree species automatic classification is a challenging task due to the spectral similarity among diversified tree species, fine-scale spatial variation, shadow, and underlying objects within a crown. Advanced remote sensing data such as airborne Light Detection and Ranging (LiDAR) and hyperspectral imagery offer a great potential opportunity to derive crown spectral, structure and canopy physiological information at the individual crown scale, which can be useful for mapping tree species. In this paper, an innovative approach was developed for tree species classification at the crown level. The method utilized LiDAR data for individual tree crown delineation and morphological structure extraction, and Compact Airborne Spectrographic Imager (CASI) hyperspectral imagery for pure crown-scale spectral extraction. Specifically, four steps were include: 1) A weighted mean filtering method was developed to improve the accuracy of the smoothed Canopy Height Model (CHM) derived from LiDAR data; 2) The marker-controlled watershed segmentation algorithm was, therefore, also employed to delineate the tree-level canopy from the CHM image in this study, and then individual tree height and tree crown were calculated according to the delineated crown; 3) Spectral features within 3 × 3 neighborhood regions centered on the treetops detected by the treetop detection algorithm were derived from the spectrally normalized CASI imagery; 4) The shape characteristics related to their crown diameters and heights were established, and different crown-level tree species were classified using the combination of spectral and shape characteristics. Analysis of results suggests that the developed classification strategy in this paper (OA = 85.12 %, Kc = 0.90) performed better than LiDAR-metrics method (OA = 79.86 %, Kc = 0.81) and spectral-metircs method (OA = 71.26, Kc = 0.69) in terms of classification accuracy, which indicated that the advanced method of data processing and sensitive feature selection are critical for improving the accuracy of crown-level tree species classification.

  9. Probabilistic grammatical model for helix‐helix contact site classification

    PubMed Central

    2013-01-01

    Background Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. Results In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. Conclusions We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists. PMID:24350601

  10. Detection of motor imagery of swallow EEG signals based on the dual-tree complex wavelet transform and adaptive model selection

    NASA Astrophysics Data System (ADS)

    Yang, Huijuan; Guan, Cuntai; Sui Geok Chua, Karen; San Chok, See; Wang, Chuan Chu; Kok Soon, Phua; Tang, Christina Ka Yin; Keng Ang, Kai

    2014-06-01

    Objective. Detection of motor imagery of hand/arm has been extensively studied for stroke rehabilitation. This paper firstly investigates the detection of motor imagery of swallow (MI-SW) and motor imagery of tongue protrusion (MI-Ton) in an attempt to find a novel solution for post-stroke dysphagia rehabilitation. Detection of MI-SW from a simple yet relevant modality such as MI-Ton is then investigated, motivated by the similarity in activation patterns between tongue movements and swallowing and there being fewer movement artifacts in performing tongue movements compared to swallowing. Approach. Novel features were extracted based on the coefficients of the dual-tree complex wavelet transform to build multiple training models for detecting MI-SW. The session-to-session classification accuracy was boosted by adaptively selecting the training model to maximize the ratio of between-classes distances versus within-class distances, using features of training and evaluation data. Main results. Our proposed method yielded averaged cross-validation (CV) classification accuracies of 70.89% and 73.79% for MI-SW and MI-Ton for ten healthy subjects, which are significantly better than the results from existing methods. In addition, averaged CV accuracies of 66.40% and 70.24% for MI-SW and MI-Ton were obtained for one stroke patient, demonstrating the detectability of MI-SW and MI-Ton from the idle state. Furthermore, averaged session-to-session classification accuracies of 72.08% and 70% were achieved for ten healthy subjects and one stroke patient using the MI-Ton model. Significance. These results and the subjectwise strong correlations in classification accuracies between MI-SW and MI-Ton demonstrated the feasibility of detecting MI-SW from MI-Ton models.

  11. A new method of building footprints detection using airborne laser scanning data and multispectral image

    NASA Astrophysics Data System (ADS)

    Luo, Yiping; Jiang, Ting; Gao, Shengli; Wang, Xin

    2010-10-01

    It presents a new approach for detecting building footprints in a combination of registered aerial image with multispectral bands and airborne laser scanning data synchronously obtained by Leica-Geosystems ALS40 and Applanix DACS-301 on the same platform. A two-step method for building detection was presented consisting of selecting 'building' candidate points and then classifying candidate points. A digital surface model(DSM) derived from last pulse laser scanning data was first filtered and the laser points were classified into classes 'ground' and 'building or tree' based on mathematic morphological filter. Then, 'ground' points were resample into digital elevation model(DEM), and a Normalized DSM(nDSM) was generated from DEM and DSM. The candidate points were selected from 'building or tree' points by height value and area threshold in nDSM. The candidate points were further classified into building points and tree points by using the support vector machines(SVM) classification method. Two classification tests were carried out using features only from laser scanning data and associated features from two input data sources. The features included height, height finite difference, RGB bands value, and so on. The RGB value of points was acquired by matching laser scanning data and image using collinear equation. The features of training points were presented as input data for SVM classification method, and cross validation was used to select best classification parameters. The determinant function could be constructed by the classification parameters and the class of candidate points was determined by determinant function. The result showed that associated features from two input data sources were superior to features only from laser scanning data. The accuracy of more than 90% was achieved for buildings in first kind of features.

  12. The information extraction of Gannan citrus orchard based on the GF-1 remote sensing image

    NASA Astrophysics Data System (ADS)

    Wang, S.; Chen, Y. L.

    2017-02-01

    The production of Gannan oranges is the largest in China, which occupied an important part in the world. The extraction of citrus orchard quickly and effectively has important significance for fruit pathogen defense, fruit production and industrial planning. The traditional spectra extraction method of citrus orchard based on pixel has a lower classification accuracy, difficult to avoid the “pepper phenomenon”. In the influence of noise, the phenomenon that different spectrums of objects have the same spectrum is graveness. Taking Xunwu County citrus fruit planting area of Ganzhou as the research object, aiming at the disadvantage of the lower accuracy of the traditional method based on image element classification method, a decision tree classification method based on object-oriented rule set is proposed. Firstly, multi-scale segmentation is performed on the GF-1 remote sensing image data of the study area. Subsequently the sample objects are selected for statistical analysis of spectral features and geometric features. Finally, combined with the concept of decision tree classification, a variety of empirical values of single band threshold, NDVI, band combination and object geometry characteristics are used hierarchically to execute the information extraction of the research area, and multi-scale segmentation and hierarchical decision tree classification is implemented. The classification results are verified with the confusion matrix, and the overall Kappa index is 87.91%.

  13. Tree Classification with Fused Mobile Laser Scanning and Hyperspectral Data

    PubMed Central

    Puttonen, Eetu; Jaakkola, Anttoni; Litkey, Paula; Hyyppä, Juha

    2011-01-01

    Mobile Laser Scanning data were collected simultaneously with hyperspectral data using the Finnish Geodetic Institute Sensei system. The data were tested for tree species classification. The test area was an urban garden in the City of Espoo, Finland. Point clouds representing 168 individual tree specimens of 23 tree species were determined manually. The classification of the trees was done using first only the spatial data from point clouds, then with only the spectral data obtained with a spectrometer, and finally with the combined spatial and hyperspectral data from both sensors. Two classification tests were performed: the separation of coniferous and deciduous trees, and the identification of individual tree species. All determined tree specimens were used in distinguishing coniferous and deciduous trees. A subset of 133 trees and 10 tree species was used in the tree species classification. The best classification results for the fused data were 95.8% for the separation of the coniferous and deciduous classes. The best overall tree species classification succeeded with 83.5% accuracy for the best tested fused data feature combination. The respective results for paired structural features derived from the laser point cloud were 90.5% for the separation of the coniferous and deciduous classes and 65.4% for the species classification. Classification accuracies with paired hyperspectral reflectance value data were 90.5% for the separation of coniferous and deciduous classes and 62.4% for different species. The results are among the first of their kind and they show that mobile collected fused data outperformed single-sensor data in both classification tests and by a significant margin. PMID:22163894

  14. Tree classification with fused mobile laser scanning and hyperspectral data.

    PubMed

    Puttonen, Eetu; Jaakkola, Anttoni; Litkey, Paula; Hyyppä, Juha

    2011-01-01

    Mobile Laser Scanning data were collected simultaneously with hyperspectral data using the Finnish Geodetic Institute Sensei system. The data were tested for tree species classification. The test area was an urban garden in the City of Espoo, Finland. Point clouds representing 168 individual tree specimens of 23 tree species were determined manually. The classification of the trees was done using first only the spatial data from point clouds, then with only the spectral data obtained with a spectrometer, and finally with the combined spatial and hyperspectral data from both sensors. Two classification tests were performed: the separation of coniferous and deciduous trees, and the identification of individual tree species. All determined tree specimens were used in distinguishing coniferous and deciduous trees. A subset of 133 trees and 10 tree species was used in the tree species classification. The best classification results for the fused data were 95.8% for the separation of the coniferous and deciduous classes. The best overall tree species classification succeeded with 83.5% accuracy for the best tested fused data feature combination. The respective results for paired structural features derived from the laser point cloud were 90.5% for the separation of the coniferous and deciduous classes and 65.4% for the species classification. Classification accuracies with paired hyperspectral reflectance value data were 90.5% for the separation of coniferous and deciduous classes and 62.4% for different species. The results are among the first of their kind and they show that mobile collected fused data outperformed single-sensor data in both classification tests and by a significant margin.

  15. Probability of infestation and extent of mortality associated with the Douglas-fir beetle in the Colorado Front Range

    Treesearch

    Jose F. Negron

    1998-01-01

    Infested and uninfested areas within Douglas fir, Pseudotsuga menziesii Mirb.. Franco, stands affected by the Douglas-fir beetle, Dendroctonus pseudotsugae Hopk. were sampled in the Colorado Front Range, CO. Classification tree models were built to predict probabilities of infestation. Regression trees and linear regression analysis were used to model amount of tree...

  16. Identification of pests and diseases of Dalbergia hainanensis based on EVI time series and classification of decision tree

    NASA Astrophysics Data System (ADS)

    Luo, Qiu; Xin, Wu; Qiming, Xiong

    2017-06-01

    In the process of vegetation remote sensing information extraction, the problem of phenological features and low performance of remote sensing analysis algorithm is not considered. To solve this problem, the method of remote sensing vegetation information based on EVI time-series and the classification of decision-tree of multi-source branch similarity is promoted. Firstly, to improve the time-series stability of recognition accuracy, the seasonal feature of vegetation is extracted based on the fitting span range of time-series. Secondly, the decision-tree similarity is distinguished by adaptive selection path or probability parameter of component prediction. As an index, it is to evaluate the degree of task association, decide whether to perform migration of multi-source decision tree, and ensure the speed of migration. Finally, the accuracy of classification and recognition of pests and diseases can reach 87%--98% of commercial forest in Dalbergia hainanensis, which is significantly better than that of MODIS coverage accuracy of 80%--96% in this area. Therefore, the validity of the proposed method can be verified.

  17. Hierarchy-associated semantic-rule inference framework for classifying indoor scenes

    NASA Astrophysics Data System (ADS)

    Yu, Dan; Liu, Peng; Ye, Zhipeng; Tang, Xianglong; Zhao, Wei

    2016-03-01

    Typically, the initial task of classifying indoor scenes is challenging, because the spatial layout and decoration of a scene can vary considerably. Recent efforts at classifying object relationships commonly depend on the results of scene annotation and predefined rules, making classification inflexible. Furthermore, annotation results are easily affected by external factors. Inspired by human cognition, a scene-classification framework was proposed using the empirically based annotation (EBA) and a match-over rule-based (MRB) inference system. The semantic hierarchy of images is exploited by EBA to construct rules empirically for MRB classification. The problem of scene classification is divided into low-level annotation and high-level inference from a macro perspective. Low-level annotation involves detecting the semantic hierarchy and annotating the scene with a deformable-parts model and a bag-of-visual-words model. In high-level inference, hierarchical rules are extracted to train the decision tree for classification. The categories of testing samples are generated from the parts to the whole. Compared with traditional classification strategies, the proposed semantic hierarchy and corresponding rules reduce the effect of a variable background and improve the classification performance. The proposed framework was evaluated on a popular indoor scene dataset, and the experimental results demonstrate its effectiveness.

  18. Analysis of dual tree M-band wavelet transform based features for brain image classification.

    PubMed

    Ayalapogu, Ratna Raju; Pabboju, Suresh; Ramisetty, Rajeswara Rao

    2018-04-29

    The most complex organ in the human body is the brain. The unrestrained growth of cells in the brain is called a brain tumor. The cause of a brain tumor is still unknown and the survival rate is lower than other types of cancers. Hence, early detection is very important for proper treatment. In this study, an efficient computer-aided diagnosis (CAD) system is presented for brain image classification by analyzing MRI of the brain. At first, the MRI brain images of normal and abnormal categories are modeled by using the statistical features of dual tree m-band wavelet transform (DTMBWT). A maximum margin classifier, support vector machine (SVM) is then used for the classification and validated with k-fold approach. Results show that the system provides promising results on a repository of molecular brain neoplasia data (REMBRANDT) with 97.5% accuracy using 4 th level statistical features of DTMBWT. Viewing the experimental results, we conclude that the system gives a satisfactory performance for the brain image classification. © 2018 International Society for Magnetic Resonance in Medicine.

  19. Classification of the Gabon SAR Mosaic Using a Wavelet Based Rule Classifier

    NASA Technical Reports Server (NTRS)

    Simard, Marc; Saatchi, Sasan; DeGrandi, Gianfranco

    2000-01-01

    A method is developed for semi-automated classification of SAR images of the tropical forest. Information is extracted using the wavelet transform (WT). The transform allows for extraction of structural information in the image as a function of scale. In order to classify the SAR image, a Desicion Tree Classifier is used. The method of pruning is used to optimize classification rate versus tree size. The results give explicit insight on the type of information useful for a given class.

  20. Use of Binary Partition Tree and energy minimization for object-based classification of urban land cover

    NASA Astrophysics Data System (ADS)

    Li, Mengmeng; Bijker, Wietske; Stein, Alfred

    2015-04-01

    Two main challenges are faced when classifying urban land cover from very high resolution satellite images: obtaining an optimal image segmentation and distinguishing buildings from other man-made objects. For optimal segmentation, this work proposes a hierarchical representation of an image by means of a Binary Partition Tree (BPT) and an unsupervised evaluation of image segmentations by energy minimization. For building extraction, we apply fuzzy sets to create a fuzzy landscape of shadows which in turn involves a two-step procedure. The first step is a preliminarily image classification at a fine segmentation level to generate vegetation and shadow information. The second step models the directional relationship between building and shadow objects to extract building information at the optimal segmentation level. We conducted the experiments on two datasets of Pléiades images from Wuhan City, China. To demonstrate its performance, the proposed classification is compared at the optimal segmentation level with Maximum Likelihood Classification and Support Vector Machine classification. The results show that the proposed classification produced the highest overall accuracies and kappa coefficients, and the smallest over-classification and under-classification geometric errors. We conclude first that integrating BPT with energy minimization offers an effective means for image segmentation. Second, we conclude that the directional relationship between building and shadow objects represented by a fuzzy landscape is important for building extraction.

  1. Classification and Sequential Pattern Analysis for Improving Managerial Efficiency and Providing Better Medical Service in Public Healthcare Centers

    PubMed Central

    Chung, Sukhoon; Rhee, Hyunsill; Suh, Yongmoo

    2010-01-01

    Objectives This study sought to find answers to the following questions: 1) Can we predict whether a patient will revisit a healthcare center? 2) Can we anticipate diseases of patients who revisit the center? Methods For the first question, we applied 5 classification algorithms (decision tree, artificial neural network, logistic regression, Bayesian networks, and Naïve Bayes) and the stacking-bagging method for building classification models. To solve the second question, we performed sequential pattern analysis. Results We determined: 1) In general, the most influential variables which impact whether a patient of a public healthcare center will revisit it or not are personal burden, insurance bill, period of prescription, age, systolic pressure, name of disease, and postal code. 2) The best plain classification model is dependent on the dataset. 3) Based on average of classification accuracy, the proposed stacking-bagging method outperformed all traditional classification models and our sequential pattern analysis revealed 16 sequential patterns. Conclusions Classification models and sequential patterns can help public healthcare centers plan and implement healthcare service programs and businesses that are more appropriate to local residents, encouraging them to revisit public health centers. PMID:21818426

  2. Distribution of cavity trees in midwestern old-growth and second-growth forests

    Treesearch

    Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen

    2003-01-01

    We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...

  3. Distribution of cavity trees in midwesternold-growth and second-growth forests

    Treesearch

    Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R., III Thompson; David R. Larsen

    2003-01-01

    We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...

  4. The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis.

    PubMed

    Koziol, James A; Feng, Anne C; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan

    2009-01-01

    Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors.

  5. A classification tree for the prediction of benign versus malignant disease in patients with small renal masses.

    PubMed

    Rendon, Ricardo A; Mason, Ross J; Kirkland, Susan; Lawen, Joseph G; Abdolell, Mohamed

    2014-08-01

    To develop a classification tree for the preoperative prediction of benign versus malignant disease in patients with small renal masses. This is a retrospective study including 395 consecutive patients who underwent surgical treatment for a renal mass < 5 cm in maximum diameter between July 1st 2001 and June 30th 2010. A classification tree to predict the risk of having a benign renal mass preoperatively was developed using recursive partitioning analysis for repeated measures outcomes. Age, sex, volume on preoperative imaging, tumor location (central/peripheral), degree of endophytic component (1%-100%), and tumor axis position were used as potential predictors to develop the model. Forty-five patients (11.4%) were found to have a benign mass postoperatively. A classification tree has been developed which can predict the risk of benign disease with an accuracy of 88.9% (95% CI: 85.3 to 91.8). The significant prognostic factors in the classification tree are tumor volume, degree of endophytic component and symptoms at diagnosis. As an example of its utilization, a renal mass with a volume of < 5.67 cm3 that is < 45% endophytic has a 52.6% chance of having benign pathology. Conversely, a renal mass with a volume ≥ 5.67 cm3 that is ≥ 35% endophytic has only a 5.3% possibility of being benign. A classification tree to predict the risk of benign disease in small renal masses has been developed to aid the clinician when deciding on treatment strategies for small renal masses.

  6. Attitudes of High School Teachers to Educational Research Using Classification-Tree Method

    ERIC Educational Resources Information Center

    Akcoltekin, Alpturk; Engin, Ali Osman; Sevgin, Hikmet

    2017-01-01

    Purpose: The main objective is to investigate high school teachers' attitudes relating to educational research with respect to demographic variables. Research Methods: The study is based on the relational screening model. Data was obtained through an adapted scale to determine high school teachers' attitudes toward educational research. The study…

  7. Extensions and applications of ensemble-of-trees methods in machine learning

    NASA Astrophysics Data System (ADS)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems.

  8. A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem

    PubMed Central

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  9. Random forests and stochastic gradient boosting for predicting tree canopy cover: Comparing tuning processes and model performance

    Treesearch

    E. Freeman; G. Moisen; J. Coulston; B. Wilson

    2014-01-01

    Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...

  10. Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups

    PubMed Central

    2012-01-01

    Background Hospital in-patient falls constitute a prominent problem in terms of costs and consequences. Geriatric institutions are most often affected, and common screening tools cannot predict in-patient falls consistently. Our objectives are to derive comprehensible fall risk classification models from a large data set of geriatric in-patients' assessment data and to evaluate their predictive performance (aim#1), and to identify high-risk subgroups from the data (aim#2). Methods A data set of n = 5,176 single in-patient episodes covering 1.5 years of admissions to a geriatric hospital were extracted from the hospital's data base and matched with fall incident reports (n = 493). A classification tree model was induced using the C4.5 algorithm as well as a logistic regression model, and their predictive performance was evaluated. Furthermore, high-risk subgroups were identified from extracted classification rules with a support of more than 100 instances. Results The classification tree model showed an overall classification accuracy of 66%, with a sensitivity of 55.4%, a specificity of 67.1%, positive and negative predictive values of 15% resp. 93.5%. Five high-risk groups were identified, defined by high age, low Barthel index, cognitive impairment, multi-medication and co-morbidity. Conclusions Our results show that a little more than half of the fallers may be identified correctly by our model, but the positive predictive value is too low to be applicable. Non-fallers, on the other hand, may be sorted out with the model quite well. The high-risk subgroups and the risk factors identified (age, low ADL score, cognitive impairment, institutionalization, polypharmacy and co-morbidity) reflect domain knowledge and may be used to screen certain subgroups of patients with a high risk of falling. Classification models derived from a large data set using data mining methods can compete with current dedicated fall risk screening tools, yet lack diagnostic precision. High-risk subgroups may be identified automatically from existing geriatric assessment data, especially when combined with domain knowledge in a hybrid classification model. Further work is necessary to validate our approach in a controlled prospective setting. PMID:22417403

  11. Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups.

    PubMed

    Marschollek, Michael; Gövercin, Mehmet; Rust, Stefan; Gietzelt, Matthias; Schulze, Mareike; Wolf, Klaus-Hendrik; Steinhagen-Thiessen, Elisabeth

    2012-03-14

    Hospital in-patient falls constitute a prominent problem in terms of costs and consequences. Geriatric institutions are most often affected, and common screening tools cannot predict in-patient falls consistently. Our objectives are to derive comprehensible fall risk classification models from a large data set of geriatric in-patients' assessment data and to evaluate their predictive performance (aim#1), and to identify high-risk subgroups from the data (aim#2). A data set of n = 5,176 single in-patient episodes covering 1.5 years of admissions to a geriatric hospital were extracted from the hospital's data base and matched with fall incident reports (n = 493). A classification tree model was induced using the C4.5 algorithm as well as a logistic regression model, and their predictive performance was evaluated. Furthermore, high-risk subgroups were identified from extracted classification rules with a support of more than 100 instances. The classification tree model showed an overall classification accuracy of 66%, with a sensitivity of 55.4%, a specificity of 67.1%, positive and negative predictive values of 15% resp. 93.5%. Five high-risk groups were identified, defined by high age, low Barthel index, cognitive impairment, multi-medication and co-morbidity. Our results show that a little more than half of the fallers may be identified correctly by our model, but the positive predictive value is too low to be applicable. Non-fallers, on the other hand, may be sorted out with the model quite well. The high-risk subgroups and the risk factors identified (age, low ADL score, cognitive impairment, institutionalization, polypharmacy and co-morbidity) reflect domain knowledge and may be used to screen certain subgroups of patients with a high risk of falling. Classification models derived from a large data set using data mining methods can compete with current dedicated fall risk screening tools, yet lack diagnostic precision. High-risk subgroups may be identified automatically from existing geriatric assessment data, especially when combined with domain knowledge in a hybrid classification model. Further work is necessary to validate our approach in a controlled prospective setting.

  12. Identification of terrain cover using the optimum polarimetric classifier

    NASA Technical Reports Server (NTRS)

    Kong, J. A.; Swartz, A. A.; Yueh, H. A.; Novak, L. M.; Shin, R. T.

    1988-01-01

    A systematic approach for the identification of terrain media such as vegetation canopy, forest, and snow-covered fields is developed using the optimum polarimetric classifier. The covariance matrices for various terrain cover are computed from theoretical models of random medium by evaluating the scattering matrix elements. The optimal classification scheme makes use of a quadratic distance measure and is applied to classify a vegetation canopy consisting of both trees and grass. Experimentally measured data are used to validate the classification scheme. Analytical and Monte Carlo simulated classification errors using the fully polarimetric feature vector are compared with classification based on single features which include the phase difference between the VV and HH polarization returns. It is shown that the full polarimetric results are optimal and provide better classification performance than single feature measurements.

  13. Discriminative Hierarchical K-Means Tree for Large-Scale Image Classification.

    PubMed

    Chen, Shizhi; Yang, Xiaodong; Tian, Yingli

    2015-09-01

    A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.

  14. Classification of US hydropower dams by their modes of operation

    DOE PAGES

    McManamay, Ryan A.; Oigbokie, II, Clement O.; Kao, Shih -Chieh; ...

    2016-02-19

    A key challenge to understanding ecohydrologic responses to dam regulation is the absence of a universally transferable classification framework for how dams operate. In the present paper, we develop a classification system to organize the modes of operation (MOPs) for U.S. hydropower dams and powerplants. To determine the full diversity of MOPs, we mined federal documents, open-access data repositories, and internet sources. W then used CART classification trees to predict MOPs based on physical characteristics, regulation, and project generation. Finally, we evaluated how much variation MOPs explained in sub-daily discharge patterns for stream gages downstream of hydropower dams. After reviewingmore » information for 721 dams and 597 power plants, we developed a 2-tier hierarchical classification based on 1) the storage and control of flows to powerplants, and 2) the presence of a diversion around the natural stream bed. This resulted in nine tier-1 MOPs representing a continuum of operations from strictly peaking, to reregulating, to run-of-river, and two tier-2 MOPs, representing diversion and integral dam-powerhouse configurations. Although MOPs differed in physical characteristics and energy production, classification trees had low accuracies (<62%), which suggested accurate evaluations of MOPs may require individual attention. MOPs and dam storage explained 20% of the variation in downstream subdaily flow characteristics and showed consistent alterations in subdaily flow patterns from reference streams. Lastly, this standardized classification scheme is important for future research including estimating reservoir operations for large-scale hydrologic models and evaluating project economics, environmental impacts, and mitigation.« less

  15. Classification of US hydropower dams by their modes of operation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    McManamay, Ryan A.; Oigbokie, II, Clement O.; Kao, Shih -Chieh

    A key challenge to understanding ecohydrologic responses to dam regulation is the absence of a universally transferable classification framework for how dams operate. In the present paper, we develop a classification system to organize the modes of operation (MOPs) for U.S. hydropower dams and powerplants. To determine the full diversity of MOPs, we mined federal documents, open-access data repositories, and internet sources. W then used CART classification trees to predict MOPs based on physical characteristics, regulation, and project generation. Finally, we evaluated how much variation MOPs explained in sub-daily discharge patterns for stream gages downstream of hydropower dams. After reviewingmore » information for 721 dams and 597 power plants, we developed a 2-tier hierarchical classification based on 1) the storage and control of flows to powerplants, and 2) the presence of a diversion around the natural stream bed. This resulted in nine tier-1 MOPs representing a continuum of operations from strictly peaking, to reregulating, to run-of-river, and two tier-2 MOPs, representing diversion and integral dam-powerhouse configurations. Although MOPs differed in physical characteristics and energy production, classification trees had low accuracies (<62%), which suggested accurate evaluations of MOPs may require individual attention. MOPs and dam storage explained 20% of the variation in downstream subdaily flow characteristics and showed consistent alterations in subdaily flow patterns from reference streams. Lastly, this standardized classification scheme is important for future research including estimating reservoir operations for large-scale hydrologic models and evaluating project economics, environmental impacts, and mitigation.« less

  16. Method of Grassland Information Extraction Based on Multi-Level Segmentation and Cart Model

    NASA Astrophysics Data System (ADS)

    Qiao, Y.; Chen, T.; He, J.; Wen, Q.; Liu, F.; Wang, Z.

    2018-04-01

    It is difficult to extract grassland accurately by traditional classification methods, such as supervised method based on pixels or objects. This paper proposed a new method combing the multi-level segmentation with CART (classification and regression tree) model. The multi-level segmentation which combined the multi-resolution segmentation and the spectral difference segmentation could avoid the over and insufficient segmentation seen in the single segmentation mode. The CART model was established based on the spectral characteristics and texture feature which were excavated from training sample data. Xilinhaote City in Inner Mongolia Autonomous Region was chosen as the typical study area and the proposed method was verified by using visual interpretation results as approximate truth value. Meanwhile, the comparison with the nearest neighbor supervised classification method was obtained. The experimental results showed that the total precision of classification and the Kappa coefficient of the proposed method was 95 % and 0.9, respectively. However, the total precision of classification and the Kappa coefficient of the nearest neighbor supervised classification method was 80 % and 0.56, respectively. The result suggested that the accuracy of classification proposed in this paper was higher than the nearest neighbor supervised classification method. The experiment certificated that the proposed method was an effective extraction method of grassland information, which could enhance the boundary of grassland classification and avoid the restriction of grassland distribution scale. This method was also applicable to the extraction of grassland information in other regions with complicated spatial features, which could avoid the interference of woodland, arable land and water body effectively.

  17. A method of real-time fault diagnosis for power transformers based on vibration analysis

    NASA Astrophysics Data System (ADS)

    Hong, Kaixing; Huang, Hai; Zhou, Jianping; Shen, Yimin; Li, Yujie

    2015-11-01

    In this paper, a novel probability-based classification model is proposed for real-time fault detection of power transformers. First, the transformer vibration principle is introduced, and two effective feature extraction techniques are presented. Next, the details of the classification model based on support vector machine (SVM) are shown. The model also includes a binary decision tree (BDT) which divides transformers into different classes according to health state. The trained model produces posterior probabilities of membership to each predefined class for a tested vibration sample. During the experiments, the vibrations of transformers under different conditions are acquired, and the corresponding feature vectors are used to train the SVM classifiers. The effectiveness of this model is illustrated experimentally on typical in-service transformers. The consistency between the results of the proposed model and the actual condition of the test transformers indicates that the model can be used as a reliable method for transformer fault detection.

  18. A Semi-Automated Machine Learning Algorithm for Tree Cover Delineation from 1-m Naip Imagery Using a High Performance Computing Architecture

    NASA Astrophysics Data System (ADS)

    Basu, S.; Ganguly, S.; Nemani, R. R.; Mukhopadhyay, S.; Milesi, C.; Votava, P.; Michaelis, A.; Zhang, G.; Cook, B. D.; Saatchi, S. S.; Boyda, E.

    2014-12-01

    Accurate tree cover delineation is a useful instrument in the derivation of Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) satellite imagery data. Numerous algorithms have been designed to perform tree cover delineation in high to coarse resolution satellite imagery, but most of them do not scale to terabytes of data, typical in these VHR datasets. In this paper, we present an automated probabilistic framework for the segmentation and classification of 1-m VHR data as obtained from the National Agriculture Imagery Program (NAIP) for deriving tree cover estimates for the whole of Continental United States, using a High Performance Computing Architecture. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field (CRF), which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by incorporating expert knowledge through the relabeling of misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the state of California, which covers a total of 11,095 NAIP tiles and spans a total geographical area of 163,696 sq. miles. Our framework produced correct detection rates of around 85% for fragmented forests and 70% for urban tree cover areas, with false positive rates lower than 3% for both regions. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR high-resolution canopy height model shows the effectiveness of our algorithm in generating accurate high-resolution tree cover maps.

  19. SVM-based tree-type neural networks as a critic in adaptive critic designs for control.

    PubMed

    Deb, Alok Kanti; Jayadeva; Gopal, Madan; Chandra, Suresh

    2007-07-01

    In this paper, we use the approach of adaptive critic design (ACD) for control, specifically, the action-dependent heuristic dynamic programming (ADHDP) method. A least squares support vector machine (SVM) regressor has been used for generating the control actions, while an SVM-based tree-type neural network (NN) is used as the critic. After a failure occurs, the critic and action are retrained in tandem using the failure data. Failure data is binary classification data, where the number of failure states are very few as compared to the number of no-failure states. The difficulty of conventional multilayer feedforward NNs in learning this type of classification data has been overcome by using the SVM-based tree-type NN, which due to its feature to add neurons to learn misclassified data, has the capability to learn any binary classification data without a priori choice of the number of neurons or the structure of the network. The capability of the trained controller to handle unforeseen situations is demonstrated.

  20. The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis

    PubMed Central

    Koziol, James A.; Feng, Anne C.; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan

    2009-01-01

    Motivation: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Results: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors. Contact: dmercola@uci.edu PMID:18628288

  1. Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees.

    PubMed

    Chung, Yi-Shih

    2013-12-01

    Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies. Copyright © 2012 Elsevier Ltd. All rights reserved.

  2. Single-accelerometer-based daily physical activity classification.

    PubMed

    Long, Xi; Yin, Bin; Aarts, Ronald M

    2009-01-01

    In this study, a single tri-axial accelerometer placed on the waist was used to record the acceleration data for human physical activity classification. The data collection involved 24 subjects performing daily real-life activities in a naturalistic environment without researchers' intervention. For the purpose of assessing customers' daily energy expenditure, walking, running, cycling, driving, and sports were chosen as target activities for classification. This study compared a Bayesian classification with that of a Decision Tree based approach. A Bayes classifier has the advantage to be more extensible, requiring little effort in classifier retraining and software update upon further expansion or modification of the target activities. Principal components analysis was applied to remove the correlation among features and to reduce the feature vector dimension. Experiments using leave-one-subject-out and 10-fold cross validation protocols revealed a classification accuracy of approximately 80%, which was comparable with that obtained by a Decision Tree classifier.

  3. Phylogenetic classification and the universal tree.

    PubMed

    Doolittle, W F

    1999-06-25

    From comparative analyses of the nucleotide sequences of genes encoding ribosomal RNAs and several proteins, molecular phylogeneticists have constructed a "universal tree of life," taking it as the basis for a "natural" hierarchical classification of all living things. Although confidence in some of the tree's early branches has recently been shaken, new approaches could still resolve many methodological uncertainties. More challenging is evidence that most archaeal and bacterial genomes (and the inferred ancestral eukaryotic nuclear genome) contain genes from multiple sources. If "chimerism" or "lateral gene transfer" cannot be dismissed as trivial in extent or limited to special categories of genes, then no hierarchical universal classification can be taken as natural. Molecular phylogeneticists will have failed to find the "true tree," not because their methods are inadequate or because they have chosen the wrong genes, but because the history of life cannot properly be represented as a tree. However, taxonomies based on molecular sequences will remain indispensable, and understanding of the evolutionary process will ultimately be enriched, not impoverished.

  4. Effectiveness of repeated examination to diagnose enterobiasis in nursery school groups.

    PubMed

    Remm, Mare; Remm, Kalle

    2009-09-01

    The aim of this study was to estimate the benefit from repeated examinations in the diagnosis of enterobiasis in nursery school groups, and to test the effectiveness of individual-based risk predictions using different methods. A total of 604 children were examined using double, and 96 using triple, anal swab examinations. The questionnaires for parents, structured observations, and interviews with supervisors were used to identify factors of possible infection risk. In order to model the risk of enterobiasis at individual level, a similarity-based machine learning and prediction software Constud was compared with data mining methods in the Statistica 8 Data Miner software package. Prevalence according to a single examination was 22.5%; the increase as a result of double examinations was 8.2%. Single swabs resulted in an estimated prevalence of 20.1% among children examined 3 times; double swabs increased this by 10.1%, and triple swabs by 7.3%. Random forest classification, boosting classification trees, and Constud correctly predicted about 2/3 of the results of the second examination. Constud estimated a mean prevalence of 31.5% in groups. Constud was able to yield the highest overall fit of individual-based predictions while boosting classification tree and random forest models were more effective in recognizing Enterobius positive persons. As a rule, the actual prevalence of enterobiasis is higher than indicated by a single examination. We suggest using either the values of the mean increase in prevalence after double examinations compared to single examinations or group estimations deduced from individual-level modelled risk predictions.

  5. Effectiveness of Repeated Examination to Diagnose Enterobiasis in Nursery School Groups

    PubMed Central

    Remm, Kalle

    2009-01-01

    The aim of this study was to estimate the benefit from repeated examinations in the diagnosis of enterobiasis in nursery school groups, and to test the effectiveness of individual-based risk predictions using different methods. A total of 604 children were examined using double, and 96 using triple, anal swab examinations. The questionnaires for parents, structured observations, and interviews with supervisors were used to identify factors of possible infection risk. In order to model the risk of enterobiasis at individual level, a similarity-based machine learning and prediction software Constud was compared with data mining methods in the Statistica 8 Data Miner software package. Prevalence according to a single examination was 22.5%; the increase as a result of double examinations was 8.2%. Single swabs resulted in an estimated prevalence of 20.1% among children examined 3 times; double swabs increased this by 10.1%, and triple swabs by 7.3%. Random forest classification, boosting classification trees, and Constud correctly predicted about 2/3 of the results of the second examination. Constud estimated a mean prevalence of 31.5% in groups. Constud was able to yield the highest overall fit of individual-based predictions while boosting classification tree and random forest models were more effective in recognizing Enterobius positive persons. As a rule, the actual prevalence of enterobiasis is higher than indicated by a single examination. We suggest using either the values of the mean increase in prevalence after double examinations compared to single examinations or group estimations deduced from individual-level modelled risk predictions. PMID:19724696

  6. Stand conditions associated with roundheaded pine beetle (Coleoptera: Scolytidae) infestations in Arizona and Utah

    Treesearch

    Jose F. Negron; Jill L. Wilson; John A. Anhold

    2000-01-01

    Stand conditions associated with outbreak populations of the roundheaded pine beetle, Dendroctonus adjunctus Blandford, in ponderosa pine, Pinus ponderosa Dougl. ex Laws., forests were studied in the Pinaleno Mountains, AZ, and the Pine Valley Mountains, UT. Classification tree models to estimate the probability of infestation based on stand attributes were built for...

  7. Assessing herbivore foraging behavior with GPS collars in a semiarid grassland.

    PubMed

    Augustine, David J; Derner, Justin D

    2013-03-15

    Advances in global positioning system (GPS) technology have dramatically enhanced the ability to track and study distributions of free-ranging livestock. Understanding factors controlling the distribution of free-ranging livestock requires the ability to assess when and where they are foraging. For four years (2008-2011), we periodically collected GPS and activity sensor data together with direct observations of collared cattle grazing semiarid rangeland in eastern Colorado. From these data, we developed classification tree models that allowed us to discriminate between grazing and non-grazing activities. We evaluated: (1) which activity sensor measurements from the GPS collars were most valuable in predicting cattle foraging behavior, (2) the accuracy of binary (grazing, non-grazing) activity models vs. models with multiple activity categories (grazing, resting, traveling, mixed), and (3) the accuracy of models that are robust across years vs. models specific to a given year. A binary classification tree correctly removed 86.5% of the non-grazing locations, while correctly retaining 87.8% of the locations where the animal was grazing, for an overall misclassification rate of 12.9%. A classification tree that separated activity into four different categories yielded a greater misclassification rate of 16.0%. Distance travelled in a 5 minute interval and the proportion of the interval with the sensor indicating a head down position were the two most important variables predicting grazing activity. Fitting annual models of cattle foraging activity did not improve model accuracy compared to a single model based on all four years combined. This suggests that increased sample size was more valuable than accounting for interannual variation in foraging behavior associated with variation in forage production. Our models differ from previous assessments in semiarid rangeland of Israel and mesic pastures in the United States in terms of the value of different activity sensor measurements for identifying grazing activity, suggesting that the use of GPS collars to classify cattle grazing behavior will require calibrations specific to the environment and vegetation being studied.

  8. Fragment-based prediction of skin sensitization using recursive partitioning

    NASA Astrophysics Data System (ADS)

    Lu, Jing; Zheng, Mingyue; Wang, Yong; Shen, Qiancheng; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian

    2011-09-01

    Skin sensitization is an important toxic endpoint in the risk assessment of chemicals. In this paper, structure-activity relationships analysis was performed on the skin sensitization potential of 357 compounds with local lymph node assay data. Structural fragments were extracted by GASTON (GrAph/Sequence/Tree extractiON) from the training set. Eight fragments with accuracy significantly higher than 0.73 ( p < 0.1) were retained to make up an indicator descriptor fragment. The fragment descriptor and eight other physicochemical descriptors closely related to the endpoint were calculated to construct the recursive partitioning tree (RP tree) for classification. The balanced accuracy of the training set, test set I, and test set II in the leave-one-out model were 0.846, 0.800, and 0.809, respectively. The results highlight that fragment-based RP tree is a preferable method for identifying skin sensitizers. Moreover, the selected fragments provide useful structural information for exploring sensitization mechanisms, and RP tree creates a graphic tree to identify the most important properties associated with skin sensitization. They can provide some guidance for designing of drugs with lower sensitization level.

  9. Semantic Segmentation of Forest Stands of Pure Species as a Global Optimization Problem

    NASA Astrophysics Data System (ADS)

    Dechesne, C.; Mallet, C.; Le Bris, A.; Gouet-Brunet, V.

    2017-05-01

    Forest stand delineation is a fundamental task for forest management purposes, that is still mainly manually performed through visual inspection of geospatial (very) high spatial resolution images. Stand detection has been barely addressed in the literature which has mainly focused, in forested environments, on individual tree extraction and tree species classification. From a methodological point of view, stand detection can be considered as a semantic segmentation problem. It offers two advantages. First, one can retrieve the dominant tree species per segment. Secondly, one can benefit from existing low-level tree species label maps from the literature as a basis for high-level object extraction. Thus, the semantic segmentation issue becomes a regularization issue in a weakly structured environment and can be formulated in an energetical framework. This papers aims at investigating which regularization strategies of the literature are the most adapted to delineate and classify forest stands of pure species. Both airborne lidar point clouds and multispectral very high spatial resolution images are integrated for that purpose. The local methods (such as filtering and probabilistic relaxation) are not adapted for such problem since the increase of the classification accuracy is below 5%. The global methods, based on an energy model, tend to be more efficient with an accuracy gain up to 15%. The segmentation results using such models have an accuracy ranging from 96% to 99%.

  10. Temporal expansion of annual crop classification layers for the CONUS using the C5 decision tree classifier

    USGS Publications Warehouse

    Friesz, Aaron M.; Wylie, Bruce K.; Howard, Daniel M.

    2017-01-01

    Crop cover maps have become widely used in a range of research applications. Multiple crop cover maps have been developed to suite particular research interests. The National Agricultural Statistics Service (NASS) Cropland Data Layers (CDL) are a series of commonly used crop cover maps for the conterminous United States (CONUS) that span from 2008 to 2013. In this investigation, we sought to contribute to the availability of consistent CONUS crop cover maps by extending temporal coverage of the NASS CDL archive back eight additional years to 2000 by creating annual NASS CDL-like crop cover maps derived from a classification tree model algorithm. We used over 11 million records to train a classification tree algorithm and develop a crop classification model (CCM). The model was used to create crop cover maps for the CONUS for years 2000–2013 at 250 m spatial resolution. The CCM and the maps for years 2008–2013 were assessed for accuracy relative to resampled NASS CDLs. The CCM performed well against a withheld test data set with a model prediction accuracy of over 90%. The assessment of the crop cover maps indicated that the model performed well spatially, placing crop cover pixels within their known domains; however, the model did show a bias towards the ‘Other’ crop cover class, which caused frequent misclassifications of pixels around the periphery of large crop cover patch clusters and of pixels that form small, sparsely dispersed crop cover patches.

  11. A novel chemometric classification for FTIR spectra of mycotoxin-contaminated maize and peanuts at regulatory limits.

    PubMed

    Kos, Gregor; Sieger, Markus; McMullin, David; Zahradnik, Celine; Sulyok, Michael; Öner, Tuba; Mizaikoff, Boris; Krska, Rudolf

    2016-10-01

    The rapid identification of mycotoxins such as deoxynivalenol and aflatoxin B 1 in agricultural commodities is an ongoing concern for food importers and processors. While sophisticated chromatography-based methods are well established for regulatory testing by food safety authorities, few techniques exist to provide a rapid assessment for traders. This study advances the development of a mid-infrared spectroscopic method, recording spectra with little sample preparation. Spectral data were classified using a bootstrap-aggregated (bagged) decision tree method, evaluating the protein and carbohydrate absorption regions of the spectrum. The method was able to classify 79% of 110 maize samples at the European Union regulatory limit for deoxynivalenol of 1750 µg kg -1 and, for the first time, 77% of 92 peanut samples at 8 µg kg -1 of aflatoxin B 1 . A subset model revealed a dependency on variety and type of fungal infection. The employed CRC and SBL maize varieties could be pooled in the model with a reduction of classification accuracy from 90% to 79%. Samples infected with Fusarium verticillioides were removed, leaving samples infected with F. graminearum and F. culmorum in the dataset improving classification accuracy from 73% to 79%. A 500 µg kg -1 classification threshold for deoxynivalenol in maize performed even better with 85% accuracy. This is assumed to be due to a larger number of samples around the threshold increasing representativity. Comparison with established principal component analysis classification, which consistently showed overlapping clusters, confirmed the superior performance of bagged decision tree classification.

  12. Evaluating Chemical Persistence in a Multimedia Environment: ACART Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, D.H.; McKone, T.E.; Kastenberg, W.E.

    1999-02-01

    For the thousands of chemicals continuously released into the environment, it is desirable to make prospective assessments of those likely to be persistent. Persistent chemicals are difficult to remove if adverse health or ecological effects are later discovered. A tiered approach using a classification scheme and a multimedia model for determining persistence is presented. Using specific criteria for persistence, a classification tree is developed to classify a chemical as ''persistent'' or ''non-persistent'' based on the chemical properties. In this approach, the classification is derived from the results of a standardized unit world multimedia model. Thus, the classifications are more robustmore » for multimedia pollutants than classifications using a single medium half-life. The method can be readily implemented and provides insight without requiring extensive and often unavailable data. This method can be used to classify chemicals when only a few properties are known and be used to direct further data collection. Case studies are presented to demonstrate the advantages of the approach.« less

  13. Automated method for identification and artery-venous classification of vessel trees in retinal vessel networks.

    PubMed

    Joshi, Vinayak S; Reinhardt, Joseph M; Garvin, Mona K; Abramoff, Michael D

    2014-01-01

    The separation of the retinal vessel network into distinct arterial and venous vessel trees is of high interest. We propose an automated method for identification and separation of retinal vessel trees in a retinal color image by converting a vessel segmentation image into a vessel segment map and identifying the individual vessel trees by graph search. Orientation, width, and intensity of each vessel segment are utilized to find the optimal graph of vessel segments. The separated vessel trees are labeled as primary vessel or branches. We utilize the separated vessel trees for arterial-venous (AV) classification, based on the color properties of the vessels in each tree graph. We applied our approach to a dataset of 50 fundus images from 50 subjects. The proposed method resulted in an accuracy of 91.44% correctly classified vessel pixels as either artery or vein. The accuracy of correctly classified major vessel segments was 96.42%.

  14. Object-based analysis of multispectral airborne laser scanner data for land cover classification and map updating

    NASA Astrophysics Data System (ADS)

    Matikainen, Leena; Karila, Kirsi; Hyyppä, Juha; Litkey, Paula; Puttonen, Eetu; Ahokas, Eero

    2017-06-01

    During the last 20 years, airborne laser scanning (ALS), often combined with passive multispectral information from aerial images, has shown its high feasibility for automated mapping processes. The main benefits have been achieved in the mapping of elevated objects such as buildings and trees. Recently, the first multispectral airborne laser scanners have been launched, and active multispectral information is for the first time available for 3D ALS point clouds from a single sensor. This article discusses the potential of this new technology in map updating, especially in automated object-based land cover classification and change detection in a suburban area. For our study, Optech Titan multispectral ALS data over a suburban area in Finland were acquired. Results from an object-based random forests analysis suggest that the multispectral ALS data are very useful for land cover classification, considering both elevated classes and ground-level classes. The overall accuracy of the land cover classification results with six classes was 96% compared with validation points. The classes under study included building, tree, asphalt, gravel, rocky area and low vegetation. Compared to classification of single-channel data, the main improvements were achieved for ground-level classes. According to feature importance analyses, multispectral intensity features based on several channels were more useful than those based on one channel. Automatic change detection for buildings and roads was also demonstrated by utilising the new multispectral ALS data in combination with old map vectors. In change detection of buildings, an old digital surface model (DSM) based on single-channel ALS data was also used. Overall, our analyses suggest that the new data have high potential for further increasing the automation level in mapping. Unlike passive aerial imaging commonly used in mapping, the multispectral ALS technology is independent of external illumination conditions, and there are no shadows on intensity images produced from the data. These are significant advantages in developing automated classification and change detection procedures.

  15. Comparison of Ordinal and Nominal Classification Trees to Predict Ordinal Expert-Based Occupational Exposure Estimates in a Case–Control Study

    PubMed Central

    Wheeler, David C.; Archer, Kellie J.; Burstyn, Igor; Yu, Kai; Stewart, Patricia A.; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Schwenn, Molly; Johnson, Alison; Armenti, Karla; Silverman, Debra T.; Friesen, Melissa C.

    2015-01-01

    Objectives: To evaluate occupational exposures in case–control studies, exposure assessors typically review each job individually to assign exposure estimates. This process lacks transparency and does not provide a mechanism for recreating the decision rules in other studies. In our previous work, nominal (unordered categorical) classification trees (CTs) generally successfully predicted expert-assessed ordinal exposure estimates (i.e. none, low, medium, high) derived from occupational questionnaire responses, but room for improvement remained. Our objective was to determine if using recently developed ordinal CTs would improve the performance of nominal trees in predicting ordinal occupational diesel exhaust exposure estimates in a case–control study. Methods: We used one nominal and four ordinal CT methods to predict expert-assessed probability, intensity, and frequency estimates of occupational diesel exhaust exposure (each categorized as none, low, medium, or high) derived from questionnaire responses for the 14983 jobs in the New England Bladder Cancer Study. To replicate the common use of a single tree, we applied each method to a single sample of 70% of the jobs, using 15% to test and 15% to validate each method. To characterize variability in performance, we conducted a resampling analysis that repeated the sample draws 100 times. We evaluated agreement between the tree predictions and expert estimates using Somers’ d, which measures differences in terms of ordinal association between predicted and observed scores and can be interpreted similarly to a correlation coefficient. Results: From the resampling analysis, compared with the nominal tree, an ordinal CT method that used a quadratic misclassification function and controlled tree size based on total misclassification cost had a slightly better predictive performance that was statistically significant for the frequency metric (Somers’ d: nominal tree = 0.61; ordinal tree = 0.63) and similar performance for the probability (nominal = 0.65; ordinal = 0.66) and intensity (nominal = 0.65; ordinal = 0.65) metrics. The best ordinal CT predicted fewer cases of large disagreement with the expert assessments (i.e. no exposure predicted for a job with high exposure and vice versa) compared with the nominal tree across all of the exposure metrics. For example, the percent of jobs with expert-assigned high intensity of exposure that the model predicted as no exposure was 29% for the nominal tree and 22% for the best ordinal tree. Conclusions: The overall agreements were similar across CT models; however, the use of ordinal models reduced the magnitude of the discrepancy when disagreements occurred. As the best performing model can vary by situation, researchers should consider evaluating multiple CT methods to maximize the predictive performance within their data. PMID:25433003

  16. a Two-Step Classification Approach to Distinguishing Similar Objects in Mobile LIDAR Point Clouds

    NASA Astrophysics Data System (ADS)

    He, H.; Khoshelham, K.; Fraser, C.

    2017-09-01

    Nowadays, lidar is widely used in cultural heritage documentation, urban modeling, and driverless car technology for its fast and accurate 3D scanning ability. However, full exploitation of the potential of point cloud data for efficient and automatic object recognition remains elusive. Recently, feature-based methods have become very popular in object recognition on account of their good performance in capturing object details. Compared with global features describing the whole shape of the object, local features recording the fractional details are more discriminative and are applicable for object classes with considerable similarity. In this paper, we propose a two-step classification approach based on point feature histograms and the bag-of-features method for automatic recognition of similar objects in mobile lidar point clouds. Lamp post, street light and traffic sign are grouped as one category in the first-step classification for their inter similarity compared with tree and vehicle. A finer classification of the lamp post, street light and traffic sign based on the result of the first-step classification is implemented in the second step. The proposed two-step classification approach is shown to yield a considerable improvement over the conventional one-step classification approach.

  17. Decision Tree Repository and Rule Set Based Mingjiang River Estuarine Wetlands Classifaction

    NASA Astrophysics Data System (ADS)

    Zhang, W.; Li, X.; Xiao, W.

    2018-05-01

    The increasing urbanization and industrialization have led to wetland losses in estuarine area of Mingjiang River over past three decades. There has been increasing attention given to produce wetland inventories using remote sensing and GIS technology. Due to inconsistency training site and training sample, traditionally pixel-based image classification methods can't achieve a comparable result within different organizations. Meanwhile, object-oriented image classification technique shows grate potential to solve this problem and Landsat moderate resolution remote sensing images are widely used to fulfill this requirement. Firstly, the standardized atmospheric correct, spectrally high fidelity texture feature enhancement was conducted before implementing the object-oriented wetland classification method in eCognition. Secondly, we performed the multi-scale segmentation procedure, taking the scale, hue, shape, compactness and smoothness of the image into account to get the appropriate parameters, using the top and down region merge algorithm from single pixel level, the optimal texture segmentation scale for different types of features is confirmed. Then, the segmented object is used as the classification unit to calculate the spectral information such as Mean value, Maximum value, Minimum value, Brightness value and the Normalized value. The Area, length, Tightness and the Shape rule of the image object Spatial features and texture features such as Mean, Variance and Entropy of image objects are used as classification features of training samples. Based on the reference images and the sampling points of on-the-spot investigation, typical training samples are selected uniformly and randomly for each type of ground objects. The spectral, texture and spatial characteristics of each type of feature in each feature layer corresponding to the range of values are used to create the decision tree repository. Finally, with the help of high resolution reference images, the random sampling method is used to conduct the field investigation, achieve an overall accuracy of 90.31 %, and the Kappa coefficient is 0.88. The classification method based on decision tree threshold values and rule set developed by the repository, outperforms the results obtained from the traditional methodology. Our decision tree repository and rule set based object-oriented classification technique was an effective method for producing comparable and consistency wetlands data set.

  18. Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes.

    PubMed

    Yates, Katherine L; Mellin, Camille; Caley, M Julian; Radford, Ben T; Meeuwig, Jessica J

    2016-01-01

    Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability.

  19. Models of Marine Fish Biodiversity: Assessing Predictors from Three Habitat Classification Schemes

    PubMed Central

    Yates, Katherine L.; Mellin, Camille; Caley, M. Julian; Radford, Ben T.; Meeuwig, Jessica J.

    2016-01-01

    Prioritising biodiversity conservation requires knowledge of where biodiversity occurs. Such knowledge, however, is often lacking. New technologies for collecting biological and physical data coupled with advances in modelling techniques could help address these gaps and facilitate improved management outcomes. Here we examined the utility of environmental data, obtained using different methods, for developing models of both uni- and multivariate biodiversity metrics. We tested which biodiversity metrics could be predicted best and evaluated the performance of predictor variables generated from three types of habitat data: acoustic multibeam sonar imagery, predicted habitat classification, and direct observer habitat classification. We used boosted regression trees (BRT) to model metrics of fish species richness, abundance and biomass, and multivariate regression trees (MRT) to model biomass and abundance of fish functional groups. We compared model performance using different sets of predictors and estimated the relative influence of individual predictors. Models of total species richness and total abundance performed best; those developed for endemic species performed worst. Abundance models performed substantially better than corresponding biomass models. In general, BRT and MRTs developed using predicted habitat classifications performed less well than those using multibeam data. The most influential individual predictor was the abiotic categorical variable from direct observer habitat classification and models that incorporated predictors from direct observer habitat classification consistently outperformed those that did not. Our results show that while remotely sensed data can offer considerable utility for predictive modelling, the addition of direct observer habitat classification data can substantially improve model performance. Thus it appears that there are aspects of marine habitats that are important for modelling metrics of fish biodiversity that are not fully captured by remotely sensed data. As such, the use of remotely sensed data to model biodiversity represents a compromise between model performance and data availability. PMID:27333202

  20. DIF Trees: Using Classification Trees to Detect Differential Item Functioning

    ERIC Educational Resources Information Center

    Vaughn, Brandon K.; Wang, Qiu

    2010-01-01

    A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

  1. A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks.

    PubMed

    Alauthaman, Mohammad; Aslam, Nauman; Zhang, Li; Alasem, Rafe; Hossain, M A

    2018-01-01

    In recent years, Botnets have been adopted as a popular method to carry and spread many malicious codes on the Internet. These malicious codes pave the way to execute many fraudulent activities including spam mail, distributed denial-of-service attacks and click fraud. While many Botnets are set up using centralized communication architecture, the peer-to-peer (P2P) Botnets can adopt a decentralized architecture using an overlay network for exchanging command and control data making their detection even more difficult. This work presents a method of P2P Bot detection based on an adaptive multilayer feed-forward neural network in cooperation with decision trees. A classification and regression tree is applied as a feature selection technique to select relevant features. With these features, a multilayer feed-forward neural network training model is created using a resilient back-propagation learning algorithm. A comparison of feature set selection based on the decision tree, principal component analysis and the ReliefF algorithm indicated that the neural network model with features selection based on decision tree has a better identification accuracy along with lower rates of false positives. The usefulness of the proposed approach is demonstrated by conducting experiments on real network traffic datasets. In these experiments, an average detection rate of 99.08 % with false positive rate of 0.75 % was observed.

  2. A new approach to enhance the performance of decision tree for classifying gene expression data.

    PubMed

    Hassan, Md; Kotagiri, Ramamohanarao

    2013-12-20

    Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.

  3. Computational approaches for the classification of seed storage proteins.

    PubMed

    Radhika, V; Rao, V Sree Hari

    2015-07-01

    Seed storage proteins comprise a major part of the protein content of the seed and have an important role on the quality of the seed. These storage proteins are important because they determine the total protein content and have an effect on the nutritional quality and functional properties for food processing. Transgenic plants are being used to develop improved lines for incorporation into plant breeding programs and the nutrient composition of seeds is a major target of molecular breeding programs. Hence, classification of these proteins is crucial for the development of superior varieties with improved nutritional quality. In this study we have applied machine learning algorithms for classification of seed storage proteins. We have presented an algorithm based on nearest neighbor approach for classification of seed storage proteins and compared its performance with decision tree J48, multilayer perceptron neural (MLP) network and support vector machine (SVM) libSVM. The model based on our algorithm has been able to give higher classification accuracy in comparison to the other methods.

  4. An automated approach to the design of decision tree classifiers

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Chin, P.; Beaudet, P.

    1980-01-01

    The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.

  5. A new tool for post-AGB SED classification

    NASA Astrophysics Data System (ADS)

    Bendjoya, P.; Suarez, O.; Galluccio, L.; Michel, O.

    We present the results of an unsupervised classification method applied on a set of 344 spectral energy distributions (SED) of post-AGB stars extracted from the Torun catalogue of Galactic post-AGB stars. This method aims to find a new unbiased method for post-AGB star classification based on the information contained in the IR region of the SED (fluxes, IR excess, colours). We used the data from IRAS and MSX satellites, and from the 2MASS survey. We applied a classification method based on the construction of the dataset of a minimal spanning tree (MST) with the Prim's algorithm. In order to build this tree, different metrics have been tested on both flux and color indices. Our method is able to classify the set of 344 post-AGB stars in 9 distinct groups according to their SEDs.

  6. Aspen, climate, and sudden decline in western USA

    Treesearch

    Gerald E. Rehfeldt; Dennis E. Ferguson; Nicholas L. Crookston

    2009-01-01

    A bioclimate model predicting the presence or absence of aspen, Populus tremuloides, in western USA from climate variables was developed by using the Random Forests classification tree on Forest Inventory data from about 118,000 permanent sample plots. A reasonably parsimonious model used eight predictors to describe aspen's climate profile. Classification errors...

  7. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    PubMed Central

    Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  8. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    PubMed

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  9. EULAR/ACR classification criteria for adult and juvenile idiopathic inflammatory myopathies and their major subgroups: a methodology report

    PubMed Central

    Bottai, Matteo; Tjärnlund, Anna; Santoni, Giola; Werth, Victoria P; Pilkington, Clarissa; de Visser, Marianne; Alfredsson, Lars; Amato, Anthony A; Barohn, Richard J; Liang, Matthew H; Aggarwal, Rohit; Arnardottir, Snjolaug; Chinoy, Hector; Cooper, Robert G; Danko, Katalin; Dimachkie, Mazen M; Feldman, Brian M; García-De La Torre, Ignacio; Gordon, Patrick; Hayashi, Taichi; Katz, James D; Kohsaka, Hitoshi; Lachenbruch, Peter A; Lang, Bianca A; Li, Yuhui; Oddis, Chester V; Olesinka, Marzena; Reed, Ann M; Rutkowska-Sak, Lidia; Sanner, Helga; Selva-O’Callaghan, Albert; Wook Song, Yeong; Ytterberg, Steven R; Miller, Frederick W; Rider, Lisa G; Lundberg, Ingrid E; Amoruso, Maria

    2017-01-01

    Objective To describe the methodology used to develop new classification criteria for adult and juvenile idiopathic inflammatory myopathies (IIMs) and their major subgroups. Methods An international, multidisciplinary group of myositis experts produced a set of 93 potentially relevant variables to be tested for inclusion in the criteria. Rheumatology, dermatology, neurology and paediatric clinics worldwide collected data on 976 IIM cases (74% adults, 26% children) and 624 non-IIM comparator cases with mimicking conditions (82% adults, 18% children). The participating clinicians classified each case as IIM or non-IIM. Generally, the classification of any given patient was based on few variables, leaving remaining variables unmeasured. We investigated the strength of the association between all variables and between these and the disease status as determined by the physician. We considered three approaches: (1) a probability-score approach, (2) a sum-of-items approach criteria and (3) a classification-tree approach. Results The approaches yielded several candidate models that were scrutinised with respect to statistical performance and clinical relevance. The probability-score approach showed superior statistical performance and clinical practicability and was therefore preferred over the others. We developed a classification tree for subclassification of patients with IIM. A calculator for electronic devices, such as computers and smartphones, facilitates the use of the European League Against Rheumatism/American College of Rheumatology (EULAR/ACR) classification criteria. Conclusions The new EULAR/ACR classification criteria provide a patient’s probability of having IIM for use in clinical and research settings. The probability is based on a score obtained by summing the weights associated with a set of criteria items. PMID:29177080

  10. Speaker gender identification based on majority vote classifiers

    NASA Astrophysics Data System (ADS)

    Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

    2017-03-01

    Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.

  11. Multispectral imaging burn wound tissue classification system: a comparison of test accuracies between several common machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.

    2016-03-01

    The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care, this technology may improve the standard of burn care for patients without access to specialized facilities.

  12. Forest Stand Segmentation Using Airborne LIDAR Data and Very High Resolution Multispectral Imagery

    NASA Astrophysics Data System (ADS)

    Dechesne, Clément; Mallet, Clément; Le Bris, Arnaud; Gouet, Valérie; Hervieu, Alexandre

    2016-06-01

    Forest stands are the basic units for forest inventory and mapping. Stands are large forested areas (e.g., ≥ 2 ha) of homogeneous tree species composition. The accurate delineation of forest stands is usually performed by visual analysis of human operators on very high resolution (VHR) optical images. This work is highly time consuming and should be automated for scalability purposes. In this paper, a method based on the fusion of airborne laser scanning data (or lidar) and very high resolution multispectral imagery for automatic forest stand delineation and forest land-cover database update is proposed. The multispectral images give access to the tree species whereas 3D lidar point clouds provide geometric information on the trees. Therefore, multi-modal features are computed, both at pixel and object levels. The objects are individual trees extracted from lidar data. A supervised classification is performed at the object level on the computed features in order to coarsely discriminate the existing tree species in the area of interest. The analysis at tree level is particularly relevant since it significantly improves the tree species classification. A probability map is generated through the tree species classification and inserted with the pixel-based features map in an energetical framework. The proposed energy is then minimized using a standard graph-cut method (namely QPBO with α-expansion) in order to produce a segmentation map with a controlled level of details. Comparison with an existing forest land cover database shows that our method provides satisfactory results both in terms of stand labelling and delineation (matching ranges between 94% and 99%).

  13. Validation of statistical predictive models meant to select melanoma patients for sentinel lymph node biopsy.

    PubMed

    Sabel, Michael S; Rice, John D; Griffith, Kent A; Lowe, Lori; Wong, Sandra L; Chang, Alfred E; Johnson, Timothy M; Taylor, Jeremy M G

    2012-01-01

    To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid sentinel lymph node biopsy (SLNB), several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests, and support vector machines. We sought to validate recently published models meant to predict sentinel node status. We queried our comprehensive, prospectively collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon four published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false-negative rate (FNR). Logistic regression performed comparably with our data when considering NPV (89.4 versus 93.6%); however, the model's specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsy rates that were lower (87.7 versus 94.1 and 29.8 versus 14.3, respectively). Two published models could not be applied to our data due to model complexity and the use of proprietary software. Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Statistical predictive models must be developed in a clinically applicable manner to allow for both validation and ultimately clinical utility.

  14. Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study.

    PubMed

    Khalkhali, Hamid Reza; Lotfnezhad Afshar, Hadi; Esnaashari, Omid; Jabbari, Nasrollah

    2016-01-01

    Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.

  15. Study on Ecological Risk Assessment of Guangxi Coastal Zone Based on 3s Technology

    NASA Astrophysics Data System (ADS)

    Zhong, Z.; Luo, H.; Ling, Z. Y.; Huang, Y.; Ning, W. Y.; Tang, Y. B.; Shao, G. Z.

    2018-05-01

    This paper takes Guangxi coastal zone as the study area, following the standards of land use type, divides the coastal zone of ecological landscape into seven kinds of natural wetland landscape types such as woodland, farmland, grassland, water, urban land and wetlands. Using TM data of 2000-2015 such 15 years, with the CART decision tree algorithm, for analysis the characteristic of types of landscape's remote sensing image and build decision tree rules of landscape classification to extract information classification. Analyzing of the evolution process of the landscape pattern in Guangxi coastal zone in nearly 15 years, we may understand the distribution characteristics and change rules. Combined with the natural disaster data, we use of landscape index and the related risk interference degree and construct ecological risk evaluation model in Guangxi coastal zone for ecological risk assessment results of Guangxi coastal zone.

  16. An AI-based approach to structural damage identification by modal analysis

    NASA Technical Reports Server (NTRS)

    Glass, B. J.; Hanagud, S.

    1990-01-01

    Flexible-structure damage is presently addressed by a combined model- and parameter-identification approach which employs the AI methodologies of classification, heuristic search, and object-oriented model knowledge representation. The conditions for model-space search convergence to the best model are discussed in terms of search-tree organization and initial model parameter error. In the illustrative example of a truss structure presented, the use of both model and parameter identification is shown to lead to smaller parameter corrections than would be required by parameter identification alone.

  17. Molecular classification of pesticides including persistent organic pollutants, phenylurea and sulphonylurea herbicides.

    PubMed

    Torrens, Francisco; Castellano, Gloria

    2014-06-05

    Pesticide residues in wine were analyzed by liquid chromatography-tandem mass spectrometry. Retentions are modelled by structure-property relationships. Bioplastic evolution is an evolutionary perspective conjugating effect of acquired characters and evolutionary indeterminacy-morphological determination-natural selection principles; its application to design co-ordination index barely improves correlations. Fractal dimensions and partition coefficient differentiate pesticides. Classification algorithms are based on information entropy and its production. Pesticides allow a structural classification by nonplanarity, and number of O, S, N and Cl atoms and cycles; different behaviours depend on number of cycles. The novelty of the approach is that the structural parameters are related to retentions. Classification algorithms are based on information entropy. When applying procedures to moderate-sized sets, excessive results appear compatible with data suffering a combinatorial explosion. However, equipartition conjecture selects criterion resulting from classification between hierarchical trees. Information entropy permits classifying compounds agreeing with principal component analyses. Periodic classification shows that pesticides in the same group present similar properties; those also in equal period, maximum resemblance. The advantage of the classification is to predict the retentions for molecules not included in the categorization. Classification extends to phenyl/sulphonylureas and the application will be to predict their retentions.

  18. Validation of Statistical Predictive Models Meant to Select Melanoma Patients for Sentinel Lymph Node Biopsy

    PubMed Central

    Sabel, Michael S.; Rice, John D.; Griffith, Kent A.; Lowe, Lori; Wong, Sandra L.; Chang, Alfred E.; Johnson, Timothy M.; Taylor, Jeremy M.G.

    2013-01-01

    Introduction To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status. Methods We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR). Results Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software. Conclusions Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility. PMID:21822550

  19. Surveying alignment-free features for Ortholog detection in related yeast proteomes by using supervised big data classifiers.

    PubMed

    Galpert, Deborah; Fernández, Alberto; Herrera, Francisco; Antunes, Agostinho; Molina-Ruiz, Reinaldo; Agüero-Chapin, Guillermin

    2018-05-03

    The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.

  20. Mapping trees outside forests using high-resolution aerial imagery: a comparison of pixel- and object based classification approaches

    Treesearch

    Dacia M. Meneguzzo; Greg C. Liknes; Mark D. Nelson

    2013-01-01

    Discrete trees and small groups of trees in nonforest settings are considered an essential resource around the world and are collectively referred to as trees outside forests (ToF). ToF provide important functions across the landscape, such as protecting soil and water resources, providing wildlife habitat, and improving farmstead energy efficiency and aesthetics....

  1. Differences in forest area classification based on tree tally from variable- and fixed-radius plots

    Treesearch

    David Azuma; Vicente J. Monleon

    2011-01-01

    In forest inventory, it is not enough to formulate a definition; it is also necessary to define the "measurement procedure." In the classification of forestland by dominant cover type, the measurement design (the plot) can affect the outcome of the classification. We present results of a simulation study comparing classification of the dominant cover type...

  2. Fire frequency in the Interior Columbia River Basin: Building regional models from fire history data

    USGS Publications Warehouse

    McKenzie, D.; Peterson, D.L.; Agee, James K.

    2000-01-01

    Fire frequency affects vegetation composition and successional pathways; thus it is essential to understand fire regimes in order to manage natural resources at broad spatial scales. Fire history data are lacking for many regions for which fire management decisions are being made, so models are needed to estimate past fire frequency where local data are not yet available. We developed multiple regression models and tree-based (classification and regression tree, or CART) models to predict fire return intervals across the interior Columbia River basin at 1-km resolution, using georeferenced fire history, potential vegetation, cover type, and precipitation databases. The models combined semiqualitative methods and rigorous statistics. The fire history data are of uneven quality; some estimates are based on only one tree, and many are not cross-dated. Therefore, we weighted the models based on data quality and performed a sensitivity analysis of the effects on the models of estimation errors that are due to lack of cross-dating. The regression models predict fire return intervals from 1 to 375 yr for forested areas, whereas the tree-based models predict a range of 8 to 150 yr. Both types of models predict latitudinal and elevational gradients of increasing fire return intervals. Examination of regional-scale output suggests that, although the tree-based models explain more of the variation in the original data, the regression models are less likely to produce extrapolation errors. Thus, the models serve complementary purposes in elucidating the relationships among fire frequency, the predictor variables, and spatial scale. The models can provide local managers with quantitative information and provide data to initialize coarse-scale fire-effects models, although predictions for individual sites should be treated with caution because of the varying quality and uneven spatial coverage of the fire history database. The models also demonstrate the integration of qualitative and quantitative methods when requisite data for fully quantitative models are unavailable. They can be tested by comparing new, independent fire history reconstructions against their predictions and can be continually updated, as better fire history data become available.

  3. Multi-test decision tree and its application to microarray data classification.

    PubMed

    Czajkowski, Marcin; Grześ, Marek; Kretowski, Marek

    2014-05-01

    The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 14 datasets by an average 6%. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts. Copyright © 2014 Elsevier B.V. All rights reserved.

  4. Ensemble methods with simple features for document zone classification

    NASA Astrophysics Data System (ADS)

    Obafemi-Ajayi, Tayo; Agam, Gady; Xie, Bingqing

    2012-01-01

    Document layout analysis is of fundamental importance for document image understanding and information retrieval. It requires the identification of blocks extracted from a document image via features extraction and block classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting, bagging, and combined model trees) in addition to other known learning algorithms. Experimental results are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in conjunction with the Ocropus feature set, we further improve the performance of the block classification system to obtain a classification accuracy of 99.21%.

  5. Sampling strategies for improving tree accuracy and phylogenetic analyses: a case study in ciliate protists, with notes on the genus Paramecium.

    PubMed

    Yi, Zhenzhen; Strüder-Kypke, Michaela; Hu, Xiaozhong; Lin, Xiaofeng; Song, Weibo

    2014-02-01

    In order to assess how dataset-selection for multi-gene analyses affects the accuracy of inferred phylogenetic trees in ciliates, we chose five genes and the genus Paramecium, one of the most widely used model protist genera, and compared tree topologies of the single- and multi-gene analyses. Our empirical study shows that: (1) Using multiple genes improves phylogenetic accuracy, even when their one-gene topologies are in conflict with each other. (2) The impact of missing data on phylogenetic accuracy is ambiguous: resolution power and topological similarity, but not number of represented taxa, are the most important criteria of a dataset for inclusion in concatenated analyses. (3) As an example, we tested the three classification models of the genus Paramecium with a multi-gene based approach, and only the monophyly of the subgenus Paramecium is supported. Copyright © 2013 Elsevier Inc. All rights reserved.

  6. Circum-Arctic petroleum systems identified using decision-tree chemometrics

    USGS Publications Warehouse

    Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.

    2007-01-01

    Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.

  7. Analysis of occlusal variables, dental attrition, and age for distinguishing healthy controls from female patients with intracapsular temporomandibular disorders.

    PubMed

    Seligman, D A; Pullinger, A G

    2000-01-01

    Confusion about the relationship of occlusion to temporomandibular disorders (TMD) persists. This study attempted to identify occlusal and attrition factors plus age that would characterize asymptomatic normal female subjects. A total of 124 female patients with intracapsular TMD were compared with 47 asymptomatic female controls for associations to 9 occlusal factors, 3 attrition severity measures, and age using classification tree, multiple stepwise logistic regression, and univariate analyses. Models were tested for accuracy (sensitivity and specificity) and total contribution to the variance. The classification tree model had 4 terminal nodes that used only anterior attrition and age. "Normals" were mainly characterized by low attrition levels, whereas patients had higher attrition and tended to be younger. The tree model was only moderately useful (sensitivity 63%, specificity 94%) in predicting normals. The logistic regression model incorporated unilateral posterior crossbite and mediotrusive attrition severity in addition to the 2 factors in the tree, but was slightly less accurate than the tree (sensitivity 51%, specificity 90%). When only occlusal factors were considered in the analysis, normals were additionally characterized by a lack of anterior open bite, smaller overjet, and smaller RCP-ICP slides. The log likelihood accounted for was similar for both the tree (pseudo R(2) = 29.38%; mean deviance = 0.95) and the multiple logistic regression (Cox Snell R(2) = 30.3%, mean deviance = 0.84) models. The occlusal and attrition factors studied were only moderately useful in differentiating normals from TMD patients.

  8. The decision tree approach to classification

    NASA Technical Reports Server (NTRS)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  9. Upscaling of spectroradiometer data for stress detection in orchards with remote sensing

    NASA Astrophysics Data System (ADS)

    Kempeneers, Pieter; De Backer, Steve; Delalieux, Stephanie; Sterckx, Sindy; Debruyn, Walter; Coppin, Pol; Scheunders, Paul

    2004-10-01

    This paper studies the detection of vegetation stress in orchards via remote sensing. During previous research, it was shown that stress can be detected reliably on hyperspectral reflectances of the fresh leaves, using a generic wavelet based hyperspectral classification. In this work, we demonstrate the capability to detect stress from airborne/spaceborne hyperspectral sensors by upscaling the leaf reflectances to top of atmosphere (TOA) radiances. Several data sets are generated, measuring the foliar reflectance with a portable field spectroradiometer, covering different time periods, fruit variants and stress types. We concentrated on the Jonagold and Golden Delicious apple trees, induced with mildew and nitrogen deficiency. First, a directional homogeneous canopy reflectance model (ACRM) is applied on these data sets for simulating top of canopy (TOC) spectra. Then, the TOC level is further upscaled to TOA, using the atmospheric radiative transfer model MODTRAN4. To simulate hyperspectral imagery acquired with real airborne/spaceborne sensors, the spectrum is further filtered and subsampled to the available resolution. Using these simulated upscaled TOC and TOA spectra in classification, we will demonstrate that there is still a differentiation possible between stresses and non-stressed trees. Furthermore, results show it is possible to train a classifier with simulated TOA data, to make a classification of real hyperspectral imagery over the orchard.

  10. Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier.

    PubMed

    Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram

    2015-08-01

    In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.

  11. Wall-to-wall Landsat TM classifications for Georgia in support of SAFIS using FIA plots for training and verification

    Treesearch

    William H. Cooke; Andrew J. Hartsell

    2000-01-01

    Wall-to-wall Landsat TM classification efforts in Georgia require field validation. Validation uslng FIA data was testing by developing a new crown modeling procedure. A methodology is under development at the Southern Research Station to model crown diameter using Forest Health monitoring data. These models are used to simulate the proportion of tree crowns that...

  12. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  13. Assessing the Effectiveness of Statistical Classification Techniques in Predicting Future Employment of Participants in the Temporary Assistance for Needy Families Program

    ERIC Educational Resources Information Center

    Montoya, Isaac D.

    2008-01-01

    Three classification techniques (Chi-square Automatic Interaction Detection [CHAID], Classification and Regression Tree [CART], and discriminant analysis) were tested to determine their accuracy in predicting Temporary Assistance for Needy Families program recipients' future employment. Technique evaluation was based on proportion of correctly…

  14. Computer-aided diagnosis of focal liver lesions by use of physicians' subjective classification of echogenic patterns in baseline and contrast-enhanced ultrasonography.

    PubMed

    Sugimoto, Katsutoshi; Shiraishi, Junji; Moriyasu, Fuminori; Doi, Kunio

    2009-04-01

    To develop a computer-aided diagnostic (CAD) scheme for classifying focal liver lesions (FLLs) by use of physicians' subjective classification of echogenic patterns of FLLs on baseline and contrast-enhanced ultrasonography (US). A total of 137 hepatic lesions in 137 patients were evaluated with B-mode and NC100100 (Sonazoid)-enhanced pulse-inversion US; lesions included 74 hepatocellular carcinomas (HCCs) (23: well-differentiated, 36: moderately differentiated, 15: poorly differentiated HCCs), 33 liver metastases, and 30 liver hemangiomas. Three physicians evaluated single images at B-mode and arterial phases with a cine mode. Physicians were asked to classify each lesion into one of eight B-mode and one of eight enhancement patterns, but did not make a diagnosis. To classify five types of FLLs, we employed a decision tree model with four decision nodes and four artificial neural networks (ANNs). The results of the physicians' pattern classifications were used successively for four different ANNs in making decisions at each of the decision nodes in the decision tree model. The classification accuracies for the 137 FLLs were 84.8% for metastasis, 93.3% for hemangioma, and 98.6% for all HCCs. In addition, the classification accuracies for histological differentiation types of HCCs were 65.2% for well-differentiated HCC, 41.7% for moderately differentiated HCC, and 80.0% for poorly differentiated HCC. This CAD scheme has the potential to improve the diagnostic accuracy of liver lesions. However, the accuracy in the histologic differential diagnosis of HCC based on baseline and contrast-enhanced US is still limited.

  15. Predication of different stages of Alzheimer's disease using neighborhood component analysis and ensemble decision tree.

    PubMed

    Jin, Mingwu; Deng, Weishu

    2018-05-15

    There is a spectrum of the progression from healthy control (HC) to mild cognitive impairment (MCI) without conversion to Alzheimer's disease (AD), to MCI with conversion to AD (cMCI), and to AD. This study aims to predict the different disease stages using brain structural information provided by magnetic resonance imaging (MRI) data. The neighborhood component analysis (NCA) is applied to select most powerful features for prediction. The ensemble decision tree classifier is built to predict which group the subject belongs to. The best features and model parameters are determined by cross validation of the training data. Our results show that 16 out of a total of 429 features were selected by NCA using 240 training subjects, including MMSE score and structural measures in memory-related regions. The boosting tree model with NCA features can achieve prediction accuracy of 56.25% on 160 test subjects. Principal component analysis (PCA) and sequential feature selection (SFS) are used for feature selection, while support vector machine (SVM) is used for classification. The boosting tree model with NCA features outperforms all other combinations of feature selection and classification methods. The results suggest that NCA be a better feature selection strategy than PCA and SFS for the data used in this study. Ensemble tree classifier with boosting is more powerful than SVM to predict the subject group. However, more advanced feature selection and classification methods or additional measures besides structural MRI may be needed to improve the prediction performance. Copyright © 2018 Elsevier B.V. All rights reserved.

  16. A comparison of supervised classification methods for the prediction of substrate type using multibeam acoustic and legacy grain-size data.

    PubMed

    Stephens, David; Diesing, Markus

    2014-01-01

    Detailed seabed substrate maps are increasingly in demand for effective planning and management of marine ecosystems and resources. It has become common to use remotely sensed multibeam echosounder data in the form of bathymetry and acoustic backscatter in conjunction with ground-truth sampling data to inform the mapping of seabed substrates. Whilst, until recently, such data sets have typically been classified by expert interpretation, it is now obvious that more objective, faster and repeatable methods of seabed classification are required. This study compares the performances of a range of supervised classification techniques for predicting substrate type from multibeam echosounder data. The study area is located in the North Sea, off the north-east coast of England. A total of 258 ground-truth samples were classified into four substrate classes. Multibeam bathymetry and backscatter data, and a range of secondary features derived from these datasets were used in this study. Six supervised classification techniques were tested: Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forest and Naive Bayes. Each classifier was trained multiple times using different input features, including i) the two primary features of bathymetry and backscatter, ii) a subset of the features chosen by a feature selection process and iii) all of the input features. The predictive performances of the models were validated using a separate test set of ground-truth samples. The statistical significance of model performances relative to a simple baseline model (Nearest Neighbour predictions on bathymetry and backscatter) were tested to assess the benefits of using more sophisticated approaches. The best performing models were tree based methods and Naive Bayes which achieved accuracies of around 0.8 and kappa coefficients of up to 0.5 on the test set. The models that used all input features didn't generally perform well, highlighting the need for some means of feature selection.

  17. A renewed perspective on agroforestry concepts and classification.

    PubMed

    Torquebiau, E F

    2000-11-01

    Agroforestry, the association of trees with farming practices, is progressively becoming a recognized land-use discipline. However, it is still perceived by some scientists, technicians and farmers as a sort of environmental fashion which does not deserve credit. The peculiar history of agroforestry and the complex relationships between agriculture and forestry explain some misunderstandings about the concepts and classification of agroforestry and reveal that, contrarily to common perception, agroforestry is closer to agriculture than to forestry. Based on field experience from several countries, a structural classification of agroforestry into six simple categories is proposed: crops under tree cover, agroforests, agroforestry in a linear arrangement, animal agroforestry, sequential agroforestry and minor agroforestry techniques. It is argued that this pragmatic classification encompasses all major agroforestry associations and allows simultaneous agroforestry to be clearly differentiated from sequential agroforestry, two categories showing contrasting ecological tree-crop interactions. It can also contribute to a betterment of the image of agroforestry and lead to a simplification of its definition.

  18. Characterization of Escherichia coli isolates from different fecal sources by means of classification tree analysis of fatty acid methyl ester (FAME) profiles.

    PubMed

    Seurinck, Sylvie; Deschepper, Ellen; Deboch, Bishaw; Verstraete, Willy; Siciliano, Steven

    2006-03-01

    Microbial source tracking (MST) methods need to be rapid, inexpensive and accurate. Unfortunately, many MST methods provide a wealth of information that is difficult to interpret by the regulators who use this information to make decisions. This paper describes the use of classification tree analysis to interpret the results of a MST method based on fatty acid methyl ester (FAME) profiles of Escherichia coli isolates, and to present results in a format readily interpretable by water quality managers. Raw sewage E. coli isolates and animal E. coli isolates from cow, dog, gull, and horse were isolated and their FAME profiles collected. Correct classification rates determined with leaveone-out cross-validation resulted in an overall low correct classification rate of 61%. A higher overall correct classification rate of 85% was obtained when the animal isolates were pooled together and compared to the raw sewage isolates. Bootstrap aggregation or adaptive resampling and combining of the FAME profile data increased correct classification rates substantially. Other MST methods may be better suited to differentiate between different fecal sources but classification tree analysis has enabled us to distinguish raw sewage from animal E. coli isolates, which previously had not been possible with other multivariate methods such as principal component analysis and cluster analysis.

  19. Research on Classification of Chinese Text Data Based on SVM

    NASA Astrophysics Data System (ADS)

    Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao

    2017-09-01

    Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.

  20. Phylogenetic classification of bony fishes.

    PubMed

    Betancur-R, Ricardo; Wiley, Edward O; Arratia, Gloria; Acero, Arturo; Bailly, Nicolas; Miya, Masaki; Lecointre, Guillaume; Ortí, Guillermo

    2017-07-06

    Fish classifications, as those of most other taxonomic groups, are being transformed drastically as new molecular phylogenies provide support for natural groups that were unanticipated by previous studies. A brief review of the main criteria used by ichthyologists to define their classifications during the last 50 years, however, reveals slow progress towards using an explicit phylogenetic framework. Instead, the trend has been to rely, in varying degrees, on deep-rooted anatomical concepts and authority, often mixing taxa with explicit phylogenetic support with arbitrary groupings. Two leading sources in ichthyology frequently used for fish classifications (JS Nelson's volumes of Fishes of the World and W. Eschmeyer's Catalog of Fishes) fail to adopt a global phylogenetic framework despite much recent progress made towards the resolution of the fish Tree of Life. The first explicit phylogenetic classification of bony fishes was published in 2013, based on a comprehensive molecular phylogeny ( www.deepfin.org ). We here update the first version of that classification by incorporating the most recent phylogenetic results. The updated classification presented here is based on phylogenies inferred using molecular and genomic data for nearly 2000 fishes. A total of 72 orders (and 79 suborders) are recognized in this version, compared with 66 orders in version 1. The phylogeny resolves placement of 410 families, or ~80% of the total of 514 families of bony fishes currently recognized. The ordinal status of 30 percomorph families included in this study, however, remains uncertain (incertae sedis in the series Carangaria, Ovalentaria, or Eupercaria). Comments to support taxonomic decisions and comparisons with conflicting taxonomic groups proposed by others are presented. We also highlight cases were morphological support exist for the groups being classified. This version of the phylogenetic classification of bony fishes is substantially improved, providing resolution for more taxa than previous versions, based on more densely sampled phylogenetic trees. The classification presented in this study represents, unlike any other, the most up-to-date hypothesis of the Tree of Life of fishes.

  1. Semantic Labelling of Ultra Dense Mls Point Clouds in Urban Road Corridors Based on Fusing Crf with Shape Priors

    NASA Astrophysics Data System (ADS)

    Yao, W.; Polewski, P.; Krzystek, P.

    2017-09-01

    In this paper, a labelling method for the semantic analysis of ultra-high point density MLS data (up to 4000 points/m2) in urban road corridors is developed based on combining a conditional random field (CRF) for the context-based classification of 3D point clouds with shape priors. The CRF uses a Random Forest (RF) for generating the unary potentials of nodes and a variant of the contrastsensitive Potts model for the pair-wise potentials of node edges. The foundations of the classification are various geometric features derived by means of co-variance matrices and local accumulation map of spatial coordinates based on local neighbourhoods. Meanwhile, in order to cope with the ultra-high point density, a plane-based region growing method combined with a rule-based classifier is applied to first fix semantic labels for man-made objects. Once such kind of points that usually account for majority of entire data amount are pre-labeled; the CRF classifier can be solved by optimizing the discriminative probability for nodes within a subgraph structure excluded from pre-labeled nodes. The process can be viewed as an evidence fusion step inferring a degree of belief for point labelling from different sources. The MLS data used for this study were acquired by vehicle-borne Z+F phase-based laser scanner measurement, which permits the generation of a point cloud with an ultra-high sampling rate and accuracy. The test sites are parts of Munich City which is assumed to consist of seven object classes including impervious surfaces, tree, building roof/facade, low vegetation, vehicle and pole. The competitive classification performance can be explained by the diverse factors: e.g. the above ground height highlights the vertical dimension of houses, trees even cars, but also attributed to decision-level fusion of graph-based contextual classification approach with shape priors. The use of context-based classification methods mainly contributed to smoothing of labelling by removing outliers and the improvement in underrepresented object classes. In addition, the routine operation of a context-based classification for such high density MLS data becomes much more efficient being comparable to non-contextual classification schemes.

  2. A non-parametric, supervised classification of vegetation types on the Kaibab National Forest using decision trees

    Treesearch

    Suzanne M. Joy; R. M. Reich; Richard T. Reynolds

    2003-01-01

    Traditional land classification techniques for large areas that use Landsat Thematic Mapper (TM) imagery are typically limited to the fixed spatial resolution of the sensors (30m). However, the study of some ecological processes requires land cover classifications at finer spatial resolutions. We model forest vegetation types on the Kaibab National Forest (KNF) in...

  3. Landslide susceptibility mapping using decision-tree based CHi-squared automatic interaction detection (CHAID) and Logistic regression (LR) integration

    NASA Astrophysics Data System (ADS)

    Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin

    2014-06-01

    This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.

  4. Comparison of rule induction, decision trees and formal concept analysis approaches for classification

    NASA Astrophysics Data System (ADS)

    Kotelnikov, E. V.; Milov, V. R.

    2018-05-01

    Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.

  5. Impact of genomics on the understanding of microbial evolution and classification: the importance of Darwin's views on classification.

    PubMed

    Gupta, Radhey S

    2016-07-01

    Analyses of genome sequences, by some approaches, suggest that the widespread occurrence of horizontal gene transfers (HGTs) in prokaryotes disguises their evolutionary relationships and have led to questioning of the Darwinian model of evolution for prokaryotes. These inferences are critically examined in the light of comparative genome analysis, characteristic synapomorphies, phylogenetic trees and Darwin's views on examining evolutionary relationships. Genome sequences are enabling discovery of numerous molecular markers (synapomorphies) such as conserved signature indels (CSIs) and conserved signature proteins (CSPs), which are distinctive characteristics of different prokaryotic taxa. Based on these molecular markers, exhibiting high degree of specificity and predictive ability, numerous prokaryotic taxa of different ranks, currently identified based on the 16S rRNA gene trees, can now be reliably demarcated in molecular terms. Within all studied groups, multiple CSIs and CSPs have been identified for successive nested clades providing reliable information regarding their hierarchical relationships and these inferences are not affected by HGTs. These results strongly support Darwin's views on evolution and classification and supplement the current phylogenetic framework based on 16S rRNA in important respects. The identified molecular markers provide important means for developing novel diagnostics, therapeutics and for functional studies providing important insights regarding prokaryotic taxa. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  6. Guide to the measurement of tree characteristics important to the quality classification for young hardwood trees

    Treesearch

    David L. Sonderman

    1979-01-01

    A procedure is shown for measuring external tree characteristics that are important in determining the current and future quality of young hardwood trees. This guide supplements a precious study which describes the quality classification system for young hardwood trees

  7. Identification of extremely premature infants at high risk of rehospitalization.

    PubMed

    Ambalavanan, Namasivayam; Carlo, Waldemar A; McDonald, Scott A; Yao, Qing; Das, Abhik; Higgins, Rosemary D

    2011-11-01

    Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002-2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%-42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge.

  8. Identification of Extremely Premature Infants at High Risk of Rehospitalization

    PubMed Central

    Carlo, Waldemar A.; McDonald, Scott A.; Yao, Qing; Das, Abhik; Higgins, Rosemary D.

    2011-01-01

    OBJECTIVE: Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. METHODS: Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002–2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. RESULTS: A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%–42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. CONCLUSIONS: The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge. PMID:22007016

  9. Harmonic regression of Landsat time series for modeling attributes from national forest inventory data

    NASA Astrophysics Data System (ADS)

    Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.

    2018-03-01

    Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.

  10. Fast Image Texture Classification Using Decision Trees

    NASA Technical Reports Server (NTRS)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  11. Data mining of tree-based models to analyze freeway accident frequency.

    PubMed

    Chang, Li-Yen; Chen, Wen-Chieh

    2005-01-01

    Statistical models, such as Poisson or negative binomial regression models, have been employed to analyze vehicle accident frequency for many years. However, these models have their own model assumptions and pre-defined underlying relationship between dependent and independent variables. If these assumptions are violated, the model could lead to erroneous estimation of accident likelihood. Classification and Regression Tree (CART), one of the most widely applied data mining techniques, has been commonly employed in business administration, industry, and engineering. CART does not require any pre-defined underlying relationship between target (dependent) variable and predictors (independent variables) and has been shown to be a powerful tool, particularly for dealing with prediction and classification problems. This study collected the 2001-2002 accident data of National Freeway 1 in Taiwan. A CART model and a negative binomial regression model were developed to establish the empirical relationship between traffic accidents and highway geometric variables, traffic characteristics, and environmental factors. The CART findings indicated that the average daily traffic volume and precipitation variables were the key determinants for freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies.

  12. Genome-Based Taxonomic Classification of Bacteroidetes

    DOE PAGES

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; ...

    2016-12-20

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less

  13. Genome-Based Taxonomic Classification of Bacteroidetes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less

  14. Genome-Based Taxonomic Classification of Bacteroidetes

    PubMed Central

    Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia N.; Woyke, Tanja; Kyrpides, Nikos C.; Klenk, Hans-Peter; Göker, Markus

    2016-01-01

    The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved. PMID:28066339

  15. A global reference database from very high resolution commercial satellite data and methodology for application to Landsat derived 30 m continuous field tree cover data

    USGS Publications Warehouse

    Pengra, Bruce; Long, Jordan; Dahal, Devendra; Stehman, Stephen V.; Loveland, Thomas R.

    2015-01-01

    The methodology for selection, creation, and application of a global remote sensing validation dataset using high resolution commercial satellite data is presented. High resolution data are obtained for a stratified random sample of 500 primary sampling units (5 km  ×  5 km sample blocks), where the stratification based on Köppen climate classes is used to distribute the sample globally among biomes. The high resolution data are classified to categorical land cover maps using an analyst mediated classification workflow. Our initial application of these data is to evaluate a global 30 m Landsat-derived, continuous field tree cover product. For this application, the categorical reference classification produced at 2 m resolution is converted to percent tree cover per 30 m pixel (secondary sampling unit)for comparison to Landsat-derived estimates of tree cover. We provide example results (based on a subsample of 25 sample blocks in South America) illustrating basic analyses of agreement that can be produced from these reference data. Commercial high resolution data availability and data quality are shown to provide a viable means of validating continuous field tree cover. When completed, the reference classifications for the full sample of 500 blocks will be released for public use.

  16. Application of classification-tree methods to identify nitrate sources in ground water

    USGS Publications Warehouse

    Spruill, T.B.; Showers, W.J.; Howe, S.S.

    2002-01-01

    A study was conducted to determine if nitrate sources in ground water (fertilizer on crops, fertilizer on golf courses, irrigation spray from hog (Sus scrofa) wastes, and leachate from poultry litter and septic systems) could be classified with 80% or greater success. Two statistical classification-tree models were devised from 48 water samples containing nitrate from five source categories. Model I was constructed by evaluating 32 variables and selecting four primary predictor variables (??15N, nitrate to ammonia ratio, sodium to potassium ratio, and zinc) to identify nitrate sources. A ??15N value of nitrate plus potassium 18.2 indicated inorganic or soil organic N. A nitrate to ammonia ratio 575 indicated nitrate from golf courses. A sodium to potassium ratio 3.2 indicated spray or poultry wastes. A value for zinc 2.8 indicated poultry wastes. Model 2 was devised by using all variables except ??15N. This model also included four variables (sodium plus potassium, nitrate to ammonia ratio, calcium to magnesium ratio, and sodium to potassium ratio) to distinguish categories. Both models were able to distinguish all five source categories with better than 80% overall success and with 71 to 100% success in individual categories using the learning samples. Seventeen water samples that were not used in model development were tested using Model 2 for three categories, and all were correctly classified. Classification-tree models show great potential in identifying sources of contamination and variables important in the source-identification process.

  17. Classification tree and minimum-volume ellipsoid analyses of the distribution of ponderosa pine in the western USA

    USGS Publications Warehouse

    Norris, Jodi R.; Jackson, Stephen T.; Betancourt, Julio L.

    2006-01-01

    Aim? Ponderosa pine (Pinus ponderosa Douglas ex Lawson & C. Lawson) is an economically and ecologically important conifer that has a wide geographic range in the western USA, but is mostly absent from the geographic centre of its distribution - the Great Basin and adjoining mountain ranges. Much of its modern range was achieved by migration of geographically distinct Sierra Nevada (P. ponderosa var. ponderosa) and Rocky Mountain (P. ponderosa var. scopulorum) varieties in the last 10,000 years. Previous research has confirmed genetic differences between the two varieties, and measurable genetic exchange occurs where their ranges now overlap in western Montana. A variety of approaches in bioclimatic modelling is required to explore the ecological differences between these varieties and their implications for historical biogeography and impending changes in western landscapes. Location? Western USA. Methods? We used a classification tree analysis and a minimum-volume ellipsoid as models to explain the broad patterns of distribution of ponderosa pine in modern environments using climatic and edaphic variables. Most biogeographical modelling assumes that the target group represents a single, ecologically uniform taxonomic population. Classification tree analysis does not require this assumption because it allows the creation of pathways that predict multiple positive and negative outcomes. Thus, classification tree analysis can be used to test the ecological uniformity of the species. In addition, a multidimensional ellipsoid was constructed to describe the niche of each variety of ponderosa pine, and distances from the niche were calculated and mapped on a 4-km grid for each ecological variable. Results? The resulting classification tree identified three dominant pathways predicting ponderosa pine presence. Two of these three pathways correspond roughly to the distribution of var. ponderosa, and the third pathway generally corresponds to the distribution of var. scopulorum. The classification tree and minimum-volume ellipsoid model show that both varieties have very similar temperature limitations, although var. ponderosa is more limited by the temperature extremes of the continental interior. The precipitation limitations of the two varieties are seasonally different, with var. ponderosa requiring significant winter moisture and var. scopulorum requiring significant summer moisture. Great Basin mountain ranges are too cold at higher elevations to support either variety of ponderosa pine, and at lower elevations are too dry in summer for var. scopulorum and too dry in winter for var. ponderosa. Main conclusions? The classification tree analysis indicates that var. ponderosa is ecologically as well as genetically distinct from var. scopulorum. Ecological differences may maintain genetic separation in spite of a limited zone of introgression between the two varieties in western Montana. Two hypotheses about past and future movements of ponderosa pine emerge from our analyses. The first hypothesis is that, during the last glacial period, colder and/or drier summers truncated most of the range of var. scopulorum in the central Rockies, but had less dramatic effects on the more maritime and winter-wet distribution of var. ponderosa. The second hypothesis is that, all other factors held constant, increasing summer temperatures in the future should produce changes in the distribution of var. scopulorum that are likely to involve range expansions in the central Rockies with the warming of mountain ranges currently too cold but sufficiently wet in summer for var. scopulorum. Finally, our results underscore the growing need to focus on genotypes in biogeographical modelling and ecological forecasting.

  18. BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data.

    PubMed

    Guo, Yang; Liu, Shuhui; Li, Zhanhuai; Shang, Xuequn

    2018-04-11

    The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification. The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.

  19. Classification of building infrastructure and automatic building footprint delineation using airborne laser swath mapping data

    NASA Astrophysics Data System (ADS)

    Caceres, Jhon

    Three-dimensional (3D) models of urban infrastructure comprise critical data for planners working on problems in wireless communications, environmental monitoring, civil engineering, and urban planning, among other tasks. Photogrammetric methods have been the most common approach to date to extract building models. However, Airborne Laser Swath Mapping (ALSM) observations offer a competitive alternative because they overcome some of the ambiguities that arise when trying to extract 3D information from 2D images. Regardless of the source data, the building extraction process requires segmentation and classification of the data and building identification. In this work, approaches for classifying ALSM data, separating building and tree points, and delineating ALSM footprints from the classified data are described. Digital aerial photographs are used in some cases to verify results, but the objective of this work is to develop methods that can work on ALSM data alone. A robust approach for separating tree and building points in ALSM data is presented. The method is based on supervised learning of the classes (tree vs. building) in a high dimensional feature space that yields good class separability. Features used for classification are based on the generation of local mappings, from three-dimensional space to two-dimensional space, known as "spin images" for each ALSM point to be classified. The method discriminates ALSM returns in compact spaces and even where the classes are very close together or overlapping spatially. A modified algorithm of the Hough Transform is used to orient the spin images, and the spin image parameters are specified such that the mutual information between the spin image pixel values and class labels is maximized. This new approach to ALSM classification allows us to fully exploit the 3D point information in the ALSM data while still achieving good class separability, which has been a difficult trade-off in the past. Supported by the spin image analysis for obtaining an initial classification, an automatic approach for delineating accurate building footprints is presented. The physical fact that laser pulses that happen to strike building edges can produce very different 1st and last return elevations has been long recognized. However, in older generation ALSM systems (<50 kHz pulse rates) such points were too few and far between to delineate building footprints precisely. Furthermore, without the robust separation of nearby trees and vegetation from the buildings, simply extracting ALSM shots where the elevation of the first return was much higher than the elevation of the last return, was not a reliable means of identifying building footprints. However, with the advent of ALSM systems with pulse rates in excess of 100 kHz, and by using spin-imaged based segmentation, it is now possible to extract building edges from the point cloud. A refined classification resulting from incorporating "on-edge" information is developed for obtaining quadrangular footprints. The footprint fitting process involves line generalization, least squares-based clustering and dominant points finding for segmenting individual building edges. In addition, an algorithm for fitting complex footprints using the segmented edges and data inside footprints is also proposed.

  20. Deriving Continuous Fields of Tree Cover at 1-m over the Continental United States From the National Agriculture Imagery Program (NAIP) Imagery to Reduce Uncertainties in Forest Carbon Stock Estimation

    NASA Astrophysics Data System (ADS)

    Ganguly, S.; Basu, S.; Mukhopadhyay, S.; Michaelis, A.; Milesi, C.; Votava, P.; Nemani, R. R.

    2013-12-01

    An unresolved issue with coarse-to-medium resolution satellite-based forest carbon mapping over regional to continental scales is the high level of uncertainty in above ground biomass (AGB) estimates caused by the absence of forest cover information at a high enough spatial resolution (current spatial resolution is limited to 30-m). To put confidence in existing satellite-derived AGB density estimates, it is imperative to create continuous fields of tree cover at a sufficiently high resolution (e.g. 1-m) such that large uncertainties in forested area are reduced. The proposed work will provide means to reduce uncertainty in present satellite-derived AGB maps and Forest Inventory and Analysis (FIA) based regional estimates. Our primary objective will be to create Very High Resolution (VHR) estimates of tree cover at a spatial resolution of 1-m for the Continental United States using all available National Agriculture Imaging Program (NAIP) color-infrared imagery from 2010 till 2012. We will leverage the existing capabilities of the NASA Earth Exchange (NEX) high performance computing and storage facilities. The proposed 1-m tree cover map can be further aggregated to provide percent tree cover at any medium-to-coarse resolution spatial grid, which will aid in reducing uncertainties in AGB density estimation at the respective grid and overcome current limitations imposed by medium-to-coarse resolution land cover maps. We have implemented a scalable and computationally-efficient parallelized framework for tree-cover delineation - the core components of the algorithm [that] include a feature extraction process, a Statistical Region Merging image segmentation algorithm and a classification algorithm based on Deep Belief Network and a Feedforward Backpropagation Neural Network algorithm. An initial pilot exercise has been performed over the state of California (~11,000 scenes) to create a wall-to-wall 1-m tree cover map and the classification accuracy has been assessed. Results show an improvement in accuracy of tree-cover delineation as compared to existing forest cover maps from NLCD, especially over fragmented, heterogeneous and urban landscapes. Estimates of VHR tree cover will complement and enhance the accuracy of present remote-sensing based AGB modeling approaches and forest inventory based estimates at both national and local scales. A requisite step will be to characterize the inherent uncertainties in tree cover estimates and propagate them to estimate AGB.

  1. Activity classification using realistic data from wearable sensors.

    PubMed

    Pärkkä, Juha; Ermes, Miikka; Korpipää, Panu; Mäntyjärvi, Jani; Peltola, Johannes; Korhonen, Ilkka

    2006-01-01

    Automatic classification of everyday activities can be used for promotion of health-enhancing physical activities and a healthier lifestyle. In this paper, methods used for classification of everyday activities like walking, running, and cycling are described. The aim of the study was to find out how to recognize activities, which sensors are useful and what kind of signal processing and classification is required. A large and realistic data library of sensor data was collected. Sixteen test persons took part in the data collection, resulting in approximately 31 h of annotated, 35-channel data recorded in an everyday environment. The test persons carried a set of wearable sensors while performing several activities during the 2-h measurement session. Classification results of three classifiers are shown: custom decision tree, automatically generated decision tree, and artificial neural network. The classification accuracies using leave-one-subject-out cross validation range from 58 to 97% for custom decision tree classifier, from 56 to 97% for automatically generated decision tree, and from 22 to 96% for artificial neural network. Total classification accuracy is 82 % for custom decision tree classifier, 86% for automatically generated decision tree, and 82% for artificial neural network.

  2. Sample classification for improved performance of PLS models applied to the quality control of deep-frying oils of different botanic origins analyzed using ATR-FTIR spectroscopy.

    PubMed

    Kuligowski, Julia; Carrión, David; Quintás, Guillermo; Garrigues, Salvador; de la Guardia, Miguel

    2011-01-01

    The selection of an appropriate calibration set is a critical step in multivariate method development. In this work, the effect of using different calibration sets, based on a previous classification of unknown samples, on the partial least squares (PLS) regression model performance has been discussed. As an example, attenuated total reflection (ATR) mid-infrared spectra of deep-fried vegetable oil samples from three botanical origins (olive, sunflower, and corn oil), with increasing polymerized triacylglyceride (PTG) content induced by a deep-frying process were employed. The use of a one-class-classifier partial least squares-discriminant analysis (PLS-DA) and a rooted binary directed acyclic graph tree provided accurate oil classification. Oil samples fried without foodstuff could be classified correctly, independent of their PTG content. However, class separation of oil samples fried with foodstuff, was less evident. The combined use of double-cross model validation with permutation testing was used to validate the obtained PLS-DA classification models, confirming the results. To discuss the usefulness of the selection of an appropriate PLS calibration set, the PTG content was determined by calculating a PLS model based on the previously selected classes. In comparison to a PLS model calculated using a pooled calibration set containing samples from all classes, the root mean square error of prediction could be improved significantly using PLS models based on the selected calibration sets using PLS-DA, ranging between 1.06 and 2.91% (w/w).

  3. Delineation of marsh types from Corpus Christi Bay, Texas, to Perdido Bay, Alabama, in 2010

    USGS Publications Warehouse

    Enwright, Nicholas M.; Hartley, Stephen B.; Couvillion, Brady R.; Michael G. Brasher,; Jenneke M. Visser,; Michael K. Mitchell,; Bart M. Ballard,; Mark W. Parr,; Barry C. Wilson,

    2015-07-23

    This study incorporates about 9,800 ground reference locations collected via helicopter surveys in coastal wetland areas. Decision-tree analyses were used to classify emergent marsh vegetation types by using ground reference data from helicopter vegetation surveys and independent variables such as multitemporal satellite-based multispectral imagery from 2009 to 2011, bare-earth digital elevation models based on airborne light detection and ranging (lidar), alternative contemporary land cover classifications, and other spatially explicit variables. Image objects were created from 2010 National Agriculture Imagery Program color-infrared aerial photography. The final classification is a 10-meter raster dataset that was produced by using a majority filter to classify image objects according to the marsh vegetation type covering the majority of each image object. The classification is dated 2010 because the year is both the midpoint of the classified multitemporal satellite-based imagery (2009–11) and the date of the high-resolution airborne imagery that was used to develop image objects. The seamless classification produced through this work can be used to help develop and refine conservation efforts for priority natural resources.

  4. Multi-Cohort Stand Structural Classification: Ground- and LiDAR-based Approaches for Boreal Mixedwood and Black Spruce Forest Types of Northeastern Ontario

    NASA Astrophysics Data System (ADS)

    Kuttner, Benjamin George

    Natural fire return intervals are relatively long in eastern Canadian boreal forests and often allow for the development of stands with multiple, successive cohorts of trees. Multi-cohort forest management (MCM) provides a strategy to maintain such multi-cohort stands that focuses on three broad phases of increasingly complex, post-fire stand development, termed "cohorts", and recommends different silvicultural approaches be applied to emulate different cohort types. Previous research on structural cohort typing has relied upon primarily subjective classification methods; in this thesis, I develop more comprehensive and objective methods for three common boreal mixedwood and black spruce forest types in northeastern Ontario. Additionally, I examine relationships between cohort types and stand age, productivity, and disturbance history and the utility of airborne LiDAR to retrieve ground-based classifications and to extend structural cohort typing from plot- to stand-levels. In both mixedwood and black spruce forest types, stand age and age-related deadwood features varied systematically with cohort classes in support of an age-based interpretation of increasing cohort complexity. However, correlations of stand age with cohort classes were surprisingly weak. Differences in site productivity had a significant effect on the accrual of increasingly complex multi-cohort stand structure in both forest types, especially in black spruce stands. The effects of past harvesting in predictive models of class membership were only significant when considered in isolation of age. As an age-emulation strategy, the three cohort model appeared to be poorly suited to black spruce forests where the accrual of structural complexity appeared to be more a function of site productivity than age. Airborne LiDAR data appear to be particularly useful in recovering plot-based cohort types and extending them to the stand-level. The main gradients of structural variability detected using LiDAR were similar between boreal mixedwood and black spruce forest types; the best LiDAR-based models of cohort type relied upon combinations of tree size, size heterogeneity, and tree density related variables. The methods described here to measure, classify, and predict cohort-related structural complexity assist in translating the conceptual three cohort model to a more precise, measurement-based management system. In addition, the approaches presented here to measure and classify stand structural complexity promise to significantly enhance the detail of structural information in operational forest inventories in support of a wide array of forest management and conservation applications.

  5. Alzheimer Classification Using a Minimum Spanning Tree of High-Order Functional Network on fMRI Dataset

    PubMed Central

    Guo, Hao; Liu, Lei; Chen, Junjie; Xu, Yong; Jie, Xiang

    2017-01-01

    Functional magnetic resonance imaging (fMRI) is one of the most useful methods to generate functional connectivity networks of the brain. However, conventional network generation methods ignore dynamic changes of functional connectivity between brain regions. Previous studies proposed constructing high-order functional connectivity networks that consider the time-varying characteristics of functional connectivity, and a clustering method was performed to decrease computational cost. However, random selection of the initial clustering centers and the number of clusters negatively affected classification accuracy, and the network lost neurological interpretability. Here we propose a novel method that introduces the minimum spanning tree method to high-order functional connectivity networks. As an unbiased method, the minimum spanning tree simplifies high-order network structure while preserving its core framework. The dynamic characteristics of time series are not lost with this approach, and the neurological interpretation of the network is guaranteed. Simultaneously, we propose a multi-parameter optimization framework that involves extracting discriminative features from the minimum spanning tree high-order functional connectivity networks. Compared with the conventional methods, our resting-state fMRI classification method based on minimum spanning tree high-order functional connectivity networks greatly improved the diagnostic accuracy for Alzheimer's disease. PMID:29249926

  6. Response of Demographic Rates of Tropical Trees to Light Availability: Can Position-Based Competition Indices Replace Information from Canopy Census Data?

    PubMed Central

    Grote, Steffi; Condit, Richard; Hubbell, Stephen; Wirth, Christian; Rüger, Nadja

    2013-01-01

    For trees in tropical forests, competition for light is thought to be a central process that offers opportunities for niche differentiation through light gradient partitioning. In previous studies, a canopy index based on three-dimensional canopy census data has been shown to be a good predictor of species-specific demographic rates across the entire tree community on Barro Colorado Island, Panama, and has allowed quantifying between-species variation in light response. However, almost all other forest census plots lack data on the canopy structure. Hence, this study aims at assessing whether position-based neighborhood competition indices can replace information from canopy census data and produce similar estimates of the interspecific variation of light responses. We used inventory data from the census plot at Barro Colorado Island and calculated neighborhood competition indices with varying relative effects of the size and distance of neighboring trees. Among these indices, we selected the one that was most strongly correlated with the canopy index. We then compared outcomes of hierarchical Bayesian models for species-specific recruitment and growth rates including either the canopy index or the selected neighborhood competition index as predictor. Mean posterior estimates of light response parameters were highly correlated between models (r>0.85) and indicated that most species regenerate and grow better in higher light. Both light estimation approaches consistently found that the interspecific variation of light response was larger for recruitment than for growth rates. However, the classification of species into different groups of light response, e.g. weaker than linear (decelerating) vs. stronger than linear (accelerating) differed between approaches. These results imply that while the classification into light response groups might be biased when using neighborhood competition indices, they may be useful for determining species rankings and between-species variation of light response and therefore enable large comparative studies between different forest census plots. PMID:24324723

  7. Classification tree models for predicting distributions of michigan stream fish from landscape variables

    USGS Publications Warehouse

    Steen, P.J.; Zorn, T.G.; Seelbach, P.W.; Schaeffer, J.S.

    2008-01-01

    Traditionally, fish habitat requirements have been described from local-scale environmental variables. However, recent studies have shown that studying landscape-scale processes improves our understanding of what drives species assemblages and distribution patterns across the landscape. Our goal was to learn more about constraints on the distribution of Michigan stream fish by examining landscape-scale habitat variables. We used classification trees and landscape-scale habitat variables to create and validate presence-absence models and relative abundance models for Michigan stream fishes. We developed 93 presence-absence models that on average were 72% correct in making predictions for an independent data set, and we developed 46 relative abundance models that were 76% correct in making predictions for independent data. The models were used to create statewide predictive distribution and abundance maps that have the potential to be used for a variety of conservation and scientific purposes. ?? Copyright by the American Fisheries Society 2008.

  8. A phylogenomic approach to bacterial subspecies classification: proof of concept in Mycobacterium abscessus.

    PubMed

    Tan, Joon Liang; Khang, Tsung Fei; Ngeow, Yun Fong; Choo, Siew Woh

    2013-12-13

    Mycobacterium abscessus is a rapidly growing mycobacterium that is often associated with human infections. The taxonomy of this species has undergone several revisions and is still being debated. In this study, we sequenced the genomes of 12 M. abscessus strains and used phylogenomic analysis to perform subspecies classification. A data mining approach was used to rank and select informative genes based on the relative entropy metric for the construction of a phylogenetic tree. The resulting tree topology was similar to that generated using the concatenation of five classical housekeeping genes: rpoB, hsp65, secA, recA and sodA. Additional support for the reliability of the subspecies classification came from the analysis of erm41 and ITS gene sequences, single nucleotide polymorphisms (SNPs)-based classification and strain clustering demonstrated by a variable number tandem repeat (VNTR) assay and a multilocus sequence analysis (MLSA). We subsequently found that the concatenation of a minimal set of three median-ranked genes: DNA polymerase III subunit alpha (polC), 4-hydroxy-2-ketovalerate aldolase (Hoa) and cell division protein FtsZ (ftsZ), is sufficient to recover the same tree topology. PCR assays designed specifically for these genes showed that all three genes could be amplified in the reference strain of M. abscessus ATCC 19977T. This study provides proof of concept that whole-genome sequence-based data mining approach can provide confirmatory evidence of the phylogenetic informativeness of existing markers, as well as lead to the discovery of a more economical and informative set of markers that produces similar subspecies classification in M. abscessus. The systematic procedure used in this study to choose the informative minimal set of gene markers can potentially be applied to species or subspecies classification of other bacteria.

  9. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

    PubMed

    Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

    2015-08-01

    Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.

  10. Event Classification and Identification Based on the Characteristic Ellipsoid of Phasor Measurement

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ma, Jian; Diao, Ruisheng; Makarov, Yuri V.

    2011-09-23

    In this paper, a method to classify and identify power system events based on the characteristic ellipsoid of phasor measurement is presented. The decision tree technique is used to perform the event classification and identification. Event types, event locations and clearance times are identified by decision trees based on the indices of the characteristic ellipsoid. A sufficiently large number of transient events were simulated on the New England 10-machine 39-bus system based on different system configurations. Transient simulations taking into account different event types, clearance times and various locations are conducted to simulate phasor measurement. Bus voltage magnitudes and recordedmore » reactive and active power flows are used to build the characteristic ellipsoid. The volume, eccentricity, center and projection of the longest axis in the parameter space coordinates of the characteristic ellipsoids are used to classify and identify events. Results demonstrate that the characteristic ellipsoid and the decision tree are capable to detect the event type, location, and clearance time with very high accuracy.« less

  11. Accuracy and efficiency of area classifications based on tree tally

    Treesearch

    Michael S. Williams; Hans T. Schreuder; Raymond L. Czaplewski

    2001-01-01

    Inventory data are often used to estimate the area of the land base that is classified as a specific condition class. Examples include areas classified as old-growth forest, private ownership, or suitable habitat for a given species. Many inventory programs rely on classification algorithms of varying complexity to determine condition class. These algorithms can be...

  12. Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping.

    PubMed

    Shafizadeh-Moghadam, Hossein; Valavi, Roozbeh; Shahabi, Himan; Chapi, Kamran; Shirzadi, Ataollah

    2018-07-01

    In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. Copyright © 2018 Elsevier Ltd. All rights reserved.

  13. Applying decision tree for identification of a low risk population for type 2 diabetes. Tehran Lipid and Glucose Study.

    PubMed

    Ramezankhani, Azra; Pournik, Omid; Shahrabi, Jamal; Khalili, Davood; Azizi, Fereidoun; Hadaegh, Farzad

    2014-09-01

    The aim of this study was to create a prediction model using data mining approach to identify low risk individuals for incidence of type 2 diabetes, using the Tehran Lipid and Glucose Study (TLGS) database. For a 6647 population without diabetes, aged ≥20 years, followed for 12 years, a prediction model was developed using classification by the decision tree technique. Seven hundred and twenty-nine (11%) diabetes cases occurred during the follow-up. Predictor variables were selected from demographic characteristics, smoking status, medical and drug history and laboratory measures. We developed the predictive models by decision tree using 60 input variables and one output variable. The overall classification accuracy was 90.5%, with 31.1% sensitivity, 97.9% specificity; and for the subjects without diabetes, precision and f-measure were 92% and 0.95, respectively. The identified variables included fasting plasma glucose, body mass index, triglycerides, mean arterial blood pressure, family history of diabetes, educational level and job status. In conclusion, decision tree analysis, using routine demographic, clinical, anthropometric and laboratory measurements, created a simple tool to predict individuals at low risk for type 2 diabetes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  14. Predicting cannabis abuse screening test (CAST) scores: a recursive partitioning analysis using survey data from Czech Republic, Italy, the Netherlands and Sweden.

    PubMed

    Blankers, Matthijs; Frijns, Tom; Belackova, Vendula; Rossi, Carla; Svensson, Bengt; Trautmann, Franz; van Laar, Margriet

    2014-01-01

    Cannabis is Europe's most commonly used illicit drug. Some users do not develop dependence or other problems, whereas others do. Many factors are associated with the occurrence of cannabis-related disorders. This makes it difficult to identify key risk factors and markers to profile at-risk cannabis users using traditional hypothesis-driven approaches. Therefore, the use of a data-mining technique called binary recursive partitioning is demonstrated in this study by creating a classification tree to profile at-risk users. 59 variables on cannabis use and drug market experiences were extracted from an internet-based survey dataset collected in four European countries (Czech Republic, Italy, Netherlands and Sweden), n = 2617. These 59 potential predictors of problematic cannabis use were used to partition individual respondents into subgroups with low and high risk of having a cannabis use disorder, based on their responses on the Cannabis Abuse Screening Test. Both a generic model for the four countries combined and four country-specific models were constructed. Of the 59 variables included in the first analysis step, only three variables were required to construct a generic partitioning model to classify high risk cannabis users with 65-73% accuracy. Based on the generic model for the four countries combined, the highest risk for cannabis use disorder is seen in participants reporting a cannabis use on more than 200 days in the last 12 months. In comparison to the generic model, the country-specific models led to modest, non-significant improvements in classification accuracy, with an exception for Italy (p = 0.01). Using recursive partitioning, it is feasible to construct classification trees based on only a few variables with acceptable performance to classify cannabis users into groups with low or high risk of meeting criteria for cannabis use disorder. The number of cannabis use days in the last 12 months is the most relevant variable. The identified variables may be considered for use in future screeners for cannabis use disorders.

  15. Toward extending terrestrial laser scanning applications in forestry: a case study of broad- and needle-leaf tree classification

    NASA Astrophysics Data System (ADS)

    Lin, Yi; Jiang, Miao

    2017-01-01

    Tree species information is essential for forest research and management purposes, which in turn require approaches for accurate and precise classification of tree species. One such remote sensing technology, terrestrial laser scanning (TLS), has proved to be capable of characterizing detailed tree structures, such as tree stem geometry. Can TLS further differentiate between broad- and needle-leaves? If the answer is positive, TLS data can be used for classification of taxonomic tree groups by directly examining their differences in leaf morphology. An analysis was proposed to assess TLS-represented broad- and needle-leaf structures, followed by a Bayes classifier to perform the classification. Tests indicated that the proposed method can basically implement the task, with an overall accuracy of 77.78%. This study indicates a way of implementing the classification of the two major broad- and needle-leaf taxonomies measured by TLS in accordance to their literal definitions, and manifests the potential of extending TLS applications in forestry.

  16. Prediction of strontium bromide laser efficiency using cluster and decision tree analysis

    NASA Astrophysics Data System (ADS)

    Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar

    2018-01-01

    Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.

  17. A hybrid approach to select features and classify diseases based on medical data

    NASA Astrophysics Data System (ADS)

    AbdelLatif, Hisham; Luo, Jiawei

    2018-03-01

    Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms

  18. Sorting Olive Batches for the Milling Process Using Image Processing

    PubMed Central

    Puerto, Daniel Aguilera; Martínez Gila, Diego Manuel; Gámez García, Javier; Gómez Ortega, Juan

    2015-01-01

    The quality of virgin olive oil obtained in the milling process is directly bound to the characteristics of the olives. Hence, the correct classification of the different incoming olive batches is crucial to reach the maximum quality of the oil. The aim of this work is to provide an automatic inspection system, based on computer vision, and to classify automatically different batches of olives entering the milling process. The classification is based on the differentiation between ground and tree olives. For this purpose, three different species have been studied (Picudo, Picual and Hojiblanco). The samples have been obtained by picking the olives directly from the tree or from the ground. The feature vector of the samples has been obtained on the basis of the olive image histograms. Moreover, different image preprocessing has been employed, and two classification techniques have been used: these are discriminant analysis and neural networks. The proposed methodology has been validated successfully, obtaining good classification results. PMID:26147729

  19. Beating the Odds: Trees to Success in Different Countries

    ERIC Educational Resources Information Center

    Finch, W. Holmes; Marchant, Gregory J.

    2017-01-01

    A recursive partitioning model approach in the form of classification and regression trees (CART) was used with 2012 PISA data for five countries (Canada, Finland, Germany, Singapore-China, and the Unites States). The objective of the study was to determine demographic and educational variables that differentiated between low SES student that were…

  20. Terrain Classification and Identification of Tree Stems Using Ground-Based Lidar

    DTIC Science & Technology

    2012-12-01

    hailing from North America and Eastern Asia. Stands are mixed age and very diverse, making this an appealing test site in terms of tree variety...sparse scene in Fig. 3(b) contains several deciduous trees and shrubs, but is largely open. The moderate scene, shown in Fig. 3(c), is cluttered with...numerous deciduous trees and shrubs, and significant ground cover. The remaining two data sets, dense1 and dense2 were collected at Breakheart

  1. Classification Rule for 5-year Cardiovascular Diseases Risk using decision tree in Primary Care Chinese Patients with Type 2 Diabetes Mellitus.

    PubMed

    Wan, Eric Yuk Fai; Fong, Daniel Yee Tak; Fung, Colman Siu Cheung; Yu, Esther Yee Tak; Chin, Weng Yee; Chan, Anca Ka Chun; Lam, Cindy Lo Kuen

    2017-11-10

    Cardiovascular disease(CVD) is the leading cause of mortality among patients with type 2 diabetes mellitus(T2DM), and a risk classification model for CVD among primary care diabetic patients is pivotal for risk-based interventions and patient information. This study developed a simple tool for a 5-year CVD risk prediction for primary care Chinese patients with T2DM. A retrospective cohort study was conducted on 137,935 primary care Chinese T2DM patients aged 18-79 years without history of CVD between 1 January 2010 and 31 December 2010. New events of CVD of the cohort over a median follow up of 5 years were extracted from the medical records. A classification rule of 5-year CVD risk was obtained from the derivation cohort and validated in the validation cohort. Significant risk factors included in decision tree were age, gender, smoking status, diagnosis duration, obesity, unsatisfactory control on haemoglobin A1c and cholesterol, albuminuria and stage of chronic kidney disease, which categorized patients into five 5-year CVD risk groups(<5%; 5-9%; 10-14%; 15-19% and ≥20%). Taking the group with the lowest CVD risk, the hazard ratios varied from 1.92(1.77,2.08) to 8.46(7.75,9.24). The present prediction model performed comparable discrimination and better calibration from the plot compared to other current existing models.

  2. Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set.

    PubMed

    Adler, Werner; Gefeller, Olaf; Gul, Asma; Horn, Folkert K; Khan, Zardad; Lausen, Berthold

    2016-12-07

    Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma.

  3. A preliminary case-mix classification system for Medicare home health clients.

    PubMed

    Branch, L G; Goldberg, H B

    1993-04-01

    In this study, a hierarchical case-mix model was developed for grouping Medicare home health beneficiaries homogeneously, based on the allowed charges for their home care. Based on information from a two-page form from 2,830 clients from ten states and using the classification and regression trees method, a four-component model was developed that yielded 11 case-mix groups and explained 22% of the variance for the test sample of 1,929 clients. The four components are rehabilitation, special care, skilled-nurse monitoring, and paralysis; each are categorized as present or absent. The range of mean-allowed charges for the 11 groups in the total sample was $473 to $2,562 with a mean of $847. Of the six groups with mean charges above $1,000, none exceeded 5.2% of clients; thus, the high-cost groups are relatively rare.

  4. Heart rate time series characteristics for early detection of infections in critically ill patients.

    PubMed

    Tambuyzer, T; Guiza, F; Boonen, E; Meersseman, P; Vervenne, H; Hansen, T K; Bjerre, M; Van den Berghe, G; Berckmans, D; Aerts, J M; Meyfroidt, G

    2017-04-01

    It is difficult to make a distinction between inflammation and infection. Therefore, new strategies are required to allow accurate detection of infection. Here, we hypothesize that we can distinguish infected from non-infected ICU patients based on dynamic features of serum cytokine concentrations and heart rate time series. Serum cytokine profiles and heart rate time series of 39 patients were available for this study. The serum concentration of ten cytokines were measured using blood sampled every 10 min between 2100 and 0600 hours. Heart rate was recorded every minute. Ten metrics were used to extract features from these time series to obtain an accurate classification of infected patients. The predictive power of the metrics derived from the heart rate time series was investigated using decision tree analysis. Finally, logistic regression methods were used to examine whether classification performance improved with inclusion of features derived from the cytokine time series. The AUC of a decision tree based on two heart rate features was 0.88. The model had good calibration with 0.09 Hosmer-Lemeshow p value. There was no significant additional value of adding static cytokine levels or cytokine time series information to the generated decision tree model. The results suggest that heart rate is a better marker for infection than information captured by cytokine time series when the exact stage of infection is not known. The predictive value of (expensive) biomarkers should always be weighed against the routinely monitored data, and such biomarkers have to demonstrate added value.

  5. Using Bayesian neural networks to classify forest scenes

    NASA Astrophysics Data System (ADS)

    Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni

    1998-10-01

    We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.

  6. A tree-based statistical classification algorithm (CHAID) for identifying variables responsible for the occurrence of faecal indicator bacteria during waterworks operations

    NASA Astrophysics Data System (ADS)

    Bichler, Andrea; Neumaier, Arnold; Hofmann, Thilo

    2014-11-01

    Microbial contamination of groundwater used for drinking water can affect public health and is of major concern to local water authorities and water suppliers. Potential hazards need to be identified in order to protect raw water resources. We propose a non-parametric data mining technique for exploring the presence of total coliforms (TC) in a groundwater abstraction well and its relationship to readily available, continuous time series of hydrometric monitoring parameters (seven year records of precipitation, river water levels, and groundwater heads). The original monitoring parameters were used to create an extensive generic dataset of explanatory variables by considering different accumulation or averaging periods, as well as temporal offsets of the explanatory variables. A classification tree based on the Chi-Squared Automatic Interaction Detection (CHAID) recursive partitioning algorithm revealed statistically significant relationships between precipitation and the presence of TC in both a production well and a nearby monitoring well. Different secondary explanatory variables were identified for the two wells. Elevated water levels and short-term water table fluctuations in the nearby river were found to be associated with TC in the observation well. The presence of TC in the production well was found to relate to elevated groundwater heads and fluctuations in groundwater levels. The generic variables created proved useful for increasing significance levels. The tree-based model was used to predict the occurrence of TC on the basis of hydrometric variables.

  7. Gibbs measures with memory of length 2 on an arbitrary-order Cayley tree

    NASA Astrophysics Data System (ADS)

    Akın, Hasan

    In this paper, we consider the Ising-Vanniminus model on an arbitrary-order Cayley tree. We generalize the results conjectured by Akın [Chinese J. Phys. 54(4), 635-649 (2016) and Int. J. Mod. Phys. B 31(13), 1750093 (2017)] for an arbitrary-order Cayley tree. We establish the existence and a full classification of translation-invariant Gibbs measures (TIGMs) with a memory of length 2 associated with the model on arbitrary-order Cayley tree. We construct the recurrence equations corresponding to the generalized ANNNI model. We satisfy the Kolmogorov consistency condition. We propose a rigorous measure-theoretical approach to investigate the Gibbs measures with a memory of length 2 for the model. We explain if the number of branches of the tree does not change the number of Gibbs measures. Also, we try to determine when the phase transition does occur.

  8. Tree Colors: Color Schemes for Tree-Structured Data.

    PubMed

    Tennekes, Martijn; de Jonge, Edwin

    2014-12-01

    We present a method to map tree structures to colors from the Hue-Chroma-Luminance color model, which is known for its well balanced perceptual properties. The Tree Colors method can be tuned with several parameters, whose effect on the resulting color schemes is discussed in detail. We provide a free and open source implementation with sensible parameter defaults. Categorical data are very common in statistical graphics, and often these categories form a classification tree. We evaluate applying Tree Colors to tree structured data with a survey on a large group of users from a national statistical institute. Our user study suggests that Tree Colors are useful, not only for improving node-link diagrams, but also for unveiling tree structure in non-hierarchical visualizations.

  9. Multispectral LiDAR Data for Land Cover Classification of Urban Areas

    PubMed Central

    Morsy, Salem; Shaker, Ahmed; El-Rabbany, Ahmed

    2017-01-01

    Airborne Light Detection And Ranging (LiDAR) systems usually operate at a monochromatic wavelength measuring the range and the strength of the reflected energy (intensity) from objects. Recently, multispectral LiDAR sensors, which acquire data at different wavelengths, have emerged. This allows for recording of a diversity of spectral reflectance from objects. In this context, we aim to investigate the use of multispectral LiDAR data in land cover classification using two different techniques. The first is image-based classification, where intensity and height images are created from LiDAR points and then a maximum likelihood classifier is applied. The second is point-based classification, where ground filtering and Normalized Difference Vegetation Indices (NDVIs) computation are conducted. A dataset of an urban area located in Oshawa, Ontario, Canada, is classified into four classes: buildings, trees, roads and grass. An overall accuracy of up to 89.9% and 92.7% is achieved from image classification and 3D point classification, respectively. A radiometric correction model is also applied to the intensity data in order to remove the attenuation due to the system distortion and terrain height variation. The classification process is then repeated, and the results demonstrate that there are no significant improvements achieved in the overall accuracy. PMID:28445432

  10. Multispectral LiDAR Data for Land Cover Classification of Urban Areas.

    PubMed

    Morsy, Salem; Shaker, Ahmed; El-Rabbany, Ahmed

    2017-04-26

    Airborne Light Detection And Ranging (LiDAR) systems usually operate at a monochromatic wavelength measuring the range and the strength of the reflected energy (intensity) from objects. Recently, multispectral LiDAR sensors, which acquire data at different wavelengths, have emerged. This allows for recording of a diversity of spectral reflectance from objects. In this context, we aim to investigate the use of multispectral LiDAR data in land cover classification using two different techniques. The first is image-based classification, where intensity and height images are created from LiDAR points and then a maximum likelihood classifier is applied. The second is point-based classification, where ground filtering and Normalized Difference Vegetation Indices (NDVIs) computation are conducted. A dataset of an urban area located in Oshawa, Ontario, Canada, is classified into four classes: buildings, trees, roads and grass. An overall accuracy of up to 89.9% and 92.7% is achieved from image classification and 3D point classification, respectively. A radiometric correction model is also applied to the intensity data in order to remove the attenuation due to the system distortion and terrain height variation. The classification process is then repeated, and the results demonstrate that there are no significant improvements achieved in the overall accuracy.

  11. Iris Image Classification Based on Hierarchical Visual Codebook.

    PubMed

    Zhenan Sun; Hui Zhang; Tieniu Tan; Jianyu Wang

    2014-06-01

    Iris recognition as a reliable method for personal identification has been well-studied with the objective to assign the class label of each iris image to a unique subject. In contrast, iris image classification aims to classify an iris image to an application specific category, e.g., iris liveness detection (classification of genuine and fake iris images), race classification (e.g., classification of iris images of Asian and non-Asian subjects), coarse-to-fine iris identification (classification of all iris images in the central database into multiple categories). This paper proposes a general framework for iris image classification based on texture analysis. A novel texture pattern representation method called Hierarchical Visual Codebook (HVC) is proposed to encode the texture primitives of iris images. The proposed HVC method is an integration of two existing Bag-of-Words models, namely Vocabulary Tree (VT), and Locality-constrained Linear Coding (LLC). The HVC adopts a coarse-to-fine visual coding strategy and takes advantages of both VT and LLC for accurate and sparse representation of iris texture. Extensive experimental results demonstrate that the proposed iris image classification method achieves state-of-the-art performance for iris liveness detection, race classification, and coarse-to-fine iris identification. A comprehensive fake iris image database simulating four types of iris spoof attacks is developed as the benchmark for research of iris liveness detection.

  12. Patient casemix classification for medicare psychiatric prospective payment.

    PubMed

    Drozd, Edward M; Cromwell, Jerry; Gage, Barbara; Maier, Jan; Greenwald, Leslie M; Goldman, Howard H

    2006-04-01

    For a proposed Medicare prospective payment system for inpatient psychiatric facility treatment, the authors developed a casemix classification to capture differences in patients' real daily resource use. Primary data on patient characteristics and daily time spent in various activities were collected in a survey of 696 patients from 40 inpatient psychiatric facilities. Survey data were combined with Medicare claims data to estimate intensity-adjusted daily cost. Classification and Regression Trees (CART) analysis of average daily routine and ancillary costs yielded several hierarchical classification groupings. Regression analysis was used to control for facility and day-of-stay effects in order to compare hierarchical models with models based on the recently proposed payment system of the Centers for Medicare & Medicaid Services. CART analysis identified a small set of patient characteristics strongly associated with higher daily costs, including age, psychiatric diagnosis, deficits in daily living activities, and detox or ECT use. A parsimonious, 16-group, fully interactive model that used five major DSM-IV categories and stratified by age, illness severity, deficits in daily living activities, dangerousness, and use of ECT explained 40% (out of a possible 76%) of daily cost variation not attributable to idiosyncratic daily changes within patients. A noninteractive model based on diagnosis-related groups, age, and medical comorbidity had explanatory power of only 32%. A regression model with 16 casemix groups restricted to using "appropriate" payment variables (i.e., those with clinical face validity and low administrative burden that are easily validated and provide proper care incentives) produced more efficient and equitable payments than did a noninteractive system based on diagnosis-related groups.

  13. Remote sensing of aquatic vegetation distribution in Taihu Lake using an improved classification tree with modified thresholds.

    PubMed

    Zhao, Dehua; Jiang, Hao; Yang, Tangwu; Cai, Ying; Xu, Delin; An, Shuqing

    2012-03-01

    Classification trees (CT) have been used successfully in the past to classify aquatic vegetation from spectral indices (SI) obtained from remotely-sensed images. However, applying CT models developed for certain image dates to other time periods within the same year or among different years can reduce the classification accuracy. In this study, we developed CT models with modified thresholds using extreme SI values (CT(m)) to improve the stability of the models when applying them to different time periods. A total of 903 ground-truth samples were obtained in September of 2009 and 2010 and classified as emergent, floating-leaf, or submerged vegetation or other cover types. Classification trees were developed for 2009 (Model-09) and 2010 (Model-10) using field samples and a combination of two images from winter and summer. Overall accuracies of these models were 92.8% and 94.9%, respectively, which confirmed the ability of CT analysis to map aquatic vegetation in Taihu Lake. However, Model-10 had only 58.9-71.6% classification accuracy and 31.1-58.3% agreement (i.e., pixels classified the same in the two maps) for aquatic vegetation when it was applied to image pairs from both a different time period in 2010 and a similar time period in 2009. We developed a method to estimate the effects of extrinsic (EF) and intrinsic (IF) factors on model uncertainty using Modis images. Results indicated that 71.1% of the instability in classification between time periods was due to EF, which might include changes in atmospheric conditions, sun-view angle and water quality. The remainder was due to IF, such as phenological and growth status differences between time periods. The modified version of Model-10 (i.e. CT(m)) performed better than traditional CT with different image dates. When applied to 2009 images, the CT(m) version of Model-10 had very similar thresholds and performance as Model-09, with overall accuracies of 92.8% and 90.5% for Model-09 and the CT(m) version of Model-10, respectively. CT(m) decreased the variability related to EF and IF and thereby improved the applicability of the models to different time periods. In both practice and theory, our results suggested that CT(m) was more stable than traditional CT models and could be used to map aquatic vegetation in time periods other than the one for which the model was developed. Copyright © 2011 Elsevier Ltd. All rights reserved.

  14. Gene function prediction based on the Gene Ontology hierarchical structure.

    PubMed

    Cheng, Liangxi; Lin, Hongfei; Hu, Yuncui; Wang, Jian; Yang, Zhihao

    2014-01-01

    The information of the Gene Ontology annotation is helpful in the explanation of life science phenomena, and can provide great support for the research of the biomedical field. The use of the Gene Ontology is gradually affecting the way people store and understand bioinformatic data. To facilitate the prediction of gene functions with the aid of text mining methods and existing resources, we transform it into a multi-label top-down classification problem and develop a method that uses the hierarchical relationships in the Gene Ontology structure to relieve the quantitative imbalance of positive and negative training samples. Meanwhile the method enhances the discriminating ability of classifiers by retaining and highlighting the key training samples. Additionally, the top-down classifier based on a tree structure takes the relationship of target classes into consideration and thus solves the incompatibility between the classification results and the Gene Ontology structure. Our experiment on the Gene Ontology annotation corpus achieves an F-value performance of 50.7% (precision: 52.7% recall: 48.9%). The experimental results demonstrate that when the size of training set is small, it can be expanded via topological propagation of associated documents between the parent and child nodes in the tree structure. The top-down classification model applies to the set of texts in an ontology structure or with a hierarchical relationship.

  15. Mapping and detecting bark beetle-caused tree mortality in the western United States

    NASA Astrophysics Data System (ADS)

    Meddens, Arjan J. H.

    Recently, insect outbreaks across North America have dramatically increased and the forest area affected by bark beetles is similar to that affected by fire. Remote sensing offers the potential to detect insect outbreaks with high accuracy. Chapter one involved detection of insect-caused tree mortality on the tree level for a 90km2 area in northcentral Colorado. Classes of interest included green trees, multiple stages of post-insect attack tree mortality including dead trees with red needles ("red-attack") and dead trees without needles ("gray-attack"), and non-forest. The results illustrated that classification of an image with a spatial resolution similar to the area of a tree crown outperformed that from finer and coarser resolution imagery for mapping tree mortality and non-forest classes. I also demonstrated that multispectral imagery could be used to separate multiple postoutbreak attack stages (i.e., red-attack and gray-attack) from other classes in the image. In Chapter 2, I compared and improved methods for detecting bark beetle-caused tree mortality using medium-resolution satellite data. I found that overall classification accuracy was similar between single-date and multi-date classification methods. I developed regression models to predict percent red attack within a 30-m grid cell and these models explained >75% of the variance using three Landsat spectral explanatory variables. Results of the final product showed that approximately 24% of the forest within the Landsat scene was comprised of tree mortality caused by bark beetles. In Chapter 3, I developed a gridded data set with 1-km2 resolution using aerial survey data and improved estimates of tree mortality across the western US and British Columbia. In the US, I also produced an upper estimate by forcing the mortality area to match that from high-resolution imagery in Idaho, Colorado, and New Mexico. Cumulative mortality area from all bark beetles was 5.46 Mha in British Columbia in 2001-2010 and 0.47-5.37 Mha (lower and upper estimate) in the western conterminous US during 1997-2010. Improved methods for detection and mapping of insect outbreak areas will lead to improved assessments of the effects of these forest disturbances on the economy, carbon cycle (and feedback to climate change), fuel loads, hydrology and forest ecology.

  16. Assessment of wastewater treatment facility compliance with decreasing ammonia discharge limits using a regression tree model.

    PubMed

    Suchetana, Bihu; Rajagopalan, Balaji; Silverstein, JoAnn

    2017-11-15

    A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Using laser altimetry-based segmentation to refine automated tree identification in managed forests of the Black Hills, South Dakota

    Treesearch

    Eric Rowell; Carl Selelstad; Lee Vierling; Lloyd Queen; Wayne Sheppard

    2006-01-01

    The success of a local maximum (LM) tree detection algorithm for detecting individual trees from lidar data depends on stand conditions that are often highly variable. A laser height variance and percent canopy cover (PCC) classification is used to segment the landscape by stand condition prior to stem detection. We test the performance of the LM algorithm using canopy...

  18. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis.

    PubMed

    Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H

    2016-01-01

    Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.

  19. Can SLE classification rules be effectively applied to diagnose unclear SLE cases?

    PubMed Central

    Mesa, Annia; Fernandez, Mitch; Wu, Wensong; Narasimhan, Giri; Greidinger, Eric L.; Mills, DeEtta K.

    2016-01-01

    Summary Objective Develop a novel classification criteria to distinguish between unclear SLE and MCTD cases. Methods A total of 205 variables from 111 SLE and 55 MCTD patients were evaluated to uncover unique molecular and clinical markers for each disease. Binomial logistic regressions (BLR) were performed on currently used SLE and MCTD classification criteria sets to obtain six reduced models with power to discriminate between unclear SLE and MCTD patients which were confirmed by Receiving Operating Characteristic (ROC) curve. Decision trees were employed to delineate novel classification rules to discriminate between unclear SLE and MCTD patients. Results SLE and MCTD patients exhibited contrasting molecular markers and clinical manifestations. Furthermore, reduced models highlighted SLE patients exhibit prevalence of skin rashes and renal disease while MCTD cases show dominance of myositis and muscle weakness. Additionally decision trees analyses revealed a novel classification rule tailored to differentiate unclear SLE and MCTD patients (Lu-vs-M) with an overall accuracy of 88%. Conclusions Validation of our novel proposed classification rule (Lu-vs-M) includes novel contrasting characteristics (calcinosis, CPK elevated and anti-IgM reactivity for U1-70K, U1A and U1C) between SLE and MCTD patients and showed a 33% improvement in distinguishing these disorders when compare to currently used classification criteria sets. Pending additional validation, our novel classification rule is a promising method to distinguish between patients with unclear SLE and MCTD diagnosis. PMID:27353506

  20. Mastectomy or breast conserving surgery? Factors affecting type of surgical treatment for breast cancer--a classification tree approach.

    PubMed

    Martin, Michael A; Meyricke, Ramona; O'Neill, Terry; Roberts, Steven

    2006-04-20

    A critical choice facing breast cancer patients is which surgical treatment--mastectomy or breast conserving surgery (BCS)--is most appropriate. Several studies have investigated factors that impact the type of surgery chosen, identifying features such as place of residence, age at diagnosis, tumor size, socio-economic and racial/ethnic elements as relevant. Such assessment of "propensity" is important in understanding issues such as a reported under-utilisation of BCS among women for whom such treatment was not contraindicated. Using Western Australian (WA) data, we further examine the factors associated with the type of surgical treatment for breast cancer using a classification tree approach. This approach deals naturally with complicated interactions between factors, and so allows flexible and interpretable models for treatment choice to be built that add to the current understanding of this complex decision process. Data was extracted from the WA Cancer Registry on women diagnosed with breast cancer in WA from 1990 to 2000. Subjects' treatment preferences were predicted from covariates using both classification trees and logistic regression. Tumor size was the primary determinant of patient choice, subjects with tumors smaller than 20 mm in diameter preferring BCS. For subjects with tumors greater than 20 mm in diameter factors such as patient age, nodal status, and tumor histology become relevant as predictors of patient choice. Classification trees perform as well as logistic regression for predicting patient choice, but are much easier to interpret for clinical use. The selected tree can inform clinicians' advice to patients.

  1. Object-based locust habitat mapping using high-resolution multispectral satellite data in the southern Aral Sea basin

    NASA Astrophysics Data System (ADS)

    Navratil, Peter; Wilps, Hans

    2013-01-01

    Three different object-based image classification techniques are applied to high-resolution satellite data for the mapping of the habitats of Asian migratory locust (Locusta migratoria migratoria) in the southern Aral Sea basin, Uzbekistan. A set of panchromatic and multispectral Système Pour l'Observation de la Terre-5 satellite images was spectrally enhanced by normalized difference vegetation index and tasseled cap transformation and segmented into image objects, which were then classified by three different classification approaches: a rule-based hierarchical fuzzy threshold (HFT) classification method was compared to a supervised nearest neighbor classifier and classification tree analysis by the quick, unbiased, efficient statistical trees algorithm. Special emphasis was laid on the discrimination of locust feeding and breeding habitats due to the significance of this discrimination for practical locust control. Field data on vegetation and land cover, collected at the time of satellite image acquisition, was used to evaluate classification accuracy. The results show that a robust HFT classifier outperformed the two automated procedures by 13% overall accuracy. The classification method allowed a reliable discrimination of locust feeding and breeding habitats, which is of significant importance for the application of the resulting data for an economically and environmentally sound control of locust pests because exact spatial knowledge on the habitat types allows a more effective surveying and use of pesticides.

  2. Analyzing tree-shape anatomical structures using topological descriptors of branching and ensemble of classifiers.

    PubMed

    Skoura, Angeliki; Bakic, Predrag R; Megalooikonomou, Vasilis

    2013-01-01

    The analysis of anatomical tree-shape structures visualized in medical images provides insight into the relationship between tree topology and pathology of the corresponding organs. In this paper, we propose three methods to extract descriptive features of the branching topology; the asymmetry index, the encoding of branching patterns using a node labeling scheme and an extension of the Sholl analysis. Based on these descriptors, we present classification schemes for tree topologies with respect to the underlying pathology. Moreover, we present a classifier ensemble approach which combines the predictions of the individual classifiers to optimize the classification accuracy. We applied the proposed methodology to a dataset of x-ray galactograms, medical images which visualize the breast ductal tree, in order to recognize images with radiological findings regarding breast cancer. The experimental results demonstrate the effectiveness of the proposed framework compared to state-of-the-art techniques suggesting that the proposed descriptors provide more valuable information regarding the topological patterns of ductal trees and indicating the potential of facilitating early breast cancer diagnosis.

  3. Analyzing tree-shape anatomical structures using topological descriptors of branching and ensemble of classifiers

    PubMed Central

    Skoura, Angeliki; Bakic, Predrag R.; Megalooikonomou, Vasilis

    2014-01-01

    The analysis of anatomical tree-shape structures visualized in medical images provides insight into the relationship between tree topology and pathology of the corresponding organs. In this paper, we propose three methods to extract descriptive features of the branching topology; the asymmetry index, the encoding of branching patterns using a node labeling scheme and an extension of the Sholl analysis. Based on these descriptors, we present classification schemes for tree topologies with respect to the underlying pathology. Moreover, we present a classifier ensemble approach which combines the predictions of the individual classifiers to optimize the classification accuracy. We applied the proposed methodology to a dataset of x-ray galactograms, medical images which visualize the breast ductal tree, in order to recognize images with radiological findings regarding breast cancer. The experimental results demonstrate the effectiveness of the proposed framework compared to state-of-the-art techniques suggesting that the proposed descriptors provide more valuable information regarding the topological patterns of ductal trees and indicating the potential of facilitating early breast cancer diagnosis. PMID:25414850

  4. Statistical analysis of texture in trunk images for biometric identification of tree species.

    PubMed

    Bressane, Adriano; Roveda, José A F; Martins, Antônio C G

    2015-04-01

    The identification of tree species is a key step for sustainable management plans of forest resources, as well as for several other applications that are based on such surveys. However, the present available techniques are dependent on the presence of tree structures, such as flowers, fruits, and leaves, limiting the identification process to certain periods of the year. Therefore, this article introduces a study on the application of statistical parameters for texture classification of tree trunk images. For that, 540 samples from five Brazilian native deciduous species were acquired and measures of entropy, uniformity, smoothness, asymmetry (third moment), mean, and standard deviation were obtained from the presented textures. Using a decision tree, a biometric species identification system was constructed and resulted to a 0.84 average precision rate for species classification with 0.83accuracy and 0.79 agreement. Thus, it can be considered that the use of texture presented in trunk images can represent an important advance in tree identification, since the limitations of the current techniques can be overcome.

  5. Object based technique for delineating and mapping 15 tree species using VHR WorldView-2 imagery

    NASA Astrophysics Data System (ADS)

    Mustafa, Yaseen T.; Habeeb, Hindav N.

    2014-10-01

    Monitoring and analyzing forests and trees are required task to manage and establish a good plan for the forest sustainability. To achieve such a task, information and data collection of the trees are requested. The fastest way and relatively low cost technique is by using satellite remote sensing. In this study, we proposed an approach to identify and map 15 tree species in the Mangish sub-district, Kurdistan Region-Iraq. Image-objects (IOs) were used as the tree species mapping unit. This is achieved using the shadow index, normalized difference vegetation index and texture measurements. Four classification methods (Maximum Likelihood, Mahalanobis Distance, Neural Network, and Spectral Angel Mapper) were used to classify IOs using selected IO features derived from WorldView-2 imagery. Results showed that overall accuracy was increased 5-8% using the Neural Network method compared with other methods with a Kappa coefficient of 69%. This technique gives reasonable results of various tree species classifications by means of applying the Neural Network method with IOs techniques on WorldView-2 imagery.

  6. An enhanced forest classification scheme for modeling vegetation-climate interactions based on national forest inventory data

    NASA Astrophysics Data System (ADS)

    Majasalmi, Titta; Eisner, Stephanie; Astrup, Rasmus; Fridman, Jonas; Bright, Ryan M.

    2018-01-01

    Forest management affects the distribution of tree species and the age class of a forest, shaping its overall structure and functioning and in turn the surface-atmosphere exchanges of mass, energy, and momentum. In order to attribute climate effects to anthropogenic activities like forest management, good accounts of forest structure are necessary. Here, using Fennoscandia as a case study, we make use of Fennoscandic National Forest Inventory (NFI) data to systematically classify forest cover into groups of similar aboveground forest structure. An enhanced forest classification scheme and related lookup table (LUT) of key forest structural attributes (i.e., maximum growing season leaf area index (LAImax), basal-area-weighted mean tree height, tree crown length, and total stem volume) was developed, and the classification was applied for multisource NFI (MS-NFI) maps from Norway, Sweden, and Finland. To provide a complete surface representation, our product was integrated with the European Space Agency Climate Change Initiative Land Cover (ESA CCI LC) map of present day land cover (v.2.0.7). Comparison of the ESA LC and our enhanced LC products (https://doi.org/10.21350/7zZEy5w3) showed that forest extent notably (κ = 0.55, accuracy 0.64) differed between the two products. To demonstrate the potential of our enhanced LC product to improve the description of the maximum growing season LAI (LAImax) of managed forests in Fennoscandia, we compared our LAImax map with reference LAImax maps created using the ESA LC product (and related cross-walking table) and PFT-dependent LAImax values used in three leading land models. Comparison of the LAImax maps showed that our product provides a spatially more realistic description of LAImax in managed Fennoscandian forests compared to reference maps. This study presents an approach to account for the transient nature of forest structural attributes due to human intervention in different land models.

  7. A New Morphological Phylogeny of the Ophiuroidea (Echinodermata) Accords with Molecular Evidence and Renders Microfossils Accessible for Cladistics

    PubMed Central

    Thuy, Ben; Stöhr, Sabine

    2016-01-01

    Ophiuroid systematics is currently in a state of upheaval, with recent molecular estimates fundamentally clashing with traditional, morphology-based classifications. Here, we attempt a long overdue recast of a morphological phylogeny estimate of the Ophiuroidea taking into account latest insights on microstructural features of the arm skeleton. Our final estimate is based on a total of 45 ingroup taxa, including 41 recent species covering the full range of extant ophiuroid higher taxon diversity and 4 fossil species known from exceptionally preserved material, and the Lower Carboniferous Aganaster gregarius as the outgroup. A total of 130 characters were scored directly on specimens. The tree resulting from the Bayesian inference analysis of the full data matrix is reasonably well resolved and well supported, and refutes all previous classifications, with most traditional families discredited as poly- or paraphyletic. In contrast, our tree agrees remarkably well with the latest molecular estimate, thus paving the way towards an integrated new classification of the Ophiuroidea. Among the characters which were qualitatively found to accord best with our tree topology, we selected a list of potential synapomorphies for future formal clade definitions. Furthermore, an analysis with 13 of the ingroup taxa reduced to the lateral arm plate characters produced a tree which was essentially similar to the full dataset tree. This suggests that dissociated lateral arm plates can be analysed in combination with fully known taxa and thus effectively unlocks the extensive record of fossil lateral arm plates for phylogenetic estimates. Finally, the age and position within our tree implies that the ophiuroid crown-group had started to diversify by the Early Triassic. PMID:27227685

  8. A Novel Multi-Class Ensemble Model for Classifying Imbalanced Biomedical Datasets

    NASA Astrophysics Data System (ADS)

    Bikku, Thulasi; Sambasiva Rao, N., Dr; Rao, Akepogu Ananda, Dr

    2017-08-01

    This paper mainly focuseson developing aHadoop based framework for feature selection and classification models to classify high dimensionality data in heterogeneous biomedical databases. Wide research has been performing in the fields of Machine learning, Big data and Data mining for identifying patterns. The main challenge is extracting useful features generated from diverse biological systems. The proposed model can be used for predicting diseases in various applications and identifying the features relevant to particular diseases. There is an exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Extracting key features from unstructured documents often lead to uncertain results due to outliers and missing values. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a Map-Reduce based multi-class ensemble decision tree model was designed and implemented in the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.

  9. Classification of Tree Species in Overstorey Canopy of Subtropical Forest Using QuickBird Images.

    PubMed

    Lin, Chinsu; Popescu, Sorin C; Thomson, Gavin; Tsogt, Khongor; Chang, Chein-I

    2015-01-01

    This paper proposes a supervised classification scheme to identify 40 tree species (2 coniferous, 38 broadleaf) belonging to 22 families and 36 genera in high spatial resolution QuickBird multispectral images (HMS). Overall kappa coefficient (OKC) and species conditional kappa coefficients (SCKC) were used to evaluate classification performance in training samples and estimate accuracy and uncertainty in test samples. Baseline classification performance using HMS images and vegetation index (VI) images were evaluated with an OKC value of 0.58 and 0.48 respectively, but performance improved significantly (up to 0.99) when used in combination with an HMS spectral-spatial texture image (SpecTex). One of the 40 species had very high conditional kappa coefficient performance (SCKC ≥ 0.95) using 4-band HMS and 5-band VIs images, but, only five species had lower performance (0.68 ≤ SCKC ≤ 0.94) using the SpecTex images. When SpecTex images were combined with a Visible Atmospherically Resistant Index (VARI), there was a significant improvement in performance in the training samples. The same level of improvement could not be replicated in the test samples indicating that a high degree of uncertainty exists in species classification accuracy which may be due to individual tree crown density, leaf greenness (inter-canopy gaps), and noise in the background environment (intra-canopy gaps). These factors increase uncertainty in the spectral texture features and therefore represent potential problems when using pixel-based classification techniques for multi-species classification.

  10. Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    NASA Astrophysics Data System (ADS)

    Kim, Jong Kyu; Kim, Nam Soo

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  11. Monsoon Forecasting based on Imbalanced Classification Techniques

    NASA Astrophysics Data System (ADS)

    Ribera, Pedro; Troncoso, Alicia; Asencio-Cortes, Gualberto; Vega, Inmaculada; Gallego, David

    2017-04-01

    Monsoonal systems are quasiperiodic processes of the climatic system that control seasonal precipitation over different regions of the world. The Western North Pacific Summer Monsoon (WNPSM) is one of those monsoons and it is known to have a great impact both over the global climate and over the total precipitation of very densely populated areas. The interannual variability of the WNPSM along the last 50-60 years has been related to different climatic indices such as El Niño, El Niño Modoki, the Indian Ocean Dipole or the Pacific Decadal Oscillation. Recently, a new and longer series characterizing the monthly evolution of the WNPSM, the WNP Directional Index (WNPDI), has been developed, extending its previous length from about 50 years to more than 100 years (1900-2007). Imbalanced classification techniques have been applied to the WNPDI in order to check the capability of traditional climate indices to capture and forecast the evolution of the WNPSM. The problem of forecasting has been transformed into a binary classification problem, in which the positive class represents the occurrence of an extreme monsoon event. Given that the number of extreme monsoons is much lower than the number of non-extreme monsoons, the resultant classification problem is highly imbalanced. The complete dataset is composed of 1296 instances, where only 71 (5.47%) samples correspond to extreme monsoons. Twenty predictor variables based on the cited climatic indices have been proposed, and namely, models based on trees, black box models such as neural networks, support vector machines and nearest neighbors, and finally ensemble-based techniques as random forests have been used in order to forecast the occurrence of extreme monsoons. It can be concluded that the methodology proposed here reports promising results according to the quality parameters evaluated and predicts extreme monsoons for a temporal horizon of a month with a high accuracy. From a climatological point of view, models based on trees show that the index of the El Niño Modoki in the months previous to an extreme monsoon acts as its best predictor. In most cases, the value of the Indian Ocean Dipole index acts as a second order classifier. But El Niño index, more frequently, or the Pacific Decadal Oscillation index, only in one case, do also modulate the intensity of the WNPSM in some cases.

  12. Developing collaborative classifiers using an expert-based model

    USGS Publications Warehouse

    Mountrakis, G.; Watts, R.; Luo, L.; Wang, Jingyuan

    2009-01-01

    This paper presents a hierarchical, multi-stage adaptive strategy for image classification. We iteratively apply various classification methods (e.g., decision trees, neural networks), identify regions of parametric and geographic space where accuracy is low, and in these regions, test and apply alternate methods repeating the process until the entire image is classified. Currently, classifiers are evaluated through human input using an expert-based system; therefore, this paper acts as the proof of concept for collaborative classifiers. Because we decompose the problem into smaller, more manageable sub-tasks, our classification exhibits increased flexibility compared to existing methods since classification methods are tailored to the idiosyncrasies of specific regions. A major benefit of our approach is its scalability and collaborative support since selected low-accuracy classifiers can be easily replaced with others without affecting classification accuracy in high accuracy areas. At each stage, we develop spatially explicit accuracy metrics that provide straightforward assessment of results by non-experts and point to areas that need algorithmic improvement or ancillary data. Our approach is demonstrated in the task of detecting impervious surface areas, an important indicator for human-induced alterations to the environment, using a 2001 Landsat scene from Las Vegas, Nevada. ?? 2009 American Society for Photogrammetry and Remote Sensing.

  13. T-RMSD: a web server for automated fine-grained protein structural classification.

    PubMed

    Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric

    2013-07-01

    This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd.

  14. T-RMSD: a web server for automated fine-grained protein structural classification

    PubMed Central

    Magis, Cedrik; Di Tommaso, Paolo; Notredame, Cedric

    2013-01-01

    This article introduces the T-RMSD web server (tree-based on root-mean-square deviation), a service allowing the online computation of structure-based protein classification. It has been developed to address the relation between structural and functional similarity in proteins, and it allows a fine-grained structural clustering of a given protein family or group of structurally related proteins using distance RMSD (dRMSD) variations. These distances are computed between all pairs of equivalent residues, as defined by the ungapped columns within a given multiple sequence alignment. Using these generated distance matrices (one per equivalent position), T-RMSD produces a structural tree with support values for each cluster node, reminiscent of bootstrap values. These values, associated with the tree topology, allow a quantitative estimate of structural distances between proteins or group of proteins defined by the tree topology. The clusters thus defined have been shown to be structurally and functionally informative. The T-RMSD web server is a free website open to all users and available at http://tcoffee.crg.cat/apps/tcoffee/do:trmsd. PMID:23716642

  15. Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree

    DTIC Science & Technology

    2008-04-01

    REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music

  16. Automated identification and geometrical features extraction of individual trees from Mobile Laser Scanning data in Budapest

    NASA Astrophysics Data System (ADS)

    Koma, Zsófia; Székely, Balázs; Folly-Ritvay, Zoltán; Skobrák, Ferenc; Koenig, Kristina; Höfle, Bernhard

    2016-04-01

    Mobile Laser Scanning (MLS) is an evolving operational measurement technique for urban environment providing large amounts of high resolution information about trees, street features, pole-like objects on the street sides or near to motorways. In this study we investigate a robust segmentation method to extract the individual trees automatically in order to build an object-based tree database system. We focused on the large urban parks in Budapest (Margitsziget and Városliget; KARESZ project) which contained large diversity of different kind of tree species. The MLS data contained high density point cloud data with 1-8 cm mean absolute accuracy 80-100 meter distance from streets. The robust segmentation method contained following steps: The ground points are determined first. As a second step cylinders are fitted in vertical slice 1-1.5 meter relative height above ground, which is used to determine the potential location of each single trees trunk and cylinder-like object. Finally, residual values are calculated as deviation of each point from a vertically expanded fitted cylinder; these residual values are used to separate cylinder-like object from individual trees. After successful parameterization, the model parameters and the corresponding residual values of the fitted object are extracted and imported into the tree database. Additionally, geometric features are calculated for each segmented individual tree like crown base, crown width, crown length, diameter of trunk, volume of the individual trees. In case of incompletely scanned trees, the extraction of geometric features is based on fitted circles. The result of the study is a tree database containing detailed information about urban trees, which can be a valuable dataset for ecologist, city planners, planting and mapping purposes. Furthermore, the established database will be the initial point for classification trees into single species. MLS data used in this project had been measured in the framework of KARESZ project for whole Budapest. BSz contributed as an Alexander von Humboldt Research Fellow.

  17. Comparison between Decision Tree and Genetic Programming to distinguish healthy from stroke postural sway patterns.

    PubMed

    Marrega, Luiz H G; Silva, Simone M; Manffra, Elisangela F; Nievola, Julio C

    2015-01-01

    Maintaining balance is a motor task of crucial importance for humans to perform their daily activities safely and independently. Studies in the field of Artificial Intelligence have considered different classification methods in order to distinguish healthy subjects from patients with certain motor disorders based on their postural strategies during the balance control. The main purpose of this paper is to compare the performance between Decision Tree (DT) and Genetic Programming (GP) - both classification methods of easy interpretation by health professionals - to distinguish postural sway patterns produced by healthy and stroke individuals based on 16 widely used posturographic variables. For this purpose, we used a posturographic dataset of time-series of center-of-pressure displacements derived from 19 stroke patients and 19 healthy matched subjects in three quiet standing tasks of balance control. Then, DT and GP models were trained and tested under two different experiments where accuracy, sensitivity and specificity were adopted as performance metrics. The DT method has performed statistically significant (P < 0.05) better in both cases, showing for example an accuracy of 72.8% against 69.2% from GP in the second experiment of this paper.

  18. Use of CHAID Decision Trees to Formulate Pathways for the Early Detection of Metabolic Syndrome in Young Adults

    PubMed Central

    Liu, Pei-Yang

    2014-01-01

    Metabolic syndrome (MetS) in young adults (age 20–39) is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID) decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES) 2009-2010 Cohort as a representative sample of the United States population (n = 745). Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP) classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS. PMID:24817904

  19. Knowledge discovery from patients' behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services.

    PubMed

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers.

  20. Knowledge discovery from patients’ behavior via clustering-classification algorithms based on weighted eRFM and CLV model: An empirical study in public health care services

    PubMed Central

    Zare Hosseini, Zeinab; Mohammadzadeh, Mahdi

    2016-01-01

    The rapid growing of information technology (IT) motivates and makes competitive advantages in health care industry. Nowadays, many hospitals try to build a successful customer relationship management (CRM) to recognize target and potential patients, increase patient loyalty and satisfaction and finally maximize their profitability. Many hospitals have large data warehouses containing customer demographic and transactions information. Data mining techniques can be used to analyze this data and discover hidden knowledge of customers. This research develops an extended RFM model, namely RFML (added parameter: Length) based on health care services for a public sector hospital in Iran with the idea that there is contrast between patient and customer loyalty, to estimate customer life time value (CLV) for each patient. We used Two-step and K-means algorithms as clustering methods and Decision tree (CHAID) as classification technique to segment the patients to find out target, potential and loyal customers in order to implement strengthen CRM. Two approaches are used for classification: first, the result of clustering is considered as Decision attribute in classification process and second, the result of segmentation based on CLV value of patients (estimated by RFML) is considered as Decision attribute. Finally the results of CHAID algorithm show the significant hidden rules and identify existing patterns of hospital consumers. PMID:27610177

  1. Automated structural classification of lipids by machine learning.

    PubMed

    Taylor, Ryan; Miller, Ryan H; Miller, Ryan D; Porter, Michael; Dalgleish, James; Prince, John T

    2015-03-01

    Modern lipidomics is largely dependent upon structural ontologies because of the great diversity exhibited in the lipidome, but no automated lipid classification exists to facilitate this partitioning. The size of the putative lipidome far exceeds the number currently classified, despite a decade of work. Automated classification would benefit ongoing classification efforts by decreasing the time needed and increasing the accuracy of classification while providing classifications for mass spectral identification algorithms. We introduce a tool that automates classification into the LIPID MAPS ontology of known lipids with >95% accuracy and novel lipids with 63% accuracy. The classification is based upon simple chemical characteristics and modern machine learning algorithms. The decision trees produced are intelligible and can be used to clarify implicit assumptions about the current LIPID MAPS classification scheme. These characteristics and decision trees are made available to facilitate alternative implementations. We also discovered many hundreds of lipids that are currently misclassified in the LIPID MAPS database, strongly underscoring the need for automated classification. Source code and chemical characteristic lists as SMARTS search strings are available under an open-source license at https://www.github.com/princelab/lipid_classifier. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Bioinformatics in proteomics: application, terminology, and pitfalls.

    PubMed

    Wiemer, Jan C; Prokudin, Alexander

    2004-01-01

    Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.

  3. SUPERFAMILY 1.75 including a domain-centric gene ontology method.

    PubMed

    de Lima Morais, David A; Fang, Hai; Rackham, Owen J L; Wilson, Derek; Pethica, Ralph; Chothia, Cyrus; Gough, Julian

    2011-01-01

    The SUPERFAMILY resource provides protein domain assignments at the structural classification of protein (SCOP) superfamily level for over 1400 completely sequenced genomes, over 120 metagenomes and other gene collections such as UniProt. All models and assignments are available to browse and download at http://supfam.org. A new hidden Markov model library based on SCOP 1.75 has been created and a previously ignored class of SCOP, coiled coils, is now included. Our scoring component now uses HMMER3, which is in orders of magnitude faster and produces superior results. A cloud-based pipeline was implemented and is publicly available at Amazon web services elastic computer cloud. The SUPERFAMILY reference tree of life has been improved allowing the user to highlight a chosen superfamily, family or domain architecture on the tree of life. The most significant advance in SUPERFAMILY is that now it contains a domain-based gene ontology (GO) at the superfamily and family levels. A new methodology was developed to ensure a high quality GO annotation. The new methodology is general purpose and has been used to produce domain-based phenotypic ontologies in addition to GO.

  4. Data-driven modeling of hydroclimatic trends and soil moisture: Multi-scale data integration and decision support

    NASA Astrophysics Data System (ADS)

    Coopersmith, Evan Joseph

    The techniques and information employed for decision-making vary with the spatial and temporal scope of the assessment required. In modern agriculture, the farm owner or manager makes decisions on a day-to-day or even hour-to-hour basis for dozens of fields scattered over as much as a fifty-mile radius from some central location. Following precipitation events, land begins to dry. Land-owners and managers often trace serpentine paths of 150+ miles every morning to inspect the conditions of their various parcels. His or her objective lies in appropriate resource usage -- is a given tract of land dry enough to be workable at this moment or would he or she be better served waiting patiently? Longer-term, these owners and managers decide upon which seeds will grow most effectively and which crops will make their operations profitable. At even longer temporal scales, decisions are made regarding which fields must be acquired and sold and what types of equipment will be necessary in future operations. This work develops and validates algorithms for these shorter-term decisions, along with models of national climate patterns and climate changes to enable longer-term operational planning. A test site at the University of Illinois South Farms (Urbana, IL, USA) served as the primary location to validate machine learning algorithms, employing public sources of precipitation and potential evapotranspiration to model the wetting/drying process. In expanding such local decision support tools to locations on a national scale, one must recognize the heterogeneity of hydroclimatic and soil characteristics throughout the United States. Machine learning algorithms modeling the wetting/drying process must address this variability, and yet it is wholly impractical to construct a separate algorithm for every conceivable location. For this reason, a national hydrological classification system is presented, allowing clusters of hydroclimatic similarity to emerge naturally from annual regime curve data and facilitate the development of cluster-specific algorithms. Given the desire to enable intelligent decision-making at any location, this classification system is developed in a manner that will allow for classification anywhere in the U.S., even in an ungauged basin. Daily time series data from 428 catchments in the MOPEX database are analyzed to produce an empirical classification tree, partitioning the United States into regions of hydroclimatic similarity. In constructing a classification tree based upon 55 years of data, it is important to recognize the non-stationary nature of climate data. The shifts in climatic regimes will cause certain locations to shift their ultimate position within the classification tree, requiring decision-makers to alter land usage, farming practices, and equipment needs, and algorithms to adjust accordingly. This work adapts the classification model to address the issue of regime shifts over larger temporal scales and suggests how land-usage and farming protocol may vary from hydroclimatic shifts in decades to come. Finally, the generalizability of the hydroclimatic classification system is tested with a physically-based soil moisture model calibrated at several locations throughout the continental United States. The soil moisture model is calibrated at a given site and then applied with the same parameters at other sites within and outside the same hydroclimatic class. The model's performance deteriorates minimally if the calibration and validation location are within the same hydroclimatic class, but deteriorates significantly if the calibration and validates sites are located in different hydroclimatic classes. These soil moisture estimates at the field scale are then further refined by the introduction of LiDAR elevation data, distinguishing faster-drying peaks and ridges from slower-drying valleys. The inclusion of LiDAR enabled multiple locations within the same field to be predicted accurately despite non-identical topography. This cross-application of parametric calibrations and LiDAR-driven disaggregation facilitates decision-support at locations without proximally-located soil moisture sensors.

  5. Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search for epileptic EEG multiclassification

    PubMed Central

    Wen, Tingxi; Zhang, Zhongnan

    2017-01-01

    Abstract In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy. PMID:28489789

  6. Effective and extensible feature extraction method using genetic algorithm-based frequency-domain feature search for epileptic EEG multiclassification.

    PubMed

    Wen, Tingxi; Zhang, Zhongnan

    2017-05-01

    In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.

  7. Machine learning models in breast cancer survival prediction.

    PubMed

    Montazeri, Mitra; Montazeri, Mohadeseh; Montazeri, Mahdieh; Beigzadeh, Amin

    2016-01-01

    Breast cancer is one of the most common cancers with a high mortality rate among women. With the early diagnosis of breast cancer survival will increase from 56% to more than 86%. Therefore, an accurate and reliable system is necessary for the early diagnosis of this cancer. The proposed model is the combination of rules and different machine learning techniques. Machine learning models can help physicians to reduce the number of false decisions. They try to exploit patterns and relationships among a large number of cases and predict the outcome of a disease using historical cases stored in datasets. The objective of this study is to propose a rule-based classification method with machine learning techniques for the prediction of different types of Breast cancer survival. We use a dataset with eight attributes that include the records of 900 patients in which 876 patients (97.3%) and 24 (2.7%) patients were females and males respectively. Naive Bayes (NB), Trees Random Forest (TRF), 1-Nearest Neighbor (1NN), AdaBoost (AD), Support Vector Machine (SVM), RBF Network (RBFN), and Multilayer Perceptron (MLP) machine learning techniques with 10-cross fold technique were used with the proposed model for the prediction of breast cancer survival. The performance of machine learning techniques were evaluated with accuracy, precision, sensitivity, specificity, and area under ROC curve. Out of 900 patients, 803 patients and 97 patients were alive and dead, respectively. In this study, Trees Random Forest (TRF) technique showed better results in comparison to other techniques (NB, 1NN, AD, SVM and RBFN, MLP). The accuracy, sensitivity and the area under ROC curve of TRF are 96%, 96%, 93%, respectively. However, 1NN machine learning technique provided poor performance (accuracy 91%, sensitivity 91% and area under ROC curve 78%). This study demonstrates that Trees Random Forest model (TRF) which is a rule-based classification model was the best model with the highest level of accuracy. Therefore, this model is recommended as a useful tool for breast cancer survival prediction as well as medical decision making.

  8. Predicting 30-day Hospital Readmission with Publicly Available Administrative Database. A Conditional Logistic Regression Modeling Approach.

    PubMed

    Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P

    2015-01-01

    This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.

  9. Molecular systematics of the barklouse family Psocidae (Insecta: Psocodea: 'Psocoptera') and implications for morphological and behavioral evolution.

    PubMed

    Yoshizawa, Kazunori; Johnson, Kevin P

    2008-02-01

    We evaluated the higher level classification within the family Psocidae (Insecta: Psocodea: 'Psocoptera') based on combined analyses of nuclear 18S, Histone 3, wingless and mitochondrial 12S, 16S and COI gene sequences. Various analyses (inclusion/exclusion of incomplete taxa and/or rapidly evolving genes, data partitioning, and analytical method selection) all provided similar results, which were generally concordant with relationships inferred using morphological observations. Based on the phylogenetic trees estimated for Psocidae, we propose a revised higher level classification of this family, although uncertainty still exists regarding some aspects of this classification. This classification includes a basal division into two subfamilies, 'Amphigerontiinae' (possibly paraphyletic) and Psocinae. The Amphigerontiinae is divided into the tribes Kaindipsocini (new tribe), Blastini, Amphigerontini, and Stylatopsocini. Psocinae is divided into the tribes 'Ptyctini' (probably paraphyletic), Psocini, Atrichadenotecnini (new tribe), Sigmatoneurini, Metylophorini, and Thyrsophorini (the latter includes the taxon previously recognized as Cerastipsocini). We examined the evolution of symmetric/asymmetric male genitalia over this tree and found this character to be quite homoplasious.

  10. [Study on extraction method of Panax notoginseng plots in Wenshan of Yunnan province based on decision tree model].

    PubMed

    Shi, Ting-Ting; Zhang, Xiao-Bo; Guo, Lan-Ping; Huang, Lu-Qi

    2017-11-01

    The herbs used as the material for traditional Chinese medicine are always planted in the mountainous area where the natural environment is suitable. As the mountain terrain is complex and the distribution of planting plots is scattered, the traditional survey method is difficult to obtain accurate planting area. It is of great significance to provide decision support for the conservation and utilization of traditional Chinese medicine resources by studying the method of extraction of Chinese herbal medicine planting area based on remote sensing and realizing the dynamic monitoring and reserve estimation of Chinese herbal medicines. In this paper, taking the Panax notoginseng plots in Wenshan prefecture of Yunnan province as an example, the China-made GF-1multispectral remote sensing images with a 16 m×16 m resolution were obtained. Then, the time series that can reflect the difference of spectrum of P. notoginseng shed and the background objects were selected to the maximum extent, and the decision tree model of extraction the of P. notoginseng plots was constructed according to the spectral characteristics of the surface features. The results showed that the remote sensing classification method based on the decision tree model could extract P. notoginseng plots in the study area effectively. The method can provide technical support for extraction of P. notoginseng plots at county level. Copyright© by the Chinese Pharmaceutical Association.

  11. Spectral and spatial resolution analysis of multi sensor satellite data for coral reef mapping: Tioman Island, Malaysia

    NASA Astrophysics Data System (ADS)

    Pradhan, Biswajeet; Kabiri, Keivan

    2012-07-01

    This paper describes an assessment of coral reef mapping using multi sensor satellite images such as Landsat ETM, SPOT and IKONOS images for Tioman Island, Malaysia. The study area is known to be one of the best Islands in South East Asia for its unique collection of diversified coral reefs and serves host to thousands of tourists every year. For the coral reef identification, classification and analysis, Landsat ETM, SPOT and IKONOS images were collected processed and classified using hierarchical classification schemes. At first, Decision tree classification method was implemented to separate three main land cover classes i.e. water, rural and vegetation and then maximum likelihood supervised classification method was used to classify these main classes. The accuracy of the classification result is evaluated by a separated test sample set, which is selected based on the fieldwork survey and view interpretation from IKONOS image. Few types of ancillary data in used are: (a) DGPS ground control points; (b) Water quality parameters measured by Hydrolab DS4a; (c) Sea-bed substrates spectrum measured by Unispec and; (d) Landcover observation photos along Tioman island coastal area. The overall accuracy of the final classification result obtained was 92.25% with the kappa coefficient is 0.8940. Key words: Coral reef, Multi-spectral Segmentation, Pixel-Based Classification, Decision Tree, Tioman Island

  12. Building Diversified Multiple Trees for classification in high dimensional noisy biomedical data.

    PubMed

    Li, Jiuyong; Liu, Lin; Liu, Jixue; Green, Ryan

    2017-12-01

    It is common that a trained classification model is applied to the operating data that is deviated from the training data because of noise. This paper will test an ensemble method, Diversified Multiple Tree (DMT), on its capability for classifying instances in a new laboratory using the classifier built on the instances of another laboratory. DMT is tested on three real world biomedical data sets from different laboratories in comparison with four benchmark ensemble methods, AdaBoost, Bagging, Random Forests, and Random Trees. Experiments have also been conducted on studying the limitation of DMT and its possible variations. Experimental results show that DMT is significantly more accurate than other benchmark ensemble classifiers on classifying new instances of a different laboratory from the laboratory where instances are used to build the classifier. This paper demonstrates that an ensemble classifier, DMT, is more robust in classifying noisy data than other widely used ensemble methods. DMT works on the data set that supports multiple simple trees.

  13. BAYESIAN METHODS FOR REGIONAL-SCALE EUTROPHICATION MODELS. (R830887)

    EPA Science Inventory

    We demonstrate a Bayesian classification and regression tree (CART) approach to link multiple environmental stressors to biological responses and quantify uncertainty in model predictions. Such an approach can: (1) report prediction uncertainty, (2) be consistent with the amou...

  14. Feature Relevance Assessment of Multispectral Airborne LIDAR Data for Tree Species Classification

    NASA Astrophysics Data System (ADS)

    Amiri, N.; Heurich, M.; Krzystek, P.; Skidmore, A. K.

    2018-04-01

    The presented experiment investigates the potential of Multispectral Laser Scanning (MLS) point clouds for single tree species classification. The basic idea is to simulate a MLS sensor by combining two different Lidar sensors providing three different wavelngthes. The available data were acquired in the summer 2016 at the same date in a leaf-on condition with an average point density of 37 points/m2. For the purpose of classification, we segmented the combined 3D point clouds consisiting of three different spectral channels into 3D clusters using Normalized Cut segmentation approach. Then, we extracted four group of features from the 3D point cloud space. Once a varity of features has been extracted, we applied forward stepwise feature selection in order to reduce the number of irrelevant or redundant features. For the classification, we used multinomial logestic regression with L1 regularization. Our study is conducted using 586 ground measured single trees from 20 sample plots in the Bavarian Forest National Park, in Germany. Due to lack of reference data for some rare species, we focused on four classes of species. The results show an improvement between 4-10 pp for the tree species classification by using MLS data in comparison to a single wavelength based approach. A cross validated (15-fold) accuracy of 0.75 can be achieved when all feature sets from three different spectral channels are used. Our results cleary indicates that the use of MLS point clouds has great potential to improve detailed forest species mapping.

  15. Explicit area-based accuracy assessment for mangrove tree crown delineation using Geographic Object-Based Image Analysis (GEOBIA)

    NASA Astrophysics Data System (ADS)

    Kamal, Muhammad; Johansen, Kasper

    2017-10-01

    Effective mangrove management requires spatially explicit information of mangrove tree crown map as a basis for ecosystem diversity study and health assessment. Accuracy assessment is an integral part of any mapping activities to measure the effectiveness of the classification approach. In geographic object-based image analysis (GEOBIA) the assessment of the geometric accuracy (shape, symmetry and location) of the created image objects from image segmentation is required. In this study we used an explicit area-based accuracy assessment to measure the degree of similarity between the results of the classification and reference data from different aspects, including overall quality (OQ), user's accuracy (UA), producer's accuracy (PA) and overall accuracy (OA). We developed a rule set to delineate the mangrove tree crown using WorldView-2 pan-sharpened image. The reference map was obtained by visual delineation of the mangrove tree crowns boundaries form a very high-spatial resolution aerial photograph (7.5cm pixel size). Ten random points with a 10 m radius circular buffer were created to calculate the area-based accuracy assessment. The resulting circular polygons were used to clip both the classified image objects and reference map for area comparisons. In this case, the area-based accuracy assessment resulted 64% and 68% for the OQ and OA, respectively. The overall quality of the calculation results shows the class-related area accuracy; which is the area of correctly classified as tree crowns was 64% out of the total area of tree crowns. On the other hand, the overall accuracy of 68% was calculated as the percentage of all correctly classified classes (tree crowns and canopy gaps) in comparison to the total class area (an entire image). Overall, the area-based accuracy assessment was simple to implement and easy to interpret. It also shows explicitly the omission and commission error variations of object boundary delineation with colour coded polygons.

  16. Classification of Liss IV Imagery Using Decision Tree Methods

    NASA Astrophysics Data System (ADS)

    Verma, Amit Kumar; Garg, P. K.; Prasad, K. S. Hari; Dadhwal, V. K.

    2016-06-01

    Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green) has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness) with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  17. Regional Estimates of Drought-Induced Tree Canopy Loss across Texas

    NASA Astrophysics Data System (ADS)

    Schwantes, A.; Swenson, J. J.; González-Roglich, M.; Johnson, D. M.; Domec, J. C.; Jackson, R. B.

    2015-12-01

    The severe drought of 2011 killed millions of trees across the state of Texas. Drought-induced tree-mortality can have significant impacts to carbon cycling, regional biophysics, and community composition. We quantified canopy cover loss across the state using remotely sensed imagery from before and after the drought at multiple scales. First, we classified ~200 orthophotos (1-m spatial resolution) from the National Agriculture Imagery Program, using a supervised maximum likelihood classification. Area of canopy cover loss in these classifications was highly correlated (R2 = 0.8) with ground estimates of canopy cover loss, measured in 74 plots across 15 different sites in Texas. These 1-m orthophoto classifications were then used to calibrate and validate coarser scale (30-m) Landsat imagery to create wall-to-wall tree canopy cover loss maps across the state of Texas. We quantified percent dead and live canopy within each pixel of Landsat to create continuous maps of dead and live tree cover, using two approaches: (1) a zero-inflated beta distribution model and (2) a random forest algorithm. Widespread canopy loss occurred across all the major natural systems of Texas, with the Edwards Plateau region most affected. In this region, on average, 10% of the forested area was lost due to the 2011 drought. We also identified climatic thresholds that controlled the spatial distribution of tree canopy loss across the state. However, surprisingly, there were many local hot spots of canopy loss, suggesting that not only climatic factors could explain the spatial patterns of canopy loss, but rather other factors related to soil, landscape, management, and stand density also likely played a role. As increases in extreme droughts are predicted to occur with climate change, it will become important to define methods that can detect associated drought-induced tree mortality across large regions. These maps could then be used (1) to quantify impacts to carbon cycling and regional biophysics, (2) to better understand the spatiotemporal dynamics of tree mortality, and (3) to calibrate and/or validate mortality algorithms in regional models.

  18. Retinal artery-vein classification via topology estimation

    PubMed Central

    Estrada, Rolando; Allingham, Michael J.; Mettu, Priyatham S.; Cousins, Scott W.; Tomasi, Carlo; Farsiu, Sina

    2015-01-01

    We propose a novel, graph-theoretic framework for distinguishing arteries from veins in a fundus image. We make use of the underlying vessel topology to better classify small and midsized vessels. We extend our previously proposed tree topology estimation framework by incorporating expert, domain-specific features to construct a simple, yet powerful global likelihood model. We efficiently maximize this model by iteratively exploring the space of possible solutions consistent with the projected vessels. We tested our method on four retinal datasets and achieved classification accuracies of 91.0%, 93.5%, 91.7%, and 90.9%, outperforming existing methods. Our results show the effectiveness of our approach, which is capable of analyzing the entire vasculature, including peripheral vessels, in wide field-of-view fundus photographs. This topology-based method is a potentially important tool for diagnosing diseases with retinal vascular manifestation. PMID:26068204

  19. Tree species classification using within crown localization of waveform LiDAR attributes

    NASA Astrophysics Data System (ADS)

    Blomley, Rosmarie; Hovi, Aarne; Weinmann, Martin; Hinz, Stefan; Korpela, Ilkka; Jutzi, Boris

    2017-11-01

    Since forest planning is increasingly taking an ecological, diversity-oriented perspective into account, remote sensing technologies are becoming ever more important in assessing existing resources with reduced manual effort. While the light detection and ranging (LiDAR) technology provides a good basis for predictions of tree height and biomass, tree species identification based on this type of data is particularly challenging in structurally heterogeneous forests. In this paper, we analyse existing approaches with respect to the geometrical scale of feature extraction (whole tree, within crown partitions or within laser footprint) and conclude that currently features are always extracted separately from the different scales. Since multi-scale approaches however have proven successful in other applications, we aim to utilize the within-tree-crown distribution of within-footprint signal characteristics as additional features. To do so, a spin image algorithm, originally devised for the extraction of 3D surface features in object recognition, is adapted. This algorithm relies on spinning an image plane around a defined axis, e.g. the tree stem, collecting the number of LiDAR returns or mean values of returns attributes per pixel as respective values. Based on this representation, spin image features are extracted that comprise only those components of highest variability among a given set of library trees. The relative performance and the combined improvement of these spin image features with respect to non-spatial statistical metrics of the waveform (WF) attributes are evaluated for the tree species classification of Scots pine (Pinus sylvestris L.), Norway spruce (Picea abies (L.) Karst.) and Silver/Downy birch (Betula pendula Roth/Betula pubescens Ehrh.) in a boreal forest environment. This evaluation is performed for two WF LiDAR datasets that differ in footprint size, pulse density at ground, laser wavelength and pulse width. Furthermore, we evaluate the robustness of the proposed method with respect to internal parameters and tree size. The results reveal, that the consideration of the crown-internal distribution of within-footprint signal characteristics captured in spin image features improves the classification results in nearly all test cases.

  20. Land cover and land use mapping of the iSimangaliso Wetland Park, South Africa: comparison of oblique and orthogonal random forest algorithms

    NASA Astrophysics Data System (ADS)

    Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad

    2016-01-01

    In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.

  1. Vlsi implementation of flexible architecture for decision tree classification in data mining

    NASA Astrophysics Data System (ADS)

    Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

    2017-07-01

    The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.

  2. Reconstruction of forest geometries from terrestrial laser scanning point clouds for canopy radiative transfer modelling

    NASA Astrophysics Data System (ADS)

    Bremer, Magnus; Schmidtner, Korbinian; Rutzinger, Martin

    2015-04-01

    The architecture of forest canopies is a key parameter for forest ecological issues helping to model the variability of wood biomass and foliage in space and time. In order to understand the nature of subpixel effects of optical space-borne sensors with coarse spatial resolution, hypothetical 3D canopy models are widely used for the simulation of radiative transfer in forests. Thereby, radiation is traced through the atmosphere and canopy geometries until it reaches the optical sensor. For a realistic simulation scene we decompose terrestrial laser scanning point cloud data of leaf-off larch forest plots in the Austrian Alps and reconstruct detailed model ready input data for radiative transfer simulations. The point clouds are pre-classified into primitive classes using Principle Component Analysis (PCA) using scale adapted radius neighbourhoods. Elongated point structures are extracted as tree trunks. The tree trunks are used as seeds for a Dijkstra-growing procedure, in order to obtain single tree segmentation in the interlinked canopies. For the optimized reconstruction of branching architectures as vector models, point cloud skeletonisation is used in combination with an iterative Dijkstra-growing and by applying distance constraints. This allows conducting a hierarchical reconstruction preferring the tree trunk and higher order branches and avoiding over-skeletonization effects. Based on the reconstructed branching architectures, larch needles are modelled based on the hierarchical level of branches and the geometrical openness of the canopy. For radiative transfer simulations, branch architectures are used as mesh geometries representing branches as cylindrical pipes. Needles are either used as meshes or as voxel-turbids. The presented workflow allows an automatic classification and single tree segmentation in interlinked canopies. The iterative Dijkstra-growing using distance constraints generated realistic reconstruction results. As the mesh representation of branches proved to be sufficient for the simulation approach, the modelling of huge amounts of needles is much more efficient in voxel-turbid representation.

  3. Phylogeny of the cycads based on multiple single copy nuclear genes: congruence of concatenation and species tree inference methods

    USDA-ARS?s Scientific Manuscript database

    Despite a recent new classification, a stable tree of life for the cycads has been elusive, particularly regarding resolution of Bowenia, Stangeria and Dioon. In this study we apply five single copy nuclear genes (SCNGs) to the phylogeny of the order Cycadales. We specifically aim to evaluate seve...

  4. Post-fire tree establishment patterns at the alpine treeline ecotone: Mount Rainier National Park, Washington, USA

    Treesearch

    Kirk M. Stueve; Dawna L. Cerney; Regina M. Rochefort; Laurie L. Kurth

    2009-01-01

    We performed classification analysis of 1970 satellite imagery and 2003 aerial photography to delineate establishment. Local site conditions were calculated from a LIDAR-based DEM, ancillary climate data, and 1970 tree locations in a GIS. We used logistic regression on a spatially weighted landscape matrix to rank variables.

  5. An analysis of tree mortality using high resolution remotely-sensed data for mixed-conifer forests in San Diego county

    NASA Astrophysics Data System (ADS)

    Freeman, Mary Pyott

    ABSTRACT An Analysis of Tree Mortality Using High Resolution Remotely-Sensed Data for Mixed-Conifer Forests in San Diego County by Mary Pyott Freeman The montane mixed-conifer forests of San Diego County are currently experiencing extensive tree mortality, which is defined as dieback where whole stands are affected. This mortality is likely the result of the complex interaction of many variables, such as altered fire regimes, climatic conditions such as drought, as well as forest pathogens and past management strategies. Conifer tree mortality and its spatial pattern and change over time were examined in three components. In component 1, two remote sensing approaches were compared for their effectiveness in delineating dead trees, a spatial contextual approach and an OBIA (object based image analysis) approach, utilizing various dates and spatial resolutions of airborne image data. For each approach transforms and masking techniques were explored, which were found to improve classifications, and an object-based assessment approach was tested. In component 2, dead tree maps produced by the most effective techniques derived from component 1 were utilized for point pattern and vector analyses to further understand spatio-temporal changes in tree mortality for the years 1997, 2000, 2002, and 2005 for three study areas: Palomar, Volcan and Laguna mountains. Plot-based fieldwork was conducted to further assess mortality patterns. Results indicate that conifer mortality was significantly clustered, increased substantially between 2002 and 2005, and was non-random with respect to tree species and diameter class sizes. In component 3, multiple environmental variables were used in Generalized Linear Model (GLM-logistic regression) and decision tree classifier model development, revealing the importance of climate and topographic factors such as precipitation and elevation, in being able to predict areas of high risk for tree mortality. The results from this study highlight the importance of multi-scale spatial as well as temporal analyses, in order to understand mixed-conifer forest structure, dynamics, and processes of decline, which can lead to more sustainable management of forests with continued natural and anthropogenic disturbance.

  6. A novel tree-based procedure for deciphering the genomic spectrum of clinical disease entities.

    PubMed

    Mbogning, Cyprien; Perdry, Hervé; Toussile, Wilson; Broët, Philippe

    2014-01-01

    Dissecting the genomic spectrum of clinical disease entities is a challenging task. Recursive partitioning (or classification trees) methods provide powerful tools for exploring complex interplay among genomic factors, with respect to a main factor, that can reveal hidden genomic patterns. To take confounding variables into account, the partially linear tree-based regression (PLTR) model has been recently published. It combines regression models and tree-based methodology. It is however computationally burdensome and not well suited for situations for which a large number of exploratory variables is expected. We developed a novel procedure that represents an alternative to the original PLTR procedure, and considered different selection criteria. A simulation study with different scenarios has been performed to compare the performances of the proposed procedure to the original PLTR strategy. The proposed procedure with a Bayesian Information Criterion (BIC) achieved good performances to detect the hidden structure as compared to the original procedure. The novel procedure was used for analyzing patterns of copy-number alterations in lung adenocarcinomas, with respect to Kirsten Rat Sarcoma Viral Oncogene Homolog gene (KRAS) mutation status, while controlling for a cohort effect. Results highlight two subgroups of pure or nearly pure wild-type KRAS tumors with particular copy-number alteration patterns. The proposed procedure with a BIC criterion represents a powerful and practical alternative to the original procedure. Our procedure performs well in a general framework and is simple to implement.

  7. Classification of document page images based on visual similarity of layout structures

    NASA Astrophysics Data System (ADS)

    Shin, Christian K.; Doermann, David S.

    1999-12-01

    Searching for documents by their type or genre is a natural way to enhance the effectiveness of document retrieval. The layout of a document contains a significant amount of information that can be used to classify a document's type in the absence of domain specific models. A document type or genre can be defined by the user based primarily on layout structure. Our classification approach is based on 'visual similarity' of the layout structure by building a supervised classifier, given examples of the class. We use image features, such as the percentages of tex and non-text (graphics, image, table, and ruling) content regions, column structures, variations in the point size of fonts, the density of content area, and various statistics on features of connected components which can be derived from class samples without class knowledge. In order to obtain class labels for training samples, we conducted a user relevance test where subjects ranked UW-I document images with respect to the 12 representative images. We implemented our classification scheme using the OC1, a decision tree classifier, and report our findings.

  8. Predicting Tillage Patterns in the Tiffin River Watershed Using Remote Sensing Methods

    NASA Astrophysics Data System (ADS)

    Brooks, C.; McCarty, J. L.; Dean, D. B.; Mann, B. F.

    2012-12-01

    Previous research in tillage mapping has focused primarily on utilizing low to no-cost, moderate (30 m to 15 m) resolution satellite data. Successful data processing techniques published in the scientific literature have focused on extracting and/or classifying tillage patterns through manipulation of spectral bands. For instance, Daughtry et al. (2005) evaluated several spectral indices for crop residue cover using satellite multispectral and hyperspectral data and to categorize soil tillage intensity in agricultural fields. A weak to moderate relationship between Landsat Thematic Mapper (TM) indices and crop residue cover was found; similar results were reported in Minnesota. Building on the findings from the scientific literature and previous work done by MTRI in the heavily agricultural Tiffin watershed of northwest Ohio and southeast Michigan, a decision tree classifier approach (also referred to as a classification tree) was used, linking several satellite data to on-the-ground tillage information in order to boost classification results. This approach included five tillage indices and derived products. A decision tree methodology enabled the development of statistically optimized (i.e., minimizing misclassification rates) classification algorithms at various desired time steps: monthly, seasonally, and annual over the 2006-2010 time period. Due to their flexibility, processing speed, and availability within all major remote sensing and statistical software packages, decision trees can ingest several data inputs from multiple sensors and satellite products, selecting only the bands, band ratios, indices, and products that further reduce misclassification errors. The project team created crop-specific tillage pattern classification trees whereby a training data set (~ 50% of available ground data) was created for production of the actual decision tree and a validation data set was set aside (~ 50% of available ground data) in order to assess the accuracy of the classification. A seasonal time step was used, optimizing a decision tree based on seasonal ground data for tillage patterns and satellite data and products for years 2006 through 2010. Annual crop type maps derived by the project team and the USDA Cropland Data Layer project was used an input to understand locations of corn, soybeans, wheat, etc. on a yearly basis. As previously stated, the robustness of the decision tree approach is the ability to implement various satellite data and products across temporal, spectral, and spatial resolutions, thereby improving the resulting classification and providing a reliable method that is not sensor-dependent. Tillage pattern classification from satellite imagery is not a simple task and has proven a challenge to previous researchers investigating this remote sensing topic. The team's decision tree method produced a practical, usable output within a focused project time period. Daughtry, C.S.T., Hunt Jr., E.R., Doraiswamy, P.C., McMurtrey III, J.E. 2005. Remote sensing the spatial distribution of crop residues. Agron. J. 97, 864-871.

  9. Multiclassifier system with hybrid learning applied to the control of bioprosthetic hand.

    PubMed

    Kurzynski, Marek; Krysmann, Maciej; Trajdos, Pawel; Wolczowski, Andrzej

    2016-02-01

    In this paper the problem of recognition of the intended hand movements for the control of bio-prosthetic hand is addressed. The proposed method is based on recognition of electromiographic (EMG) and mechanomiographic (MMG) biosignals using a multiclassifier system (MCS) working in a two-level structure with a dynamic ensemble selection (DES) scheme and original concepts of competence function. Additionally, feedback information coming from bioprosthesis sensors on the correct/incorrect classification is applied to the adjustment of the combining mechanism during MCS operation through adaptive tuning competences of base classifiers depending on their decisions. Three MCS systems operating in decision tree structure and with different tuning algorithms are developed. In the MCS1 system, competence is uniformly allocated to each class belonging to the group indicated by the feedback signal. In the MCS2 system, the modification of competence depends on the node of decision tree at which a correct/incorrect classification is made. In the MCS3 system, the randomized model of classifier and the concept of cross-competence are used in the tuning procedure. Experimental investigations on the real data and computer-simulated procedure of generating feedback signals are performed. In these investigations classification accuracy of the MCS systems developed is compared and furthermore, the MCS systems are evaluated with respect to the effectiveness of the procedure of tuning competence. The results obtained indicate that modification of competence of base classifiers during the working phase essentially improves performance of the MCS system and that this improvement depends on the MCS system and tuning method used. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Spatial modeling and classification of corneal shape.

    PubMed

    Marsolo, Keith; Twa, Michael; Bullimore, Mark A; Parthasarathy, Srinivasan

    2007-03-01

    One of the most promising applications of data mining is in biomedical data used in patient diagnosis. Any method of data analysis intended to support the clinical decision-making process should meet several criteria: it should capture clinically relevant features, be computationally feasible, and provide easily interpretable results. In an initial study, we examined the feasibility of using Zernike polynomials to represent biomedical instrument data in conjunction with a decision tree classifier to distinguish between the diseased and non-diseased eyes. Here, we provide a comprehensive follow-up to that work, examining a second representation, pseudo-Zernike polynomials, to determine whether they provide any increase in classification accuracy. We compare the fidelity of both methods using residual root-mean-square (rms) error and evaluate accuracy using several classifiers: neural networks, C4.5 decision trees, Voting Feature Intervals, and Naïve Bayes. We also examine the effect of several meta-learning strategies: boosting, bagging, and Random Forests (RFs). We present results comparing accuracy as it relates to dataset and transformation resolution over a larger, more challenging, multi-class dataset. They show that classification accuracy is similar for both data transformations, but differs by classifier. We find that the Zernike polynomials provide better feature representation than the pseudo-Zernikes and that the decision trees yield the best balance of classification accuracy and interpretability.

  11. Welcome to pandoraviruses at the ‘Fourth TRUC’ club

    PubMed Central

    Sharma, Vikas; Colson, Philippe; Chabrol, Olivier; Scheid, Patrick; Pontarotti, Pierre; Raoult, Didier

    2015-01-01

    Nucleocytoplasmic large DNA viruses, or representatives of the proposed order Megavirales, belong to families of giant viruses that infect a broad range of eukaryotic hosts. Megaviruses have been previously described to comprise a fourth monophylogenetic TRUC (things resisting uncompleted classification) together with cellular domains in the universal tree of life. Recently described pandoraviruses have large (1.9–2.5 MB) and highly divergent genomes. In the present study, we updated the classification of pandoraviruses and other reported giant viruses. Phylogenetic trees were constructed based on six informational genes. Hierarchical clustering was performed based on a set of informational genes from Megavirales members and cellular organisms. Homologous sequences were selected from cellular organisms using TimeTree software, comprising comprehensive, and representative sets of members from Bacteria, Archaea, and Eukarya. Phylogenetic analyses based on three conserved core genes clustered pandoraviruses with phycodnaviruses, exhibiting their close relatedness. Additionally, hierarchical clustering analyses based on informational genes grouped pandoraviruses with Megavirales members as a super group distinct from cellular organisms. Thus, the analyses based on core conserved genes revealed that pandoraviruses are new genuine members of the ‘Fourth TRUC’ club, encompassing distinct life forms compared with cellular organisms. PMID:26042093

  12. Welcome to pandoraviruses at the 'Fourth TRUC' club.

    PubMed

    Sharma, Vikas; Colson, Philippe; Chabrol, Olivier; Scheid, Patrick; Pontarotti, Pierre; Raoult, Didier

    2015-01-01

    Nucleocytoplasmic large DNA viruses, or representatives of the proposed order Megavirales, belong to families of giant viruses that infect a broad range of eukaryotic hosts. Megaviruses have been previously described to comprise a fourth monophylogenetic TRUC (things resisting uncompleted classification) together with cellular domains in the universal tree of life. Recently described pandoraviruses have large (1.9-2.5 MB) and highly divergent genomes. In the present study, we updated the classification of pandoraviruses and other reported giant viruses. Phylogenetic trees were constructed based on six informational genes. Hierarchical clustering was performed based on a set of informational genes from Megavirales members and cellular organisms. Homologous sequences were selected from cellular organisms using TimeTree software, comprising comprehensive, and representative sets of members from Bacteria, Archaea, and Eukarya. Phylogenetic analyses based on three conserved core genes clustered pandoraviruses with phycodnaviruses, exhibiting their close relatedness. Additionally, hierarchical clustering analyses based on informational genes grouped pandoraviruses with Megavirales members as a super group distinct from cellular organisms. Thus, the analyses based on core conserved genes revealed that pandoraviruses are new genuine members of the 'Fourth TRUC' club, encompassing distinct life forms compared with cellular organisms.

  13. Seeing the wood for the trees: a forest of methods for optimization and omic-network integration in metabolic modelling.

    PubMed

    Vijayakumar, Supreeta; Conway, Max; Lió, Pietro; Angione, Claudio

    2017-05-30

    Metabolic modelling has entered a mature phase with dozens of methods and software implementations available to the practitioner and the theoretician. It is not easy for a modeller to be able to see the wood (or the forest) for the trees. Driven by this analogy, we here present a 'forest' of principal methods used for constraint-based modelling in systems biology. This provides a tree-based view of methods available to prospective modellers, also available in interactive version at http://modellingmetabolism.net, where it will be kept updated with new methods after the publication of the present manuscript. Our updated classification of existing methods and tools highlights the most promising in the different branches, with the aim to develop a vision of how existing methods could hybridize and become more complex. We then provide the first hands-on tutorial for multi-objective optimization of metabolic models in R. We finally discuss the implementation of multi-view machine learning approaches in poly-omic integration. Throughout this work, we demonstrate the optimization of trade-offs between multiple metabolic objectives, with a focus on omic data integration through machine learning. We anticipate that the combination of a survey, a perspective on multi-view machine learning and a step-by-step R tutorial should be of interest for both the beginner and the advanced user. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  14. Predictive Models of the Hydrological Regime of Unregulated Streams in Arizona

    USGS Publications Warehouse

    Anning, David W.; Parker, John T.C.

    2009-01-01

    Three statistical models were developed by the U.S. Geological Survey in cooperation with the Arizona Department of Environmental Quality to improve the predictability of flow occurrence in unregulated streams throughout Arizona. The models can be used to predict the probabilities of the hydrological regime being one of four categories developed by this investigation: perennial, which has streamflow year-round; nearly perennial, which has streamflow 90 to 99.9 percent of the year; weakly perennial, which has streamflow 80 to 90 percent of the year; or nonperennial, which has streamflow less than 80 percent of the year. The models were developed to assist the Arizona Department of Environmental Quality in selecting sites for participation in the U.S. Environmental Protection Agency's Environmental Monitoring and Assessment Program. One model was developed for each of the three hydrologic provinces in Arizona - the Plateau Uplands, the Central Highlands, and the Basin and Range Lowlands. The models for predicting the hydrological regime were calibrated using statistical methods and explanatory variables of discharge, drainage-area, altitude, and location data for selected U.S. Geological Survey streamflow-gaging stations and a climate index derived from annual precipitation data. Models were calibrated on the basis of streamflow data from 46 stations for the Plateau Uplands province, 82 stations for the Central Highlands province, and 90 stations for the Basin and Range Lowlands province. The models were developed using classification trees that facilitated the analysis of mixed numeric and factor variables. In all three models, a threshold stream discharge was the initial variable to be considered within the classification tree and was the single most important explanatory variable. If a stream discharge value at a station was below the threshold, then the station record was determined as being nonperennial. If, however, the stream discharge was above the threshold, subsequent decisions were made according to the classification tree and explanatory variables to determine the hydrological regime of the reach as being perennial, nearly perennial, weakly perennial, or nonperennial. Using model calibration data, misclassification rates for each model were 17 percent for the Plateau Uplands, 15 percent for the Central Highlands, and 14 percent for the Basin and Range Lowlands models. The actual misclassification rate may be higher; however, the model has not been field verified for a full error assessment. The calibrated models were used to classify stream reaches for which the Arizona Department of Environmental Quality had collected miscellaneous discharge measurements. A total of 5,080 measurements at 696 sites were routed through the appropriate classification tree to predict the hydrological regime of the reaches in which the measurements were made. The predictions resulted in classification of all stream reaches as perennial or nonperennial; no reaches were predicted as nearly perennial or weakly perennial. The percentages of sites predicted as being perennial and nonperennial, respectively, were 77 and 23 for the Plateau Uplands, 87 and 13 for the Central Highlands, and 76 and 24 for the Basin and Range Lowlands.

  15. Quad-polarized synthetic aperture radar and multispectral data classification using classification and regression tree and support vector machine-based data fusion system

    NASA Astrophysics Data System (ADS)

    Bigdeli, Behnaz; Pahlavani, Parham

    2017-01-01

    Interpretation of synthetic aperture radar (SAR) data processing is difficult because the geometry and spectral range of SAR are different from optical imagery. Consequently, SAR imaging can be a complementary data to multispectral (MS) optical remote sensing techniques because it does not depend on solar illumination and weather conditions. This study presents a multisensor fusion of SAR and MS data based on the use of classification and regression tree (CART) and support vector machine (SVM) through a decision fusion system. First, different feature extraction strategies were applied on SAR and MS data to produce more spectral and textural information. To overcome the redundancy and correlation between features, an intrinsic dimension estimation method based on noise-whitened Harsanyi, Farrand, and Chang determines the proper dimension of the features. Then, principal component analysis and independent component analysis were utilized on stacked feature space of two data. Afterward, SVM and CART classified each reduced feature space. Finally, a fusion strategy was utilized to fuse the classification results. To show the effectiveness of the proposed methodology, single classification on each data was compared to the obtained results. A coregistered Radarsat-2 and WorldView-2 data set from San Francisco, USA, was available to examine the effectiveness of the proposed method. The results show that combinations of SAR data with optical sensor based on the proposed methodology improve the classification results for most of the classes. The proposed fusion method provided approximately 93.24% and 95.44% for two different areas of the data.

  16. Crop classification modelling using remote sensing and environmental data in the Greater Platte River Basin, USA

    USGS Publications Warehouse

    Howard, Daniel M.; Wylie, Bruce K.; Tieszen, Larry L.

    2012-01-01

    With an ever expanding population, potential climate variability and an increasing demand for agriculture-based alternative fuels, accurate agricultural land-cover classification for specific crops and their spatial distributions are becoming critical to researchers, policymakers, land managers and farmers. It is important to ensure the sustainability of these and other land uses and to quantify the net impacts that certain management practices have on the environment. Although other quality crop classification products are often available, temporal and spatial coverage gaps can create complications for certain regional or time-specific applications. Our goal was to develop a model capable of classifying major crops in the Greater Platte River Basin (GPRB) for the post-2000 era to supplement existing crop classification products. This study identifies annual spatial distributions and area totals of corn, soybeans, wheat and other crops across the GPRB from 2000 to 2009. We developed a regression tree classification model based on 2.5 million training data points derived from the National Agricultural Statistics Service (NASS) Cropland Data Layer (CDL) in relation to a variety of other relevant input environmental variables. The primary input variables included the weekly 250 m US Geological Survey Earth Observing System Moderate Resolution Imaging Spectroradiometer normalized differential vegetation index, average long-term growing season temperature, average long-term growing season precipitation and yearly start of growing season. An overall model accuracy rating of 78% was achieved for a test sample of roughly 215 000 independent points that were withheld from model training. Ten 250 m resolution annual crop classification maps were produced and evaluated for the GPRB region, one for each year from 2000 to 2009. In addition to the model accuracy assessment, our validation focused on spatial distribution and county-level crop area totals in comparison with the NASS CDL and county statistics from the US Department of Agriculture (USDA) Census of Agriculture. The results showed that our model produced crop classification maps that closely resembled the spatial distribution trends observed in the NASS CDL and exhibited a close linear agreement with county-by-county crop area totals from USDA census data (R 2 = 0.90).

  17. Random Forest Application for NEXRAD Radar Data Quality Control

    NASA Astrophysics Data System (ADS)

    Keem, M.; Seo, B. C.; Krajewski, W. F.

    2017-12-01

    Identification and elimination of non-meteorological radar echoes (e.g., returns from ground, wind turbines, and biological targets) are the basic data quality control steps before radar data use in quantitative applications (e.g., precipitation estimation). Although WSR-88Ds' recent upgrade to dual-polarization has enhanced this quality control and echo classification, there are still challenges to detect some non-meteorological echoes that show precipitation-like characteristics (e.g., wind turbine or anomalous propagation clutter embedded in rain). With this in mind, a new quality control method using Random Forest is proposed in this study. This classification algorithm is known to produce reliable results with less uncertainty. The method introduces randomness into sampling and feature selections and integrates consequent multiple decision trees. The multidimensional structure of the trees can characterize the statistical interactions of involved multiple features in complex situations. The authors explore the performance of Random Forest method for NEXRAD radar data quality control. Training datasets are selected using several clear cases of precipitation and non-precipitation (but with some non-meteorological echoes). The model is structured using available candidate features (from the NEXRAD data) such as horizontal reflectivity, differential reflectivity, differential phase shift, copolar correlation coefficient, and their horizontal textures (e.g., local standard deviation). The influence of each feature on classification results are quantified by variable importance measures that are automatically estimated by the Random Forest algorithm. Therefore, the number and types of features in the final forest can be examined based on the classification accuracy. The authors demonstrate the capability of the proposed approach using several cases ranging from distinct to complex rain/no-rain events and compare the performance with the existing algorithms (e.g., MRMS). They also discuss operational feasibility based on the observed strength and weakness of the method.

  18. Application of a hybrid association rules/decision tree model for drought monitoring

    NASA Astrophysics Data System (ADS)

    Nourani, Vahid; Molajou, Amir

    2017-12-01

    The previous researches have shown that the incorporation of the oceanic-atmospheric climate phenomena such as Sea Surface Temperature (SST) into hydro-climatic models could provide important predictive information about hydro-climatic variability. In this paper, the hybrid application of two data mining techniques (decision tree and association rules) was offered to discover affiliation between drought of Tabriz and Kermanshah synoptic stations (located in Iran) and de-trend SSTs of the Black, Mediterranean and Red Seas. Two major steps of the proposed model were the classification of de-trend SST data and selecting the most effective groups and extracting hidden information involved in the data. The techniques of decision tree which can identify the good traits from a data set for the classification purpose were used for classification and selecting the most effective groups and association rules were employed to extract the hidden predictive information from the large observed data. To examine the accuracy of the rules, confidence and Heidke Skill Score (HSS) measures were calculated and compared for different considering lag times. The computed measures confirm reliable performance of the proposed hybrid data mining method to forecast drought and the results show a relative correlation between the Mediterranean, Black and Red Sea de-trend SSTs and drought of Tabriz and Kermanshah synoptic stations so that the confidence between the monthly Standardized Precipitation Index (SPI) values and the de-trend SST of seas is higher than 70 and 80% respectively for Tabriz and Kermanshah synoptic stations.

  19. A Climatic Classification for Citrus Winter Survival in China.

    NASA Astrophysics Data System (ADS)

    Shou, Bo Huang

    1991-05-01

    The citrus tree is susceptible to frost damage. Winter injury to citrus from freezing weather is the major meteorological problem in the northern pail of citrus growing regions in China. Based on meteorological data collected at 120 stations in southern China and on the extent of citrus freezing injury, five climatic regions for citrus winter survival in China were developed. They were: 1) no citrus tree injury. 2) light injury to mandarins (citrus reticulate) or moderate injury to oranges (citrus sinensis), 3) moderate injury to mandarins or heavy injury to oranges, 4) heavy injury to mandarins, and 5) impossible citrus tree growth. This citrus climatic classification was an attempt to provide guidelines for regulation of citrus production, to effectively utilize land and climatic resources, to chose suitable citrus varieties, and to develop methods to prevent injury by freezing.

  20. Decision trees in epidemiological research.

    PubMed

    Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone

    2017-01-01

    In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

  1. Personalized Modeling for Prediction with Decision-Path Models

    PubMed Central

    Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.

    2015-01-01

    Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570

  2. Mapping Urban Tree Canopy Coverage and Structure using Data Fusion of High Resolution Satellite Imagery and Aerial Lidar

    NASA Astrophysics Data System (ADS)

    Elmes, A.; Rogan, J.; Williams, C. A.; Martin, D. G.; Ratick, S.; Nowak, D.

    2015-12-01

    Urban tree canopy (UTC) coverage is a critical component of sustainable urban areas. Trees provide a number of important ecosystem services, including air pollution mitigation, water runoff control, and aesthetic and cultural values. Critically, urban trees also act to mitigate the urban heat island (UHI) effect by shading impervious surfaces and via evaporative cooling. The cooling effect of urban trees can be seen locally, with individual trees reducing home HVAC costs, and at a citywide scale, reducing the extent and magnitude of an urban areas UHI. In order to accurately model the ecosystem services of a given urban forest, it is essential to map in detail the condition and composition of these trees at a fine scale, capturing individual tree crowns and their vertical structure. This paper presents methods for delineating UTC and measuring canopy structure at fine spatial resolution (<1m). These metrics are essential for modeling the HVAC benefits from UTC for individual homes, and for assessing the ecosystem services for entire urban areas. Such maps have previously been made using a variety of methods, typically relying on high resolution aerial or satellite imagery. This paper seeks to contribute to this growing body of methods, relying on a data fusion method to combine the information contained in high resolution WorldView-3 satellite imagery and aerial lidar data using an object-based image classification approach. The study area, Worcester, MA, has recently undergone a large-scale tree removal and reforestation program, following a pest eradication effort. Therefore, the urban canopy in this location provides a wide mix of tree age class and functional type, ideal for illustrating the effectiveness of the proposed methods. Early results show that the object-based classifier is indeed capable of identifying individual tree crowns, while continued research will focus on extracting crown structural characteristics using lidar-derived metrics. Ultimately, the resulting fine resolution UTC map will be compared with previously created UTC maps of the same area but for earlier dates, producing a canopy change map corresponding to the Worcester area tree removal and replanting effort.

  3. An Analysis of Machine- and Human-Analytics in Classification.

    PubMed

    Tam, Gary K L; Kothari, Vivek; Chen, Min

    2017-01-01

    In this work, we present a study that traces the technical and cognitive processes in two visual analytics applications to a common theoretic model of soft knowledge that may be added into a visual analytics process for constructing a decision-tree model. Both case studies involved the development of classification models based on the "bag of features" approach. Both compared a visual analytics approach using parallel coordinates with a machine-learning approach using information theory. Both found that the visual analytics approach had some advantages over the machine learning approach, especially when sparse datasets were used as the ground truth. We examine various possible factors that may have contributed to such advantages, and collect empirical evidence for supporting the observation and reasoning of these factors. We propose an information-theoretic model as a common theoretic basis to explain the phenomena exhibited in these two case studies. Together we provide interconnected empirical and theoretical evidence to support the usefulness of visual analytics.

  4. Indicators of Terrorism Vulnerability in Africa

    DTIC Science & Technology

    2015-03-26

    the terror threat and vulnerabilities across Africa. Key words: Terrorism, Africa, Negative Binomial Regression, Classification Tree iv I would like...31 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Log -likelihood...70 viii Page 5.3 Classification Tree Description

  5. Land cover and forest formation distributions for St. Kitts, Nevis, St. Eustatius, Grenada and Barbados from decision tree classification of cloud-cleared satellite imagery. Caribbean Journal of Science. 44(2):175-198.

    Treesearch

    E.H. Helmer; T.A. Kennaway; D.H. Pedreros; M.L. Clark; H. Marcano-Vega; L.L. Tieszen; S.R. Schill; C.M.S. Carrington

    2008-01-01

    Satellite image-based mapping of tropical forests is vital to conservation planning. Standard methods for automated image classification, however, limit classification detail in complex tropical landscapes. In this study, we test an approach to Landsat image interpretation on four islands of the Lesser Antilles, including Grenada and St. Kitts, Nevis and St. Eustatius...

  6. Lung tumor diagnosis and subtype discovery by gene expression profiling.

    PubMed

    Wang, Lu-yong; Tu, Zhuowen

    2006-01-01

    The optimal treatment of patients with complex diseases, such as cancers, depends on the accurate diagnosis by using a combination of clinical and histopathological data. In many scenarios, it becomes tremendously difficult because of the limitations in clinical presentation and histopathology. To accurate diagnose complex diseases, the molecular classification based on gene or protein expression profiles are indispensable for modern medicine. Moreover, many heterogeneous diseases consist of various potential subtypes in molecular basis and differ remarkably in their response to therapies. It is critical to accurate predict subgroup on disease gene expression profiles. More fundamental knowledge of the molecular basis and classification of disease could aid in the prediction of patient outcome, the informed selection of therapies, and identification of novel molecular targets for therapy. In this paper, we propose a new disease diagnostic method, probabilistic boosting tree (PB tree) method, on gene expression profiles of lung tumors. It enables accurate disease classification and subtype discovery in disease. It automatically constructs a tree in which each node combines a number of weak classifiers into a strong classifier. Also, subtype discovery is naturally embedded in the learning process. Our algorithm achieves excellent diagnostic performance, and meanwhile it is capable of detecting the disease subtype based on gene expression profile.

  7. Method of center localization for objects containing concentric arcs

    NASA Astrophysics Data System (ADS)

    Kuznetsova, Elena G.; Shvets, Evgeny A.; Nikolaev, Dmitry P.

    2015-02-01

    This paper proposes a method for automatic center location of objects containing concentric arcs. The method utilizes structure tensor analysis and voting scheme optimized with Fast Hough Transform. Two applications of the proposed method are considered: (i) wheel tracking in video-based system for automatic vehicle classification and (ii) tree growth rings analysis on a tree cross cut image.

  8. Classifying dysmorphic syndromes by using artificial neural network based hierarchical decision tree.

    PubMed

    Özdemir, Merve Erkınay; Telatar, Ziya; Eroğul, Osman; Tunca, Yusuf

    2018-05-01

    Dysmorphic syndromes have different facial malformations. These malformations are significant to an early diagnosis of dysmorphic syndromes and contain distinctive information for face recognition. In this study we define the certain features of each syndrome by considering facial malformations and classify Fragile X, Hurler, Prader Willi, Down, Wolf Hirschhorn syndromes and healthy groups automatically. The reference points are marked on the face images and ratios between the points' distances are taken into consideration as features. We suggest a neural network based hierarchical decision tree structure in order to classify the syndrome types. We also implement k-nearest neighbor (k-NN) and artificial neural network (ANN) classifiers to compare classification accuracy with our hierarchical decision tree. The classification accuracy is 50, 73 and 86.7% with k-NN, ANN and hierarchical decision tree methods, respectively. Then, the same images are shown to a clinical expert who achieve a recognition rate of 46.7%. We develop an efficient system to recognize different syndrome types automatically in a simple, non-invasive imaging data, which is independent from the patient's age, sex and race at high accuracy. The promising results indicate that our method can be used for pre-diagnosis of the dysmorphic syndromes by clinical experts.

  9. North American vegetation model for land-use planning in a changing climate: A solution to large classification problems

    Treesearch

    Gerald E. Rehfeldt; Nicholas L. Crookston; Cuauhtemoc Saenz-Romero; Elizabeth M. Campbell

    2012-01-01

    Data points intensively sampling 46 North American biomes were used to predict the geographic distribution of biomes from climate variables using the Random Forests classification tree. Techniques were incorporated to accommodate a large number of classes and to predict the future occurrence of climates beyond the contemporary climatic range of the biomes. Errors of...

  10. Binary Classification using Decision Tree based Genetic Programming and Its Application to Analysis of Bio-mass Data

    NASA Astrophysics Data System (ADS)

    To, Cuong; Pham, Tuan D.

    2010-01-01

    In machine learning, pattern recognition may be the most popular task. "Similar" patterns identification is also very important in biology because first, it is useful for prediction of patterns associated with disease, for example cancer tissue (normal or tumor); second, similarity or dissimilarity of the kinetic patterns is used to identify coordinately controlled genes or proteins involved in the same regulatory process. Third, similar genes (proteins) share similar functions. In this paper, we present an algorithm which uses genetic programming to create decision tree for binary classification problem. The application of the algorithm was implemented on five real biological databases. Base on the results of comparisons with well-known methods, we see that the algorithm is outstanding in most of cases.

  11. Improved data retrieval from TreeBASE via taxonomic and linguistic data enrichment

    PubMed Central

    Anwar, Nadia; Hunt, Ela

    2009-01-01

    Background TreeBASE, the only data repository for phylogenetic studies, is not being used effectively since it does not meet the taxonomic data retrieval requirements of the systematics community. We show, through an examination of the queries performed on TreeBASE, that data retrieval using taxon names is unsatisfactory. Results We report on a new wrapper supporting taxon queries on TreeBASE by utilising a Taxonomy and Classification Database (TCl-Db) we created. TCl-Db holds merged and consolidated taxonomic names from multiple data sources and can be used to translate hierarchical, vernacular and synonym queries into specific query terms in TreeBASE. The query expansion supported by TCl-Db shows very significant information retrieval quality improvement. The wrapper can be accessed at the URL The methodology we developed is scalable and can be applied to new data, as those become available in the future. Conclusion Significantly improved data retrieval quality is shown for all queries, and additional flexibility is achieved via user-driven taxonomy selection. PMID:19426482

  12. Mapping regional distribution of a single tree species: Whitebark pine in the Greater Yellowstone Ecosystem

    USGS Publications Warehouse

    Landenburger, L.; Lawrence, R.L.; Podruzny, S.; Schwartz, C.C.

    2008-01-01

    Moderate resolution satellite imagery traditionally has been thought to be inadequate for mapping vegetation at the species level. This has made comprehensive mapping of regional distributions of sensitive species, such as whitebark pine, either impractical or extremely time consuming. We sought to determine whether using a combination of moderate resolution satellite imagery (Landsat Enhanced Thematic Mapper Plus), extensive stand data collected by land management agencies for other purposes, and modern statistical classification techniques (boosted classification trees) could result in successful mapping of whitebark pine. Overall classification accuracies exceeded 90%, with similar individual class accuracies. Accuracies on a localized basis varied based on elevation. Accuracies also varied among administrative units, although we were not able to determine whether these differences related to inherent spatial variations or differences in the quality of available reference data.

  13. Predicting Chemically Induced Duodenal Ulcer and Adrenal Necrosis with Classification Trees

    NASA Astrophysics Data System (ADS)

    Giampaolo, Casimiro; Gray, Andrew T.; Olshen, Richard A.; Szabo, Sandor

    1991-07-01

    Binary tree-structured statistical classification algorithms and properties of 56 model alkyl nucleophiles were brought to bear on two problems of experimental pharmacology and toxicology. Each rat of a learning sample of 745 was administered one compound and autopsied to determine the presence of duodenal ulcer or adrenal hemorrhagic necrosis. The cited statistical classification schemes were then applied to these outcomes and 67 features of the compounds to ascertain those characteristics that are associated with biologic activity. For predicting duodenal ulceration, dipole moment, melting point, and solubility in octanol are particularly important, while for predicting adrenal necrosis, important features include the number of sulfhydryl groups and double bonds. These methods may constitute inexpensive but powerful ways to screen untested compounds for possible organ-specific toxicity. Mechanisms for the etiology and pathogenesis of the duodenal and adrenal lesions are suggested, as are additional avenues for drug design.

  14. Predicting outcome on admission and post-admission for acetaminophen-induced acute liver failure using classification and regression tree models.

    PubMed

    Speiser, Jaime Lynn; Lee, William M; Karvellas, Constantine J

    2015-01-01

    Assessing prognosis for acetaminophen-induced acute liver failure (APAP-ALF) patients often presents significant challenges. King's College (KCC) has been validated on hospital admission, but little has been published on later phases of illness. We aimed to improve determinations of prognosis both at the time of and following admission for APAP-ALF using Classification and Regression Tree (CART) models. CART models were applied to US ALFSG registry data to predict 21-day death or liver transplant early (on admission) and post-admission (days 3-7) for 803 APAP-ALF patients enrolled 01/1998-09/2013. Accuracy in prediction of outcome (AC), sensitivity (SN), specificity (SP), and area under receiver-operating curve (AUROC) were compared between 3 models: KCC (INR, creatinine, coma grade, pH), CART analysis using only KCC variables (KCC-CART) and a CART model using new variables (NEW-CART). Traditional KCC yielded 69% AC, 90% SP, 27% SN, and 0.58 AUROC on admission, with similar performance post-admission. KCC-CART at admission offered predictive 66% AC, 65% SP, 67% SN, and 0.74 AUROC. Post-admission, KCC-CART had predictive 82% AC, 86% SP, 46% SN and 0.81 AUROC. NEW-CART models using MELD (Model for end stage liver disease), lactate and mechanical ventilation on admission yielded predictive 72% AC, 71% SP, 77% SN and AUROC 0.79. For later stages, NEW-CART (MELD, lactate, coma grade) offered predictive AC 86%, SP 91%, SN 46%, AUROC 0.73. CARTs offer simple prognostic models for APAP-ALF patients, which have higher AUROC and SN than KCC, with similar AC and negligibly worse SP. Admission and post-admission predictions were developed. • Prognostication in acetaminophen-induced acute liver failure (APAP-ALF) is challenging beyond admission • Little has been published regarding the use of King's College Criteria (KCC) beyond admission and KCC has shown limited sensitivity in subsequent studies • Classification and Regression Tree (CART) methodology allows the development of predictive models using binary splits and offers an intuitive method for predicting outcome, using processes familiar to clinicians • Data from the ALFSG registry suggested that CART prognosis models for the APAP population offer improved sensitivity and model performance over traditional regression-based KCC, while maintaining similar accuracy and negligibly worse specificity • KCC-CART models offered modest improvement over traditional KCC, with NEW-CART models performing better than KCC-CART particularly at late time points.

  15. Evaluating the predictability of distance race performance in NCAA cross country and track and field from high school race times in the United States.

    PubMed

    Brusa, Jamie L

    2017-12-30

    Successful recruiting for collegiate track & field athletes has become a more competitive and essential component of coaching. This study aims to determine the relationship between race performances of distance runners at the United States high school and National Collegiate Athletic Association (NCAA) levels. Conditional inference classification tree models were built and analysed to predict the probability that runners would qualify for the NCAA Division I National Cross Country Meet and/or the East or West NCAA Division I Outdoor Track & Field Preliminary Round based on their high school race times in the 800 m, 1600 m, and 3200 m. Prediction accuracies of the classification trees ranged from 60.0 to 76.6 percent. The models produced the most reliable estimates for predicting qualifiers in cross country, the 1500 m, and the 800 m for females and cross country, the 5000 m, and the 800 m for males. NCAA track & field coaches can use the results from this study as a guideline for recruiting decisions. Additionally, future studies can apply the methodological foundations of this research to predicting race performances set at different metrics, such as national meets in other countries or Olympic qualifications, from previous race data.

  16. Benchmarking protein classification algorithms via supervised cross-validation.

    PubMed

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

  17. Generation and Termination of Binary Decision Trees for Nonparametric Multiclass Classification.

    DTIC Science & Technology

    1984-10-01

    O M coF=F;; UMBER2. GOVT ACCE5SION NO.1 3 . REC,PINS :A7AL:,G NUMBER ( ’eneration and Terminat_,on :)f Binary D-ecision jC j ik; Trees for Nonnararetrc...1-I . v)IAMO 0~I4 EDvt" O F I 00 . 3 15I OR%.OL.ETL - S-S OCTOBER 1984 LIDS-P-1411 GENERATION AND TERMINATION OF BINARY DECISION TREES FOR...minimizes the Bayes risk. Tree generation and termination are based on the training and test samples, respectively. 0 0 0/ 6 0¢ A 3 I. Introduction We state

  18. Statistical classification of drug incidents due to look-alike sound-alike mix-ups.

    PubMed

    Wong, Zoie Shui Yee

    2016-06-01

    It has been recognised that medication names that look or sound similar are a cause of medication errors. This study builds statistical classifiers for identifying medication incidents due to look-alike sound-alike mix-ups. A total of 227 patient safety incident advisories related to medication were obtained from the Canadian Patient Safety Institute's Global Patient Safety Alerts system. Eight feature selection strategies based on frequent terms, frequent drug terms and constituent terms were performed. Statistical text classifiers based on logistic regression, support vector machines with linear, polynomial, radial-basis and sigmoid kernels and decision tree were trained and tested. The models developed achieved an average accuracy of above 0.8 across all the model settings. The receiver operating characteristic curves indicated the classifiers performed reasonably well. The results obtained in this study suggest that statistical text classification can be a feasible method for identifying medication incidents due to look-alike sound-alike mix-ups based on a database of advisories from Global Patient Safety Alerts. © The Author(s) 2014.

  19. ARG-based genome-wide analysis of cacao cultivars.

    PubMed

    Utro, Filippo; Cornejo, Omar Eduardo; Livingstone, Donald; Motamayor, Juan Carlos; Parida, Laxmi

    2012-01-01

    Ancestral recombinations graph (ARG) is a topological structure that captures the relationship between the extant genomic sequences in terms of genetic events including recombinations. IRiS is a system that estimates the ARG on sequences of individuals, at genomic scales, capturing the relationship between these individuals of the species. Recently, this system was used to estimate the ARG of the recombining X Chromosome of a collection of human populations using relatively dense, bi-allelic SNP data. While the ARG is a natural model for capturing the inter-relationship between a single chromosome of the individuals of a species, it is not immediately apparent how the model can utilize whole-genome (across chromosomes) diploid data. Also, the sheer complexity of an ARG structure presents a challenge to graph visualization techniques. In this paper we examine the ARG reconstruction for (1) genome-wide or multiple chromosomes, (2) multi-allelic and (3) extremely sparse data. To aid in the visualization of the results of the reconstructed ARG, we additionally construct a much simplified topology, a classification tree, suggested by the ARG.As the test case, we study the problem of extracting the relationship between populations of Theobroma cacao. The chocolate tree is an outcrossing species in the wild, due to self-incompatibility mechanisms at play. Thus a principled approach to understanding the inter-relationships between the different populations must take the shuffling of the genomic segments into account. The polymorphisms in the test data are short tandem repeats (STR) and are multi-allelic (sometimes as high as 30 distinct possible values at a locus). Each is at a genomic location that is bilaterally transmitted, hence the ARG is a natural model for this data. Another characteristic of this plant data set is that while it is genome-wide, across 10 linkage groups or chromosomes, it is very sparse, i.e., only 96 loci from a genome of approximately 400 megabases. The results are visualized both as MDS plots and as classification trees. To evaluate the accuracy of the ARG approach, we compare the results with those available in literature. We have extended the ARG model to incorporate genome-wide (ensemble of multiple chromosomes) data in a natural way. We present a simple scheme to implement this in practice. Finally, this is the first time that a plant population data set is being studied by estimating its underlying ARG. We demonstrate an overall precision of 0.92 and an overall recall of 0.93 of the ARG-based classification, with respect to the gold standard. While we have corroborated the classification of the samples with that in literature, this opens the door to other potential studies that can be made on the ARG.

  20. ARG-based genome-wide analysis of cacao cultivars

    PubMed Central

    2012-01-01

    Background Ancestral recombinations graph (ARG) is a topological structure that captures the relationship between the extant genomic sequences in terms of genetic events including recombinations. IRiS is a system that estimates the ARG on sequences of individuals, at genomic scales, capturing the relationship between these individuals of the species. Recently, this system was used to estimate the ARG of the recombining X Chromosome of a collection of human populations using relatively dense, bi-allelic SNP data. Results While the ARG is a natural model for capturing the inter-relationship between a single chromosome of the individuals of a species, it is not immediately apparent how the model can utilize whole-genome (across chromosomes) diploid data. Also, the sheer complexity of an ARG structure presents a challenge to graph visualization techniques. In this paper we examine the ARG reconstruction for (1) genome-wide or multiple chromosomes, (2) multi-allelic and (3) extremely sparse data. To aid in the visualization of the results of the reconstructed ARG, we additionally construct a much simplified topology, a classification tree, suggested by the ARG. As the test case, we study the problem of extracting the relationship between populations of Theobroma cacao. The chocolate tree is an outcrossing species in the wild, due to self-incompatibility mechanisms at play. Thus a principled approach to understanding the inter-relationships between the different populations must take the shuffling of the genomic segments into account. The polymorphisms in the test data are short tandem repeats (STR) and are multi-allelic (sometimes as high as 30 distinct possible values at a locus). Each is at a genomic location that is bilaterally transmitted, hence the ARG is a natural model for this data. Another characteristic of this plant data set is that while it is genome-wide, across 10 linkage groups or chromosomes, it is very sparse, i.e., only 96 loci from a genome of approximately 400 megabases. The results are visualized both as MDS plots and as classification trees. To evaluate the accuracy of the ARG approach, we compare the results with those available in literature. Conclusions We have extended the ARG model to incorporate genome-wide (ensemble of multiple chromosomes) data in a natural way. We present a simple scheme to implement this in practice. Finally, this is the first time that a plant population data set is being studied by estimating its underlying ARG. We demonstrate an overall precision of 0.92 and an overall recall of 0.93 of the ARG-based classification, with respect to the gold standard. While we have corroborated the classification of the samples with that in literature, this opens the door to other potential studies that can be made on the ARG. PMID:23281769

  1. Learning machines and sleeping brains: Automatic sleep stage classification using decision-tree multi-class support vector machines.

    PubMed

    Lajnef, Tarek; Chaibi, Sahbi; Ruby, Perrine; Aguera, Pierre-Emmanuel; Eichenlaub, Jean-Baptiste; Samet, Mounir; Kachouri, Abdennaceur; Jerbi, Karim

    2015-07-30

    Sleep staging is a critical step in a range of electrophysiological signal processing pipelines used in clinical routine as well as in sleep research. Although the results currently achievable with automatic sleep staging methods are promising, there is need for improvement, especially given the time-consuming and tedious nature of visual sleep scoring. Here we propose a sleep staging framework that consists of a multi-class support vector machine (SVM) classification based on a decision tree approach. The performance of the method was evaluated using polysomnographic data from 15 subjects (electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) recordings). The decision tree, or dendrogram, was obtained using a hierarchical clustering technique and a wide range of time and frequency-domain features were extracted. Feature selection was carried out using forward sequential selection and classification was evaluated using k-fold cross-validation. The dendrogram-based SVM (DSVM) achieved mean specificity, sensitivity and overall accuracy of 0.92, 0.74 and 0.88 respectively, compared to expert visual scoring. Restricting DSVM classification to data where both experts' scoring was consistent (76.73% of the data) led to a mean specificity, sensitivity and overall accuracy of 0.94, 0.82 and 0.92 respectively. The DSVM framework outperforms classification with more standard multi-class "one-against-all" SVM and linear-discriminant analysis. The promising results of the proposed methodology suggest that it may be a valuable alternative to existing automatic methods and that it could accelerate visual scoring by providing a robust starting hypnogram that can be further fine-tuned by expert inspection. Copyright © 2015 Elsevier B.V. All rights reserved.

  2. Fault detection and diagnosis of induction motors using motor current signature analysis and a hybrid FMM-CART model.

    PubMed

    Seera, Manjeevan; Lim, Chee Peng; Ishak, Dahaman; Singh, Harapajan

    2012-01-01

    In this paper, a novel approach to detect and classify comprehensive fault conditions of induction motors using a hybrid fuzzy min-max (FMM) neural network and classification and regression tree (CART) is proposed. The hybrid model, known as FMM-CART, exploits the advantages of both FMM and CART for undertaking data classification and rule extraction problems. A series of real experiments is conducted, whereby the motor current signature analysis method is applied to form a database comprising stator current signatures under different motor conditions. The signal harmonics from the power spectral density are extracted as discriminative input features for fault detection and classification with FMM-CART. A comprehensive list of induction motor fault conditions, viz., broken rotor bars, unbalanced voltages, stator winding faults, and eccentricity problems, has been successfully classified using FMM-CART with good accuracy rates. The results are comparable, if not better, than those reported in the literature. Useful explanatory rules in the form of a decision tree are also elicited from FMM-CART to analyze and understand different fault conditions of induction motors.

  3. Stacked Denoising Autoencoders Applied to Star/Galaxy Classification

    NASA Astrophysics Data System (ADS)

    Qin, Hao-ran; Lin, Ji-ming; Wang, Jun-yi

    2017-04-01

    In recent years, the deep learning algorithm, with the characteristics of strong adaptability, high accuracy, and structural complexity, has become more and more popular, but it has not yet been used in astronomy. In order to solve the problem that the star/galaxy classification accuracy is high for the bright source set, but low for the faint source set of the Sloan Digital Sky Survey (SDSS) data, we introduced the new deep learning algorithm, namely the SDA (stacked denoising autoencoder) neural network and the dropout fine-tuning technique, which can greatly improve the robustness and antinoise performance. We randomly selected respectively the bright source sets and faint source sets from the SDSS DR12 and DR7 data with spectroscopic measurements, and made preprocessing on them. Then, we randomly selected respectively the training sets and testing sets without replacement from the bright source sets and faint source sets. At last, using these training sets we made the training to obtain the SDA models of the bright sources and faint sources in the SDSS DR7 and DR12, respectively. We compared the test result of the SDA model on the DR12 testing set with the test results of the Library for Support Vector Machines (LibSVM), J48 decision tree, Logistic Model Tree (LMT), Support Vector Machine (SVM), Logistic Regression, and Decision Stump algorithm, and compared the test result of the SDA model on the DR7 testing set with the test results of six kinds of decision trees. The experiments show that the SDA has a better classification accuracy than other machine learning algorithms for the faint source sets of DR7 and DR12. Especially, when the completeness function is used as the evaluation index, compared with the decision tree algorithms, the correctness rate of SDA has improved about 15% for the faint source set of SDSS-DR7.

  4. A retrospective analysis to identify the factors affecting infection in patients undergoing chemotherapy.

    PubMed

    Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung

    2015-12-01

    This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.

  5. Non-monophyly and intricate morphological evolution within the avian family Cettiidae revealed by multilocus analysis of a taxonomically densely sampled dataset

    PubMed Central

    2011-01-01

    Background The avian family Cettiidae, including the genera Cettia, Urosphena, Tesia, Abroscopus and Tickellia and Orthotomus cucullatus, has recently been proposed based on analysis of a small number of loci and species. The close relationship of most of these taxa was unexpected, and called for a comprehensive study based on multiple loci and dense taxon sampling. In the present study, we infer the relationships of all except one of the species in this family using one mitochondrial and three nuclear loci. We use traditional gene tree methods (Bayesian inference, maximum likelihood bootstrapping, parsimony bootstrapping), as well as a recently developed Bayesian species tree approach (*BEAST) that accounts for lineage sorting processes that might produce discordance between gene trees. We also analyse mitochondrial DNA for a larger sample, comprising multiple individuals and a large number of subspecies of polytypic species. Results There are many topological incongruences among the single-locus trees, although none of these is strongly supported. The multi-locus tree inferred using concatenated sequences and the species tree agree well with each other, and are overall well resolved and well supported by the data. The main discrepancy between these trees concerns the most basal split. Both methods infer the genus Cettia to be highly non-monophyletic, as it is scattered across the entire family tree. Deep intraspecific divergences are revealed, and one or two species and one subspecies are inferred to be non-monophyletic (differences between methods). Conclusions The molecular phylogeny presented here is strongly inconsistent with the traditional, morphology-based classification. The remarkably high degree of non-monophyly in the genus Cettia is likely to be one of the most extraordinary examples of misconceived relationships in an avian genus. The phylogeny suggests instances of parallel evolution, as well as highly unequal rates of morphological divergence in different lineages. This complex morphological evolution apparently misled earlier taxonomists. These results underscore the well-known but still often neglected problem of basing classifications on overall morphological similarity. Based on the molecular data, a revised taxonomy is proposed. Although the traditional and species tree methods inferred much the same tree in the present study, the assumption by species tree methods that all species are monophyletic is a limitation in these methods, as some currently recognized species might have more complex histories. PMID:22142197

  6. A simple and robust classification tree for differentiation between benign and malignant lesions in MR-mammography.

    PubMed

    Baltzer, Pascal A T; Dietzel, Matthias; Kaiser, Werner A

    2013-08-01

    In the face of multiple available diagnostic criteria in MR-mammography (MRM), a practical algorithm for lesion classification is needed. Such an algorithm should be as simple as possible and include only important independent lesion features to differentiate benign from malignant lesions. This investigation aimed to develop a simple classification tree for differential diagnosis in MRM. A total of 1,084 lesions in standardised MRM with subsequent histological verification (648 malignant, 436 benign) were investigated. Seventeen lesion criteria were assessed by 2 readers in consensus. Classification analysis was performed using the chi-squared automatic interaction detection (CHAID) method. Results include the probability for malignancy for every descriptor combination in the classification tree. A classification tree incorporating 5 lesion descriptors with a depth of 3 ramifications (1, root sign; 2, delayed enhancement pattern; 3, border, internal enhancement and oedema) was calculated. Of all 1,084 lesions, 262 (40.4 %) and 106 (24.3 %) could be classified as malignant and benign with an accuracy above 95 %, respectively. Overall diagnostic accuracy was 88.4 %. The classification algorithm reduced the number of categorical descriptors from 17 to 5 (29.4 %), resulting in a high classification accuracy. More than one third of all lesions could be classified with accuracy above 95 %. • A practical algorithm has been developed to classify lesions found in MR-mammography. • A simple decision tree consisting of five criteria reaches high accuracy of 88.4 %. • Unique to this approach, each classification is associated with a diagnostic certainty. • Diagnostic certainty of greater than 95 % is achieved in 34 % of all cases.

  7. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    NASA Astrophysics Data System (ADS)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  8. Five centuries of Central European temperature extremes reconstructed from tree-ring density and documentary evidence

    NASA Astrophysics Data System (ADS)

    Battipaglia, G.; Frank, D.; Buentgen, U.; Dobrovolný, P.; Brázdil, R.; Pfister, C.; Esper, J.

    2009-09-01

    In this project three different summer temperature sensitive tree-ring chronologies across the European Alpine region were compiled and analyzed to make a calendar of extreme warm and cold summers. We identified 100 extreme events during the past millennium from the tree ring data, and 44 extreme years during the 1550-2003 period based upon tree-ring, documentary and instrumental evidence. Comparisons with long instrumental series and documentary evidence verify the tree-ring extremes and indicate the possibility to use this dataset towards a better understanding of the characteristics prior to the instrumental period. Potential links between the occurrence of extreme events over Alps and anomalous large-scale patterns were explored and indicate that the average pattern of the 20 warmest summers (over the 1700-2002 period) describes maximum positive anomalies over Central Europe, whereas the average pattern of the 20 coldest summers shows maximum negative anomalies over Western Europe. Challenges with the present approach included determining an appropriate classification scheme for extreme events and the development of a methodology able to identify and characterize the occurrence of extreme episodes back in time. As a future step, our approach will be extended to help verify the sparse documentary data from the beginning of the past millennium and will be used in conjunction with climate models to assess model capabilities in reproducing characteristics of temperature extremes.

  9. Antidepressant Use After Aneurysmal Subarachnoid Hemorrhage: A Population-Based Case-Control Study.

    PubMed

    Huttunen, Jukka; Lindgren, Antti; Kurki, Mitja I; Huttunen, Terhi; Frösen, Juhana; von Und Zu Fraunberg, Mikael; Koivisto, Timo; Kälviäinen, Reetta; Räikkönen, Katri; Viinamäki, Heimo; Jääskeläinen, Juha E; Immonen, Arto

    2016-09-01

    To elucidate the predictors of antidepressant use after subarachnoid hemorrhage from saccular intracranial aneurysm (sIA-SAH) in a population-based cohort with matched controls. The Kuopio sIA database includes all unruptured and ruptured sIA cases admitted to the Kuopio University Hospital from its defined catchment population in Eastern Finland, with 3 matched controls for each patient. The use of all prescribed medicines has been fused from the Finnish national registry of prescribed medicines. In the present study, 2 or more purchases of antidepressant medication indicated antidepressant use. The risk factors of the antidepressant use were analyzed in 940 patients alive 12 months after sIA-SAH, and the classification tree analysis was used to create a predicting model for antidepressant use after sIA-SAH. The 940 12-month survivors of sIA-SAH had significantly more antidepressant use (odds ratio, 2.6; 95% confidence interval, 2.2-3.1) than their 2676 matched controls (29% versus 14%). Classification tree analysis, based on independent risk factors, was used for the best prediction model of antidepressant use after sIA-SAH. Modified Rankin Scale until 12 months was the most potent predictor, followed by condition (Hunt and Hess Scale) and age on admission for sIA-SAH. The sIA-SAH survivors use significantly more often antidepressants, indicative of depression, than their matched population controls. Even with a seemingly good recovery (modified Rankin Scale score, 0) at 12 months after sIA-SAH, there is a significant risk of depression requiring antidepressant medication. © 2016 American Heart Association, Inc.

  10. Habitat features and predictive habitat modeling for the Colorado chipmunk in southern New Mexico

    USGS Publications Warehouse

    Rivieccio, M.; Thompson, B.C.; Gould, W.R.; Boykin, K.G.

    2003-01-01

    Two subspecies of Colorado chipmunk (state threatened and federal species of concern) occur in southern New Mexico: Tamias quadrivittatus australis in the Organ Mountains and T. q. oscuraensis in the Oscura Mountains. We developed a GIS model of potentially suitable habitat based on vegetation and elevation features, evaluated site classifications of the GIS model, and determined vegetation and terrain features associated with chipmunk occurrence. We compared GIS model classifications with actual vegetation and elevation features measured at 37 sites. At 60 sites we measured 18 habitat variables regarding slope, aspect, tree species, shrub species, and ground cover. We used logistic regression to analyze habitat variables associated with chipmunk presence/absence. All (100%) 37 sample sites (28 predicted suitable, 9 predicted unsuitable) were classified correctly by the GIS model regarding elevation and vegetation. For 28 sites predicted suitable by the GIS model, 18 sites (64%) appeared visually suitable based on habitat variables selected from logistic regression analyses, of which 10 sites (36%) were specifically predicted as suitable habitat via logistic regression. We detected chipmunks at 70% of sites deemed suitable via the logistic regression models. Shrub cover, tree density, plant proximity, presence of logs, and presence of rock outcrop were retained in the logistic model for the Oscura Mountains; litter, shrub cover, and grass cover were retained in the logistic model for the Organ Mountains. Evaluation of predictive models illustrates the need for multi-stage analyses to best judge performance. Microhabitat analyses indicate prospective needs for different management strategies between the subspecies. Sensitivities of each population of the Colorado chipmunk to natural and prescribed fire suggest that partial burnings of areas inhabited by Colorado chipmunks in southern New Mexico may be beneficial. These partial burnings may later help avoid a fire that could substantially reduce habitat of chipmunks over a mountain range.

  11. A survey of supervised machine learning models for mobile-phone based pathogen identification and classification

    NASA Astrophysics Data System (ADS)

    Ceylan Koydemir, Hatice; Feng, Steve; Liang, Kyle; Nadkarni, Rohan; Tseng, Derek; Benien, Parul; Ozcan, Aydogan

    2017-03-01

    Giardia lamblia causes a disease known as giardiasis, which results in diarrhea, abdominal cramps, and bloating. Although conventional pathogen detection methods used in water analysis laboratories offer high sensitivity and specificity, they are time consuming, and need experts to operate bulky equipment and analyze the samples. Here we present a field-portable and cost-effective smartphone-based waterborne pathogen detection platform that can automatically classify Giardia cysts using machine learning. Our platform enables the detection and quantification of Giardia cysts in one hour, including sample collection, labeling, filtration, and automated counting steps. We evaluated the performance of three prototypes using Giardia-spiked water samples from different sources (e.g., reagent-grade, tap, non-potable, and pond water samples). We populated a training database with >30,000 cysts and estimated our detection sensitivity and specificity using 20 different classifier models, including decision trees, nearest neighbor classifiers, support vector machines (SVMs), and ensemble classifiers, and compared their speed of training and classification, as well as predicted accuracies. Among them, cubic SVM, medium Gaussian SVM, and bagged-trees were the most promising classifier types with accuracies of 94.1%, 94.2%, and 95%, respectively; we selected the latter as our preferred classifier for the detection and enumeration of Giardia cysts that are imaged using our mobile-phone fluorescence microscope. Without the need for any experts or microbiologists, this field-portable pathogen detection platform can present a useful tool for water quality monitoring in resource-limited-settings.

  12. Modified TAROT for cross-selling personal financial products

    NASA Astrophysics Data System (ADS)

    Tee, Ya-Mei; LEE, Lai-Soon; LEE, Chew-Ging; SEOW, Hsin-Vonn

    2014-09-01

    The Top Application characteristics Remainder Offer characteristics Tree (TAROT) was first introduced in 2007. This is a modified Classification and Regression Trees (CART) used to help decide which question(s) to ask potential applicants to customise an offer of a personal financial product so that it would have a high probability of take up. In this piece of work the authors are presenting, they have further modified the TAROT to cross TAROT, using its properties and modeling steps to deal with the issue of cross-selling. Since the bank already has ready customers, it would be ideal to cross-sell the financial products seeing that one can ask one (or more) further question(s) based on the initial offer to identify and customise another financial product to offer.

  13. Microscopic saw mark analysis: an empirical approach.

    PubMed

    Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles

    2015-01-01

    Microscopic saw mark analysis is a well published and generally accepted qualitative analytical method. However, little research has focused on identifying and mitigating potential sources of error associated with the method. The presented study proposes the use of classification trees and random forest classifiers as an optimal, statistically sound approach to mitigate the potential for error of variability and outcome error in microscopic saw mark analysis. The statistical model was applied to 58 experimental saw marks created with four types of saws. The saw marks were made in fresh human femurs obtained through anatomical gift and were analyzed using a Keyence digital microscope. The statistical approach weighed the variables based on discriminatory value and produced decision trees with an associated outcome error rate of 8.62-17.82%. © 2014 American Academy of Forensic Sciences.

  14. Applying an Ensemble Classification Tree Approach to the Prediction of Completion of a 12-Step Facilitation Intervention with Stimulant Abusers

    PubMed Central

    Doyle, Suzanne R.; Donovan, Dennis M.

    2014-01-01

    Aims The purpose of this study was to explore the selection of predictor variables in the evaluation of drug treatment completion using an ensemble approach with classification trees. The basic methodology is reviewed and the subagging procedure of random subsampling is applied. Methods Among 234 individuals with stimulant use disorders randomized to a 12-Step facilitative intervention shown to increase stimulant use abstinence, 67.52% were classified as treatment completers. A total of 122 baseline variables were used to identify factors associated with completion. Findings The number of types of self-help activity involvement prior to treatment was the predominant predictor. Other effective predictors included better coping self-efficacy for substance use in high-risk situations, more days of prior meeting attendance, greater acceptance of the Disease model, higher confidence for not resuming use following discharge, lower ASI Drug and Alcohol composite scores, negative urine screens for cocaine or marijuana, and fewer employment problems. Conclusions The application of an ensemble subsampling regression tree method utilizes the fact that classification trees are unstable but, on average, produce an improved prediction of the completion of drug abuse treatment. The results support the notion there are early indicators of treatment completion that may allow for modification of approaches more tailored to fitting the needs of individuals and potentially provide more successful treatment engagement and improved outcomes. PMID:25134038

  15. Multi-Layer Identification of Highly-Potent ABCA1 Up-Regulators Targeting LXRβ Using Multiple QSAR Modeling, Structural Similarity Analysis, and Molecular Docking.

    PubMed

    Chen, Meimei; Yang, Fafu; Kang, Jie; Yang, Xuemei; Lai, Xinmei; Gao, Yuxing

    2016-11-29

    In this study, in silico approaches, including multiple QSAR modeling, structural similarity analysis, and molecular docking, were applied to develop QSAR classification models as a fast screening tool for identifying highly-potent ABCA1 up-regulators targeting LXRβ based on a series of new flavonoids. Initially, four modeling approaches, including linear discriminant analysis, support vector machine, radial basis function neural network, and classification and regression trees, were applied to construct different QSAR classification models. The statistics results indicated that these four kinds of QSAR models were powerful tools for screening highly potent ABCA1 up-regulators. Then, a consensus QSAR model was developed by combining the predictions from these four models. To discover new ABCA1 up-regulators at maximum accuracy, the compounds in the ZINC database that fulfilled the requirement of structural similarity of 0.7 compared to known potent ABCA1 up-regulator were subjected to the consensus QSAR model, which led to the discovery of 50 compounds. Finally, they were docked into the LXRβ binding site to understand their role in up-regulating ABCA1 expression. The excellent binding modes and docking scores of 10 hit compounds suggested they were highly-potent ABCA1 up-regulators targeting LXRβ. Overall, this study provided an effective strategy to discover highly potent ABCA1 up-regulators.

  16. Decision-Tree Analysis for Predicting First-Time Pass/Fail Rates for the NCLEX-RN® in Associate Degree Nursing Students.

    PubMed

    Chen, Hsiu-Chin; Bennett, Sean

    2016-08-01

    Little evidence shows the use of decision-tree algorithms in identifying predictors and analyzing their associations with pass rates for the NCLEX-RN(®) in associate degree nursing students. This longitudinal and retrospective cohort study investigated whether a decision-tree algorithm could be used to develop an accurate prediction model for the students' passing or failing the NCLEX-RN. This study used archived data from 453 associate degree nursing students in a selected program. The chi-squared automatic interaction detection analysis of the decision trees module was used to examine the effect of the collected predictors on passing/failing the NCLEX-RN. The actual percentage scores of Assessment Technologies Institute®'s RN Comprehensive Predictor(®) accurately identified students at risk of failing. The classification model correctly classified 92.7% of the students for passing. This study applied the decision-tree model to analyze a sequence database for developing a prediction model for early remediation in preparation for the NCLEXRN. [J Nurs Educ. 2016;55(8):454-457.]. Copyright 2016, SLACK Incorporated.

  17. A comprehensive simulation study on classification of RNA-Seq data.

    PubMed

    Zararsız, Gökmen; Goksuluk, Dincer; Korkmaz, Selcuk; Eldem, Vahap; Zararsiz, Gozde Erturk; Duru, Izzet Parug; Ozturk, Ahmet

    2017-01-01

    RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of gene-expression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data closer to microarrays and apply microarray-based classifiers. In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such as overdispersion, sample size, number of genes, number of classes, differential-expression rate, and the transformation method on model performances. A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM classifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html.

  18. Classification and regression trees

    Treesearch

    G. G. Moisen

    2008-01-01

    Frequently, ecologists are interested in exploring ecological relationships, describing patterns and processes, or making spatial or temporal predictions. These purposes often can be addressed by modeling the relationship between some outcome or response and a set of features or explanatory variables.

  19. Boosting bonsai trees for handwritten/printed text discrimination

    NASA Astrophysics Data System (ADS)

    Ricquebourg, Yann; Raymond, Christian; Poirriez, Baptiste; Lemaitre, Aurélie; Coüasnon, Bertrand

    2013-12-01

    Boosting over decision-stumps proved its efficiency in Natural Language Processing essentially with symbolic features, and its good properties (fast, few and not critical parameters, not sensitive to over-fitting) could be of great interest in the numeric world of pixel images. In this article we investigated the use of boosting over small decision trees, in image classification processing, for the discrimination of handwritten/printed text. Then, we conducted experiments to compare it to usual SVM-based classification revealing convincing results with very close performance, but with faster predictions and behaving far less as a black-box. Those promising results tend to make use of this classifier in more complex recognition tasks like multiclass problems.

  20. Modern modeling techniques had limited external validity in predicting mortality from traumatic brain injury.

    PubMed

    van der Ploeg, Tjeerd; Nieboer, Daan; Steyerberg, Ewout W

    2016-10-01

    Prediction of medical outcomes may potentially benefit from using modern statistical modeling techniques. We aimed to externally validate modeling strategies for prediction of 6-month mortality of patients suffering from traumatic brain injury (TBI) with predictor sets of increasing complexity. We analyzed individual patient data from 15 different studies including 11,026 TBI patients. We consecutively considered a core set of predictors (age, motor score, and pupillary reactivity), an extended set with computed tomography scan characteristics, and a further extension with two laboratory measurements (glucose and hemoglobin). With each of these sets, we predicted 6-month mortality using default settings with five statistical modeling techniques: logistic regression (LR), classification and regression trees, random forests (RFs), support vector machines (SVM) and neural nets. For external validation, a model developed on one of the 15 data sets was applied to each of the 14 remaining sets. This process was repeated 15 times for a total of 630 validations. The area under the receiver operating characteristic curve (AUC) was used to assess the discriminative ability of the models. For the most complex predictor set, the LR models performed best (median validated AUC value, 0.757), followed by RF and support vector machine models (median validated AUC value, 0.735 and 0.732, respectively). With each predictor set, the classification and regression trees models showed poor performance (median validated AUC value, <0.7). The variability in performance across the studies was smallest for the RF- and LR-based models (inter quartile range for validated AUC values from 0.07 to 0.10). In the area of predicting mortality from TBI, nonlinear and nonadditive effects are not pronounced enough to make modern prediction methods beneficial. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Blood oxygen level dependent magnetic resonance imaging for detecting pathological patterns in lupus nephritis patients: a preliminary study using a decision tree model.

    PubMed

    Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng

    2018-02-09

    Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P < 0.001) and inferior to that of the Logistic regression model (71.87% vs 78.71%, P < 0.001). The specificity of decision tree model was equivalent to that of the line discriminant model (63.87% vs 63.73%, P = 0.939) and higher than that of the logistic regression model (63.87% vs 38.0%, P < 0.001). The Area under the ROC curve (AUROCC) of the decision tree model was greater than that of the line discriminant model (0.765 vs 0.629, P < 0.001) and logistic regression model (0.765 vs 0.662, P < 0.001). BOLD MRI is a useful non-invasive imaging technique for the evaluation of lupus nephritis. Decision tree models constructed using functions of R2* values may facilitate the prediction of renal pathological patterns.

  2. Spatial and thematic assessment of object-based forest stand delineation using an OFA-matrix

    NASA Astrophysics Data System (ADS)

    Hernando, A.; Tiede, D.; Albrecht, F.; Lang, S.

    2012-10-01

    The delineation and classification of forest stands is a crucial aspect of forest management. Object-based image analysis (OBIA) can be used to produce detailed maps of forest stands from either orthophotos or very high resolution satellite imagery. However, measures are then required for evaluating and quantifying both the spatial and thematic accuracy of the OBIA output. In this paper we present an approach for delineating forest stands and a new Object Fate Analysis (OFA) matrix for accuracy assessment. A two-level object-based orthophoto analysis was first carried out to delineate stands on the Dehesa Boyal public land in central Spain (Avila Province). Two structural features were first created for use in class modelling, enabling good differentiation between stands: a relational tree cover cluster feature, and an arithmetic ratio shadow/tree feature. We then extended the OFA comparison approach with an OFA-matrix to enable concurrent validation of thematic and spatial accuracies. Its diagonal shows the proportion of spatial and thematic coincidence between a reference data and the corresponding classification. New parameters for Spatial Thematic Loyalty (STL), Spatial Thematic Loyalty Overall (STLOVERALL) and Maximal Interfering Object (MIO) are introduced to summarise the OFA-matrix accuracy assessment. A stands map generated by OBIA (classification data) was compared with a map of the same area produced from photo interpretation and field data (reference data). In our example the OFA-matrix results indicate good spatial and thematic accuracies (>65%) for all stand classes except for the shrub stands (31.8%), and a good STLOVERALL (69.8%). The OFA-matrix has therefore been shown to be a valid tool for OBIA accuracy assessment.

  3. Raster Vs. Point Cloud LiDAR Data Classification

    NASA Astrophysics Data System (ADS)

    El-Ashmawy, N.; Shaker, A.

    2014-09-01

    Airborne Laser Scanning systems with light detection and ranging (LiDAR) technology is one of the fast and accurate 3D point data acquisition techniques. Generating accurate digital terrain and/or surface models (DTM/DSM) is the main application of collecting LiDAR range data. Recently, LiDAR range and intensity data have been used for land cover classification applications. Data range and Intensity, (strength of the backscattered signals measured by the LiDAR systems), are affected by the flying height, the ground elevation, scanning angle and the physical characteristics of the objects surface. These effects may lead to uneven distribution of point cloud or some gaps that may affect the classification process. Researchers have investigated the conversion of LiDAR range point data to raster image for terrain modelling. Interpolation techniques have been used to achieve the best representation of surfaces, and to fill the gaps between the LiDAR footprints. Interpolation methods are also investigated to generate LiDAR range and intensity image data for land cover classification applications. In this paper, different approach has been followed to classifying the LiDAR data (range and intensity) for land cover mapping. The methodology relies on the classification of the point cloud data based on their range and intensity and then converted the classified points into raster image. The gaps in the data are filled based on the classes of the nearest neighbour. Land cover maps are produced using two approaches using: (a) the conventional raster image data based on point interpolation; and (b) the proposed point data classification. A study area covering an urban district in Burnaby, British Colombia, Canada, is selected to compare the results of the two approaches. Five different land cover classes can be distinguished in that area: buildings, roads and parking areas, trees, low vegetation (grass), and bare soil. The results show that an improvement of around 10 % in the classification results can be achieved by using the proposed approach.

  4. Classification and regression tree (CART) analyses of genomic signatures reveal sets of tetramers that discriminate temperature optima of archaea and bacteria

    PubMed Central

    Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.

    2008-01-01

    Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742

  5. The application of data mining techniques to oral cancer prognosis.

    PubMed

    Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan

    2015-05-01

    This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.

  6. Building interpretable predictive models for pediatric hospital readmission using Tree-Lasso logistic regression.

    PubMed

    Jovanovic, Milos; Radovanovic, Sandro; Vukicevic, Milan; Van Poucke, Sven; Delibasic, Boris

    2016-09-01

    Quantification and early identification of unplanned readmission risk have the potential to improve the quality of care during hospitalization and after discharge. However, high dimensionality, sparsity, and class imbalance of electronic health data and the complexity of risk quantification, challenge the development of accurate predictive models. Predictive models require a certain level of interpretability in order to be applicable in real settings and create actionable insights. This paper aims to develop accurate and interpretable predictive models for readmission in a general pediatric patient population, by integrating a data-driven model (sparse logistic regression) and domain knowledge based on the international classification of diseases 9th-revision clinical modification (ICD-9-CM) hierarchy of diseases. Additionally, we propose a way to quantify the interpretability of a model and inspect the stability of alternative solutions. The analysis was conducted on >66,000 pediatric hospital discharge records from California, State Inpatient Databases, Healthcare Cost and Utilization Project between 2009 and 2011. We incorporated domain knowledge based on the ICD-9-CM hierarchy in a data driven, Tree-Lasso regularized logistic regression model, providing the framework for model interpretation. This approach was compared with traditional Lasso logistic regression resulting in models that are easier to interpret by fewer high-level diagnoses, with comparable prediction accuracy. The results revealed that the use of a Tree-Lasso model was as competitive in terms of accuracy (measured by area under the receiver operating characteristic curve-AUC) as the traditional Lasso logistic regression, but integration with the ICD-9-CM hierarchy of diseases provided more interpretable models in terms of high-level diagnoses. Additionally, interpretations of models are in accordance with existing medical understanding of pediatric readmission. Best performing models have similar performances reaching AUC values 0.783 and 0.779 for traditional Lasso and Tree-Lasso, respectfully. However, information loss of Lasso models is 0.35 bits higher compared to Tree-Lasso model. We propose a method for building predictive models applicable for the detection of readmission risk based on Electronic Health records. Integration of domain knowledge (in the form of ICD-9-CM taxonomy) and a data-driven, sparse predictive algorithm (Tree-Lasso Logistic Regression) resulted in an increase of interpretability of the resulting model. The models are interpreted for the readmission prediction problem in general pediatric population in California, as well as several important subpopulations, and the interpretations of models comply with existing medical understanding of pediatric readmission. Finally, quantitative assessment of the interpretability of the models is given, that is beyond simple counts of selected low-level features. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests.

    PubMed

    Hu, Chen; Steingrimsson, Jon Arni

    2018-01-01

    A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.

  8. Using classification tree modelling to investigate drug prescription practices at health facilities in rural Tanzania.

    PubMed

    Kajungu, Dan K; Selemani, Majige; Masanja, Irene; Baraka, Amuri; Njozi, Mustafa; Khatib, Rashid; Dodoo, Alexander N; Binka, Fred; Macq, Jean; D'Alessandro, Umberto; Speybroeck, Niko

    2012-09-05

    Drug prescription practices depend on several factors related to the patient, health worker and health facilities. A better understanding of the factors influencing prescription patterns is essential to develop strategies to mitigate the negative consequences associated with poor practices in both the public and private sectors. A cross-sectional study was conducted in rural Tanzania among patients attending health facilities, and health workers. Patients, health workers and health facilities-related factors with the potential to influence drug prescription patterns were used to build a model of key predictors. Standard data mining methodology of classification tree analysis was used to define the importance of the different factors on prescription patterns. This analysis included 1,470 patients and 71 health workers practicing in 30 health facilities. Patients were mostly treated in dispensaries. Twenty two variables were used to construct two classification tree models: one for polypharmacy (prescription of ≥3 drugs) on a single clinic visit and one for co-prescription of artemether-lumefantrine (AL) with antibiotics. The most important predictor of polypharmacy was the diagnosis of several illnesses. Polypharmacy was also associated with little or no supervision of the health workers, administration of AL and private facilities. Co-prescription of AL with antibiotics was more frequent in children under five years of age and the other important predictors were transmission season, mode of diagnosis and the location of the health facility. Standard data mining methodology is an easy-to-implement analytical approach that can be useful for decision-making. Polypharmacy is mainly due to the diagnosis of multiple illnesses.

  9. Tree mortality based fire severity classification for forest inventories: A Pacific Northwest national forests example

    Treesearch

    Thomas R. Whittier; Andrew N. Gray

    2016-01-01

    Determining how the frequency, severity, and extent of forest fires are changing in response to changes in management and climate is a key concern in many regions where fire is an important natural disturbance. In the USA the only national-scale fire severity classification uses satellite image changedetection to produce maps for large (>400 ha) fires, and is...

  10. Simulation of Sentinel-2A Red Edge Bands with RPAS Based Multispectral Data

    NASA Astrophysics Data System (ADS)

    Davids, Corine; Storvold, Rune; Haarpaintner, Jorg; Arnason, Kolbeinn

    2016-08-01

    Very high spatial and spectral resolution multispectral data was collected over the Hallormstađur test site in eastern Iceland using a fixed wing remotely piloted aerial system as part of the EU FP7 project North State (www.northstatefp7.eu). The North State project uses forest variable estimates derived from optical and radar satellite data as either input or validation for carbon flux models. The RPAS data from the Hallormsstađur forest test site in Iceland is here used to simulate Landsat and Sentinel-2A data and to explore the advantages of the new Sentinel-2A red edge bands for forest vegetation mapping. Simple supervised classification shows that the inclusion of the red edge bands improves the tree species classification considerably.

  11. [Analysis of dietary pattern and diabetes mellitus influencing factors identified by classification tree model in adults of Fujian].

    PubMed

    Yu, F L; Ye, Y; Yan, Y S

    2017-05-10

    Objective: To find out the dietary patterns and explore the relationship between environmental factors (especially dietary patterns) and diabetes mellitus in the adults of Fujian. Methods: Multi-stage sampling method were used to survey residents aged ≥18 years by questionnaire, physical examination and laboratory detection in 10 disease surveillance points in Fujian. Factor analysis was used to identify the dietary patterns, while logistic regression model was applied to analyze relationship between dietary patterns and diabetes mellitus, and classification tree model was adopted to identify the influencing factors for diabetes mellitus. Results: There were four dietary patterns in the population, including meat, plant, high-quality protein, and fried food and beverages patterns. The result of logistic analysis showed that plant pattern, which has higher factor loading of fresh fruit-vegetables and cereal-tubers, was a protective factor for non-diabetes mellitus. The risk of diabetes mellitus in the population at T2 and T3 levels of factor score were 0.727 (95 %CI: 0.561-0.943) times and 0.736 (95 %CI : 0.573-0.944) times higher, respectively, than those whose factor score was in lowest quartile. Thirteen influencing factors and eleven group at high-risk for diabetes mellitus were identified by classification tree model. The influencing factors were dyslipidemia, age, family history of diabetes, hypertension, physical activity, career, sex, sedentary time, abdominal adiposity, BMI, marital status, sleep time and high-quality protein pattern. Conclusion: There is a close association between dietary patterns and diabetes mellitus. It is necessary to promote healthy and reasonable diet, strengthen the monitoring and control of blood lipids, blood pressure and body weight, and have good lifestyle for the prevention and control of diabetes mellitus.

  12. A Pruning Neural Network Model in Credit Classification Analysis

    PubMed Central

    Tang, Yajiao; Ji, Junkai; Dai, Hongwei; Yu, Yang; Todo, Yuki

    2018-01-01

    Nowadays, credit classification models are widely applied because they can help financial decision-makers to handle credit classification issues. Among them, artificial neural networks (ANNs) have been widely accepted as the convincing methods in the credit industry. In this paper, we propose a pruning neural network (PNN) and apply it to solve credit classification problem by adopting the well-known Australian and Japanese credit datasets. The model is inspired by synaptic nonlinearity of a dendritic tree in a biological neural model. And it is trained by an error back-propagation algorithm. The model is capable of realizing a neuronal pruning function by removing the superfluous synapses and useless dendrites and forms a tidy dendritic morphology at the end of learning. Furthermore, we utilize logic circuits (LCs) to simulate the dendritic structures successfully which makes PNN be implemented on the hardware effectively. The statistical results of our experiments have verified that PNN obtains superior performance in comparison with other classical algorithms in terms of accuracy and computational efficiency. PMID:29606961

  13. The Pediatric Home Care/Expenditure Classification Model (P/ECM): A Home Care Case-Mix Model for Children Facing Special Health Care Challenges.

    PubMed

    Phillips, Charles D

    2015-01-01

    Case-mix classification and payment systems help assure that persons with similar needs receive similar amounts of care resources, which is a major equity concern for consumers, providers, and programs. Although health service programs for adults regularly use case-mix payment systems, programs providing health services to children and youth rarely use such models. This research utilized Medicaid home care expenditures and assessment data on 2,578 children receiving home care in one large state in the USA. Using classification and regression tree analyses, a case-mix model for long-term pediatric home care was developed. The Pediatric Home Care/Expenditure Classification Model (P/ECM) grouped children and youth in the study sample into 24 groups, explaining 41% of the variance in annual home care expenditures. The P/ECM creates the possibility of a more equitable, and potentially more effective, allocation of home care resources among children and youth facing serious health care challenges.

  14. The Pediatric Home Care/Expenditure Classification Model (P/ECM): A Home Care Case-Mix Model for Children Facing Special Health Care Challenges

    PubMed Central

    Phillips, Charles D.

    2015-01-01

    Case-mix classification and payment systems help assure that persons with similar needs receive similar amounts of care resources, which is a major equity concern for consumers, providers, and programs. Although health service programs for adults regularly use case-mix payment systems, programs providing health services to children and youth rarely use such models. This research utilized Medicaid home care expenditures and assessment data on 2,578 children receiving home care in one large state in the USA. Using classification and regression tree analyses, a case-mix model for long-term pediatric home care was developed. The Pediatric Home Care/Expenditure Classification Model (P/ECM) grouped children and youth in the study sample into 24 groups, explaining 41% of the variance in annual home care expenditures. The P/ECM creates the possibility of a more equitable, and potentially more effective, allocation of home care resources among children and youth facing serious health care challenges. PMID:26740744

  15. Forest tree species discrimination in western Himalaya using EO-1 Hyperion

    NASA Astrophysics Data System (ADS)

    George, Rajee; Padalia, Hitendra; Kushwaha, S. P. S.

    2014-05-01

    The information acquired in the narrow bands of hyperspectral remote sensing data has potential to capture plant species spectral variability, thereby improving forest tree species mapping. This study assessed the utility of spaceborne EO-1 Hyperion data in discrimination and classification of broadleaved evergreen and conifer forest tree species in western Himalaya. The pre-processing of 242 bands of Hyperion data resulted into 160 noise-free and vertical stripe corrected reflectance bands. Of these, 29 bands were selected through step-wise exclusion of bands (Wilk's Lambda). Spectral Angle Mapper (SAM) and Support Vector Machine (SVM) algorithms were applied to the selected bands to assess their effectiveness in classification. SVM was also applied to broadband data (Landsat TM) to compare the variation in classification accuracy. All commonly occurring six gregarious tree species, viz., white oak, brown oak, chir pine, blue pine, cedar and fir in western Himalaya could be effectively discriminated. SVM produced a better species classification (overall accuracy 82.27%, kappa statistic 0.79) than SAM (overall accuracy 74.68%, kappa statistic 0.70). It was noticed that classification accuracy achieved with Hyperion bands was significantly higher than Landsat TM bands (overall accuracy 69.62%, kappa statistic 0.65). Study demonstrated the potential utility of narrow spectral bands of Hyperion data in discriminating tree species in a hilly terrain.

  16. Identification and Mapping of Tree Species in Urban Areas Using WORLDVIEW-2 Imagery

    NASA Astrophysics Data System (ADS)

    Mustafa, Y. T.; Habeeb, H. N.; Stein, A.; Sulaiman, F. Y.

    2015-10-01

    Monitoring and mapping of urban trees are essential to provide urban forestry authorities with timely and consistent information. Modern techniques increasingly facilitate these tasks, but require the development of semi-automatic tree detection and classification methods. In this article, we propose an approach to delineate and map the crown of 15 tree species in the city of Duhok, Kurdistan Region of Iraq using WorldView-2 (WV-2) imagery. A tree crown object is identified first and is subsequently delineated as an image object (IO) using vegetation indices and texture measurements. Next, three classification methods: Maximum Likelihood, Neural Network, and Support Vector Machine were used to classify IOs using selected IO features. The best results are obtained with Support Vector Machine classification that gives the best map of urban tree species in Duhok. The overall accuracy was between 60.93% to 88.92% and κ-coefficient was between 0.57 to 0.75. We conclude that fifteen tree species were identified and mapped at a satisfactory accuracy in urban areas of this study.

  17. Can Statistical Machine Learning Algorithms Help for Classification of Obstructive Sleep Apnea Severity to Optimal Utilization of Polysomnography Resources?

    PubMed

    Bozkurt, Selen; Bostanci, Asli; Turhan, Murat

    2017-08-11

    The goal of this study is to evaluate the results of machine learning methods for the classification of OSA severity of patients with suspected sleep disorder breathing as normal, mild, moderate and severe based on non-polysomnographic variables: 1) clinical data, 2) symptoms and 3) physical examination. In order to produce classification models for OSA severity, five different machine learning methods (Bayesian network, Decision Tree, Random Forest, Neural Networks and Logistic Regression) were trained while relevant variables and their relationships were derived empirically from observed data. Each model was trained and evaluated using 10-fold cross-validation and to evaluate classification performances of all methods, true positive rate (TPR), false positive rate (FPR), Positive Predictive Value (PPV), F measure and Area Under Receiver Operating Characteristics curve (ROC-AUC) were used. Results of 10-fold cross validated tests with different variable settings promisingly indicated that the OSA severity of suspected OSA patients can be classified, using non-polysomnographic features, with 0.71 true positive rate as the highest and, 0.15 false positive rate as the lowest, respectively. Moreover, the test results of different variables settings revealed that the accuracy of the classification models was significantly improved when physical examination variables were added to the model. Study results showed that machine learning methods can be used to estimate the probabilities of no, mild, moderate, and severe obstructive sleep apnea and such approaches may improve accurate initial OSA screening and help referring only the suspected moderate or severe OSA patients to sleep laboratories for the expensive tests.

  18. T-RMSD: a fine-grained, structure-based classification method and its application to the functional characterization of TNF receptors.

    PubMed

    Magis, Cedrik; Stricher, François; van der Sloot, Almer M; Serrano, Luis; Notredame, Cedric

    2010-07-16

    This study addresses the relation between structural and functional similarity in proteins. We introduce a novel method named tree based on root mean square deviation (T-RMSD), which uses distance RMSD (dRMSD) variations to build fine-grained structure-based classifications of proteins. The main improvement of the T-RMSD over similar methods, such as Dali, is its capacity to produce the equivalent of a bootstrap value for each cluster node. We validated our approach on two domain families studied extensively for their role in many biological and pathological pathways: the small GTPase RAS superfamily and the cysteine-rich domains (CRDs) associated with the tumor necrosis factor receptors (TNFRs) family. Our analysis showed that T-RMSD is able to automatically recover and refine existing classifications. In the case of the small GTPase ARF subfamily, T-RMSD can distinguish GTP- from GDP-bound states, while in the case of CRDs it can identify two new subgroups associated with well defined functional features (ligand binding and formation of ligand pre-assembly complex). We show how hidden Markov models (HMMs) can be built on these new groups and propose a methodology to use these models simultaneously in order to do fine-grained functional genomic annotation without known 3D structures. T-RMSD, an open source freeware incorporated in the T-Coffee package, is available online. 2010 Elsevier Ltd. All rights reserved.

  19. Comparison of SAM and OBIA as Tools for Lava Morphology Classification - A Case Study in Krafla, NE Iceland

    NASA Astrophysics Data System (ADS)

    Aufaristama, Muhammad; Hölbling, Daniel; Höskuldsson, Ármann; Jónsdóttir, Ingibjörg

    2017-04-01

    The Krafla volcanic system is part of the Icelandic North Volcanic Zone (NVZ). During Holocene, two eruptive events occurred in Krafla, 1724-1729 and 1975-1984. The last eruptive episode (1975-1984), known as the "Krafla Fires", resulted in nine volcanic eruption episodes. The total area covered by the lavas from this eruptive episode is 36 km2 and the volume is about 0.25-0.3 km3. Lava morphology is related to the characteristics of the surface morphology of a lava flow after solidification. The typical morphology of lava can be used as primary basis for the classification of lava flows when rheological properties cannot be directly observed during emplacement, and also for better understanding the behavior of lava flow models. Although mapping of lava flows in the field is relatively accurate such traditional methods are time consuming, especially when the lava covers large areas such as it is the case in Krafla. Semi-automatic mapping methods that make use of satellite remote sensing data allow for an efficient and fast mapping of lava morphology. In this study, two semi-automatic methods for lava morphology classification are presented and compared using Landsat 8 (30 m spatial resolution) and SPOT-5 (10 m spatial resolution) satellite images. For assessing the classification accuracy, the results from semi-automatic mapping were compared to the respective results from visual interpretation. On the one hand, the Spectral Angle Mapper (SAM) classification method was used. With this method an image is classified according to the spectral similarity between the image reflectance spectrums and the reference reflectance spectra. SAM successfully produced detailed lava surface morphology maps. However, the pixel-based approach partly leads to a salt-and-pepper effect. On the other hand, we applied the Random Forest (RF) classification method within an object-based image analysis (OBIA) framework. This statistical classifier uses a randomly selected subset of training samples to produce multiple decision trees. For final classification of pixels or - in the present case - image objects, the average of the class assignments probability predicted by the different decision trees is used. While the resulting OBIA classification of lava morphology types shows a high coincidence with the reference data, the approach is sensitive to the segmentation-derived image objects that constitute the base units for classification. Both semi-automatic methods produce reasonable results in the Krafla lava field, even if the identification of different pahoehoe and aa types of lava appeared to be difficult. The use of satellite remote sensing data shows a high potential for fast and efficient classification of lava morphology, particularly over large and inaccessible areas.

  20. Urban forest ecosystem analysis using fused airborne hyperspectral and lidar data

    NASA Astrophysics Data System (ADS)

    Alonzo, Mike Gerard

    Urban trees are strategically important in a city's effort to mitigate their carbon footprint, heat island effects, air pollution, and stormwater runoff. Currently, the most common method for quantifying urban forest structure and ecosystem function is through field plot sampling. However, taking intensive structural measurements on private properties throughout a city is difficult, and the outputs from sample inventories are not spatially explicit. The overarching goal of this dissertation is to develop methods for mapping urban forest structure and function using fused hyperspectral imagery and waveform lidar data at the individual tree crown scale. Urban forest ecosystem services estimated using the USDA Forest Service's i-Tree Eco (formerly UFORE) model are based largely on tree species and leaf area index (LAI). Accordingly, tree species were mapped in my Santa Barbara, California study area for 29 species comprising >80% of canopy. Crown-scale discriminant analysis methods were introduced for fusing Airborne Visible Infrared Imaging Spectrometry (AVIRIS) data with a suite of lidar structural metrics (e.g., tree height, crown porosity) to maximize classification accuracy in a complex environment. AVIRIS imagery was critical to achieving an overall species-level accuracy of 83.4% while lidar data was most useful for improving the discrimination of small and morphologically unique species. LAI was estimated at both the field-plot scale using laser penetration metrics and at the crown scale using allometry. Agreement of the former with photographic estimates of gap fraction and the latter with allometric estimates based on field measurements was examined. Results indicate that lidar may be used reasonably to measure LAI in an urban environment lacking in continuous canopy and characterized by high species diversity. Finally, urban ecosystem services such as carbon storage and building energy-use modification were analyzed through combination of aforementioned methods and the i-Tree Eco modeling framework. The remote sensing methods developed in this dissertation will allow researchers to more precisely constrain urban ecosystem spatial analyses and equip cities to better manage their urban forest resource.

  1. In silico prediction of toxicity of phenols to Tetrahymena pyriformis by using genetic algorithm and decision tree-based modeling approach.

    PubMed

    Abbasitabar, Fatemeh; Zare-Shahabadi, Vahid

    2017-04-01

    Risk assessment of chemicals is an important issue in environmental protection; however, there is a huge lack of experimental data for a large number of end-points. The experimental determination of toxicity of chemicals involves high costs and time-consuming process. In silico tools such as quantitative structure-toxicity relationship (QSTR) models, which are constructed on the basis of computational molecular descriptors, can predict missing data for toxic end-points for existing or even not yet synthesized chemicals. Phenol derivatives are known to be aquatic pollutants. With this background, we aimed to develop an accurate and reliable QSTR model for the prediction of toxicity of 206 phenols to Tetrahymena pyriformis. A multiple linear regression (MLR)-based QSTR was obtained using a powerful descriptor selection tool named Memorized_ACO algorithm. Statistical parameters of the model were 0.72 and 0.68 for R training 2 and R test 2 , respectively. To develop a high-quality QSTR model, classification and regression tree (CART) was employed. Two approaches were considered: (1) phenols were classified into different modes of action using CART and (2) the phenols in the training set were partitioned to several subsets by a tree in such a manner that in each subset, a high-quality MLR could be developed. For the first approach, the statistical parameters of the resultant QSTR model were improved to 0.83 and 0.75 for R training 2 and R test 2 , respectively. Genetic algorithm was employed in the second approach to obtain an optimal tree, and it was shown that the final QSTR model provided excellent prediction accuracy for the training and test sets (R training 2 and R test 2 were 0.91 and 0.93, respectively). The mean absolute error for the test set was computed as 0.1615. Copyright © 2016 Elsevier Ltd. All rights reserved.

  2. Explaining Match Outcome During The Men’s Basketball Tournament at The Olympic Games

    PubMed Central

    Leicht, Anthony S.; Gómez, Miguel A.; Woods, Carl T.

    2017-01-01

    In preparation for the Olympics, there is a limited opportunity for coaches and athletes to interact regularly with team performance indicators providing important guidance to coaches for enhanced match success at the elite level. This study examined the relationship between match outcome and team performance indicators during men’s basketball tournaments at the Olympic Games. Twelve team performance indicators were collated from all men’s teams and matches during the basketball tournament of the 2004-2016 Olympic Games (n = 156). Linear and non-linear analyses examined the relationship between match outcome and team performance indicator characteristics; namely, binary logistic regression and a conditional interference (CI) classification tree. The most parsimonious logistic regression model retained ‘assists’, ‘defensive rebounds’, ‘field-goal percentage’, ‘fouls’, ‘fouls against’, ‘steals’ and ‘turnovers’ (delta AIC <0.01; Akaike weight = 0.28) with a classification accuracy of 85.5%. Conversely, four performance indicators were retained with the CI classification tree with an average classification accuracy of 81.4%. However, it was the combination of ‘field-goal percentage’ and ‘defensive rebounds’ that provided the greatest probability of winning (93.2%). Match outcome during the men’s basketball tournaments at the Olympic Games was identified by a unique combination of performance indicators. Despite the average model accuracy being marginally higher for the logistic regression analysis, the CI classification tree offered a greater practical utility for coaches through its resolution of non-linear phenomena to guide team success. Key points A unique combination of team performance indicators explained 93.2% of winning observations in men’s basketball at the Olympics. Monitoring of these team performance indicators may provide coaches with the capability to devise multiple game plans or strategies to enhance their likelihood of winning. Incorporation of machine learning techniques with team performance indicators may provide a valuable and strategic approach to explain patterns within multivariate datasets in sport science. PMID:29238245

  3. Classification Model for Damage Localization in a Plate Structure

    NASA Astrophysics Data System (ADS)

    Janeliukstis, R.; Ruchevskis, S.; Chate, A.

    2018-01-01

    The present study is devoted to the problem of damage localization by means of data classification. The commercial ANSYS finite-elements program was used to make a model of a cantilevered composite plate equipped with numerous strain sensors. The plate was divided into zones, and, for data classification purposes, each of them housed several points to which a point mass of magnitude 5 and 10% of plate mass was applied. At each of these points, a numerical modal analysis was performed, from which the first few natural frequencies and strain readings were extracted. The strain data for every point were the input for a classification procedure involving k nearest neighbors and decision trees. The classification model was trained and optimized by finetuning the key parameters of both algorithms. Finally, two new query points were simulated and subjected to a classification in terms of assigning a label to one of the zones of the plate, thus localizing these points. Damage localization results were compared for both algorithms and were found to be in good agreement with the actual application positions of point load.

  4. Sensor-based fall risk assessment--an expert 'to go'.

    PubMed

    Marschollek, M; Rehwald, A; Wolf, K H; Gietzelt, M; Nemitz, G; Meyer Zu Schwabedissen, H; Haux, R

    2011-01-01

    Falls are a predominant problem in our aging society, often leading to severe somatic and psychological consequences, and having an incidence of about 30% in the group of persons aged 65 years or above. In order to identify persons at risk, many assessment tools and tests have been developed, but most of these have to be conducted in a supervised setting and are dependent on an expert rater. The overall aim of our research work is to develop an objective and unobtrusive method to determine individual fall risk based on the use of motion sensor data. The aims of our work for this paper are to derive a fall risk model based on sensor data that may potentially be measured during typical activities of daily life (aim #1), and to evaluate the resulting model with data from a one-year follow-up study (aim #2). A sample of n = 119 geriatric inpatients wore an accelerometer on the waist during a Timed 'Up & Go' test and a 20 m walk. Fifty patients were included in a one-year follow-up study, assessing fall events and scoring average physical activity at home in telephone interviews. The sensor data were processed to extract gait and dynamic balance parameters, from which four fall risk models--two classification trees and two logistic regression models--were computed: models CT#1 and SL#1 using accelerometer data only, models CT#2 and SL#2 including the physical activity score. The risk models were evaluated in a ten-times tenfold cross-validation procedure, calculating sensitivity (SENS), specificity (SPEC), positive and negative predictive values (PPV, NPV), classification accuracy, area under the curve (AUC) and the Brier score. Both classification trees show a fair to good performance (models CT#1/CT#2): SENS 74%/58%, SPEC 96%/82%, PPV 92%/ 74%, NPV 77%/82%, accuracy 80%/78%, AUC 0.83/0.87 and Brier scores 0.14/0.14. The logistic regression models (SL#1/SL#2) perform worse: SENS 42%/58%, SPEC 82%/ 78%, PPV 62%/65%, NPV 67%/72%, accuracy 65%/70%, AUC 0.65/0.72 and Brier scores 0.23/0.21. Our results suggest that accelerometer data may be used to predict falls in an unsupervised setting. Furthermore, the parameters used for prediction are measurable with an unobtrusive sensor device during normal activities of daily living. These promising results have to be validated in a larger, long-term prospective trial.

  5. Reticulate classification of mosaic microbial genomes using NeAT website.

    PubMed

    Lima-Mendez, Gipsi

    2012-01-01

    The tree of life is the classical representation of the evolutionary relationships between existent species. A tree is appropriate to display the divergence of species through mutation, i.e., by vertical descent. However, lateral gene transfer (LGT) is excluded from such representations. When LGT contribution to genome evolution cannot be neglected (e.g., for prokaryotes and mobile genetic elements), the tree becomes misleading. Networks appear as an intuitive way to represent both vertical and horizontal relationships, while overlapping groups within such graphs are more suitable for their classification. Here, we describe a method to represent both vertical and horizontal relationships. We start with a set of genomes whose coded proteins have been grouped into families based on sequence similarity. Next, all pairs of genomes are compared, counting the number of proteins classified into the same family. From this comparison, we derive a weighted graph where genomes with a significant number of similar proteins are linked. Finally, we apply a two-step clustering of this graph to produce a classification where nodes can be assigned to multiple clusters. The procedure can be performed using the Network Analysis Tools (NeAT) website.

  6. Holoentropy enabled-decision tree for automatic classification of diabetic retinopathy using retinal fundus images.

    PubMed

    Mane, Vijay Mahadeo; Jadhav, D V

    2017-05-24

    Diabetic retinopathy (DR) is the most common diabetic eye disease. Doctors are using various test methods to detect DR. But, the availability of test methods and requirements of domain experts pose a new challenge in the automatic detection of DR. In order to fulfill this objective, a variety of algorithms has been developed in the literature. In this paper, we propose a system consisting of a novel sparking process and a holoentropy-based decision tree for automatic classification of DR images to further improve the effectiveness. The sparking process algorithm is developed for automatic segmentation of blood vessels through the estimation of optimal threshold. The holoentropy enabled decision tree is newly developed for automatic classification of retinal images into normal or abnormal using hybrid features which preserve the disease-level patterns even more than the signal level of the feature. The effectiveness of the proposed system is analyzed using standard fundus image databases DIARETDB0 and DIARETDB1 for sensitivity, specificity and accuracy. The proposed system yields sensitivity, specificity and accuracy values of 96.72%, 97.01% and 96.45%, respectively. The experimental result reveals that the proposed technique outperforms the existing algorithms.

  7. A sun-crown-sensor model and adapted C-correction logic for topographic correction of high resolution forest imagery

    NASA Astrophysics Data System (ADS)

    Fan, Yuanchao; Koukal, Tatjana; Weisberg, Peter J.

    2014-10-01

    Canopy shadowing mediated by topography is an important source of radiometric distortion on remote sensing images of rugged terrain. Topographic correction based on the sun-canopy-sensor (SCS) model significantly improved over those based on the sun-terrain-sensor (STS) model for surfaces with high forest canopy cover, because the SCS model considers and preserves the geotropic nature of trees. The SCS model accounts for sub-pixel canopy shadowing effects and normalizes the sunlit canopy area within a pixel. However, it does not account for mutual shadowing between neighboring pixels. Pixel-to-pixel shadowing is especially apparent for fine resolution satellite images in which individual tree crowns are resolved. This paper proposes a new topographic correction model: the sun-crown-sensor (SCnS) model based on high-resolution satellite imagery (IKONOS) and high-precision LiDAR digital elevation model. An improvement on the C-correction logic with a radiance partitioning method to address the effects of diffuse irradiance is also introduced (SCnS + C). In addition, we incorporate a weighting variable, based on pixel shadow fraction, on the direct and diffuse radiance portions to enhance the retrieval of at-sensor radiance and reflectance of highly shadowed tree pixels and form another variety of SCnS model (SCnS + W). Model evaluation with IKONOS test data showed that the new SCnS model outperformed the STS and SCS models in quantifying the correlation between terrain-regulated illumination factor and at-sensor radiance. Our adapted C-correction logic based on the sun-crown-sensor geometry and radiance partitioning better represented the general additive effects of diffuse radiation than C parameters derived from the STS or SCS models. The weighting factor Wt also significantly enhanced correction results by reducing within-class standard deviation and balancing the mean pixel radiance between sunlit and shaded slopes. We analyzed these improvements with model comparison on the red and near infrared bands. The advantages of SCnS + C and SCnS + W on both bands are expected to facilitate forest classification and change detection applications.

  8. Neuropsychological Test Selection for Cognitive Impairment Classification: A Machine Learning Approach

    PubMed Central

    Williams, Jennifer A.; Schmitter-Edgecombe, Maureen; Cook, Diane J.

    2016-01-01

    Introduction Reducing the amount of testing required to accurately detect cognitive impairment is clinically relevant. The aim of this research was to determine the fewest number of clinical measures required to accurately classify participants as healthy older adult, mild cognitive impairment (MCI) or dementia using a suite of classification techniques. Methods Two variable selection machine learning models (i.e., naive Bayes, decision tree), a logistic regression, and two participant datasets (i.e., clinical diagnosis, clinical dementia rating; CDR) were explored. Participants classified using clinical diagnosis criteria included 52 individuals with dementia, 97 with MCI, and 161 cognitively healthy older adults. Participants classified using CDR included 154 individuals CDR = 0, 93 individuals with CDR = 0.5, and 25 individuals with CDR = 1.0+. Twenty-seven demographic, psychological, and neuropsychological variables were available for variable selection. Results No significant difference was observed between naive Bayes, decision tree, and logistic regression models for classification of both clinical diagnosis and CDR datasets. Participant classification (70.0 – 99.1%), geometric mean (60.9 – 98.1%), sensitivity (44.2 – 100%), and specificity (52.7 – 100%) were generally satisfactory. Unsurprisingly, the MCI/CDR = 0.5 participant group was the most challenging to classify. Through variable selection only 2 – 9 variables were required for classification and varied between datasets in a clinically meaningful way. Conclusions The current study results reveal that machine learning techniques can accurately classifying cognitive impairment and reduce the number of measures required for diagnosis. PMID:26332171

  9. Application of decision tree model for the ground subsidence hazard mapping near abandoned underground coal mines.

    PubMed

    Lee, Saro; Park, Inhye

    2013-09-30

    Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. Copyright © 2013. Published by Elsevier Ltd.

  10. Preliminary Classification of Novel Hemorrhagic Fever-Causing Viruses Using Sequence-Based PAirwise Sequence Comparison (PASC) Analysis.

    PubMed

    Bào, Yīmíng; Kuhn, Jens H

    2018-01-01

    During the last decade, genome sequence-based classification of viruses has become increasingly prominent. Viruses can be even classified based on coding-complete genome sequence data alone. Nevertheless, classification remains arduous as experts are required to establish phylogenetic trees to depict the evolutionary relationships of such sequences for preliminary taxonomic placement. Pairwise sequence comparison (PASC) of genomes is one of several novel methods for establishing relationships among viruses. This method, provided by the US National Center for Biotechnology Information as an open-access tool, circumvents phylogenetics, and yet PASC results are often in agreement with those of phylogenetic analyses. Computationally inexpensive, PASC can be easily performed by non-taxonomists. Here we describe how to use the PASC tool for the preliminary classification of novel viral hemorrhagic fever-causing viruses.

  11. Differential Diagnosis of Erythmato-Squamous Diseases Using Classification and Regression Tree.

    PubMed

    Maghooli, Keivan; Langarizadeh, Mostafa; Shahmoradi, Leila; Habibi-Koolaee, Mahdi; Jebraeily, Mohamad; Bouraghi, Hamid

    2016-10-01

    Differential diagnosis of Erythmato-Squamous Diseases (ESD) is a major challenge in the field of dermatology. The ESD diseases are placed into six different classes. Data mining is the process for detection of hidden patterns. In the case of ESD, data mining help us to predict the diseases. Different algorithms were developed for this purpose. we aimed to use the Classification and Regression Tree (CART) to predict differential diagnosis of ESD. we used the Cross Industry Standard Process for Data Mining (CRISP-DM) methodology. For this purpose, the dermatology data set from machine learning repository, UCI was obtained. The Clementine 12.0 software from IBM Company was used for modelling. In order to evaluation of the model we calculate the accuracy, sensitivity and specificity of the model. The proposed model had an accuracy of 94.84% (. 24.42) in order to correct prediction of the ESD disease. Results indicated that using of this classifier could be useful. But, it would be strongly recommended that the combination of machine learning methods could be more useful in terms of prediction of ESD.

  12. Time Series of Images to Improve Tree Species Classification

    NASA Astrophysics Data System (ADS)

    Miyoshi, G. T.; Imai, N. N.; de Moraes, M. V. A.; Tommaselli, A. M. G.; Näsi, R.

    2017-10-01

    Tree species classification provides valuable information to forest monitoring and management. The high floristic variation of the tree species appears as a challenging issue in the tree species classification because the vegetation characteristics changes according to the season. To help to monitor this complex environment, the imaging spectroscopy has been largely applied since the development of miniaturized sensors attached to Unmanned Aerial Vehicles (UAV). Considering the seasonal changes in forests and the higher spectral and spatial resolution acquired with sensors attached to UAV, we present the use of time series of images to classify four tree species. The study area is an Atlantic Forest area located in the western part of São Paulo State. Images were acquired in August 2015 and August 2016, generating three data sets of images: only with the image spectra of 2015; only with the image spectra of 2016; with the layer stacking of images from 2015 and 2016. Four tree species were classified using Spectral angle mapper (SAM), Spectral information divergence (SID) and Random Forest (RF). The results showed that SAM and SID caused an overfitting of the data whereas RF showed better results and the use of the layer stacking improved the classification achieving a kappa coefficient of 18.26 %.

  13. A tree canopy height delineation method based on Morphological Reconstruction—Open Crown Decomposition

    NASA Astrophysics Data System (ADS)

    Liu, Q.; Jing, L.; Li, Y.; Tang, Y.; Li, H.; Lin, Q.

    2016-04-01

    For the purpose of forest management, high resolution LIDAR and optical remote sensing imageries are used for treetop detection, tree crown delineation, and classification. The purpose of this study is to develop a self-adjusted dominant scales calculation method and a new crown horizontal cutting method of tree canopy height model (CHM) to detect and delineate tree crowns from LIDAR, under the hypothesis that a treetop is radiometric or altitudinal maximum and tree crowns consist of multi-scale branches. The major concept of the method is to develop an automatic selecting strategy of feature scale on CHM, and a multi-scale morphological reconstruction-open crown decomposition (MRCD) to get morphological multi-scale features of CHM by: cutting CHM from treetop to the ground; analysing and refining the dominant multiple scales with differential horizontal profiles to get treetops; segmenting LiDAR CHM using watershed a segmentation approach marked with MRCD treetops. This method has solved the problems of false detection of CHM side-surface extracted by the traditional morphological opening canopy segment (MOCS) method. The novel MRCD delineates more accurate and quantitative multi-scale features of CHM, and enables more accurate detection and segmentation of treetops and crown. Besides, the MRCD method can also be extended to high optical remote sensing tree crown extraction. In an experiment on aerial LiDAR CHM of a forest of multi-scale tree crowns, the proposed method yielded high-quality tree crown maps.

  14. Impact of atmospheric correction and image filtering on hyperspectral classification of tree species using support vector machine

    NASA Astrophysics Data System (ADS)

    Shahriari Nia, Morteza; Wang, Daisy Zhe; Bohlman, Stephanie Ann; Gader, Paul; Graves, Sarah J.; Petrovic, Milenko

    2015-01-01

    Hyperspectral images can be used to identify savannah tree species at the landscape scale, which is a key step in measuring biomass and carbon, and tracking changes in species distributions, including invasive species, in these ecosystems. Before automated species mapping can be performed, image processing and atmospheric correction is often performed, which can potentially affect the performance of classification algorithms. We determine how three processing and correction techniques (atmospheric correction, Gaussian filters, and shade/green vegetation filters) affect the prediction accuracy of classification of tree species at pixel level from airborne visible/infrared imaging spectrometer imagery of longleaf pine savanna in Central Florida, United States. Species classification using fast line-of-sight atmospheric analysis of spectral hypercubes (FLAASH) atmospheric correction outperformed ATCOR in the majority of cases. Green vegetation (normalized difference vegetation index) and shade (near-infrared) filters did not increase classification accuracy when applied to large and continuous patches of specific species. Finally, applying a Gaussian filter reduces interband noise and increases species classification accuracy. Using the optimal preprocessing steps, our classification accuracy of six species classes is about 75%.

  15. Advanced Subspace Techniques for Modeling Channel and Session Variability in a Speaker Recognition System

    DTIC Science & Technology

    2012-03-01

    with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random

  16. Learning classification trees

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1991-01-01

    Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.

  17. A risk-based predictive tool to prevent accidental introductions of nonindigenous marine species.

    PubMed

    Floerl, Oliver; Inglis, Graeme J; Hayden, Barbara J

    2005-06-01

    Preventing the introduction of nonindigenous species (NIS) is the most efficient way to avoid the costs and impacts of biological invasions. The transport of fouling species on ship hulls is an important vector for the introduction of marine NIS. We use quantitative risk screening techniques to develop a predictive tool of the abundance and variety of organisms being transported by ocean-going yachts. We developed and calibrated an ordinal rank scale of the abundance of fouling assemblages on the hulls of international yacht hulls arriving in New Zealand. Fouling ranks were allocated to 783 international yachts that arrived in New Zealand between 2002 and 2004. Classification tree analysis was used to identify relationships between the fouling ranks and predictor variables that described the maintenance and travel history of the yachts. The fouling ranks provided reliable indications of the actual abundance and variety of fouling assemblages on the yachts and identified most (60%) yachts that had fouling on their hulls. However, classification tree models explained comparatively little of the variation in the distribution of fouling ranks (22.1%), had high misclassification rates (approximately 43%), and low predictive power. In agreement with other studies, the best model selected the age of the toxic antifouling paint on yacht hulls as the principal risk factor for hull fouling. Our study shows that the transport probability of fouling organisms is the result of a complex suite of interacting factors and that large sample sizes will be needed for calibration of robust risk models.

  18. Automated construction of arterial and venous trees in retinal images.

    PubMed

    Hu, Qiao; Abràmoff, Michael D; Garvin, Mona K

    2015-10-01

    While many approaches exist to segment retinal vessels in fundus photographs, only a limited number focus on the construction and disambiguation of arterial and venous trees. Previous approaches are local and/or greedy in nature, making them susceptible to errors or limiting their applicability to large vessels. We propose a more global framework to generate arteriovenous trees in retinal images, given a vessel segmentation. In particular, our approach consists of three stages. The first stage is to generate an overconnected vessel network, named the vessel potential connectivity map (VPCM), consisting of vessel segments and the potential connectivity between them. The second stage is to disambiguate the VPCM into multiple anatomical trees, using a graph-based metaheuristic algorithm. The third stage is to classify these trees into arterial or venous (A/V) trees. We evaluated our approach with a ground truth built based on a public database, showing a pixel-wise classification accuracy of 88.15% using a manual vessel segmentation as input, and 86.11% using an automatic vessel segmentation as input.

  19. Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery.

    PubMed

    Li, Guiying; Lu, Dengsheng; Moran, Emilio; Hetrick, Scott

    2011-01-01

    This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms - maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based classification (OBC), were explored. The results indicated that a combination of vegetation indices as extra bands into Landsat TM multispectral bands did not improve the overall classification performance, but the combination of textural images was valuable for improving vegetation classification accuracy. In particular, the combination of both vegetation indices and textural images into TM multispectral bands improved overall classification accuracy by 5.6% and kappa coefficient by 6.25%. Comparison of the different classification algorithms indicated that CTA and ANN have poor classification performance in this research, but OBC improved primary forest and pasture classification accuracies. This research indicates that use of textural images or use of OBC are especially valuable for improving the vegetation classes such as upland and liana forest classes having complex stand structures and having relatively large patch sizes.

  20. Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery

    PubMed Central

    LI, GUIYING; LU, DENGSHENG; MORAN, EMILIO; HETRICK, SCOTT

    2011-01-01

    This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms – maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based classification (OBC), were explored. The results indicated that a combination of vegetation indices as extra bands into Landsat TM multispectral bands did not improve the overall classification performance, but the combination of textural images was valuable for improving vegetation classification accuracy. In particular, the combination of both vegetation indices and textural images into TM multispectral bands improved overall classification accuracy by 5.6% and kappa coefficient by 6.25%. Comparison of the different classification algorithms indicated that CTA and ANN have poor classification performance in this research, but OBC improved primary forest and pasture classification accuracies. This research indicates that use of textural images or use of OBC are especially valuable for improving the vegetation classes such as upland and liana forest classes having complex stand structures and having relatively large patch sizes. PMID:22368311

  1. Geographical and climatic limits of needle types of one- and two-needled pinyon pines

    USGS Publications Warehouse

    Cole, K.L.; Fisher, J.; Arundel, S.T.; Cannella, J.; Swift, S.

    2008-01-01

    Aim: The geographical extent and climatic tolerances of one- and two-needled pinyon pines (Pinus subsect. Cembroides) are the focus of questions in taxonomy, palaeoclimatology and modelling of future distributions. The identification of these pines, traditionally classified by one- versus two-needled fascicles, is complicated by populations with both one- and two-needled fascicles on the same tree, and the description of two more recently described one-needled varieties: the fallax-type and californiarum-type. Because previous studies have suggested correlations between needle anatomy and climate, including anatomical plasticity reflecting annual precipitation, we approached this study at the level of the anatomy of individual pine needles rather than species. Location: Western North America. Methods: We synthesized available and new data from field and herbarium collections of needles to compile maps of their current distributions across western North America. Annual frequencies of needle types were compared with local precipitation histories for some stands. Historical North American climates were modelled on a c. 1-km grid using monthly temperature and precipitation values. A geospatial model (ClimLim), which analyses the effect of climate-modulated physiological and ecosystem processes, was used to rank the importance of seasonal climate variables in limiting the distributions of anatomical needle types. Results: The pinyon needles were classified into four distinct types based upon the number of needles per fascicle, needle thickness and the number of stomatal rows and resin canals. The individual needles fit well into four categories of needle types, whereas some trees exhibit a mixture of two needle types. Trees from central Arizona containing a mixture of Pinus edulis and fallax-type needles increased their percentage of fallax-type needles following dry years. All four needle types occupy broader geographical regions with distinctive precipitation regimes. Pinus monophylla and californiarum-type needles occur in regions with high winter precipitation. Pinus edulis and fallax-type needles are found in regions with high monsoon precipitation. Areas supporting californiarum-type and fallax-type needle distributions are additionally characterized by a more extreme May-June drought. Main conclusions: These pinyon needle types seem to reflect the amount and seasonality of precipitation. The single needle fascicle characterizing the fallax type may be an adaptation to early summer or periodic drought, while the single needle of Pinus monophylla may be an adaptation to summer-autumn drought. Although the needles fit into four distinct categories, the parent trees are sometimes less easily classified, especially near their ancestral Pleistocene ranges in the Mojave and northern Sonoran deserts. The abundance of trees with both one- and two-needled fascicles in the zones between P. monophylla, P. edulis and fallax-type populations suggest that needle fascicle number is an unreliable characteristic for species classification. Disregarding needle fascicle number, the fallax-type needles are nearly identical to P. edulis, supporting Little's (1968) initial classification of these trees as P. edulis var. fallax, while the californiarum-type needles have a distinctive morphology supporting Bailey's (1987) classification of this tree as Pinus californiarum.

  2. [Analysis of the characteristics of the older adults with depression using data mining decision tree analysis].

    PubMed

    Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi

    2013-02-01

    The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.

  3. Systematization method for distinguishing plastic groups by using NIR spectroscopy.

    PubMed

    Kaihara, Mikio; Satoh, Minami; Satoh, Minoru

    2007-07-01

    A systematic classification method for polymers is not yet available in case of using near infrared spectra (NIR). That is why we have been searching for a systematic method. Because raw NIR spectra usually have few obvious peaks, NIR spectra have been pretreated by 2nd derivation for taking well modulated spectra. After the pretreatment, we applied classification and regression trees (CART) to the discrimination between the spectra and the species of polymers. As a result, we obtained a relatively simple classification tree. Judging from the obtained splitting conditions and the classified polymers, we concluded that obtained knowledge on the chemical function groups estimated by the important wavelength regions is not always applicable to this classification tree. However, we clarified the splitting rules for polymer species from the NIR spectral point of view.

  4. Classification of Complete Proteomes of Different Organisms and Protein Sets Based on Their Protein Distributions in Terms of Some Key Attributes of Proteins

    PubMed Central

    Ma, Yue; Tuskan, Gerald A.

    2018-01-01

    The existence of complete genome sequences makes it important to develop different approaches for classification of large-scale data sets and to make extraction of biological insights easier. Here, we propose an approach for classification of complete proteomes/protein sets based on protein distributions on some basic attributes. We demonstrate the usefulness of this approach by determining protein distributions in terms of two attributes: protein lengths and protein intrinsic disorder contents (ID). The protein distributions based on L and ID are surveyed for representative proteome organisms and protein sets from the three domains of life. The two-dimensional maps (designated as fingerprints here) from the protein distribution densities in the LD space defined by ln(L) and ID are then constructed. The fingerprints for different organisms and protein sets are found to be distinct with each other, and they can therefore be used for comparative studies. As a test case, phylogenetic trees have been constructed based on the protein distribution densities in the fingerprints of proteomes of organisms without performing any protein sequence comparison and alignments. The phylogenetic trees generated are biologically meaningful, demonstrating that the protein distributions in the LD space may serve as unique phylogenetic signals of the organisms at the proteome level. PMID:29686995

  5. A multitemporal (1979-2009) land-use/land-cover dataset of the binational Santa Cruz Watershed

    USGS Publications Warehouse

    2011-01-01

    Trends derived from multitemporal land-cover data can be used to make informed land management decisions and to help managers model future change scenarios. We developed a multitemporal land-use/land-cover dataset for the binational Santa Cruz watershed of southern Arizona, United States, and northern Sonora, Mexico by creating a series of land-cover maps at decadal intervals (1979, 1989, 1999, and 2009) using Landsat Multispectral Scanner and Thematic Mapper data and a classification and regression tree classifier. The classification model exploited phenological changes of different land-cover spectral signatures through the use of biseasonal imagery collected during the (dry) early summer and (wet) late summer following rains from the North American monsoon. Landsat images were corrected to remove atmospheric influences, and the data were converted from raw digital numbers to surface reflectance values. The 14-class land-cover classification scheme is based on the 2001 National Land Cover Database with a focus on "Developed" land-use classes and riverine "Forest" and "Wetlands" cover classes required for specific watershed models. The classification procedure included the creation of several image-derived and topographic variables, including digital elevation model derivatives, image variance, and multitemporal Kauth-Thomas transformations. The accuracy of the land-cover maps was assessed using a random-stratified sampling design, reference aerial photography, and digital imagery. This showed high accuracy results, with kappa values (the statistical measure of agreement between map and reference data) ranging from 0.80 to 0.85.

  6. Evaluation of metabolites extraction strategies for identifying different brain regions and their relationship with alcohol preference and gender difference using NMR metabolomics.

    PubMed

    Wang, Jie; Zeng, Hao-Long; Du, Hongying; Liu, Zeyuan; Cheng, Ji; Liu, Taotao; Hu, Ting; Kamal, Ghulam Mustafa; Li, Xihai; Liu, Huili; Xu, Fuqiang

    2018-03-01

    Metabolomics generate a profile of small molecules from cellular/tissue metabolism, which could directly reflect the mechanisms of complex networks of biochemical reactions. Traditional metabolomics methods, such as OPLS-DA, PLS-DA are mainly used for binary class discrimination. Multiple groups are always involved in the biological system, especially for brain research. Multiple brain regions are involved in the neuronal study of brain metabolic dysfunctions such as alcoholism, Alzheimer's disease, etc. In the current study, 10 different brain regions were utilized for comparative studies between alcohol preferring and non-preferring rats, male and female rats respectively. As many classes are involved (ten different regions and four types of animals), traditional metabolomics methods are no longer efficient for showing differentiation. Here, a novel strategy based on the decision tree algorithm was employed for successfully constructing different classification models to screen out the major characteristics of ten brain regions at the same time. Subsequently, this method was also utilized to select the major effective brain regions related to alcohol preference and gender difference. Compared with the traditional multivariate statistical methods, the decision tree could construct acceptable and understandable classification models for multi-class data analysis. Therefore, the current technology could also be applied to other general metabolomics studies involving multi class data. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. The Efficiency of Random Forest Method for Shoreline Extraction from LANDSAT-8 and GOKTURK-2 Imageries

    NASA Astrophysics Data System (ADS)

    Bayram, B.; Erdem, F.; Akpinar, B.; Ince, A. K.; Bozkurt, S.; Catal Reis, H.; Seker, D. Z.

    2017-11-01

    Coastal monitoring plays a vital role in environmental planning and hazard management related issues. Since shorelines are fundamental data for environment management, disaster management, coastal erosion studies, modelling of sediment transport and coastal morphodynamics, various techniques have been developed to extract shorelines. Random Forest is one of these techniques which is used in this study for shoreline extraction.. This algorithm is a machine learning method based on decision trees. Decision trees analyse classes of training data creates rules for classification. In this study, Terkos region has been chosen for the proposed method within the scope of "TUBITAK Project (Project No: 115Y718) titled "Integration of Unmanned Aerial Vehicles for Sustainable Coastal Zone Monitoring Model - Three-Dimensional Automatic Coastline Extraction and Analysis: Istanbul-Terkos Example". Random Forest algorithm has been implemented to extract the shoreline of the Black Sea where near the lake from LANDSAT-8 and GOKTURK-2 satellite imageries taken in 2015. The MATLAB environment was used for classification. To obtain land and water-body classes, the Random Forest method has been applied to NIR bands of LANDSAT-8 (5th band) and GOKTURK-2 (4th band) imageries. Each image has been digitized manually and shorelines obtained for accuracy assessment. According to accuracy assessment results, Random Forest method is efficient for both medium and high resolution images for shoreline extraction studies.

  8. Performance of Case-Based Reasoning Retrieval Using Classification Based on Associations versus Jcolibri and FreeCBR: A Further Validation Study

    NASA Astrophysics Data System (ADS)

    Aljuboori, Ahmed S.; Coenen, Frans; Nsaif, Mohammed; Parsons, David J.

    2018-05-01

    Case-Based Reasoning (CBR) plays a major role in expert system research. However, a critical problem can be met when a CBR system retrieves incorrect cases. Class Association Rules (CARs) have been utilized to offer a potential solution in a previous work. The aim of this paper was to perform further validation of Case-Based Reasoning using a Classification based on Association Rules (CBRAR) to enhance the performance of Similarity Based Retrieval (SBR). The CBRAR strategy uses a classed frequent pattern tree algorithm (FP-CAR) in order to disambiguate wrongly retrieved cases in CBR. The research reported in this paper makes contributions to both fields of CBR and Association Rules Mining (ARM) in that full target cases can be extracted from the FP-CAR algorithm without invoking P-trees and union operations. The dataset used in this paper provided more efficient results when the SBR retrieves unrelated answers. The accuracy of the proposed CBRAR system outperforms the results obtained by existing CBR tools such as Jcolibri and FreeCBR.

  9. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data,more » optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the constructed (c) DTB and (d) DTF regression models to predict the T. pyriformis toxicity of diverse chemicals. - Highlights: • Ensemble learning (EL) based models constructed for toxicity prediction of chemicals • Predictive models used a few simple non-quantum mechanical molecular descriptors. • EL-based DTB/DTF models successfully discriminated toxic and non-toxic chemicals. • DTB/DTF regression models precisely predicted toxicity of chemicals in multi-species. • Proposed EL based models can be used as tool to predict toxicity of new chemicals.« less

  10. Identification of phreatophytic groundwater dependent ecosystems using geospatial technologies

    NASA Astrophysics Data System (ADS)

    Perez Hoyos, Isabel Cristina

    The protection of groundwater dependent ecosystems (GDEs) is increasingly being recognized as an essential aspect for the sustainable management and allocation of water resources. Ecosystem services are crucial for human well-being and for a variety of flora and fauna. However, the conservation of GDEs is only possible if knowledge about their location and extent is available. Several studies have focused on the identification of GDEs at specific locations using ground-based measurements. However, recent progress in technologies such as remote sensing and their integration with geographic information systems (GIS) has provided alternative ways to map GDEs at much larger spatial extents. This study is concerned with the discovery of patterns in geospatial data sets using data mining techniques for mapping phreatophytic GDEs in the United States at 1 km spatial resolution. A methodology to identify the probability of an ecosystem to be groundwater dependent is developed. Probabilities are obtained by modeling the relationship between the known locations of GDEs and main factors influencing groundwater dependency, namely water table depth (WTD) and aridity index (AI). A methodology is proposed to predict WTD at 1 km spatial resolution using relevant geospatial data sets calibrated with WTD observations. An ensemble learning algorithm called random forest (RF) is used in order to model the distribution of groundwater in three study areas: Nevada, California, and Washington, as well as in the entire United States. RF regression performance is compared with a single regression tree (RT). The comparison is based on contrasting training error, true prediction error, and variable importance estimates of both methods. Additionally, remote sensing variables are omitted from the process of fitting the RF model to the data to evaluate the deterioration in the model performance when these variables are not used as an input. Research results suggest that although the prediction accuracy of a single RT is reduced in comparison with RFs, single trees can still be used to understand the interactions that might be taking place between predictor variables and the response variable. Regarding RF, there is a great potential in using the power of an ensemble of trees for prediction of WTD. The superior capability of RF to accurately map water table position in Nevada, California, and Washington demonstrate that this technique can be applied at scales larger than regional levels. It is also shown that the removal of remote sensing variables from the RF training process degrades the performance of the model. Using the predicted WTD, the probability of an ecosystem to be groundwater dependent (GDE probability) is estimated at 1 km spatial resolution. The modeling technique is evaluated in the state of Nevada, USA to develop a systematic approach for the identification of GDEs and it is then applied in the United States. The modeling approach selected for the development of the GDE probability map results from a comparison of the performance of classification trees (CT) and classification forests (CF). Predictive performance evaluation for the selection of the most accurate model is achieved using a threshold independent technique, and the prediction accuracy of both models is assessed in greater detail using threshold-dependent measures. The resulting GDE probability map can potentially be used for the definition of conservation areas since it can be translated into a binary classification map with two classes: GDE and NON-GDE. These maps are created by selecting a probability threshold. It is demonstrated that the choice of this threshold has dramatic effects on deterministic model performance measures.

  11. A cross-sectional study for predicting tail biting risk in pig farms using classification and regression tree analysis.

    PubMed

    Scollo, Annalisa; Gottardo, Flaviana; Contiero, Barbara; Edwards, Sandra A

    2017-10-01

    Tail biting in pigs has been an identified behavioural, welfare and economic problem for decades, and requires appropriate but sometimes difficult on-farm interventions. The aim of the paper is to introduce the Classification and Regression Tree (CRT) methodologies to develop a tool for prevention of acute tail biting lesions in pigs on-farm. A sample of 60 commercial farms rearing heavy pigs were involved; an on-farm visit and an interview with the farmer collected data on general management, herd health, disease prevention, climate control, feeding and production traits. Results suggest a value for the CRT analysis in managing the risk factors behind tail biting on a farm-specific level, showing 86.7% sensitivity for the Classification Tree and a correlation of 0.7 between observed and predicted prevalence of tail biting obtained with the Regression Tree. CRT analysis showed five main variables (stocking density, ammonia levels, number of pigs per stockman, type of floor and timeliness in feed supply) as critical predictors of acute tail biting lesions, which demonstrate different importance in different farms subgroups. The model might have reliable and practical applications for the support and implementation of tail biting prevention interventions, especially in case of subgroups of pigs with higher risk, helping farmers and veterinarians to assess the risk in their own farm and to manage their predisposing variables in order to reduce acute tail biting lesions. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Using Classification and Regression Trees (CART) and random forests to analyze attrition: Results from two simulations.

    PubMed

    Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J

    2015-12-01

    In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. (c) 2015 APA, all rights reserved).

  13. Using Classification and Regression Trees (CART) and Random Forests to Analyze Attrition: Results From Two Simulations

    PubMed Central

    Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J.

    2016-01-01

    In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. PMID:26389526

  14. Predicting U.S. Army Reserve Unit Manning Using Market Demographics

    DTIC Science & Technology

    2015-06-01

    develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S

  15. Supervised DNA Barcodes species classification: analysis, comparisons and results

    PubMed Central

    2014-01-01

    Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. Conclusions The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community. PMID:24721333

  16. Evaluation of Decision Trees for Cloud Detection from AVHRR Data

    NASA Technical Reports Server (NTRS)

    Shiffman, Smadar; Nemani, Ramakrishna

    2005-01-01

    Automated cloud detection and tracking is an important step in assessing changes in radiation budgets associated with global climate change via remote sensing. Data products based on satellite imagery are available to the scientific community for studying trends in the Earth's atmosphere. The data products include pixel-based cloud masks that assign cloud-cover classifications to pixels. Many cloud-mask algorithms have the form of decision trees. The decision trees employ sequential tests that scientists designed based on empirical astrophysics studies and simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In a previous study we compared automatically learned decision trees to cloud masks included in Advanced Very High Resolution Radiometer (AVHRR) data products from the year 2000. In this paper we report the replication of the study for five-year data, and for a gold standard based on surface observations performed by scientists at weather stations in the British Islands. For our sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks p < 0.001.

  17. Application of data mining techniques to explore predictors of upper urinary tract damage in patients with neurogenic bladder.

    PubMed

    Fang, H; Lu, B; Wang, X; Zheng, L; Sun, K; Cai, W

    2017-08-17

    This study proposed a decision tree model to screen upper urinary tract damage (UUTD) for patients with neurogenic bladder (NGB). Thirty-four NGB patients with UUTD were recruited in the case group, while 78 without UUTD were included in the control group. A decision tree method, classification and regression tree (CART), was then applied to develop the model in which UUTD was used as a dependent variable and history of urinary tract infections, bladder management, conservative treatment, and urodynamic findings were used as independent variables. The urethra function factor was found to be the primary screening information of patients and treated as the root node of the tree; Pabd max (maximum abdominal pressure, >14 cmH2O), Pves max (maximum intravesical pressure, ≤89 cmH2O), and gender (female) were also variables associated with UUTD. The accuracy of the proposed model was 84.8%, and the area under curve was 0.901 (95%CI=0.844-0.958), suggesting that the decision tree model might provide a new and convenient way to screen UUTD for NGB patients in both undeveloped and developing areas.

  18. A Method for Application of Classification Tree Models to Map Aquatic Vegetation Using Remotely Sensed Images from Different Sensors and Dates

    PubMed Central

    Jiang, Hao; Zhao, Dehua; Cai, Ying; An, Shuqing

    2012-01-01

    In previous attempts to identify aquatic vegetation from remotely-sensed images using classification trees (CT), the images used to apply CT models to different times or locations necessarily originated from the same satellite sensor as that from which the original images used in model development came, greatly limiting the application of CT. We have developed an effective normalization method to improve the robustness of CT models when applied to images originating from different sensors and dates. A total of 965 ground-truth samples of aquatic vegetation types were obtained in 2009 and 2010 in Taihu Lake, China. Using relevant spectral indices (SI) as classifiers, we manually developed a stable CT model structure and then applied a standard CT algorithm to obtain quantitative (optimal) thresholds from 2009 ground-truth data and images from Landsat7-ETM+, HJ-1B-CCD, Landsat5-TM and ALOS-AVNIR-2 sensors. Optimal CT thresholds produced average classification accuracies of 78.1%, 84.7% and 74.0% for emergent vegetation, floating-leaf vegetation and submerged vegetation, respectively. However, the optimal CT thresholds for different sensor images differed from each other, with an average relative variation (RV) of 6.40%. We developed and evaluated three new approaches to normalizing the images. The best-performing method (Method of 0.1% index scaling) normalized the SI images using tailored percentages of extreme pixel values. Using the images normalized by Method of 0.1% index scaling, CT models for a particular sensor in which thresholds were replaced by those from the models developed for images originating from other sensors provided average classification accuracies of 76.0%, 82.8% and 68.9% for emergent vegetation, floating-leaf vegetation and submerged vegetation, respectively. Applying the CT models developed for normalized 2009 images to 2010 images resulted in high classification (78.0%–93.3%) and overall (92.0%–93.1%) accuracies. Our results suggest that Method of 0.1% index scaling provides a feasible way to apply CT models directly to images from sensors or time periods that differ from those of the images used to develop the original models.

  19. Classification of driving workload affected by highway alignment conditions based on classification and regression tree algorithm.

    PubMed

    Hu, Jiangbi; Wang, Ronghua

    2018-02-17

    Guaranteeing a safe and comfortable driving workload can contribute to reducing traffic injuries. In order to provide safe and comfortable threshold values, this study attempted to classify driving workload from the aspects of human factors mainly affected by highway geometric conditions and to determine the thresholds of different workload classifications. This article stated a hypothesis that the values of driver workload change within a certain range. Driving workload scales were stated based on a comprehensive literature review. Through comparative analysis of different psychophysiological measures, heart rate variability (HRV) was chosen as the representative measure for quantifying driving workload by field experiments. Seventy-two participants (36 car drivers and 36 large truck drivers) and 6 highways with different geometric designs were selected to conduct field experiments. A wearable wireless dynamic multiparameter physiological detector (KF-2) was employed to detect physiological data that were simultaneously correlated to the speed changes recorded by a Global Positioning System (GPS) (testing time, driving speeds, running track, and distance). Through performing statistical analyses, including the distribution of HRV during the flat, straight segments and P-P plots of modified HRV, a driving workload calculation model was proposed. Integrating driving workload scales with values, the threshold of each scale of driving workload was determined by classification and regression tree (CART) algorithms. The driving workload calculation model was suitable for driving speeds in the range of 40 to 120 km/h. The experimental data of 72 participants revealed that driving workload had a significant effect on modified HRV, revealing a change in driving speed. When the driving speed was between 100 and 120 km/h, drivers showed an apparent increase in the corresponding modified HRV. The threshold value of the normal driving workload K was between -0.0011 and 0.056 for a car driver and between -0.00086 and 0.067 for a truck driver. Heart rate variability was a direct and effective index for measuring driving workload despite being affected by multiple highway alignment elements. The driving workload model and the thresholds of driving workload classifications can be used to evaluate the quality of highway geometric design. A higher quality of highway geometric design could keep driving workload within a safer and more comfortable range. This study provided insight into reducing traffic injuries from the perspective of disciplinary integration of highway engineering and human factor engineering.

  20. [Predicting very early rebleeding after acute variceal bleeding based in classification and regression tree analysis (CRTA).].

    PubMed

    Altamirano, J; Augustin, S; Muntaner, L; Zapata, L; González-Angulo, A; Martínez, B; Flores-Arroyo, A; Camargo, L; Genescá, J

    2010-01-01

    Variceal bleeding (VB) is the main cause of death among cirrhotic patients. About 30-50% of early rebleeding is encountered few days after the acute episode of VB. It is necessary to stratify patients with high risk of very early rebleeding (VER) for more aggressive therapies. However, there are few and incompletely understood prognostic models for this purpose. To determine the risk factors associated with VER after an acute VB. Assessment and comparison of a novel prognostic model generated by Classification and Regression Tree Analysis (CART) with classic-used models (MELD and Child-Pugh [CP]). Sixty consecutive cirrhotic patients with acute variceal bleeding. CART analysis, MELD and Child-Pugh scores were performed at admission. Receiver operating characteristic (ROC) curves were constructed to evaluate the predictive performance of the models. Very early rebleeding rate was 13%. Variables associated with VER were: serum albumin (p = 0.027), creatinine (p = 0.021) and transfused blood units in the first 24 hrs (p = 0.05). The area under the ROC for MELD, CHILD-Pugh and CART were 0.46, 0.50 and 0.82, respectively. The value of cut analyzed by CART for the significant variables were: 1) Albumin 2.85 mg/dL, 2) Packed red cells 2 units and 3) Creatinine 1.65 mg/dL the ABC-ROC. Serum albumin, creatinine and number of transfused blood units were associated with VER. A simple CART algorithm combining these variables allows an accurate predictive assessment of VER after acute variceal bleeding. Key words: cirrhosis, variceal bleeding, esophageal varices, prognosis, portal hypertension.

  1. Development of an automated ultrasonic testing system

    NASA Astrophysics Data System (ADS)

    Shuxiang, Jiao; Wong, Brian Stephen

    2005-04-01

    Non-Destructive Testing is necessary in areas where defects in structures emerge over time due to wear and tear and structural integrity is necessary to maintain its usability. However, manual testing results in many limitations: high training cost, long training procedure, and worse, the inconsistent test results. A prime objective of this project is to develop an automatic Non-Destructive testing system for a shaft of the wheel axle of a railway carriage. Various methods, such as the neural network, pattern recognition methods and knowledge-based system are used for the artificial intelligence problem. In this paper, a statistical pattern recognition approach, Classification Tree is applied. Before feature selection, a thorough study on the ultrasonic signals produced was carried out. Based on the analysis of the ultrasonic signals, three signal processing methods were developed to enhance the ultrasonic signals: Cross-Correlation, Zero-Phase filter and Averaging. The target of this step is to reduce the noise and make the signal character more distinguishable. Four features: 1. The Auto Regressive Model Coefficients. 2. Standard Deviation. 3. Pearson Correlation 4. Dispersion Uniformity Degree are selected. And then a Classification Tree is created and applied to recognize the peak positions and amplitudes. Searching local maximum is carried out before feature computing. This procedure reduces much computation time in the real-time testing. Based on this algorithm, a software package called SOFRA was developed to recognize the peaks, calibrate automatically and test a simulated shaft automatically. The automatic calibration procedure and the automatic shaft testing procedure are developed.

  2. Putting Priors in Mixture Density Mercer Kernels

    NASA Technical Reports Server (NTRS)

    Srivastava, Ashok N.; Schumann, Johann; Fischer, Bernd

    2004-01-01

    This paper presents a new methodology for automatic knowledge driven data mining based on the theory of Mercer Kernels, which are highly nonlinear symmetric positive definite mappings from the original image space to a very high, possibly infinite dimensional feature space. We describe a new method called Mixture Density Mercer Kernels to learn kernel function directly from data, rather than using predefined kernels. These data adaptive kernels can en- code prior knowledge in the kernel using a Bayesian formulation, thus allowing for physical information to be encoded in the model. We compare the results with existing algorithms on data from the Sloan Digital Sky Survey (SDSS). The code for these experiments has been generated with the AUTOBAYES tool, which automatically generates efficient and documented C/C++ code from abstract statistical model specifications. The core of the system is a schema library which contains template for learning and knowledge discovery algorithms like different versions of EM, or numeric optimization methods like conjugate gradient methods. The template instantiation is supported by symbolic- algebraic computations, which allows AUTOBAYES to find closed-form solutions and, where possible, to integrate them into the code. The results show that the Mixture Density Mercer-Kernel described here outperforms tree-based classification in distinguishing high-redshift galaxies from low- redshift galaxies by approximately 16% on test data, bagged trees by approximately 7%, and bagged trees built on a much larger sample of data by approximately 2%.

  3. Automatic Classification of Trees from Laser Scanning Point Clouds

    NASA Astrophysics Data System (ADS)

    Sirmacek, B.; Lindenbergh, R.

    2015-08-01

    Development of laser scanning technologies has promoted tree monitoring studies to a new level, as the laser scanning point clouds enable accurate 3D measurements in a fast and environmental friendly manner. In this paper, we introduce a probability matrix computation based algorithm for automatically classifying laser scanning point clouds into 'tree' and 'non-tree' classes. Our method uses the 3D coordinates of the laser scanning points as input and generates a new point cloud which holds a label for each point indicating if it belongs to the 'tree' or 'non-tree' class. To do so, a grid surface is assigned to the lowest height level of the point cloud. The grids are filled with probability values which are calculated by checking the point density above the grid. Since the tree trunk locations appear with very high values in the probability matrix, selecting the local maxima of the grid surface help to detect the tree trunks. Further points are assigned to tree trunks if they appear in the close proximity of trunks. Since heavy mathematical computations (such as point cloud organization, detailed shape 3D detection methods, graph network generation) are not required, the proposed algorithm works very fast compared to the existing methods. The tree classification results are found reliable even on point clouds of cities containing many different objects. As the most significant weakness, false detection of light poles, traffic signs and other objects close to trees cannot be prevented. Nevertheless, the experimental results on mobile and airborne laser scanning point clouds indicate the possible usage of the algorithm as an important step for tree growth observation, tree counting and similar applications. While the laser scanning point cloud is giving opportunity to classify even very small trees, accuracy of the results is reduced in the low point density areas further away than the scanning location. These advantages and disadvantages of two laser scanning point cloud sources are discussed in detail.

  4. Mapping Urban Tree Canopy Cover Using Fused Airborne LIDAR and Satellite Imagery Data

    NASA Astrophysics Data System (ADS)

    Parmehr, Ebadat G.; Amati, Marco; Fraser, Clive S.

    2016-06-01

    Urban green spaces, particularly urban trees, play a key role in enhancing the liveability of cities. The availability of accurate and up-to-date maps of tree canopy cover is important for sustainable development of urban green spaces. LiDAR point clouds are widely used for the mapping of buildings and trees, and several LiDAR point cloud classification techniques have been proposed for automatic mapping. However, the effectiveness of point cloud classification techniques for automated tree extraction from LiDAR data can be impacted to the point of failure by the complexity of tree canopy shapes in urban areas. Multispectral imagery, which provides complementary information to LiDAR data, can improve point cloud classification quality. This paper proposes a reliable method for the extraction of tree canopy cover from fused LiDAR point cloud and multispectral satellite imagery data. The proposed method initially associates each LiDAR point with spectral information from the co-registered satellite imagery data. It calculates the normalised difference vegetation index (NDVI) value for each LiDAR point and corrects tree points which have been misclassified as buildings. Then, region growing of tree points, taking the NDVI value into account, is applied. Finally, the LiDAR points classified as tree points are utilised to generate a canopy cover map. The performance of the proposed tree canopy cover mapping method is experimentally evaluated on a data set of airborne LiDAR and WorldView 2 imagery covering a suburb in Melbourne, Australia.

  5. Determination of the ecological connectivity between landscape patches obtained using the knowledge engineer (expert) classification technique

    NASA Astrophysics Data System (ADS)

    Selim, Serdar; Sonmez, Namik Kemal; Onur, Isin; Coslu, Mesut

    2017-10-01

    Connection of similar landscape patches with ecological corridors supports habitat quality of these patches, increases urban ecological quality, and constitutes an important living and expansion area for wild life. Furthermore, habitat connectivity provided by urban green areas is supporting biodiversity in urban areas. In this study, possible ecological connections between landscape patches, which were achieved by using Expert classification technique and modeled with probabilistic connection index. Firstly, the reflection responses of plants to various bands are used as data in hypotheses. One of the important features of this method is being able to use more than one image at the same time in the formation of the hypothesis. For this reason, before starting the application of the Expert classification, the base images are prepared. In addition to the main image, the hypothesis conditions were also created for each class with the NDVI image which is commonly used in the vegetation researches. Besides, the results of the previously conducted supervised classification were taken into account. We applied this classification method by using the raster imagery with user-defined variables. Hereupon, to provide ecological connections of the tree cover which was achieved from the classification, we used Probabilistic Connection (PC) index. The probabilistic connection model which is used for landscape planning and conservation studies via detecting and prioritization critical areas for ecological connection characterizes the possibility of direct connection between habitats. As a result we obtained over % 90 total accuracy in accuracy assessment analysis. We provided ecological connections with PC index and we created inter-connected green spaces system. Thus, we offered and implicated green infrastructure system model takes place in the agenda of recent years.

  6. Application of decision tree for prediction of cutaneous leishmaniasis incidence based on environmental and topographic factors in Isfahan Province, Iran.

    PubMed

    Ramezankhani, Roghieh; Sajjadi, Nooshin; Nezakati Esmaeilzadeh, Roya; Jozi, Seyed Ali; Shirzadi, Mohammad Reza

    2018-05-08

    Cutaneous Leishmaniasis (CL) is a neglected tropical disease that continues to be a health problem in Iran. Nearly 350 million people are thought to be at risk. We investigated the impact of the environmental factors on CL incidence during the period 2007- 2015 in a known endemic area for this disease in Isfahan Province, Iran. After collecting data with regard to the climatic, topographic, vegetation coverage and CL cases in the study area, a decision tree model was built using the classification and regression tree algorithm. CL data for the years 2007 until 2012 were used for model construction and the data for the years 2013 until 2015 were used for testing the model. The Root Mean Square error and the correlation factor were used to evaluate the predictive performance of the decision tree model. We found that wind speeds less than 14 m/s, altitudes between 1234 and 1810 m above the mean sea level, vegetation coverage according to the normalized difference vegetation index (NDVI) less than 0.12, rainfall less than 1.6 mm and air temperatures higher than 30°C would correspond to a seasonal incidence of 163.28 per 100,000 persons, while if wind speed is less than 14 m/s, altitude less than 1,810 m and NDVI higher than 0.12, then the mean seasonal incidence of the disease would be 2.27 per 100,000 persons. Environmental factors were found to be important predictive variables for CL incidence and should be considered in surveillance and prevention programmes for CL control.

  7. Vector quantizer designs for joint compression and terrain categorization of multispectral imagery

    NASA Technical Reports Server (NTRS)

    Gorman, John D.; Lyons, Daniel F.

    1994-01-01

    Two vector quantizer designs for compression of multispectral imagery and their impact on terrain categorization performance are evaluated. The mean-squared error (MSE) and classification performance of the two quantizers are compared, and it is shown that a simple two-stage design minimizing MSE subject to a constraint on classification performance has a significantly better classification performance than a standard MSE-based tree-structured vector quantizer followed by maximum likelihood classification. This improvement in classification performance is obtained with minimal loss in MSE performance. The results show that it is advantageous to tailor compression algorithm designs to the required data exploitation tasks. Applications of joint compression/classification include compression for the archival or transmission of Landsat imagery that is later used for land utility surveys and/or radiometric analysis.

  8. Decision tree analysis in subarachnoid hemorrhage: prediction of outcome parameters during the course of aneurysmal subarachnoid hemorrhage using decision tree analysis.

    PubMed

    Hostettler, Isabel Charlotte; Muroi, Carl; Richter, Johannes Konstantin; Schmid, Josef; Neidert, Marian Christoph; Seule, Martin; Boss, Oliver; Pangalu, Athina; Germans, Menno Robbert; Keller, Emanuela

    2018-01-19

    OBJECTIVE The aim of this study was to create prediction models for outcome parameters by decision tree analysis based on clinical and laboratory data in patients with aneurysmal subarachnoid hemorrhage (aSAH). METHODS The database consisted of clinical and laboratory parameters of 548 patients with aSAH who were admitted to the Neurocritical Care Unit, University Hospital Zurich. To examine the model performance, the cohort was randomly divided into a derivation cohort (60% [n = 329]; training data set) and a validation cohort (40% [n = 219]; test data set). The classification and regression tree prediction algorithm was applied to predict death, functional outcome, and ventriculoperitoneal (VP) shunt dependency. Chi-square automatic interaction detection was applied to predict delayed cerebral infarction on days 1, 3, and 7. RESULTS The overall mortality was 18.4%. The accuracy of the decision tree models was good for survival on day 1 and favorable functional outcome at all time points, with a difference between the training and test data sets of < 5%. Prediction accuracy for survival on day 1 was 75.2%. The most important differentiating factor was the interleukin-6 (IL-6) level on day 1. Favorable functional outcome, defined as Glasgow Outcome Scale scores of 4 and 5, was observed in 68.6% of patients. Favorable functional outcome at all time points had a prediction accuracy of 71.1% in the training data set, with procalcitonin on day 1 being the most important differentiating factor at all time points. A total of 148 patients (27%) developed VP shunt dependency. The most important differentiating factor was hyperglycemia on admission. CONCLUSIONS The multiple variable analysis capability of decision trees enables exploration of dependent variables in the context of multiple changing influences over the course of an illness. The decision tree currently generated increases awareness of the early systemic stress response, which is seemingly pertinent for prognostication.

  9. An automatic graph-based approach for artery/vein classification in retinal images.

    PubMed

    Dashtbozorg, Behdad; Mendonça, Ana Maria; Campilho, Aurélio

    2014-03-01

    The classification of retinal vessels into artery/vein (A/V) is an important phase for automating the detection of vascular changes, and for the calculation of characteristic signs associated with several systemic diseases such as diabetes, hypertension, and other cardiovascular conditions. This paper presents an automatic approach for A/V classification based on the analysis of a graph extracted from the retinal vasculature. The proposed method classifies the entire vascular tree deciding on the type of each intersection point (graph nodes) and assigning one of two labels to each vessel segment (graph links). Final classification of a vessel segment as A/V is performed through the combination of the graph-based labeling results with a set of intensity features. The results of this proposed method are compared with manual labeling for three public databases. Accuracy values of 88.3%, 87.4%, and 89.8% are obtained for the images of the INSPIRE-AVR, DRIVE, and VICAVR databases, respectively. These results demonstrate that our method outperforms recent approaches for A/V classification.

  10. Comparing ecoregional classifications for natural areas management in the Klamath Region, USA

    USGS Publications Warehouse

    Sarr, Daniel A.; Duff, Andrew; Dinger, Eric C.; Shafer, Sarah L.; Wing, Michael; Seavy, Nathaniel E.; Alexander, John D.

    2015-01-01

    We compared three existing ecoregional classification schemes (Bailey, Omernik, and World Wildlife Fund) with two derived schemes (Omernik Revised and Climate Zones) to explore their effectiveness in explaining species distributions and to better understand natural resource geography in the Klamath Region, USA. We analyzed presence/absence data derived from digital distribution maps for trees, amphibians, large mammals, small mammals, migrant birds, and resident birds using three statistical analyses of classification accuracy (Analysis of Similarity, Canonical Analysis of Principal Coordinates, and Classification Strength). The classifications were roughly comparable in classification accuracy, with Omernik Revised showing the best overall performance. Trees showed the strongest fidelity to the classifications, and large mammals showed the weakest fidelity. We discuss the implications for regional biogeography and describe how intermediate resolution ecoregional classifications may be appropriate for use as natural areas management domains.

  11. Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model

    PubMed Central

    Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz

    2016-01-01

    When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924

  12. Remote Sensing Image Classification Applied to the First National Geographical Information Census of China

    NASA Astrophysics Data System (ADS)

    Yu, Xin; Wen, Zongyong; Zhu, Zhaorong; Xia, Qiang; Shun, Lan

    2016-06-01

    Image classification will still be a long way in the future, although it has gone almost half a century. In fact, researchers have gained many fruits in the image classification domain, but there is still a long distance between theory and practice. However, some new methods in the artificial intelligence domain will be absorbed into the image classification domain and draw on the strength of each to offset the weakness of the other, which will open up a new prospect. Usually, networks play the role of a high-level language, as is seen in Artificial Intelligence and statistics, because networks are used to build complex model from simple components. These years, Bayesian Networks, one of probabilistic networks, are a powerful data mining technique for handling uncertainty in complex domains. In this paper, we apply Tree Augmented Naive Bayesian Networks (TAN) to texture classification of High-resolution remote sensing images and put up a new method to construct the network topology structure in terms of training accuracy based on the training samples. Since 2013, China government has started the first national geographical information census project, which mainly interprets geographical information based on high-resolution remote sensing images. Therefore, this paper tries to apply Bayesian network to remote sensing image classification, in order to improve image interpretation in the first national geographical information census project. In the experiment, we choose some remote sensing images in Beijing. Experimental results demonstrate TAN outperform than Naive Bayesian Classifier (NBC) and Maximum Likelihood Classification Method (MLC) in the overall classification accuracy. In addition, the proposed method can reduce the workload of field workers and improve the work efficiency. Although it is time consuming, it will be an attractive and effective method for assisting office operation of image interpretation.

  13. Classification and Compression of Multi-Resolution Vectors: A Tree Structured Vector Quantizer Approach

    DTIC Science & Technology

    2002-01-01

    their expression profile and for classification of cells into tumerous and non- tumerous classes. Then we will present a parallel tree method for... cancerous cells. We will use the same dataset and use tree structured classifiers with multi-resolution analysis for classifying cancerous from non- cancerous ...cells. We have the expressions of 4096 genes from 98 different cell types. Of these 98, 72 are cancerous while 26 are non- cancerous . We are interested

  14. A digital spatial predictive model of land-use change using economic and environmental inputs and a statistical tree classification approach: Thailand, 1970s--1990s

    NASA Astrophysics Data System (ADS)

    Felkner, John Sames

    The scale and extent of global land use change is massive, and has potentially powerful effects on the global climate and global atmospheric composition (Turner & Meyer, 1994). Because of this tremendous change and impact, there is an urgent need for quantitative, empirical models of land use change, especially predictive models with an ability to capture the trajectories of change (Agarwal, Green, Grove, Evans, & Schweik, 2000; Lambin et al., 1999). For this research, a spatial statistical predictive model of land use change was created and run in two provinces of Thailand. The model utilized an extensive spatial database, and used a classification tree approach for explanatory model creation and future land use (Breiman, Friedman, Olshen, & Stone, 1984). Eight input variables were used, and the trees were run on a dependent variable of land use change measured from 1979 to 1989 using classified satellite imagery. The derived tree models were used to create probability of change surfaces, and these were then used to create predicted land cover maps for 1999. These predicted 1999 maps were compared with actual 1999 landcover derived from 1999 Landsat 7 imagery. The primary research hypothesis was that an explanatory model using both economic and environmental input variables would better predict future land use change than would either a model using only economic variables or a model using only environmental. Thus, the eight input variables included four economic and four environmental variables. The results indicated a very slight superiority of the full models to predict future agricultural change and future deforestation, but a slight superiority of the economic models to predict future built change. However, the margins of superiority were too small to be statistically significant. The resulting tree structures were used, however, to derive a series of principles or "rules" governing land use change in both provinces. The model was able to predict future land use, given a series of assumptions, with 90 percent overall accuracies. The model can be used in other developing or developed country locations for future land use prediction, determination of future threatened areas, or to derive "rules" or principles driving land use change.

  15. Mapping of taiga forest units using AIRSAR data and/or optical data, and retrieval of forest parameters

    NASA Technical Reports Server (NTRS)

    Rignot, Eric; Williams, Cynthia; Way, Jobea; Viereck, Leslie

    1993-01-01

    A maximum a posteriori Bayesian classifier for multifrequency polarimetric SAR data is used to perform a supervised classification of forest types in the floodplains of Alaska. The image classes include white spruce, balsam poplar, black spruce, alder, non-forests, and open water. The authors investigate the effect on classification accuracy of changing environmental conditions, and of frequency and polarization of the signal. The highest classification accuracy (86 percent correctly classified forest pixels, and 91 percent overall) is obtained combining L- and C-band frequencies fully polarimetric on a date where the forest is just recovering from flooding. The forest map compares favorably with a vegetation map assembled from digitized aerial photos which took five years for completion, and address the state of the forest in 1978, ignoring subsequent fires, changes in the course of the river, clear-cutting of trees, and tree growth. HV-polarization is the most useful polarization at L- and C-band for classification. C-band VV (ERS-1 mode) and L-band HH (J-ERS-1 mode) alone or combined yield unsatisfactory classification accuracies. Additional data acquired in the winter season during thawed and frozen days yield classification accuracies respectively 20 percent and 30 percent lower due to a greater confusion between conifers and deciduous trees. Data acquired at the peak of flooding in May 1991 also yield classification accuracies 10 percent lower because of dominant trunk-ground interactions which mask out finer differences in radar backscatter between tree species. Combination of several of these dates does not improve classification accuracy. For comparison, panchromatic optical data acquired by SPOT in the summer season of 1991 are used to classify the same area. The classification accuracy (78 percent for the forest types and 90 percent if open water is included) is lower than that obtained with AIRSAR although conifers and deciduous trees are better separated due to the presence of leaves on the deciduous trees. Optical data do not separate black spruce and white spruce as well as SAR data, cannot separate alder from balsam poplar, and are of course limited by the frequent cloud cover in the polar regions. Yet, combining SPOT and AIRSAR offers better chances to identify vegetation types independent of ground truth information using a combination of NDVI indexes from SPOT, biomass numbers from AIRSAR, and a segmentation map from either one.

  16. Object-Based Point Cloud Analysis of Full-Waveform Airborne Laser Scanning Data for Urban Vegetation Classification

    PubMed Central

    Rutzinger, Martin; Höfle, Bernhard; Hollaus, Markus; Pfeifer, Norbert

    2008-01-01

    Airborne laser scanning (ALS) is a remote sensing technique well-suited for 3D vegetation mapping and structure characterization because the emitted laser pulses are able to penetrate small gaps in the vegetation canopy. The backscattered echoes from the foliage, woody vegetation, the terrain, and other objects are detected, leading to a cloud of points. Higher echo densities (>20 echoes/m2) and additional classification variables from full-waveform (FWF) ALS data, namely echo amplitude, echo width and information on multiple echoes from one shot, offer new possibilities in classifying the ALS point cloud. Currently FWF sensor information is hardly used for classification purposes. This contribution presents an object-based point cloud analysis (OBPA) approach, combining segmentation and classification of the 3D FWF ALS points designed to detect tall vegetation in urban environments. The definition tall vegetation includes trees and shrubs, but excludes grassland and herbage. In the applied procedure FWF ALS echoes are segmented by a seeded region growing procedure. All echoes sorted descending by their surface roughness are used as seed points. Segments are grown based on echo width homogeneity. Next, segment statistics (mean, standard deviation, and coefficient of variation) are calculated by aggregating echo features such as amplitude and surface roughness. For classification a rule base is derived automatically from a training area using a statistical classification tree. To demonstrate our method we present data of three sites with around 500,000 echoes each. The accuracy of the classified vegetation segments is evaluated for two independent validation sites. In a point-wise error assessment, where the classification is compared with manually classified 3D points, completeness and correctness better than 90% are reached for the validation sites. In comparison to many other algorithms the proposed 3D point classification works on the original measurements directly, i.e. the acquired points. Gridding of the data is not necessary, a process which is inherently coupled to loss of data and precision. The 3D properties provide especially a good separability of buildings and terrain points respectively, if they are occluded by vegetation. PMID:27873771

  17. Comparison of Sub-Pixel Classification Approaches for Crop-Specific Mapping

    EPA Science Inventory

    This paper examined two non-linear models, Multilayer Perceptron (MLP) regression and Regression Tree (RT), for estimating sub-pixel crop proportions using time-series MODIS-NDVI data. The sub-pixel proportions were estimated for three major crop types including corn, soybean, a...

  18. Identification of Chinese medicine syndromes in persistent insomnia associated with major depressive disorder: a latent tree analysis.

    PubMed

    Yeung, Wing-Fai; Chung, Ka-Fai; Zhang, Nevin Lian-Wen; Zhang, Shi Ping; Yung, Kam-Ping; Chen, Pei-Xian; Ho, Yan-Yee

    2016-01-01

    Chinese medicine (CM) syndrome (zheng) differentiation is based on the co-occurrence of CM manifestation profiles, such as signs and symptoms, and pulse and tongue features. Insomnia is a symptom that frequently occurs in major depressive disorder despite adequate antidepressant treatment. This study aims to identify co-occurrence patterns in participants with persistent insomnia and major depressive disorder from clinical feature data using latent tree analysis, and to compare the latent variables with relevant CM syndromes. One hundred and forty-two participants with persistent insomnia and a history of major depressive disorder completed a standardized checklist (the Chinese Medicine Insomnia Symptom Checklist) specially developed for CM syndrome classification of insomnia. The checklist covers symptoms and signs, including tongue and pulse features. The clinical features assessed by the checklist were analyzed using Lantern software. CM practitioners with relevant experience compared the clinical feature variables under each latent variable with reference to relevant CM syndromes, based on a previous review of CM syndromes. The symptom data were analyzed to build the latent tree model and the model with the highest Bayes information criterion score was regarded as the best model. This model contained 18 latent variables, each of which divided participants into two clusters. Six clusters represented more than 50 % of the sample. The clinical feature co-occurrence patterns of these six clusters were interpreted as the CM syndromes Liver qi stagnation transforming into fire, Liver fire flaming upward, Stomach disharmony, Hyperactivity of fire due to yin deficiency, Heart-kidney noninteraction, and Qi deficiency of the heart and gallbladder. The clinical feature variables that contributed significant cumulative information coverage (at least 95 %) were identified. Latent tree model analysis on a sample of depressed participants with insomnia revealed 13 clinical feature co-occurrence patterns, four mutual-exclusion patterns, and one pattern with a single clinical feature variable.

  19. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea.

    PubMed

    McDonald, Daniel; Price, Morgan N; Goodrich, Julia; Nawrocki, Eric P; DeSantis, Todd Z; Probst, Alexander; Andersen, Gary L; Knight, Rob; Hugenholtz, Philip

    2012-03-01

    Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a 'taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408,315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/.

  20. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea

    PubMed Central

    McDonald, Daniel; Price, Morgan N; Goodrich, Julia; Nawrocki, Eric P; DeSantis, Todd Z; Probst, Alexander; Andersen, Gary L; Knight, Rob; Hugenholtz, Philip

    2012-01-01

    Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a ‘taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408 315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/. PMID:22134646

  1. A conceptual approach to approximate tree root architecture in infinite slope models

    NASA Astrophysics Data System (ADS)

    Schmaltz, Elmar; Glade, Thomas

    2016-04-01

    Vegetation-related properties - particularly tree root distribution and coherent hydrologic and mechanical effects on the underlying soil mantle - are commonly not considered in infinite slope models. Indeed, from a geotechnical point of view, these effects appear to be difficult to be reproduced reliably in a physically-based modelling approach. The growth of a tree and the expansion of its root architecture are directly connected with both intrinsic properties such as species and age, and extrinsic factors like topography, availability of nutrients, climate and soil type. These parameters control four main issues of the tree root architecture: 1) Type of rooting; 2) maximum growing distance to the tree stem (radius r); 3) maximum growing depth (height h); and 4) potential deformation of the root system. Geometric solids are able to approximate the distribution of a tree root system. The objective of this paper is to investigate whether it is possible to implement root systems and the connected hydrological and mechanical attributes sufficiently in a 3-dimensional slope stability model. Hereby, a spatio-dynamic vegetation module should cope with the demands of performance, computation time and significance. However, in this presentation, we focus only on the distribution of roots. The assumption is that the horizontal root distribution around a tree stem on a 2-dimensional plane can be described by a circle with the stem located at the centroid and a distinct radius r that is dependent on age and species. We classified three main types of tree root systems and reproduced the species-age-related root distribution with three respective mathematical solids in a synthetic 3-dimensional hillslope ambience. Thus, two solids in an Euclidian space were distinguished to represent the three root systems: i) cylinders with radius r and height h, whilst the dimension of latter defines the shape of a taproot-system or a shallow-root-system respectively; ii) elliptic paraboloids represent a cordate-root-system with radius r, height h and a constant, species-independent curvature. This procedure simplifies the classification of tree species into the three defined geometric solids. In this study we introduce a conceptual approach to estimate the 2- and 3-dimensional distribution of different tree root systems, and to implement it in a raster environment, as it is used in infinite slope models. Hereto we used the PCRaster extension in a python framework. The results show that root distribution and root growth are spatially reproducible in a simple raster framework. The outputs exhibit significant effects for a synthetically generated slope on local scale for equal time-steps. The preliminary results depict an initial step to develop a vegetation module that can be coupled with hydro-mechanical slope stability models. This approach is expected to yield a valuable contribution to the implementation of vegetation-related properties, in particular effects of root-reinforcement, into physically-based approaches using infinite slope models.

  2. A respiratory alert model for the Shenandoah Valley, Virginia, USA

    NASA Astrophysics Data System (ADS)

    Hondula, David M.; Davis, Robert E.; Knight, David B.; Sitka, Luke J.; Enfield, Kyle; Gawtry, Stephen B.; Stenger, Phillip J.; Deaton, Michael L.; Normile, Caroline P.; Lee, Temple R.

    2013-01-01

    Respiratory morbidity (particularly COPD and asthma) can be influenced by short-term weather fluctuations that affect air quality and lung function. We developed a model to evaluate meteorological conditions associated with respiratory hospital admissions in the Shenandoah Valley of Virginia, USA. We generated ensembles of classification trees based on six years of respiratory-related hospital admissions (64,620 cases) and a suite of 83 potential environmental predictor variables. As our goal was to identify short-term weather linkages to high admission periods, the dependent variable was formulated as a binary classification of five-day moving average respiratory admission departures from the seasonal mean value. Accounting for seasonality removed the long-term apparent inverse relationship between temperature and admissions. We generated eight total models specific to the northern and southern portions of the valley for each season. All eight models demonstrate predictive skill (mean odds ratio = 3.635) when evaluated using a randomization procedure. The predictor variables selected by the ensembling algorithm vary across models, and both meteorological and air quality variables are included. In general, the models indicate complex linkages between respiratory health and environmental conditions that may be difficult to identify using more traditional approaches.

  3. A Transform-Based Feature Extraction Approach for Motor Imagery Tasks Classification

    PubMed Central

    Khorshidtalab, Aida; Mesbah, Mostefa; Salami, Momoh J. E.

    2015-01-01

    In this paper, we present a new motor imagery classification method in the context of electroencephalography (EEG)-based brain–computer interface (BCI). This method uses a signal-dependent orthogonal transform, referred to as linear prediction singular value decomposition (LP-SVD), for feature extraction. The transform defines the mapping as the left singular vectors of the LP coefficient filter impulse response matrix. Using a logistic tree-based model classifier; the extracted features are classified into one of four motor imagery movements. The proposed approach was first benchmarked against two related state-of-the-art feature extraction approaches, namely, discrete cosine transform (DCT) and adaptive autoregressive (AAR)-based methods. By achieving an accuracy of 67.35%, the LP-SVD approach outperformed the other approaches by large margins (25% compared with DCT and 6 % compared with AAR-based methods). To further improve the discriminatory capability of the extracted features and reduce the computational complexity, we enlarged the extracted feature subset by incorporating two extra features, namely, Q- and the Hotelling’s \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$T^{2}$ \\end{document} statistics of the transformed EEG and introduced a new EEG channel selection method. The performance of the EEG classification based on the expanded feature set and channel selection method was compared with that of a number of the state-of-the-art classification methods previously reported with the BCI IIIa competition data set. Our method came second with an average accuracy of 81.38%. PMID:27170898

  4. Multiple Spectral-Spatial Classification Approach for Hyperspectral Data

    NASA Technical Reports Server (NTRS)

    Tarabalka, Yuliya; Benediktsson, Jon Atli; Chanussot, Jocelyn; Tilton, James C.

    2010-01-01

    A .new multiple classifier approach for spectral-spatial classification of hyperspectral images is proposed. Several classifiers are used independently to classify an image. For every pixel, if all the classifiers have assigned this pixel to the same class, the pixel is kept as a marker, i.e., a seed of the spatial region, with the corresponding class label. We propose to use spectral-spatial classifiers at the preliminary step of the marker selection procedure, each of them combining the results of a pixel-wise classification and a segmentation map. Different segmentation methods based on dissimilar principles lead to different classification results. Furthermore, a minimum spanning forest is built, where each tree is rooted on a classification -driven marker and forms a region in the spectral -spatial classification: map. Experimental results are presented for two hyperspectral airborne images. The proposed method significantly improves classification accuracies, when compared to previously proposed classification techniques.

  5. FCMPSO: An Imputation for Missing Data Features in Heart Disease Classification

    NASA Astrophysics Data System (ADS)

    Salleh, Mohd Najib Mohd; Ashikin Samat, Nurul

    2017-08-01

    The application of data mining and machine learning in directing clinical research into possible hidden knowledge is becoming greatly influential in medical areas. Heart Disease is a killer disease around the world, and early prevention through efficient methods can help to reduce the mortality number. Medical data may contain many uncertainties, as they are fuzzy and vague in nature. Nonetheless, imprecise features data such as no values and missing values can affect quality of classification results. Nevertheless, the other complete features are still capable to give information in certain features. Therefore, an imputation approach based on Fuzzy C-Means and Particle Swarm Optimization (FCMPSO) is developed in preprocessing stage to help fill in the missing values. Then, the complete dataset is trained in classification algorithm, Decision Tree. The experiment is trained with Heart Disease dataset and the performance is analysed using accuracy, precision, and ROC values. Results show that the performance of Decision Tree is increased after the application of FCMSPO for imputation.

  6. DeepSAT: A Deep Learning Approach to Tree-Cover Delineation in 1-m NAIP Imagery for the Continental United States

    NASA Technical Reports Server (NTRS)

    Ganguly, Sangram; Basu, Saikat; Nemani, Ramakrishna R.; Mukhopadhyay, Supratik; Michaelis, Andrew; Votava, Petr

    2016-01-01

    High resolution tree cover classification maps are needed to increase the accuracy of current land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) tree cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agriculture Imagery Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with 60 million pixels) and has a total size of 65 terabytes for a single acquisition. Features extracted from the entire dataset would amount to 8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. Using the NASA Earth Exchange (NEX) initiative, we have developed an end-to-end architecture by integrating a segmentation module based on Statistical Region Merging, a classification algorithm using Deep Belief Network and a structured prediction algorithm using Conditional Random Fields to integrate the results from the segmentation and classification modules to create per-pixel class labels. The training process is scaled up using the power of GPUs and the prediction is scaled to quarter million NAIP tiles spanning the whole of Continental United States using the NEX HPC supercomputing cluster. An initial pilot over the state of California spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles has produced true positive rates of around 88 percent for fragmented forests and 74 percent for urban tree cover areas, with false positive rates lower than 2 percent for both landscapes.

  7. DeepSAT: A Deep Learning Approach to Tree-cover Delineation in 1-m NAIP Imagery for the Continental United States

    NASA Astrophysics Data System (ADS)

    Ganguly, S.; Basu, S.; Nemani, R. R.; Mukhopadhyay, S.; Michaelis, A.; Votava, P.

    2016-12-01

    High resolution tree cover classification maps are needed to increase the accuracy of current land ecosystem and climate model outputs. Limited studies are in place that demonstrates the state-of-the-art in deriving very high resolution (VHR) tree cover products. In addition, most methods heavily rely on commercial softwares that are difficult to scale given the region of study (e.g. continents to globe). Complexities in present approaches relate to (a) scalability of the algorithm, (b) large image data processing (compute and memory intensive), (c) computational cost, (d) massively parallel architecture, and (e) machine learning automation. In addition, VHR satellite datasets are of the order of terabytes and features extracted from these datasets are of the order of petabytes. In our present study, we have acquired the National Agriculture Imagery Program (NAIP) dataset for the Continental United States at a spatial resolution of 1-m. This data comes as image tiles (a total of quarter million image scenes with 60 million pixels) and has a total size of 65 terabytes for a single acquisition. Features extracted from the entire dataset would amount to 8-10 petabytes. In our proposed approach, we have implemented a novel semi-automated machine learning algorithm rooted on the principles of "deep learning" to delineate the percentage of tree cover. Using the NASA Earth Exchange (NEX) initiative, we have developed an end-to-end architecture by integrating a segmentation module based on Statistical Region Merging, a classification algorithm using Deep Belief Network and a structured prediction algorithm using Conditional Random Fields to integrate the results from the segmentation and classification modules to create per-pixel class labels. The training process is scaled up using the power of GPUs and the prediction is scaled to quarter million NAIP tiles spanning the whole of Continental United States using the NEX HPC supercomputing cluster. An initial pilot over the state of California spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles has produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes.

  8. Classification of large-scale fundus image data sets: a cloud-computing framework.

    PubMed

    Roychowdhury, Sohini

    2016-08-01

    Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.

  9. Annual crop type classification of the U.S. Great Plains for 2000 to 2011

    USGS Publications Warehouse

    Howard, Daniel M.; Wylie, Bruce K.

    2014-01-01

    The purpose of this study was to increase the spatial and temporal availability of crop classification data. In this study, nearly 16.2 million crop observation points were used in the training of the US Great Plains classification tree crop type model (CTM). Each observation point was further defined by weekly Normalized Difference Vegetation Index, annual climate, and a number of other biogeophysical environmental characteristics. This study accounted for the most prevalent crop types in the region, including, corn, soybeans, winter wheat, spring wheat, cotton, sorghum, and alfalfa. Annual CTM crop maps of the US Great Plains were created for 2000 to 2011 at a spatial resolution of 250 meters. The CTM achieved an 87 percent classification success rate on 1.8 million observation points that were withheld from model training. Product validation was performed on greater than 15,000 county records with a coefficient of determination of R2 = 0.76.

  10. Community evolution mining and analysis in social network

    NASA Astrophysics Data System (ADS)

    Liu, Hongtao; Tian, Yuan; Liu, Xueyan; Jian, Jie

    2017-03-01

    With the development of digital and network technology, various social platforms emerge. These social platforms have greatly facilitated access to information, attracting more and more users. They use these social platforms every day to work, study and communicate, so every moment social platforms are generating massive amounts of data. These data can often be modeled as complex networks, making large-scale social network analysis possible. In this paper, the existing evolution classification model of community has been improved based on community evolution relationship over time in dynamic social network, and the Evolution-Tree structure is proposed which can show the whole life cycle of the community more clearly. The comparative test result shows that the improved model can excavate the evolution relationship of the community well.

  11. Bariatric Outcomes and Obesity Modeling: Study Meeting

    DTIC Science & Technology

    2010-09-17

    to obesity. 15. SUBJECT TERMS Bariatric Surgery , Cost Effectiveness, Surgical Outcome 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF a. REPORT...EFFECTIVENESS MODEL OVERVIEW  Two parts: 1) Decision Tree and 2) Natural History Model  Results: Bariatric Surgery is cost-effective compared to no...9,300 for AGB $10,600 for LRYGB AGB: Adjustable gastric banding LRYGB: laparoscopic Roux-en-Y gastric bypass A Financial Model of Bariatric Surgery for

  12. Classification and regression tree analysis of acute-on-chronic hepatitis B liver failure: Seeing the forest for the trees.

    PubMed

    Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H

    2017-02-01

    At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.

  13. Comparative study of classification algorithms for damage classification in smart composite laminates

    NASA Astrophysics Data System (ADS)

    Khan, Asif; Ryoo, Chang-Kyung; Kim, Heung Soo

    2017-04-01

    This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to model delamination at different interfaces along the thickness and longitudinal directions of the smart composite laminate. The input-output data obtained through surface bonded piezoelectric sensor and actuator is analyzed by the system identification algorithm to get the system parameters. The identified parameters for the healthy and delaminated structure are supplied as input data to the classification algorithms. The classification algorithms considered in this study are ZeroR, Classification via regression, Naïve Bayes, Multilayer Perceptron, Sequential Minimal Optimization, Multiclass-Classifier, and Decision tree (J48). The open source software of Waikato Environment for Knowledge Analysis (WEKA) is used to evaluate the classification performance of the classifiers mentioned above via 75-25 holdout and leave-one-sample-out cross-validation regarding classification accuracy, precision, recall, kappa statistic and ROC Area.

  14. Quaternion-Based Signal Analysis for Motor Imagery Classification from Electroencephalographic Signals.

    PubMed

    Batres-Mendoza, Patricia; Montoro-Sanjose, Carlos R; Guerra-Hernandez, Erick I; Almanza-Ojeda, Dora L; Rostro-Gonzalez, Horacio; Romero-Troncoso, Rene J; Ibarra-Manzano, Mario A

    2016-03-05

    Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG) signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA) to represent EEG signals obtained using a brain-computer interface (BCI) device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT), support vector machine (SVM) and k-nearest neighbor (KNN) techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states.

  15. Quaternion-Based Signal Analysis for Motor Imagery Classification from Electroencephalographic Signals

    PubMed Central

    Batres-Mendoza, Patricia; Montoro-Sanjose, Carlos R.; Guerra-Hernandez, Erick I.; Almanza-Ojeda, Dora L.; Rostro-Gonzalez, Horacio; Romero-Troncoso, Rene J.; Ibarra-Manzano, Mario A.

    2016-01-01

    Quaternions can be used as an alternative to model the fundamental patterns of electroencephalographic (EEG) signals in the time domain. Thus, this article presents a new quaternion-based technique known as quaternion-based signal analysis (QSA) to represent EEG signals obtained using a brain-computer interface (BCI) device to detect and interpret cognitive activity. This quaternion-based signal analysis technique can extract features to represent brain activity related to motor imagery accurately in various mental states. Experimental tests in which users where shown visual graphical cues related to left and right movements were used to collect BCI-recorded signals. These signals were then classified using decision trees (DT), support vector machine (SVM) and k-nearest neighbor (KNN) techniques. The quantitative analysis of the classifiers demonstrates that this technique can be used as an alternative in the EEG-signal modeling phase to identify mental states. PMID:26959029

  16. Using Classification Trees to Predict Alumni Giving for Higher Education

    ERIC Educational Resources Information Center

    Weerts, David J.; Ronca, Justin M.

    2009-01-01

    As the relative level of public support for higher education declines, colleges and universities aim to maximize alumni-giving to keep their programs competitive. Anchored in a utility maximization framework, this study employs the classification and regression tree methodology to examine characteristics of alumni donors and non-donors at a…

  17. Using classification tree analysis to predict oak wilt distribution in Minnesota and Texas

    Treesearch

    Marla c. Downing; Vernon L. Thomas; Jennifer Juzwik; David N. Appel; Robin M. Reich; Kim Camilli

    2008-01-01

    We developed a methodology and compared results for predicting the potential distribution of Ceratocystis fagacearum (causal agent of oak wilt), in both Anoka County, MN, and Fort Hood, TX. The Potential Distribution of Oak Wilt (PDOW) utilizes a binary classification tree statistical technique that incorporates: geographical information systems (GIS...

  18. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China

    PubMed Central

    Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin

    2016-01-01

    Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328

  19. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China.

    PubMed

    Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin

    2016-05-20

    In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.

  20. Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.

    2013-09-01

    This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the setmore » of predictors to around 8 factors that can be validated using reputable medical and public health resources.« less

Top