The process and utility of classification and regression tree methodology in nursing research
Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda
2014-01-01
Aim This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Background Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Design Discussion paper. Data sources English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984–2013. Discussion Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Implications for Nursing Research Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Conclusion Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. PMID:24237048
The process and utility of classification and regression tree methodology in nursing research.
Kuhn, Lisa; Page, Karen; Ward, John; Worrall-Carter, Linda
2014-06-01
This paper presents a discussion of classification and regression tree analysis and its utility in nursing research. Classification and regression tree analysis is an exploratory research method used to illustrate associations between variables not suited to traditional regression analysis. Complex interactions are demonstrated between covariates and variables of interest in inverted tree diagrams. Discussion paper. English language literature was sourced from eBooks, Medline Complete and CINAHL Plus databases, Google and Google Scholar, hard copy research texts and retrieved reference lists for terms including classification and regression tree* and derivatives and recursive partitioning from 1984-2013. Classification and regression tree analysis is an important method used to identify previously unknown patterns amongst data. Whilst there are several reasons to embrace this method as a means of exploratory quantitative research, issues regarding quality of data as well as the usefulness and validity of the findings should be considered. Classification and regression tree analysis is a valuable tool to guide nurses to reduce gaps in the application of evidence to practice. With the ever-expanding availability of data, it is important that nurses understand the utility and limitations of the research method. Classification and regression tree analysis is an easily interpreted method for modelling interactions between health-related variables that would otherwise remain obscured. Knowledge is presented graphically, providing insightful understanding of complex and hierarchical relationships in an accessible and useful way to nursing and other health professions. © 2013 The Authors. Journal of Advanced Nursing Published by John Wiley & Sons Ltd.
DIF Trees: Using Classification Trees to Detect Differential Item Functioning
ERIC Educational Resources Information Center
Vaughn, Brandon K.; Wang, Qiu
2010-01-01
A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H
2016-01-01
Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
Discriminant forest classification method and system
Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.
2012-11-06
A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.
A new approach to enhance the performance of decision tree for classifying gene expression data.
Hassan, Md; Kotagiri, Ramamohanarao
2013-12-20
Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.
NASA Astrophysics Data System (ADS)
Estuar, Maria Regina Justina; Victorino, John Noel; Coronel, Andrei; Co, Jerelyn; Tiausas, Francis; Señires, Chiara Veronica
2017-09-01
Use of wireless sensor networks and smartphone integration design to monitor environmental parameters surrounding plantations is made possible because of readily available and affordable sensors. Providing low cost monitoring devices would be beneficial, especially to small farm owners, in a developing country like the Philippines, where agriculture covers a significant amount of the labor market. This study discusses the integration of wireless soil sensor devices and smartphones to create an application that will use multidimensional analysis to detect the presence or absence of plant disease. Specifically, soil sensors are designed to collect soil quality parameters in a sink node from which the smartphone collects data from via Bluetooth. Given these, there is a need to develop a classification model on the mobile phone that will report infection status of a soil. Though tree classification is the most appropriate approach for continuous parameter-based datasets, there is a need to determine whether tree models will result to coherent results or not. Soil sensor data that resides on the phone is modeled using several variations of decision tree, namely: decision tree (DT), best-fit (BF) decision tree, functional tree (FT), Naive Bayes (NB) decision tree, J48, J48graft and LAD tree, where decision tree approaches the problem by considering all sensor nodes as one. Results show that there are significant differences among soil sensor parameters indicating that there are variances in scores between the infected and uninfected sites. Furthermore, analysis of variance in accuracy, recall, precision and F1 measure scores from tree classification models homogeneity among NBTree, J48graft and J48 tree classification models.
New Tree-Classification System Used by the Southern Forest Inventory and Analysis Unit
Dennis M. May; John S. Vissage; D. Vince Few
1990-01-01
Trees at USDA Forest Service, Southern Forest Inventory and Analysis, sample locations are classified as growing stock or cull based on their ability to produce sawlogs. The old and new classification systems are compared, and the impacts of the new system on the reporting of tree volumes are illustrated with inventory data from north Alabama.
NASA Astrophysics Data System (ADS)
Liu, Haijian; Wu, Changshan
2018-06-01
Crown-level tree species classification is a challenging task due to the spectral similarity among different tree species. Shadow, underlying objects, and other materials within a crown may decrease the purity of extracted crown spectra and further reduce classification accuracy. To address this problem, an innovative pixel-weighting approach was developed for tree species classification at the crown level. The method utilized high density discrete LiDAR data for individual tree delineation and Airborne Imaging Spectrometer for Applications (AISA) hyperspectral imagery for pure crown-scale spectra extraction. Specifically, three steps were included: 1) individual tree identification using LiDAR data, 2) pixel-weighted representative crown spectra calculation using hyperspectral imagery, with which pixel-based illuminated-leaf fractions estimated using a linear spectral mixture analysis (LSMA) were employed as weighted factors, and 3) representative spectra based tree species classification was performed through applying a support vector machine (SVM) approach. Analysis of results suggests that the developed pixel-weighting approach (OA = 82.12%, Kc = 0.74) performed better than treetop-based (OA = 70.86%, Kc = 0.58) and pixel-majority methods (OA = 72.26, Kc = 0.62) in terms of classification accuracy. McNemar tests indicated the differences in accuracy between pixel-weighting and treetop-based approaches as well as that between pixel-weighting and pixel-majority approaches were statistically significant.
Modeling time-to-event (survival) data using classification tree analysis.
Linden, Ariel; Yarnold, Paul R
2017-12-01
Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.
Fast Image Texture Classification Using Decision Trees
NASA Technical Reports Server (NTRS)
Thompson, David R.
2011-01-01
Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.
NASA Astrophysics Data System (ADS)
Krappe, Sebastian; Wittenberg, Thomas; Haferlach, Torsten; Münzenmayer, Christian
2016-03-01
The morphological differentiation of bone marrow is fundamental for the diagnosis of leukemia. Currently, the counting and classification of the different types of bone marrow cells is done manually under the use of bright field microscopy. This is a time-consuming, subjective, tedious and error-prone process. Furthermore, repeated examinations of a slide may yield intra- and inter-observer variances. For that reason a computer assisted diagnosis system for bone marrow differentiation is pursued. In this work we focus (a) on a new method for the separation of nucleus and plasma parts and (b) on a knowledge-based hierarchical tree classifier for the differentiation of bone marrow cells in 16 different classes. Classification trees are easily interpretable and understandable and provide a classification together with an explanation. Using classification trees, expert knowledge (i.e. knowledge about similar classes and cell lines in the tree model of hematopoiesis) is integrated in the structure of the tree. The proposed segmentation method is evaluated with more than 10,000 manually segmented cells. For the evaluation of the proposed hierarchical classifier more than 140,000 automatically segmented bone marrow cells are used. Future automated solutions for the morphological analysis of bone marrow smears could potentially apply such an approach for the pre-classification of bone marrow cells and thereby shortening the examination time.
NASA Astrophysics Data System (ADS)
Lin, Yi; Jiang, Miao
2017-01-01
Tree species information is essential for forest research and management purposes, which in turn require approaches for accurate and precise classification of tree species. One such remote sensing technology, terrestrial laser scanning (TLS), has proved to be capable of characterizing detailed tree structures, such as tree stem geometry. Can TLS further differentiate between broad- and needle-leaves? If the answer is positive, TLS data can be used for classification of taxonomic tree groups by directly examining their differences in leaf morphology. An analysis was proposed to assess TLS-represented broad- and needle-leaf structures, followed by a Bayes classifier to perform the classification. Tests indicated that the proposed method can basically implement the task, with an overall accuracy of 77.78%. This study indicates a way of implementing the classification of the two major broad- and needle-leaf taxonomies measured by TLS in accordance to their literal definitions, and manifests the potential of extending TLS applications in forestry.
Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard
2010-01-30
Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.
2010-01-01
Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. PMID:20113515
Seurinck, Sylvie; Deschepper, Ellen; Deboch, Bishaw; Verstraete, Willy; Siciliano, Steven
2006-03-01
Microbial source tracking (MST) methods need to be rapid, inexpensive and accurate. Unfortunately, many MST methods provide a wealth of information that is difficult to interpret by the regulators who use this information to make decisions. This paper describes the use of classification tree analysis to interpret the results of a MST method based on fatty acid methyl ester (FAME) profiles of Escherichia coli isolates, and to present results in a format readily interpretable by water quality managers. Raw sewage E. coli isolates and animal E. coli isolates from cow, dog, gull, and horse were isolated and their FAME profiles collected. Correct classification rates determined with leaveone-out cross-validation resulted in an overall low correct classification rate of 61%. A higher overall correct classification rate of 85% was obtained when the animal isolates were pooled together and compared to the raw sewage isolates. Bootstrap aggregation or adaptive resampling and combining of the FAME profile data increased correct classification rates substantially. Other MST methods may be better suited to differentiate between different fecal sources but classification tree analysis has enabled us to distinguish raw sewage from animal E. coli isolates, which previously had not been possible with other multivariate methods such as principal component analysis and cluster analysis.
NASA Technical Reports Server (NTRS)
Buntine, Wray
1993-01-01
This paper introduces the IND Tree Package to prospective users. IND does supervised learning using classification trees. This learning task is a basic tool used in the development of diagnosis, monitoring and expert systems. The IND Tree Package was developed as part of a NASA project to semi-automate the development of data analysis and modelling algorithms using artificial intelligence techniques. The IND Tree Package integrates features from CART and C4 with newer Bayesian and minimum encoding methods for growing classification trees and graphs. The IND Tree Package also provides an experimental control suite on top. The newer features give improved probability estimates often required in diagnostic and screening tasks. The package comes with a manual, Unix 'man' entries, and a guide to tree methods and research. The IND Tree Package is implemented in C under Unix and was beta-tested at university and commercial research laboratories in the United States.
ERIC Educational Resources Information Center
Brabant, Marie-Eve; Hebert, Martine; Chagnon, Francois
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression,…
Learning accurate very fast decision trees from uncertain data streams
NASA Astrophysics Data System (ADS)
Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo
2015-12-01
Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov
2015-01-01
The purpose of this report was to explicate the use of logistic regression and classification and regression tree (CART) analysis in the development of early warning systems. It was motivated by state education leaders' interest in maintaining high classification accuracy while simultaneously improving practitioner understanding of the rules by…
PCA based feature reduction to improve the accuracy of decision tree c4.5 classification
NASA Astrophysics Data System (ADS)
Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.
2018-03-01
Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.
Jose Negron
1997-01-01
Classification trees and linear regression analysis were used to build models to predict probabilities of infestation and amount of tree mortality in terms of basal area resulting from roundheaded pine beetle, Dendroctonus adjunctus Blandford, activity in ponderosa pine, Pinus ponderosa Laws., in the Sacramento Mountains, New Mexico. Classification trees were built for...
Comparing ecoregional classifications for natural areas management in the Klamath Region, USA
Sarr, Daniel A.; Duff, Andrew; Dinger, Eric C.; Shafer, Sarah L.; Wing, Michael; Seavy, Nathaniel E.; Alexander, John D.
2015-01-01
We compared three existing ecoregional classification schemes (Bailey, Omernik, and World Wildlife Fund) with two derived schemes (Omernik Revised and Climate Zones) to explore their effectiveness in explaining species distributions and to better understand natural resource geography in the Klamath Region, USA. We analyzed presence/absence data derived from digital distribution maps for trees, amphibians, large mammals, small mammals, migrant birds, and resident birds using three statistical analyses of classification accuracy (Analysis of Similarity, Canonical Analysis of Principal Coordinates, and Classification Strength). The classifications were roughly comparable in classification accuracy, with Omernik Revised showing the best overall performance. Trees showed the strongest fidelity to the classifications, and large mammals showed the weakest fidelity. We discuss the implications for regional biogeography and describe how intermediate resolution ecoregional classifications may be appropriate for use as natural areas management domains.
Differences in Risk Factors for Rotator Cuff Tears between Elderly Patients and Young Patients.
Watanabe, Akihisa; Ono, Qana; Nishigami, Tomohiko; Hirooka, Takahiko; Machida, Hirohisa
2018-02-01
It has been unclear whether the risk factors for rotator cuff tears are the same at all ages or differ between young and older populations. In this study, we examined the risk factors for rotator cuff tears using classification and regression tree analysis as methods of nonlinear regression analysis. There were 65 patients in the rotator cuff tears group and 45 patients in the intact rotator cuff group. Classification and regression tree analysis was performed to predict rotator cuff tears. The target factor was rotator cuff tears; explanatory variables were age, sex, trauma, and critical shoulder angle≥35°. In the results of classification and regression tree analysis, the tree was divided at age 64. For patients aged≥64, the tree was divided at trauma. For patients aged<64, the tree was divided at critical shoulder angle≥35°. The odds ratio for critical shoulder angle≥35° was significant for all ages (5.89), and for patients aged<64 (10.3) while trauma was only a significant factor for patients aged≥64 (5.13). Age, trauma, and critical shoulder angle≥35° were related to rotator cuff tears in this study. However, these risk factors showed different trends according to age group, not a linear relationship.
A novel approach to internal crown characterization for coniferous tree species classification
NASA Astrophysics Data System (ADS)
Harikumar, A.; Bovolo, F.; Bruzzone, L.
2016-10-01
The knowledge about individual trees in forest is highly beneficial in forest management. High density small foot- print multi-return airborne Light Detection and Ranging (LiDAR) data can provide a very accurate information about the structural properties of individual trees in forests. Every tree species has a unique set of crown structural characteristics that can be used for tree species classification. In this paper, we use both the internal and external crown structural information of a conifer tree crown, derived from a high density small foot-print multi-return LiDAR data acquisition for species classification. Considering the fact that branches are the major building blocks of a conifer tree crown, we obtain the internal crown structural information using a branch level analysis. The structure of each conifer branch is represented using clusters in the LiDAR point cloud. We propose the joint use of the k-means clustering and geometric shape fitting, on the LiDAR data projected onto a novel 3-dimensional space, to identify branch clusters. After mapping the identified clusters back to the original space, six internal geometric features are estimated using a branch-level analysis. The external crown characteristics are modeled by using six least correlated features based on cone fitting and convex hull. Species classification is performed using a sparse Support Vector Machines (sparse SVM) classifier.
Austin, Peter C; Lee, Douglas S
2011-01-01
Purpose: Classification trees are increasingly being used to classifying patients according to the presence or absence of a disease or health outcome. A limitation of classification trees is their limited predictive accuracy. In the data-mining and machine learning literature, boosting has been developed to improve classification. Boosting with classification trees iteratively grows classification trees in a sequence of reweighted datasets. In a given iteration, subjects that were misclassified in the previous iteration are weighted more highly than subjects that were correctly classified. Classifications from each of the classification trees in the sequence are combined through a weighted majority vote to produce a final classification. The authors' objective was to examine whether boosting improved the accuracy of classification trees for predicting outcomes in cardiovascular patients. Methods: We examined the utility of boosting classification trees for classifying 30-day mortality outcomes in patients hospitalized with either acute myocardial infarction or congestive heart failure. Results: Improvements in the misclassification rate using boosted classification trees were at best minor compared to when conventional classification trees were used. Minor to modest improvements to sensitivity were observed, with only a negligible reduction in specificity. For predicting cardiovascular mortality, boosted classification trees had high specificity, but low sensitivity. Conclusions: Gains in predictive accuracy for predicting cardiovascular outcomes were less impressive than gains in performance observed in the data mining literature. PMID:22254181
Using classification tree analysis to predict oak wilt distribution in Minnesota and Texas
Marla c. Downing; Vernon L. Thomas; Jennifer Juzwik; David N. Appel; Robin M. Reich; Kim Camilli
2008-01-01
We developed a methodology and compared results for predicting the potential distribution of Ceratocystis fagacearum (causal agent of oak wilt), in both Anoka County, MN, and Fort Hood, TX. The Potential Distribution of Oak Wilt (PDOW) utilizes a binary classification tree statistical technique that incorporates: geographical information systems (GIS...
Singh, Minerva; Evans, Damian; Tan, Boun Suy; Nin, Chan Samean
2015-01-01
At present, there is very limited information on the ecology, distribution, and structure of Cambodia's tree species to warrant suitable conservation measures. The aim of this study was to assess various methods of analysis of aerial imagery for characterization of the forest mensuration variables (i.e., tree height and crown width) of selected tree species found in the forested region around the temples of Angkor Thom, Cambodia. Object-based image analysis (OBIA) was used (using multiresolution segmentation) to delineate individual tree crowns from very-high-resolution (VHR) aerial imagery and light detection and ranging (LiDAR) data. Crown width and tree height values that were extracted using multiresolution segmentation showed a high level of congruence with field-measured values of the trees (Spearman's rho 0.782 and 0.589, respectively). Individual tree crowns that were delineated from aerial imagery using multiresolution segmentation had a high level of segmentation accuracy (69.22%), whereas tree crowns delineated using watershed segmentation underestimated the field-measured tree crown widths. Both spectral angle mapper (SAM) and maximum likelihood (ML) classifications were applied to the aerial imagery for mapping of selected tree species. The latter was found to be more suitable for tree species classification. Individual tree species were identified with high accuracy. Inclusion of textural information further improved species identification, albeit marginally. Our findings suggest that VHR aerial imagery, in conjunction with OBIA-based segmentation methods (such as multiresolution segmentation) and supervised classification techniques are useful for tree species mapping and for studies of the forest mensuration variables.
Singh, Minerva; Evans, Damian; Tan, Boun Suy; Nin, Chan Samean
2015-01-01
At present, there is very limited information on the ecology, distribution, and structure of Cambodia’s tree species to warrant suitable conservation measures. The aim of this study was to assess various methods of analysis of aerial imagery for characterization of the forest mensuration variables (i.e., tree height and crown width) of selected tree species found in the forested region around the temples of Angkor Thom, Cambodia. Object-based image analysis (OBIA) was used (using multiresolution segmentation) to delineate individual tree crowns from very-high-resolution (VHR) aerial imagery and light detection and ranging (LiDAR) data. Crown width and tree height values that were extracted using multiresolution segmentation showed a high level of congruence with field-measured values of the trees (Spearman’s rho 0.782 and 0.589, respectively). Individual tree crowns that were delineated from aerial imagery using multiresolution segmentation had a high level of segmentation accuracy (69.22%), whereas tree crowns delineated using watershed segmentation underestimated the field-measured tree crown widths. Both spectral angle mapper (SAM) and maximum likelihood (ML) classifications were applied to the aerial imagery for mapping of selected tree species. The latter was found to be more suitable for tree species classification. Individual tree species were identified with high accuracy. Inclusion of textural information further improved species identification, albeit marginally. Our findings suggest that VHR aerial imagery, in conjunction with OBIA-based segmentation methods (such as multiresolution segmentation) and supervised classification techniques are useful for tree species mapping and for studies of the forest mensuration variables. PMID:25902148
2002-01-01
their expression profile and for classification of cells into tumerous and non- tumerous classes. Then we will present a parallel tree method for... cancerous cells. We will use the same dataset and use tree structured classifiers with multi-resolution analysis for classifying cancerous from non- cancerous ...cells. We have the expressions of 4096 genes from 98 different cell types. Of these 98, 72 are cancerous while 26 are non- cancerous . We are interested
ERIC Educational Resources Information Center
Cohen, Ira L.; Liu, Xudong; Hudson, Melissa; Gillis, Jennifer; Cavalari, Rachel N. S.; Romanczyk, Raymond G.; Karmel, Bernard Z.; Gardner, Judith M.
2016-01-01
In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80%,…
Bevilacqua, M; Ciarapica, F E; Giacchetta, G
2008-07-01
This work is an attempt to apply classification tree methods to data regarding accidents in a medium-sized refinery, so as to identify the important relationships between the variables, which can be considered as decision-making rules when adopting any measures for improvement. The results obtained using the CART (Classification And Regression Trees) method proved to be the most precise and, in general, they are encouraging concerning the use of tree diagrams as preliminary explorative techniques for the assessment of the ergonomic, management and operational parameters which influence high accident risk situations. The Occupational Injury analysis carried out in this paper was planned as a dynamic process and can be repeated systematically. The CART technique, which considers a very wide set of objective and predictive variables, shows new cause-effect correlations in occupational safety which had never been previously described, highlighting possible injury risk groups and supporting decision-making in these areas. The use of classification trees must not, however, be seen as an attempt to supplant other techniques, but as a complementary method which can be integrated into traditional types of analysis.
NASA Technical Reports Server (NTRS)
Fagan, Matthew E.; Defries, Ruth S.; Sesnie, Steven E.; Arroyo-Mora, J. Pablo; Soto, Carlomagno; Singh, Aditya; Townsend, Philip A.; Chazdon, Robin L.
2015-01-01
An efficient means to map tree plantations is needed to detect tropical land use change and evaluate reforestation projects. To analyze recent tree plantation expansion in northeastern Costa Rica, we examined the potential of combining moderate-resolution hyperspectral imagery (2005 HyMap mosaic) with multitemporal, multispectral data (Landsat) to accurately classify (1) general forest types and (2) tree plantations by species composition. Following a linear discriminant analysis to reduce data dimensionality, we compared four Random Forest classification models: hyperspectral data (HD) alone; HD plus interannual spectral metrics; HD plus a multitemporal forest regrowth classification; and all three models combined. The fourth, combined model achieved overall accuracy of 88.5%. Adding multitemporal data significantly improved classification accuracy (p less than 0.0001) of all forest types, although the effect on tree plantation accuracy was modest. The hyperspectral data alone classified six species of tree plantations with 75% to 93% producer's accuracy; adding multitemporal spectral data increased accuracy only for two species with dense canopies. Non-native tree species had higher classification accuracy overall and made up the majority of tree plantations in this landscape. Our results indicate that combining occasionally acquired hyperspectral data with widely available multitemporal satellite imagery enhances mapping and monitoring of reforestation in tropical landscapes.
NASA Astrophysics Data System (ADS)
Wang, Z.; Wu, J.; Wang, Y.; Kong, X.; Bao, H.; Ni, Y.; Ma, L.; Jin, J.
2018-05-01
Mapping tree species is essential for sustainable planning as well as to improve our understanding of the role of different trees as different ecological service. However, crown-level tree species automatic classification is a challenging task due to the spectral similarity among diversified tree species, fine-scale spatial variation, shadow, and underlying objects within a crown. Advanced remote sensing data such as airborne Light Detection and Ranging (LiDAR) and hyperspectral imagery offer a great potential opportunity to derive crown spectral, structure and canopy physiological information at the individual crown scale, which can be useful for mapping tree species. In this paper, an innovative approach was developed for tree species classification at the crown level. The method utilized LiDAR data for individual tree crown delineation and morphological structure extraction, and Compact Airborne Spectrographic Imager (CASI) hyperspectral imagery for pure crown-scale spectral extraction. Specifically, four steps were include: 1) A weighted mean filtering method was developed to improve the accuracy of the smoothed Canopy Height Model (CHM) derived from LiDAR data; 2) The marker-controlled watershed segmentation algorithm was, therefore, also employed to delineate the tree-level canopy from the CHM image in this study, and then individual tree height and tree crown were calculated according to the delineated crown; 3) Spectral features within 3 × 3 neighborhood regions centered on the treetops detected by the treetop detection algorithm were derived from the spectrally normalized CASI imagery; 4) The shape characteristics related to their crown diameters and heights were established, and different crown-level tree species were classified using the combination of spectral and shape characteristics. Analysis of results suggests that the developed classification strategy in this paper (OA = 85.12 %, Kc = 0.90) performed better than LiDAR-metrics method (OA = 79.86 %, Kc = 0.81) and spectral-metircs method (OA = 71.26, Kc = 0.69) in terms of classification accuracy, which indicated that the advanced method of data processing and sensitive feature selection are critical for improving the accuracy of crown-level tree species classification.
ERIC Educational Resources Information Center
Kitsantas, Anastasia; Kitsantas, Panagiota; Kitsantas, Thomas
2012-01-01
The purpose of this exploratory study was to assess the relative importance of a number of variables in predicting students' interest in math and/or computer science. Classification and regression trees (CART) were employed in the analysis of survey data collected from 276 college students enrolled in two U.S. and Greek universities. The results…
Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw
2006-01-01
We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.
NASA Astrophysics Data System (ADS)
Shahriari Nia, Morteza; Wang, Daisy Zhe; Bohlman, Stephanie Ann; Gader, Paul; Graves, Sarah J.; Petrovic, Milenko
2015-01-01
Hyperspectral images can be used to identify savannah tree species at the landscape scale, which is a key step in measuring biomass and carbon, and tracking changes in species distributions, including invasive species, in these ecosystems. Before automated species mapping can be performed, image processing and atmospheric correction is often performed, which can potentially affect the performance of classification algorithms. We determine how three processing and correction techniques (atmospheric correction, Gaussian filters, and shade/green vegetation filters) affect the prediction accuracy of classification of tree species at pixel level from airborne visible/infrared imaging spectrometer imagery of longleaf pine savanna in Central Florida, United States. Species classification using fast line-of-sight atmospheric analysis of spectral hypercubes (FLAASH) atmospheric correction outperformed ATCOR in the majority of cases. Green vegetation (normalized difference vegetation index) and shade (near-infrared) filters did not increase classification accuracy when applied to large and continuous patches of specific species. Finally, applying a Gaussian filter reduces interband noise and increases species classification accuracy. Using the optimal preprocessing steps, our classification accuracy of six species classes is about 75%.
Tree Classification with Fused Mobile Laser Scanning and Hyperspectral Data
Puttonen, Eetu; Jaakkola, Anttoni; Litkey, Paula; Hyyppä, Juha
2011-01-01
Mobile Laser Scanning data were collected simultaneously with hyperspectral data using the Finnish Geodetic Institute Sensei system. The data were tested for tree species classification. The test area was an urban garden in the City of Espoo, Finland. Point clouds representing 168 individual tree specimens of 23 tree species were determined manually. The classification of the trees was done using first only the spatial data from point clouds, then with only the spectral data obtained with a spectrometer, and finally with the combined spatial and hyperspectral data from both sensors. Two classification tests were performed: the separation of coniferous and deciduous trees, and the identification of individual tree species. All determined tree specimens were used in distinguishing coniferous and deciduous trees. A subset of 133 trees and 10 tree species was used in the tree species classification. The best classification results for the fused data were 95.8% for the separation of the coniferous and deciduous classes. The best overall tree species classification succeeded with 83.5% accuracy for the best tested fused data feature combination. The respective results for paired structural features derived from the laser point cloud were 90.5% for the separation of the coniferous and deciduous classes and 65.4% for the species classification. Classification accuracies with paired hyperspectral reflectance value data were 90.5% for the separation of coniferous and deciduous classes and 62.4% for different species. The results are among the first of their kind and they show that mobile collected fused data outperformed single-sensor data in both classification tests and by a significant margin. PMID:22163894
Tree classification with fused mobile laser scanning and hyperspectral data.
Puttonen, Eetu; Jaakkola, Anttoni; Litkey, Paula; Hyyppä, Juha
2011-01-01
Mobile Laser Scanning data were collected simultaneously with hyperspectral data using the Finnish Geodetic Institute Sensei system. The data were tested for tree species classification. The test area was an urban garden in the City of Espoo, Finland. Point clouds representing 168 individual tree specimens of 23 tree species were determined manually. The classification of the trees was done using first only the spatial data from point clouds, then with only the spectral data obtained with a spectrometer, and finally with the combined spatial and hyperspectral data from both sensors. Two classification tests were performed: the separation of coniferous and deciduous trees, and the identification of individual tree species. All determined tree specimens were used in distinguishing coniferous and deciduous trees. A subset of 133 trees and 10 tree species was used in the tree species classification. The best classification results for the fused data were 95.8% for the separation of the coniferous and deciduous classes. The best overall tree species classification succeeded with 83.5% accuracy for the best tested fused data feature combination. The respective results for paired structural features derived from the laser point cloud were 90.5% for the separation of the coniferous and deciduous classes and 65.4% for the species classification. Classification accuracies with paired hyperspectral reflectance value data were 90.5% for the separation of coniferous and deciduous classes and 62.4% for different species. The results are among the first of their kind and they show that mobile collected fused data outperformed single-sensor data in both classification tests and by a significant margin.
Baltzer, Pascal A T; Dietzel, Matthias; Kaiser, Werner A
2013-08-01
In the face of multiple available diagnostic criteria in MR-mammography (MRM), a practical algorithm for lesion classification is needed. Such an algorithm should be as simple as possible and include only important independent lesion features to differentiate benign from malignant lesions. This investigation aimed to develop a simple classification tree for differential diagnosis in MRM. A total of 1,084 lesions in standardised MRM with subsequent histological verification (648 malignant, 436 benign) were investigated. Seventeen lesion criteria were assessed by 2 readers in consensus. Classification analysis was performed using the chi-squared automatic interaction detection (CHAID) method. Results include the probability for malignancy for every descriptor combination in the classification tree. A classification tree incorporating 5 lesion descriptors with a depth of 3 ramifications (1, root sign; 2, delayed enhancement pattern; 3, border, internal enhancement and oedema) was calculated. Of all 1,084 lesions, 262 (40.4 %) and 106 (24.3 %) could be classified as malignant and benign with an accuracy above 95 %, respectively. Overall diagnostic accuracy was 88.4 %. The classification algorithm reduced the number of categorical descriptors from 17 to 5 (29.4 %), resulting in a high classification accuracy. More than one third of all lesions could be classified with accuracy above 95 %. • A practical algorithm has been developed to classify lesions found in MR-mammography. • A simple decision tree consisting of five criteria reaches high accuracy of 88.4 %. • Unique to this approach, each classification is associated with a diagnostic certainty. • Diagnostic certainty of greater than 95 % is achieved in 34 % of all cases.
Distribution of cavity trees in midwestern old-growth and second-growth forests
Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R. Thompson; David R. Larsen
2003-01-01
We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
Distribution of cavity trees in midwesternold-growth and second-growth forests
Zhaofei Fan; Stephen R. Shifley; Martin A. Spetich; Frank R., III Thompson; David R. Larsen
2003-01-01
We used classification and regression tree analysis to determine the primary variables associated with the occurrence of cavity trees and the hierarchical structure among those variables. We applied that information to develop logistic models predicting cavity tree probability as a function of diameter, species group, and decay class. Inventories of cavity abundance in...
Hill, Ryan M; Oosterhoff, Benjamin; Kaplow, Julie B
2017-07-01
Although a large number of risk markers for suicide ideation have been identified, little guidance has been provided to prospectively identify adolescents at risk for suicide ideation within community settings. The current study addressed this gap in the literature by utilizing classification tree analysis (CTA) to provide a decision-making model for screening adolescents at risk for suicide ideation. Participants were N = 4,799 youth (Mage = 16.15 years, SD = 1.63) who completed both Waves 1 and 2 of the National Longitudinal Study of Adolescent to Adult Health. CTA was used to generate a series of decision rules for identifying adolescents at risk for reporting suicide ideation at Wave 2. Findings revealed 3 distinct solutions with varying sensitivity and specificity for identifying adolescents who reported suicide ideation. Sensitivity of the classification trees ranged from 44.6% to 77.6%. The tree with greatest specificity and lowest sensitivity was based on a history of suicide ideation. The tree with moderate sensitivity and high specificity was based on depressive symptoms, suicide attempts or suicide among family and friends, and social support. The most sensitive but least specific tree utilized these factors and gender, ethnicity, hours of sleep, school-related factors, and future orientation. These classification trees offer community organizations options for instituting large-scale screenings for suicide ideation risk depending on the available resources and modality of services to be provided. This study provides a theoretically and empirically driven model for prospectively identifying adolescents at risk for suicide ideation and has implications for preventive interventions among at-risk youth. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Rendon, Ricardo A; Mason, Ross J; Kirkland, Susan; Lawen, Joseph G; Abdolell, Mohamed
2014-08-01
To develop a classification tree for the preoperative prediction of benign versus malignant disease in patients with small renal masses. This is a retrospective study including 395 consecutive patients who underwent surgical treatment for a renal mass < 5 cm in maximum diameter between July 1st 2001 and June 30th 2010. A classification tree to predict the risk of having a benign renal mass preoperatively was developed using recursive partitioning analysis for repeated measures outcomes. Age, sex, volume on preoperative imaging, tumor location (central/peripheral), degree of endophytic component (1%-100%), and tumor axis position were used as potential predictors to develop the model. Forty-five patients (11.4%) were found to have a benign mass postoperatively. A classification tree has been developed which can predict the risk of benign disease with an accuracy of 88.9% (95% CI: 85.3 to 91.8). The significant prognostic factors in the classification tree are tumor volume, degree of endophytic component and symptoms at diagnosis. As an example of its utilization, a renal mass with a volume of < 5.67 cm3 that is < 45% endophytic has a 52.6% chance of having benign pathology. Conversely, a renal mass with a volume ≥ 5.67 cm3 that is ≥ 35% endophytic has only a 5.3% possibility of being benign. A classification tree to predict the risk of benign disease in small renal masses has been developed to aid the clinician when deciding on treatment strategies for small renal masses.
Vetter, Jeffrey S.
2005-02-01
The method and system described herein presents a technique for performance analysis that helps users understand the communication behavior of their message passing applications. The method and system described herein may automatically classifies individual communication operations and reveal the cause of communication inefficiencies in the application. This classification allows the developer to quickly focus on the culprits of truly inefficient behavior, rather than manually foraging through massive amounts of performance data. Specifically, the method and system described herein trace the message operations of Message Passing Interface (MPI) applications and then classify each individual communication event using a supervised learning technique: decision tree classification. The decision tree may be trained using microbenchmarks that demonstrate both efficient and inefficient communication. Since the method and system described herein adapt to the target system's configuration through these microbenchmarks, they simultaneously automate the performance analysis process and improve classification accuracy. The method and system described herein may improve the accuracy of performance analysis and dramatically reduce the amount of data that users must encounter.
Chen, Xiao Yu; Ma, Li Zhuang; Chu, Na; Zhou, Min; Hu, Yiyang
2013-01-01
Chronic hepatitis B (CHB) is a serious public health problem, and Traditional Chinese Medicine (TCM) plays an important role in the control and treatment for CHB. In the treatment of TCM, zheng discrimination is the most important step. In this paper, an approach based on CFS-GA (Correlation based Feature Selection and Genetic Algorithm) and C5.0 boost decision tree is used for zheng classification and progression in the TCM treatment of CHB. The CFS-GA performs better than the typical method of CFS. By CFS-GA, the acquired attribute subset is classified by C5.0 boost decision tree for TCM zheng classification of CHB, and C5.0 decision tree outperforms two typical decision trees of NBTree and REPTree on CFS-GA, CFS, and nonselection in comparison. Based on the critical indicators from C5.0 decision tree, important lab indicators in zheng progression are obtained by the method of stepwise discriminant analysis for expressing TCM zhengs in CHB, and alterations of the important indicators are also analyzed in zheng progression. In conclusion, all the three decision trees perform better on CFS-GA than on CFS and nonselection, and C5.0 decision tree outperforms the two typical decision trees both on attribute selection and nonselection.
Characterisation of Feature Points in Eye Fundus Images
NASA Astrophysics Data System (ADS)
Calvo, D.; Ortega, M.; Penedo, M. G.; Rouco, J.
The retinal vessel tree adds decisive knowledge in the diagnosis of numerous opthalmologic pathologies such as hypertension or diabetes. One of the problems in the analysis of the retinal vessel tree is the lack of information in terms of vessels depth as the image acquisition usually leads to a 2D image. This situation provokes a scenario where two different vessels coinciding in a point could be interpreted as a vessel forking into a bifurcation. That is why, for traking and labelling the retinal vascular tree, bifurcations and crossovers of vessels are considered feature points. In this work a novel method for these retinal vessel tree feature points detection and classification is introduced. The method applies image techniques such as filters or thinning to obtain the adequate structure to detect the points and sets a classification of these points studying its environment. The methodology is tested using a standard database and the results show high classification capabilities.
A hybrid method for classifying cognitive states from fMRI data.
Parida, S; Dehuri, S; Cho, S-B; Cacha, L A; Poznanski, R R
2015-09-01
Functional magnetic resonance imaging (fMRI) makes it possible to detect brain activities in order to elucidate cognitive-states. The complex nature of fMRI data requires under-standing of the analyses applied to produce possible avenues for developing models of cognitive state classification and improving brain activity prediction. While many models of classification task of fMRI data analysis have been developed, in this paper, we present a novel hybrid technique through combining the best attributes of genetic algorithms (GAs) and ensemble decision tree technique that consistently outperforms all other methods which are being used for cognitive-state classification. Specifically, this paper illustrates the combined effort of decision-trees ensemble and GAs for feature selection through an extensive simulation study and discusses the classification performance with respect to fMRI data. We have shown that our proposed method exhibits significant reduction of the number of features with clear edge classification accuracy over ensemble of decision-trees.
Detection of Aspens Using High Resolution Aerial Laser Scanning Data and Digital Aerial Images
Säynäjoki, Raita; Packalén, Petteri; Maltamo, Matti; Vehmas, Mikko; Eerikäinen, Kalle
2008-01-01
The aim was to use high resolution Aerial Laser Scanning (ALS) data and aerial images to detect European aspen (Populus tremula L.) from among other deciduous trees. The field data consisted of 14 sample plots of 30 m × 30 m size located in the Koli National Park in the North Karelia, Eastern Finland. A Canopy Height Model (CHM) was interpolated from the ALS data with a pulse density of 3.86/m2, low-pass filtered using Height-Based Filtering (HBF) and binarized to create the mask needed to separate the ground pixels from the canopy pixels within individual areas. Watershed segmentation was applied to the low-pass filtered CHM in order to create preliminary canopy segments, from which the non-canopy elements were extracted to obtain the final canopy segmentation, i.e. the ground mask was analysed against the canopy mask. A manual classification of aerial images was employed to separate the canopy segments of deciduous trees from those of coniferous trees. Finally, linear discriminant analysis was applied to the correctly classified canopy segments of deciduous trees to classify them into segments belonging to aspen and those belonging to other deciduous trees. The independent variables used in the classification were obtained from the first pulse ALS point data. The accuracy of discrimination between aspen and other deciduous trees was 78.6%. The independent variables in the classification function were the proportion of vegetation hits, the standard deviation of in pulse heights, accumulated intensity at the 90th percentile and the proportion of laser points reflected at the 60th height percentile. The accuracy of classification corresponded to the validation results of earlier ALS-based studies on the classification of individual deciduous trees to tree species. PMID:27873799
Diagnostic classification scheme in Iranian breast cancer patients using a decision tree.
Malehi, Amal Saki
2014-01-01
The objective of this study was to determine a diagnostic classification scheme using a decision tree based model. The study was conducted as a retrospective case-control study in Imam Khomeini hospital in Tehran during 2001 to 2009. Data, including demographic and clinical-pathological characteristics, were uniformly collected from 624 females, 312 of them were referred with positive diagnosis of breast cancer (cases) and 312 healthy women (controls). The decision tree was implemented to develop a diagnostic classification scheme using CART 6.0 Software. The AUC (area under curve), was measured as the overall performance of diagnostic classification of the decision tree. Five variables as main risk factors of breast cancer and six subgroups as high risk were identified. The results indicated that increasing age, low age at menarche, single and divorced statues, irregular menarche pattern and family history of breast cancer are the important diagnostic factors in Iranian breast cancer patients. The sensitivity and specificity of the analysis were 66% and 86.9% respectively. The high AUC (0.82) also showed an excellent classification and diagnostic performance of the model. Decision tree based model appears to be suitable for identifying risk factors and high or low risk subgroups. It can also assists clinicians in making a decision, since it can identify underlying prognostic relationships and understanding the model is very explicit.
Forest Stand Segmentation Using Airborne LIDAR Data and Very High Resolution Multispectral Imagery
NASA Astrophysics Data System (ADS)
Dechesne, Clément; Mallet, Clément; Le Bris, Arnaud; Gouet, Valérie; Hervieu, Alexandre
2016-06-01
Forest stands are the basic units for forest inventory and mapping. Stands are large forested areas (e.g., ≥ 2 ha) of homogeneous tree species composition. The accurate delineation of forest stands is usually performed by visual analysis of human operators on very high resolution (VHR) optical images. This work is highly time consuming and should be automated for scalability purposes. In this paper, a method based on the fusion of airborne laser scanning data (or lidar) and very high resolution multispectral imagery for automatic forest stand delineation and forest land-cover database update is proposed. The multispectral images give access to the tree species whereas 3D lidar point clouds provide geometric information on the trees. Therefore, multi-modal features are computed, both at pixel and object levels. The objects are individual trees extracted from lidar data. A supervised classification is performed at the object level on the computed features in order to coarsely discriminate the existing tree species in the area of interest. The analysis at tree level is particularly relevant since it significantly improves the tree species classification. A probability map is generated through the tree species classification and inserted with the pixel-based features map in an energetical framework. The proposed energy is then minimized using a standard graph-cut method (namely QPBO with α-expansion) in order to produce a segmentation map with a controlled level of details. Comparison with an existing forest land cover database shows that our method provides satisfactory results both in terms of stand labelling and delineation (matching ranges between 94% and 99%).
Skoura, Angeliki; Bakic, Predrag R; Megalooikonomou, Vasilis
2013-01-01
The analysis of anatomical tree-shape structures visualized in medical images provides insight into the relationship between tree topology and pathology of the corresponding organs. In this paper, we propose three methods to extract descriptive features of the branching topology; the asymmetry index, the encoding of branching patterns using a node labeling scheme and an extension of the Sholl analysis. Based on these descriptors, we present classification schemes for tree topologies with respect to the underlying pathology. Moreover, we present a classifier ensemble approach which combines the predictions of the individual classifiers to optimize the classification accuracy. We applied the proposed methodology to a dataset of x-ray galactograms, medical images which visualize the breast ductal tree, in order to recognize images with radiological findings regarding breast cancer. The experimental results demonstrate the effectiveness of the proposed framework compared to state-of-the-art techniques suggesting that the proposed descriptors provide more valuable information regarding the topological patterns of ductal trees and indicating the potential of facilitating early breast cancer diagnosis.
Skoura, Angeliki; Bakic, Predrag R.; Megalooikonomou, Vasilis
2014-01-01
The analysis of anatomical tree-shape structures visualized in medical images provides insight into the relationship between tree topology and pathology of the corresponding organs. In this paper, we propose three methods to extract descriptive features of the branching topology; the asymmetry index, the encoding of branching patterns using a node labeling scheme and an extension of the Sholl analysis. Based on these descriptors, we present classification schemes for tree topologies with respect to the underlying pathology. Moreover, we present a classifier ensemble approach which combines the predictions of the individual classifiers to optimize the classification accuracy. We applied the proposed methodology to a dataset of x-ray galactograms, medical images which visualize the breast ductal tree, in order to recognize images with radiological findings regarding breast cancer. The experimental results demonstrate the effectiveness of the proposed framework compared to state-of-the-art techniques suggesting that the proposed descriptors provide more valuable information regarding the topological patterns of ductal trees and indicating the potential of facilitating early breast cancer diagnosis. PMID:25414850
NASA Astrophysics Data System (ADS)
Hamedianfar, Alireza; Shafri, Helmi Zulhaidi Mohd
2016-04-01
This paper integrates decision tree-based data mining (DM) and object-based image analysis (OBIA) to provide a transferable model for the detailed characterization of urban land-cover classes using WorldView-2 (WV-2) satellite images. Many articles have been published on OBIA in recent years based on DM for different applications. However, less attention has been paid to the generation of a transferable model for characterizing detailed urban land cover features. Three subsets of WV-2 images were used in this paper to generate transferable OBIA rule-sets. Many features were explored by using a DM algorithm, which created the classification rules as a decision tree (DT) structure from the first study area. The developed DT algorithm was applied to object-based classifications in the first study area. After this process, we validated the capability and transferability of the classification rules into second and third subsets. Detailed ground truth samples were collected to assess the classification results. The first, second, and third study areas achieved 88%, 85%, and 85% overall accuracies, respectively. Results from the investigation indicate that DM was an efficient method to provide the optimal and transferable classification rules for OBIA, which accelerates the rule-sets creation stage in the OBIA classification domain.
Tree species classification in subtropical forests using small-footprint full-waveform LiDAR data
NASA Astrophysics Data System (ADS)
Cao, Lin; Coops, Nicholas C.; Innes, John L.; Dai, Jinsong; Ruan, Honghua; She, Guanghui
2016-07-01
The accurate classification of tree species is critical for the management of forest ecosystems, particularly subtropical forests, which are highly diverse and complex ecosystems. While airborne Light Detection and Ranging (LiDAR) technology offers significant potential to estimate forest structural attributes, the capacity of this new tool to classify species is less well known. In this research, full-waveform metrics were extracted by a voxel-based composite waveform approach and examined with a Random Forests classifier to discriminate six subtropical tree species (i.e., Masson pine (Pinus massoniana Lamb.)), Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.), Slash pines (Pinus elliottii Engelm.), Sawtooth oak (Quercus acutissima Carruth.) and Chinese holly (Ilex chinensis Sims.) at three levels of discrimination. As part of the analysis, the optimal voxel size for modelling the composite waveforms was investigated, the most important predictor metrics for species classification assessed and the effect of scan angle on species discrimination examined. Results demonstrate that all tree species were classified with relatively high accuracy (68.6% for six classes, 75.8% for four main species and 86.2% for conifers and broadleaved trees). Full-waveform metrics (based on height of median energy, waveform distance and number of waveform peaks) demonstrated high classification importance and were stable among various voxel sizes. The results also suggest that the voxel based approach can alleviate some of the issues associated with large scan angles. In summary, the results indicate that full-waveform LIDAR data have significant potential for tree species classification in the subtropical forests.
NASA Astrophysics Data System (ADS)
Książek, Judyta
2015-10-01
At present, there has been a great interest in the development of texture based image classification methods in many different areas. This study presents the results of research carried out to assess the usefulness of selected textural features for detection of asbestos-cement roofs in orthophotomap classification. Two different orthophotomaps of southern Poland (with ground resolution: 5 cm and 25 cm) were used. On both orthoimages representative samples for two classes: asbestos-cement roofing sheets and other roofing materials were selected. Estimation of texture analysis usefulness was conducted using machine learning methods based on decision trees (C5.0 algorithm). For this purpose, various sets of texture parameters were calculated in MaZda software. During the calculation of decision trees different numbers of texture parameters groups were considered. In order to obtain the best settings for decision trees models cross-validation was performed. Decision trees models with the lowest mean classification error were selected. The accuracy of the classification was held based on validation data sets, which were not used for the classification learning. For 5 cm ground resolution samples, the lowest mean classification error was 15.6%. The lowest mean classification error in the case of 25 cm ground resolution was 20.0%. The obtained results confirm potential usefulness of the texture parameter image processing for detection of asbestos-cement roofing sheets. In order to improve the accuracy another extended study should be considered in which additional textural features as well as spectral characteristics should be analyzed.
ERIC Educational Resources Information Center
Montoya, Isaac D.
2008-01-01
Three classification techniques (Chi-square Automatic Interaction Detection [CHAID], Classification and Regression Tree [CART], and discriminant analysis) were tested to determine their accuracy in predicting Temporary Assistance for Needy Families program recipients' future employment. Technique evaluation was based on proportion of correctly…
NASA Astrophysics Data System (ADS)
Kotelnikov, E. V.; Milov, V. R.
2018-05-01
Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.
A new tree classification system for southern hardwoods
James S. Meadows; Daniel A. Jr. Skojac
2008-01-01
A new tree classification system for southern hardwoods is described. The new system is based on the Putnam tree classification system, originally developed by Putnam et al., 1960, Management ond inventory of southern hardwoods, Agriculture Handbook 181, US For. Sew., Washington, DC, which consists of four tree classes: (1) preferred growing stock, (2) reserve growing...
Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung
2015-12-01
This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
Learning from examples - Generation and evaluation of decision trees for software resource analysis
NASA Technical Reports Server (NTRS)
Selby, Richard W.; Porter, Adam A.
1988-01-01
A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.
Relating FIA data to habitat classifications via tree-based models of canopy cover
Mark D. Nelson; Brian G. Tavernia; Chris Toney; Brian F. Walters
2012-01-01
Wildlife species-habitat matrices are used to relate lists of species with abundance of their habitats. The Forest Inventory and Analysis Program provides data on forest composition and structure, but these attributes may not correspond directly with definitions of wildlife habitats. We used FIA tree data and tree crown diameter models to estimate canopy cover, from...
Jose F. Negron
1998-01-01
Infested and uninfested areas within Douglas fir, Pseudotsuga menziesii Mirb.. Franco, stands affected by the Douglas-fir beetle, Dendroctonus pseudotsugae Hopk. were sampled in the Colorado Front Range, CO. Classification tree models were built to predict probabilities of infestation. Regression trees and linear regression analysis were used to model amount of tree...
Jiao, Y; Chen, R; Ke, X; Cheng, L; Chu, K; Lu, Z; Herskovits, E H
2011-01-01
Autism spectrum disorder (ASD) is a neurodevelopmental disorder, of which Asperger syndrome and high-functioning autism are subtypes. Our goal is: 1) to determine whether a diagnostic model based on single-nucleotide polymorphisms (SNPs), brain regional thickness measurements, or brain regional volume measurements can distinguish Asperger syndrome from high-functioning autism; and 2) to compare the SNP, thickness, and volume-based diagnostic models. Our study included 18 children with ASD: 13 subjects with high-functioning autism and 5 subjects with Asperger syndrome. For each child, we obtained 25 SNPs for 8 ASD-related genes; we also computed regional cortical thicknesses and volumes for 66 brain structures, based on structural magnetic resonance (MR) examination. To generate diagnostic models, we employed five machine-learning techniques: decision stump, alternating decision trees, multi-class alternating decision trees, logistic model trees, and support vector machines. For SNP-based classification, three decision-tree-based models performed better than the other two machine-learning models. The performance metrics for three decision-tree-based models were similar: decision stump was modestly better than the other two methods, with accuracy = 90%, sensitivity = 0.95 and specificity = 0.75. All thickness and volume-based diagnostic models performed poorly. The SNP-based diagnostic models were superior to those based on thickness and volume. For SNP-based classification, rs878960 in GABRB3 (gamma-aminobutyric acid A receptor, beta 3) was selected by all tree-based models. Our analysis demonstrated that SNP-based classification was more accurate than morphometry-based classification in ASD subtype classification. Also, we found that one SNP--rs878960 in GABRB3--distinguishes Asperger syndrome from high-functioning autism.
Tan, Joon Liang; Khang, Tsung Fei; Ngeow, Yun Fong; Choo, Siew Woh
2013-12-13
Mycobacterium abscessus is a rapidly growing mycobacterium that is often associated with human infections. The taxonomy of this species has undergone several revisions and is still being debated. In this study, we sequenced the genomes of 12 M. abscessus strains and used phylogenomic analysis to perform subspecies classification. A data mining approach was used to rank and select informative genes based on the relative entropy metric for the construction of a phylogenetic tree. The resulting tree topology was similar to that generated using the concatenation of five classical housekeeping genes: rpoB, hsp65, secA, recA and sodA. Additional support for the reliability of the subspecies classification came from the analysis of erm41 and ITS gene sequences, single nucleotide polymorphisms (SNPs)-based classification and strain clustering demonstrated by a variable number tandem repeat (VNTR) assay and a multilocus sequence analysis (MLSA). We subsequently found that the concatenation of a minimal set of three median-ranked genes: DNA polymerase III subunit alpha (polC), 4-hydroxy-2-ketovalerate aldolase (Hoa) and cell division protein FtsZ (ftsZ), is sufficient to recover the same tree topology. PCR assays designed specifically for these genes showed that all three genes could be amplified in the reference strain of M. abscessus ATCC 19977T. This study provides proof of concept that whole-genome sequence-based data mining approach can provide confirmatory evidence of the phylogenetic informativeness of existing markers, as well as lead to the discovery of a more economical and informative set of markers that produces similar subspecies classification in M. abscessus. The systematic procedure used in this study to choose the informative minimal set of gene markers can potentially be applied to species or subspecies classification of other bacteria.
Jin, Mingwu; Deng, Weishu
2018-05-15
There is a spectrum of the progression from healthy control (HC) to mild cognitive impairment (MCI) without conversion to Alzheimer's disease (AD), to MCI with conversion to AD (cMCI), and to AD. This study aims to predict the different disease stages using brain structural information provided by magnetic resonance imaging (MRI) data. The neighborhood component analysis (NCA) is applied to select most powerful features for prediction. The ensemble decision tree classifier is built to predict which group the subject belongs to. The best features and model parameters are determined by cross validation of the training data. Our results show that 16 out of a total of 429 features were selected by NCA using 240 training subjects, including MMSE score and structural measures in memory-related regions. The boosting tree model with NCA features can achieve prediction accuracy of 56.25% on 160 test subjects. Principal component analysis (PCA) and sequential feature selection (SFS) are used for feature selection, while support vector machine (SVM) is used for classification. The boosting tree model with NCA features outperforms all other combinations of feature selection and classification methods. The results suggest that NCA be a better feature selection strategy than PCA and SFS for the data used in this study. Ensemble tree classifier with boosting is more powerful than SVM to predict the subject group. However, more advanced feature selection and classification methods or additional measures besides structural MRI may be needed to improve the prediction performance. Copyright © 2018 Elsevier B.V. All rights reserved.
The decision tree approach to classification
NASA Technical Reports Server (NTRS)
Wu, C.; Landgrebe, D. A.; Swain, P. H.
1975-01-01
A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.
Method of center localization for objects containing concentric arcs
NASA Astrophysics Data System (ADS)
Kuznetsova, Elena G.; Shvets, Evgeny A.; Nikolaev, Dmitry P.
2015-02-01
This paper proposes a method for automatic center location of objects containing concentric arcs. The method utilizes structure tensor analysis and voting scheme optimized with Fast Hough Transform. Two applications of the proposed method are considered: (i) wheel tracking in video-based system for automatic vehicle classification and (ii) tree growth rings analysis on a tree cross cut image.
NASA Astrophysics Data System (ADS)
Deng, S.; Katoh, M.; Takenaka, Y.; Cheung, K.; Ishii, A.; Fujii, N.; Gao, T.
2017-10-01
This study attempted to classify three coniferous and ten broadleaved tree species by combining airborne laser scanning (ALS) data and multispectral images. The study area, located in Nagano, central Japan, is within the broadleaved forests of the Afan Woodland area. A total of 235 trees were surveyed in 2016, and we recorded the species, DBH, and tree height. The geographical position of each tree was collected using a Global Navigation Satellite System (GNSS) device. Tree crowns were manually detected using GNSS position data, field photographs, true-color orthoimages with three bands (red-green-blue, RGB), 3D point clouds, and a canopy height model derived from ALS data. Then a total of 69 features, including 27 image-based and 42 point-based features, were extracted from the RGB images and the ALS data to classify tree species. Finally, the detected tree crowns were classified into two classes for the first level (coniferous and broadleaved trees), four classes for the second level (Pinus densiflora, Larix kaempferi, Cryptomeria japonica, and broadleaved trees), and 13 classes for the third level (three coniferous and ten broadleaved species), using the 27 image-based features, 42 point-based features, all 69 features, and the best combination of features identified using a neighborhood component analysis algorithm, respectively. The overall classification accuracies reached 90 % at the first and second levels but less than 60 % at the third level. The classifications using the best combinations of features had higher accuracies than those using the image-based and point-based features and the combination of all of the 69 features.
Mudali, D; Teune, L K; Renken, R J; Leenders, K L; Roerdink, J B T M
2015-01-01
Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data.
Single-accelerometer-based daily physical activity classification.
Long, Xi; Yin, Bin; Aarts, Ronald M
2009-01-01
In this study, a single tri-axial accelerometer placed on the waist was used to record the acceleration data for human physical activity classification. The data collection involved 24 subjects performing daily real-life activities in a naturalistic environment without researchers' intervention. For the purpose of assessing customers' daily energy expenditure, walking, running, cycling, driving, and sports were chosen as target activities for classification. This study compared a Bayesian classification with that of a Decision Tree based approach. A Bayes classifier has the advantage to be more extensible, requiring little effort in classifier retraining and software update upon further expansion or modification of the target activities. Principal components analysis was applied to remove the correlation among features and to reduce the feature vector dimension. Experiments using leave-one-subject-out and 10-fold cross validation protocols revealed a classification accuracy of approximately 80%, which was comparable with that obtained by a Decision Tree classifier.
NASA Astrophysics Data System (ADS)
Koma, Zsófia; Deák, Márton; Kovács, József; Székely, Balázs; Kelemen, Kristóf; Standovár, Tibor
2016-04-01
Airborne Laser Scanning (ALS) is a widely used technology for forestry classification applications. However, single tree detection and species classification from low density ALS point cloud is limited in a dense forest region. In this study we investigate the division of a forest into homogenous groups at stand level. The study area is located in the Aggtelek karst region (Northeast Hungary) with a complex relief topography. The ALS dataset contained only 4 discrete echoes (at 2-4 pt/m2 density) from the study area during leaf-on season. Ground-truth measurements about canopy closure and proportion of tree species cover are available for every 70 meter in 500 square meter circular plots. In the first step, ALS data were processed and geometrical and intensity based features were calculated into a 5×5 meter raster based grid. The derived features contained: basic statistics of relative height, canopy RMS, echo ratio, openness, pulse penetration ratio, basic statistics of radiometric feature. In the second step the data were investigated using Combined Cluster and Discriminant Analysis (CCDA, Kovács et al., 2014). The CCDA method first determines a basic grouping for the multiple circle shaped sampling locations using hierarchical clustering and then for the arising grouping possibilities a core cycle is executed comparing the goodness of the investigated groupings with random ones. Out of these comparisons difference values arise, yielding information about the optimal grouping out of the investigated ones. If sub-groups are then further investigated, one might even find homogeneous groups. We found that low density ALS data classification into homogeneous groups are highly dependent on canopy closure, and the proportion of the dominant tree species. The presented results show high potential using CCDA for determination of homogenous separable groups in LiDAR based tree species classification. Aggtelek Karst/Slovakian Karst Caves" (HUSK/1101/221/0180, Aggtelek NP), data evaluation: 'Multipurpose assessment serving forest biodiversity conservation in the Carpathian region of Hungary', Swiss-Hungarian Cooperation Programme (SH/4/13 Project). BS contributed as an Alexander von Humboldt Research Fellow. J. Kovács, S. Kovács, N. Magyar, P. Tanos, I. G. Hatvani, and A. Anda (2014), Classification into homogeneous groups using combined cluster and discriminant analysis, Environmental Modelling & Software, 57, 52-59.
Identification of extremely premature infants at high risk of rehospitalization.
Ambalavanan, Namasivayam; Carlo, Waldemar A; McDonald, Scott A; Yao, Qing; Das, Abhik; Higgins, Rosemary D
2011-11-01
Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002-2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%-42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge.
Identification of Extremely Premature Infants at High Risk of Rehospitalization
Carlo, Waldemar A.; McDonald, Scott A.; Yao, Qing; Das, Abhik; Higgins, Rosemary D.
2011-01-01
OBJECTIVE: Extremely low birth weight infants often require rehospitalization during infancy. Our objective was to identify at the time of discharge which extremely low birth weight infants are at higher risk for rehospitalization. METHODS: Data from extremely low birth weight infants in Eunice Kennedy Shriver National Institute of Child Health and Human Development Neonatal Research Network centers from 2002–2005 were analyzed. The primary outcome was rehospitalization by the 18- to 22-month follow-up, and secondary outcome was rehospitalization for respiratory causes in the first year. Using variables and odds ratios identified by stepwise logistic regression, scoring systems were developed with scores proportional to odds ratios. Classification and regression-tree analysis was performed by recursive partitioning and automatic selection of optimal cutoff points of variables. RESULTS: A total of 3787 infants were evaluated (mean ± SD birth weight: 787 ± 136 g; gestational age: 26 ± 2 weeks; 48% male, 42% black). Forty-five percent of the infants were rehospitalized by 18 to 22 months; 14.7% were rehospitalized for respiratory causes in the first year. Both regression models (area under the curve: 0.63) and classification and regression-tree models (mean misclassification rate: 40%–42%) were moderately accurate. Predictors for the primary outcome by regression were shunt surgery for hydrocephalus, hospital stay of >120 days for pulmonary reasons, necrotizing enterocolitis stage II or higher or spontaneous gastrointestinal perforation, higher fraction of inspired oxygen at 36 weeks, and male gender. By classification and regression-tree analysis, infants with hospital stays of >120 days for pulmonary reasons had a 66% rehospitalization rate compared with 42% without such a stay. CONCLUSIONS: The scoring systems and classification and regression-tree analysis models identified infants at higher risk of rehospitalization and might assist planning for care after discharge. PMID:22007016
A classification tree based modeling approach for segment related crashes on multilane highways.
Pande, Anurag; Abdel-Aty, Mohamed; Das, Abhishek
2010-10-01
This study presents a classification tree based alternative to crash frequency analysis for analyzing crashes on mid-block segments of multilane arterials. The traditional approach of modeling counts of crashes that occur over a period of time works well for intersection crashes where each intersection itself provides a well-defined unit over which to aggregate the crash data. However, in the case of mid-block segments the crash frequency based approach requires segmentation of the arterial corridor into segments of arbitrary lengths. In this study we have used random samples of time, day of week, and location (i.e., milepost) combinations and compared them with the sample of crashes from the same arterial corridor. For crash and non-crash cases, geometric design/roadside and traffic characteristics were derived based on their milepost locations. The variables used in the analysis are non-event specific and therefore more relevant for roadway safety feature improvement programs. First classification tree model is a model comparing all crashes with the non-crash data and then four groups of crashes (rear-end, lane-change related, pedestrian, and single-vehicle/off-road crashes) are separately compared to the non-crash cases. The classification tree models provide a list of significant variables as well as a measure to classify crash from non-crash cases. ADT along with time of day/day of week are significantly related to all crash types with different groups of crashes being more likely to occur at different times. From the classification performance of different models it was apparent that using non-event specific information may not be suitable for single vehicle/off-road crashes. The study provides the safety analysis community an additional tool to assess safety without having to aggregate the corridor crash data over arbitrary segment lengths. Copyright © 2010. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin
2010-12-01
We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.
David L. Sonderman
1979-01-01
A procedure is shown for measuring external tree characteristics that are important in determining the current and future quality of young hardwood trees. This guide supplements a precious study which describes the quality classification system for young hardwood trees
NASA Astrophysics Data System (ADS)
Luo, Qiu; Xin, Wu; Qiming, Xiong
2017-06-01
In the process of vegetation remote sensing information extraction, the problem of phenological features and low performance of remote sensing analysis algorithm is not considered. To solve this problem, the method of remote sensing vegetation information based on EVI time-series and the classification of decision-tree of multi-source branch similarity is promoted. Firstly, to improve the time-series stability of recognition accuracy, the seasonal feature of vegetation is extracted based on the fitting span range of time-series. Secondly, the decision-tree similarity is distinguished by adaptive selection path or probability parameter of component prediction. As an index, it is to evaluate the degree of task association, decide whether to perform migration of multi-source decision tree, and ensure the speed of migration. Finally, the accuracy of classification and recognition of pests and diseases can reach 87%--98% of commercial forest in Dalbergia hainanensis, which is significantly better than that of MODIS coverage accuracy of 80%--96% in this area. Therefore, the validity of the proposed method can be verified.
Spectral difference analysis and airborne imaging classification for citrus greening infected trees
USDA-ARS?s Scientific Manuscript database
Citrus greening, also called Huanglongbing (HLB), became a devastating disease spread through citrus groves in Florida, since it was first found in 2005. Multispectral (MS) and hyperspectral (HS) airborne images of citrus groves in Florida were acquired to detect citrus greening infected trees in 20...
Automated Decision Tree Classification of Corneal Shape
Twa, Michael D.; Parthasarathy, Srinivasan; Roberts, Cynthia; Mahmoud, Ashraf M.; Raasch, Thomas W.; Bullimore, Mark A.
2011-01-01
Purpose The volume and complexity of data produced during videokeratography examinations present a challenge of interpretation. As a consequence, results are often analyzed qualitatively by subjective pattern recognition or reduced to comparisons of summary indices. We describe the application of decision tree induction, an automated machine learning classification method, to discriminate between normal and keratoconic corneal shapes in an objective and quantitative way. We then compared this method with other known classification methods. Methods The corneal surface was modeled with a seventh-order Zernike polynomial for 132 normal eyes of 92 subjects and 112 eyes of 71 subjects diagnosed with keratoconus. A decision tree classifier was induced using the C4.5 algorithm, and its classification performance was compared with the modified Rabinowitz–McDonnell index, Schwiegerling’s Z3 index (Z3), Keratoconus Prediction Index (KPI), KISA%, and Cone Location and Magnitude Index using recommended classification thresholds for each method. We also evaluated the area under the receiver operator characteristic (ROC) curve for each classification method. Results Our decision tree classifier performed equal to or better than the other classifiers tested: accuracy was 92% and the area under the ROC curve was 0.97. Our decision tree classifier reduced the information needed to distinguish between normal and keratoconus eyes using four of 36 Zernike polynomial coefficients. The four surface features selected as classification attributes by the decision tree method were inferior elevation, greater sagittal depth, oblique toricity, and trefoil. Conclusions Automated decision tree classification of corneal shape through Zernike polynomials is an accurate quantitative method of classification that is interpretable and can be generated from any instrument platform capable of raw elevation data output. This method of pattern classification is extendable to other classification problems. PMID:16357645
Evaluation of forest cover estimates for Haiti using supervised classification of Landsat data
NASA Astrophysics Data System (ADS)
Churches, Christopher E.; Wampler, Peter J.; Sun, Wanxiao; Smith, Andrew J.
2014-08-01
This study uses 2010-2011 Landsat Thematic Mapper (TM) imagery to estimate total forested area in Haiti. The thematic map was generated using radiometric normalization of digital numbers by a modified normalization method utilizing pseudo-invariant polygons (PIPs), followed by supervised classification of the mosaicked image using the Food and Agriculture Organization (FAO) of the United Nations Land Cover Classification System. Classification results were compared to other sources of land-cover data produced for similar years, with an emphasis on the statistics presented by the FAO. Three global land cover datasets (GLC2000, Globcover, 2009, and MODIS MCD12Q1), and a national-scale dataset (a land cover analysis by Haitian National Centre for Geospatial Information (CNIGS)) were reclassified and compared. According to our classification, approximately 32.3% of Haiti's total land area was tree covered in 2010-2011. This result was confirmed using an error-adjusted area estimator, which predicted a tree covered area of 32.4%. Standardization to the FAO's forest cover class definition reduces the amount of tree cover of our supervised classification to 29.4%. This result was greater than the reported FAO value of 4% and the value for the recoded GLC2000 dataset of 7.0%, but is comparable to values for three other recoded datasets: MCD12Q1 (21.1%), Globcover (2009) (26.9%), and CNIGS (19.5%). We propose that at coarse resolutions, the segmented and patchy nature of Haiti's forests resulted in a systematic underestimation of the extent of forest cover. It appears the best explanation for the significant difference between our results, FAO statistics, and compared datasets is the accuracy of the data sources and the resolution of the imagery used for land cover analyses. Analysis of recoded global datasets and results from this study suggest a strong linear relationship (R2 = 0.996 for tree cover) between spatial resolution and land cover estimates.
Scollo, Annalisa; Gottardo, Flaviana; Contiero, Barbara; Edwards, Sandra A
2017-10-01
Tail biting in pigs has been an identified behavioural, welfare and economic problem for decades, and requires appropriate but sometimes difficult on-farm interventions. The aim of the paper is to introduce the Classification and Regression Tree (CRT) methodologies to develop a tool for prevention of acute tail biting lesions in pigs on-farm. A sample of 60 commercial farms rearing heavy pigs were involved; an on-farm visit and an interview with the farmer collected data on general management, herd health, disease prevention, climate control, feeding and production traits. Results suggest a value for the CRT analysis in managing the risk factors behind tail biting on a farm-specific level, showing 86.7% sensitivity for the Classification Tree and a correlation of 0.7 between observed and predicted prevalence of tail biting obtained with the Regression Tree. CRT analysis showed five main variables (stocking density, ammonia levels, number of pigs per stockman, type of floor and timeliness in feed supply) as critical predictors of acute tail biting lesions, which demonstrate different importance in different farms subgroups. The model might have reliable and practical applications for the support and implementation of tail biting prevention interventions, especially in case of subgroups of pigs with higher risk, helping farmers and veterinarians to assess the risk in their own farm and to manage their predisposing variables in order to reduce acute tail biting lesions. Copyright © 2017 Elsevier B.V. All rights reserved.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Activity classification using realistic data from wearable sensors.
Pärkkä, Juha; Ermes, Miikka; Korpipää, Panu; Mäntyjärvi, Jani; Peltola, Johannes; Korhonen, Ilkka
2006-01-01
Automatic classification of everyday activities can be used for promotion of health-enhancing physical activities and a healthier lifestyle. In this paper, methods used for classification of everyday activities like walking, running, and cycling are described. The aim of the study was to find out how to recognize activities, which sensors are useful and what kind of signal processing and classification is required. A large and realistic data library of sensor data was collected. Sixteen test persons took part in the data collection, resulting in approximately 31 h of annotated, 35-channel data recorded in an everyday environment. The test persons carried a set of wearable sensors while performing several activities during the 2-h measurement session. Classification results of three classifiers are shown: custom decision tree, automatically generated decision tree, and artificial neural network. The classification accuracies using leave-one-subject-out cross validation range from 58 to 97% for custom decision tree classifier, from 56 to 97% for automatically generated decision tree, and from 22 to 96% for artificial neural network. Total classification accuracy is 82 % for custom decision tree classifier, 86% for automatically generated decision tree, and 82% for artificial neural network.
Austin, Peter C.; Tu, Jack V.; Ho, Jennifer E.; Levy, Daniel; Lee, Douglas S.
2014-01-01
Objective Physicians classify patients into those with or without a specific disease. Furthermore, there is often interest in classifying patients according to disease etiology or subtype. Classification trees are frequently used to classify patients according to the presence or absence of a disease. However, classification trees can suffer from limited accuracy. In the data-mining and machine learning literature, alternate classification schemes have been developed. These include bootstrap aggregation (bagging), boosting, random forests, and support vector machines. Study design and Setting We compared the performance of these classification methods with those of conventional classification trees to classify patients with heart failure according to the following sub-types: heart failure with preserved ejection fraction (HFPEF) vs. heart failure with reduced ejection fraction (HFREF). We also compared the ability of these methods to predict the probability of the presence of HFPEF with that of conventional logistic regression. Results We found that modern, flexible tree-based methods from the data mining literature offer substantial improvement in prediction and classification of heart failure sub-type compared to conventional classification and regression trees. However, conventional logistic regression had superior performance for predicting the probability of the presence of HFPEF compared to the methods proposed in the data mining literature. Conclusion The use of tree-based methods offers superior performance over conventional classification and regression trees for predicting and classifying heart failure subtypes in a population-based sample of patients from Ontario. However, these methods do not offer substantial improvements over logistic regression for predicting the presence of HFPEF. PMID:23384592
Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin
2015-08-01
Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.
Geospatial relationships of tree species damage caused by Hurricane Katrina in south Mississippi
Mark W. Garrigues; Zhaofei Fan; David L. Evans; Scott D. Roberts; William H. Cooke III
2012-01-01
Hurricane Katrina generated substantial impacts on the forests and biological resources of the affected area in Mississippi. This study seeks to use classification tree analysis (CTA) to determine which variables are significant in predicting hurricane damage (shear or windthrow) in the Southeast Mississippi Institute for Forest Inventory District. Logistic regressions...
Thuy, Ben; Stöhr, Sabine
2016-01-01
Ophiuroid systematics is currently in a state of upheaval, with recent molecular estimates fundamentally clashing with traditional, morphology-based classifications. Here, we attempt a long overdue recast of a morphological phylogeny estimate of the Ophiuroidea taking into account latest insights on microstructural features of the arm skeleton. Our final estimate is based on a total of 45 ingroup taxa, including 41 recent species covering the full range of extant ophiuroid higher taxon diversity and 4 fossil species known from exceptionally preserved material, and the Lower Carboniferous Aganaster gregarius as the outgroup. A total of 130 characters were scored directly on specimens. The tree resulting from the Bayesian inference analysis of the full data matrix is reasonably well resolved and well supported, and refutes all previous classifications, with most traditional families discredited as poly- or paraphyletic. In contrast, our tree agrees remarkably well with the latest molecular estimate, thus paving the way towards an integrated new classification of the Ophiuroidea. Among the characters which were qualitatively found to accord best with our tree topology, we selected a list of potential synapomorphies for future formal clade definitions. Furthermore, an analysis with 13 of the ingroup taxa reduced to the lateral arm plate characters produced a tree which was essentially similar to the full dataset tree. This suggests that dissociated lateral arm plates can be analysed in combination with fully known taxa and thus effectively unlocks the extensive record of fossil lateral arm plates for phylogenetic estimates. Finally, the age and position within our tree implies that the ophiuroid crown-group had started to diversify by the Early Triassic. PMID:27227685
NASA Astrophysics Data System (ADS)
Susanti, Yuliana; Zukhronah, Etik; Pratiwi, Hasih; Respatiwulan; Sri Sulistijowati, H.
2017-11-01
To achieve food resilience in Indonesia, food diversification by exploring potentials of local food is required. Corn is one of alternating staple food of Javanese society. For that reason, corn production needs to be improved by considering the influencing factors. CHAID and CRT are methods of data mining which can be used to classify the influencing variables. The present study seeks to dig up information on the potentials of local food availability of corn in regencies and cities in Java Island. CHAID analysis yields four classifications with accuracy of 78.8%, while CRT analysis yields seven classifications with accuracy of 79.6%.
Norris, Jodi R.; Jackson, Stephen T.; Betancourt, Julio L.
2006-01-01
Aim? Ponderosa pine (Pinus ponderosa Douglas ex Lawson & C. Lawson) is an economically and ecologically important conifer that has a wide geographic range in the western USA, but is mostly absent from the geographic centre of its distribution - the Great Basin and adjoining mountain ranges. Much of its modern range was achieved by migration of geographically distinct Sierra Nevada (P. ponderosa var. ponderosa) and Rocky Mountain (P. ponderosa var. scopulorum) varieties in the last 10,000 years. Previous research has confirmed genetic differences between the two varieties, and measurable genetic exchange occurs where their ranges now overlap in western Montana. A variety of approaches in bioclimatic modelling is required to explore the ecological differences between these varieties and their implications for historical biogeography and impending changes in western landscapes. Location? Western USA. Methods? We used a classification tree analysis and a minimum-volume ellipsoid as models to explain the broad patterns of distribution of ponderosa pine in modern environments using climatic and edaphic variables. Most biogeographical modelling assumes that the target group represents a single, ecologically uniform taxonomic population. Classification tree analysis does not require this assumption because it allows the creation of pathways that predict multiple positive and negative outcomes. Thus, classification tree analysis can be used to test the ecological uniformity of the species. In addition, a multidimensional ellipsoid was constructed to describe the niche of each variety of ponderosa pine, and distances from the niche were calculated and mapped on a 4-km grid for each ecological variable. Results? The resulting classification tree identified three dominant pathways predicting ponderosa pine presence. Two of these three pathways correspond roughly to the distribution of var. ponderosa, and the third pathway generally corresponds to the distribution of var. scopulorum. The classification tree and minimum-volume ellipsoid model show that both varieties have very similar temperature limitations, although var. ponderosa is more limited by the temperature extremes of the continental interior. The precipitation limitations of the two varieties are seasonally different, with var. ponderosa requiring significant winter moisture and var. scopulorum requiring significant summer moisture. Great Basin mountain ranges are too cold at higher elevations to support either variety of ponderosa pine, and at lower elevations are too dry in summer for var. scopulorum and too dry in winter for var. ponderosa. Main conclusions? The classification tree analysis indicates that var. ponderosa is ecologically as well as genetically distinct from var. scopulorum. Ecological differences may maintain genetic separation in spite of a limited zone of introgression between the two varieties in western Montana. Two hypotheses about past and future movements of ponderosa pine emerge from our analyses. The first hypothesis is that, during the last glacial period, colder and/or drier summers truncated most of the range of var. scopulorum in the central Rockies, but had less dramatic effects on the more maritime and winter-wet distribution of var. ponderosa. The second hypothesis is that, all other factors held constant, increasing summer temperatures in the future should produce changes in the distribution of var. scopulorum that are likely to involve range expansions in the central Rockies with the warming of mountain ranges currently too cold but sufficiently wet in summer for var. scopulorum. Finally, our results underscore the growing need to focus on genotypes in biogeographical modelling and ecological forecasting.
Statistical analysis of texture in trunk images for biometric identification of tree species.
Bressane, Adriano; Roveda, José A F; Martins, Antônio C G
2015-04-01
The identification of tree species is a key step for sustainable management plans of forest resources, as well as for several other applications that are based on such surveys. However, the present available techniques are dependent on the presence of tree structures, such as flowers, fruits, and leaves, limiting the identification process to certain periods of the year. Therefore, this article introduces a study on the application of statistical parameters for texture classification of tree trunk images. For that, 540 samples from five Brazilian native deciduous species were acquired and measures of entropy, uniformity, smoothness, asymmetry (third moment), mean, and standard deviation were obtained from the presented textures. Using a decision tree, a biometric species identification system was constructed and resulted to a 0.84 average precision rate for species classification with 0.83accuracy and 0.79 agreement. Thus, it can be considered that the use of texture presented in trunk images can represent an important advance in tree identification, since the limitations of the current techniques can be overcome.
The information extraction of Gannan citrus orchard based on the GF-1 remote sensing image
NASA Astrophysics Data System (ADS)
Wang, S.; Chen, Y. L.
2017-02-01
The production of Gannan oranges is the largest in China, which occupied an important part in the world. The extraction of citrus orchard quickly and effectively has important significance for fruit pathogen defense, fruit production and industrial planning. The traditional spectra extraction method of citrus orchard based on pixel has a lower classification accuracy, difficult to avoid the “pepper phenomenon”. In the influence of noise, the phenomenon that different spectrums of objects have the same spectrum is graveness. Taking Xunwu County citrus fruit planting area of Ganzhou as the research object, aiming at the disadvantage of the lower accuracy of the traditional method based on image element classification method, a decision tree classification method based on object-oriented rule set is proposed. Firstly, multi-scale segmentation is performed on the GF-1 remote sensing image data of the study area. Subsequently the sample objects are selected for statistical analysis of spectral features and geometric features. Finally, combined with the concept of decision tree classification, a variety of empirical values of single band threshold, NDVI, band combination and object geometry characteristics are used hierarchically to execute the information extraction of the research area, and multi-scale segmentation and hierarchical decision tree classification is implemented. The classification results are verified with the confusion matrix, and the overall Kappa index is 87.91%.
Brabant, Marie-Eve; Hébert, Martine; Chagnon, François
2013-01-01
This study explored the clinical profiles of 77 female teenager survivors of sexual abuse and examined the association of abuse-related and personal variables with suicidal ideations. Analyses revealed that 64% of participants experienced suicidal ideations. Findings from classification and regression tree analysis indicated that depression, posttraumatic stress symptoms, and hopelessness discriminated profiles of suicidal and nonsuicidal survivors. The elevated prevalence of suicidal ideations among adolescent survivors of sexual abuse underscores the importance of investigating the presence of suicidal ideations in sexual abuse survivors. However, suicidal ideation is not the sole variable that needs to be investigated; depression, hopelessness and posttraumatic stress symptoms are also related to suicidal ideations in survivors and could therefore guide interventions.
Bioinformatics in proteomics: application, terminology, and pitfalls.
Wiemer, Jan C; Prokudin, Alexander
2004-01-01
Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet; Kabiri, Keivan
2012-07-01
This paper describes an assessment of coral reef mapping using multi sensor satellite images such as Landsat ETM, SPOT and IKONOS images for Tioman Island, Malaysia. The study area is known to be one of the best Islands in South East Asia for its unique collection of diversified coral reefs and serves host to thousands of tourists every year. For the coral reef identification, classification and analysis, Landsat ETM, SPOT and IKONOS images were collected processed and classified using hierarchical classification schemes. At first, Decision tree classification method was implemented to separate three main land cover classes i.e. water, rural and vegetation and then maximum likelihood supervised classification method was used to classify these main classes. The accuracy of the classification result is evaluated by a separated test sample set, which is selected based on the fieldwork survey and view interpretation from IKONOS image. Few types of ancillary data in used are: (a) DGPS ground control points; (b) Water quality parameters measured by Hydrolab DS4a; (c) Sea-bed substrates spectrum measured by Unispec and; (d) Landcover observation photos along Tioman island coastal area. The overall accuracy of the final classification result obtained was 92.25% with the kappa coefficient is 0.8940. Key words: Coral reef, Multi-spectral Segmentation, Pixel-Based Classification, Decision Tree, Tioman Island
Electromagnetic Compatibility (EMC) in Microelectronics.
1983-02-01
Fault Tree Analysis", System Saftey Symposium, June 8-9, 1965, Seattle: The Boeing Company . 12. Fussell, J.B., "Fault Tree Analysis-Concepts and...procedure for assessing EMC in microelectronics and for applying DD, 1473 EOiTO OP I, NOV6 IS OESOL.ETE UNCLASSIFIED SECURITY CLASSIFICATION OF THIS...CRITERIA 2.1 Background 2 2.2 The Probabilistic Nature of EMC 2 2.3 The Probabilistic Approach 5 2.4 The Compatibility Factor 6 3 APPLYING PROBABILISTIC
Kirk M. Stueve; Dawna L. Cerney; Regina M. Rochefort; Laurie L. Kurth
2009-01-01
We performed classification analysis of 1970 satellite imagery and 2003 aerial photography to delineate establishment. Local site conditions were calculated from a LIDAR-based DEM, ancillary climate data, and 1970 tree locations in a GIS. We used logistic regression on a spatially weighted landscape matrix to rank variables.
Evaluating multimedia chemical persistence: Classification and regression tree analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bennett, D.H.; McKone, T.E.; Kastenberg, W.E.
2000-04-01
For the thousands of chemicals continuously released into the environment, it is desirable to make prospective assessments of those likely to be persistent. Widely distributed persistent chemicals are impossible to remove from the environment and remediation by natural processes may take decades, which is problematic if adverse health or ecological effects are discovered after prolonged release into the environment. A tiered approach using a classification scheme and a multimedia model for determining persistence is presented. Using specific criteria for persistence, a classification tree is developed to classify a chemical as persistent or nonpersistent based on the chemical properties. In thismore » approach, the classification is derived from the results of a standardized unit world multimedia model. Thus, the classifications are more robust for multimedia pollutants than classifications using a single medium half-life. The method can be readily implemented and provides insight without requiring extensive and often unavailable data. This method can be used to classify chemicals when only a few properties are known and can be used to direct further data collection. Case studies are presented to demonstrate the advantages of the approach.« less
Comprehensive decision tree models in bioinformatics.
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
2012-01-01
Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.
Comprehensive Decision Tree Models in Bioinformatics
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
2012-01-01
Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449
C-fuzzy variable-branch decision tree with storage and classification error rate constraints
NASA Astrophysics Data System (ADS)
Yang, Shiueng-Bien
2009-10-01
The C-fuzzy decision tree (CFDT), which is based on the fuzzy C-means algorithm, has recently been proposed. The CFDT is grown by selecting the nodes to be split according to its classification error rate. However, the CFDT design does not consider the classification time taken to classify the input vector. Thus, the CFDT can be improved. We propose a new C-fuzzy variable-branch decision tree (CFVBDT) with storage and classification error rate constraints. The design of the CFVBDT consists of two phases-growing and pruning. The CFVBDT is grown by selecting the nodes to be split according to the classification error rate and the classification time in the decision tree. Additionally, the pruning method selects the nodes to prune based on the storage requirement and the classification time of the CFVBDT. Furthermore, the number of branches of each internal node is variable in the CFVBDT. Experimental results indicate that the proposed CFVBDT outperforms the CFDT and other methods.
Landsat TM Classifications For SAFIS Using FIA Field Plots
William H. Cooke; Andrew J. Hartsell
2001-01-01
Wall-to-wall Landsat Thematic Mapper (TM) classification efforts in Georgia require field validation. We developed a new crown modeling procedure based on Forest Health Monitoring (FHM) data to test Forest Inventory and Analysis (FIA) data. These models simulate the proportion of tree crowns that reflect light on a FIA subplot basis. We averaged subplot crown...
A Mixtures-of-Trees Framework for Multi-Label Classification
Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos
2015-01-01
We propose a new probabilistic approach for multi-label classification that aims to represent the class posterior distribution P(Y|X). Our approach uses a mixture of tree-structured Bayesian networks, which can leverage the computational advantages of conditional tree-structured models and the abilities of mixtures to compensate for tree-structured restrictions. We develop algorithms for learning the model from data and for performing multi-label predictions using the learned model. Experiments on multiple datasets demonstrate that our approach outperforms several state-of-the-art multi-label classification methods. PMID:25927011
Engwegen, Judith Y M N; Helgason, Helgi H; Cats, Annemieke; Harris, Nathan; Bonfrer, Johannes M G; Schellens, Jan H M; Beijnen, Jos H
2006-03-14
To detect the new serum biomarkers for colorectal cancer (CRC) by serum protein profiling with surface-enhanced laser desorption ionisation--time of flight mass spectrometry (SELDI-TOF MS). Two independent serum sample sets were analysed separately with the ProteinChip technology (set A: 40 CRC+49 healthy controls; set B: 37 CRC+31 healthy controls), using chips with a weak cation exchange moiety and buffer pH 5. Discriminative power of differentially expressed proteins was assessed with a classification tree algorithm. Sensitivities and specificities of the generated classification trees were obtained by blindly applying data from set A to the generated trees from set B and vice versa. CRC serum protein profiles were also compared with those from breast, ovarian, prostate, and non-small cell lung cancer. Mass-to-charge ratios (m/z) 3.1x10(3), 3.3x10(3), 4.5x10(3), 6.6x10(3) and 28x10(3) were used as classifiers in the best-performing classification trees. Tree sensitivities and specificities were between 65% and 90%. Most of these discriminative m/z values were also different in the other tumour types investigated. M/z 3.3x10(3), main classifier in most trees, was a doubly charged form of the 6.6x10(3)-Da protein. The latter was identified as apolipoprotein C-I. M/z 3.1x10(3) was identified as an N-terminal fragment of albumin, and m/z 28x10(3) as apolipoprotein A-I. SELDI-TOF MS followed by classification tree pattern analysis is a suitable technique for finding new serum markers for CRC. Biomarkers can be identified and reproducibly detected in independent sample sets with high sensitivities and specificities. Although not specific for CRC, these biomarkers have a potential role in disease and treatment monitoring.
Kitsantas, Panagiota; Kitsantas, Anastasia; Anagnostopoulou, Tanya
2008-01-01
In this cross-cultural study, the authors attempted to identify high-risk subgroups for alcohol consumption among college students. American and Greek students (N = 132) answered questions about alcohol consumption, religious beliefs, attitudes toward drinking, advertisement influences, parental monitoring, and drinking consequences. Heavy drinkers in the American group were younger and less religious than were infrequent drinkers. In the Greek group, heavy drinkers tended to deny the negative results of drinking alcohol and use a permissive attitude to justify it, whereas infrequent drinkers were more likely to be monitored by their parents. These results suggest that parental monitoring and an emphasis on informing students about the negative effects of alcohol on their health and social and academic lives may be effective methods of reducing alcohol consumption. Classification tree analysis revealed that student attitudes toward drinking were important in the classification of American and Greek drinkers, indicating that this is a powerful predictor of alcohol consumption regardless of ethnic background.
Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree
2008-04-01
REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music
NASA Astrophysics Data System (ADS)
Adelabu, Samuel; Mutanga, Onisimo; Adam, Elhadi; Cho, Moses Azong
2013-01-01
Classification of different tree species in semiarid areas can be challenging as a result of the change in leaf structure and orientation due to soil moisture constraints. Tree species mapping is, however, a key parameter for forest management in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples.We performed classification using random forest (RF) and support vector machines (SVM) based on EnMap box. The overall accuracies for classifying the five tree species was 88.75 and 85% for both SVM and RF, respectively. We also demonstrated that the new red-edge band in the RapidEye sensor has the potential for classifying tree species in semiarid environments when integrated with other standard bands. Similarly, we observed that where there are limited training samples, SVM is preferred over RF. Finally, we demonstrated that the two accuracy measures of quantity and allocation disagreement are simpler and more helpful for the vast majority of remote sensing classification process than the kappa coefficient. Overall, high species classification can be achieved using strategically located RapidEye bands integrated with advanced processing algorithms.
Lidar-based individual tree species classification using convolutional neural network
NASA Astrophysics Data System (ADS)
Mizoguchi, Tomohiro; Ishii, Akira; Nakamura, Hiroyuki; Inoue, Tsuyoshi; Takamatsu, Hisashi
2017-06-01
Terrestrial lidar is commonly used for detailed documentation in the field of forest inventory investigation. Recent improvements of point cloud processing techniques enabled efficient and precise computation of an individual tree shape parameters, such as breast-height diameter, height, and volume. However, tree species are manually specified by skilled workers to date. Previous works for automatic tree species classification mainly focused on aerial or satellite images, and few works have been reported for classification techniques using ground-based sensor data. Several candidate sensors can be considered for classification, such as RGB or multi/hyper spectral cameras. Above all candidates, we use terrestrial lidar because it can obtain high resolution point cloud in the dark forest. We selected bark texture for the classification criteria, since they clearly represent unique characteristics of each tree and do not change their appearance under seasonable variation and aged deterioration. In this paper, we propose a new method for automatic individual tree species classification based on terrestrial lidar using Convolutional Neural Network (CNN). The key component is the creation step of a depth image which well describe the characteristics of each species from a point cloud. We focus on Japanese cedar and cypress which cover the large part of domestic forest. Our experimental results demonstrate the effectiveness of our proposed method.
Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J
2015-12-01
In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. (c) 2015 APA, all rights reserved).
Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J.
2016-01-01
In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers. PMID:26389526
Developing a methodology to predict oak wilt distribution using classification tree analysis
Marla C. Downing; Vernon L. Thomas; Robin M. Reich
2006-01-01
Oak wilt (Ceratocystis fagacearum), a fungal disease that causes some species of oak trees to wilt and die rapidly, is a threat to oak forested resources in 22 states in the United States. We developed a methodology for predicting the Potential Distribution of Oak Wilt (PDOW) using Anoka County, Minnesota as our study area. The PDOW utilizes GIS; the...
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-08-01
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Espinosa Aldama, Mariana
2015-04-01
The gravity apple tree is a genealogical tree of the gravitation theories developed during the past century. The graphic representation is full of information such as guides in heuristic principles, names of main proponents, dates and references for original articles (See under Supplementary Data for the graphic representation). This visual presentation and its particular classification allows a quick synthetic view for a plurality of theories, many of them well validated in the Solar System domain. Its diachronic structure organizes information in a shape of a tree following similarities through a formal concept analysis. It can be used for educational purposes or as a tool for philosophical discussion.
NASA Astrophysics Data System (ADS)
Wei, Hongqiang; Zhou, Guiyun; Zhou, Junjie
2018-04-01
The classification of leaf and wood points is an essential preprocessing step for extracting inventory measurements and canopy characterization of trees from the terrestrial laser scanning (TLS) data. The geometry-based approach is one of the widely used classification method. In the geometry-based method, it is common practice to extract salient features at one single scale before the features are used for classification. It remains unclear how different scale(s) used affect the classification accuracy and efficiency. To assess the scale effect on the classification accuracy and efficiency, we extracted the single-scale and multi-scale salient features from the point clouds of two oak trees of different sizes and conducted the classification on leaf and wood. Our experimental results show that the balanced accuracy of the multi-scale method is higher than the average balanced accuracy of the single-scale method by about 10 % for both trees. The average speed-up ratio of single scale classifiers over multi-scale classifier for each tree is higher than 30.
Automated rule-base creation via CLIPS-Induce
NASA Technical Reports Server (NTRS)
Murphy, Patrick M.
1994-01-01
Many CLIPS rule-bases contain one or more rule groups that perform classification. In this paper we describe CLIPS-Induce, an automated system for the creation of a CLIPS classification rule-base from a set of test cases. CLIPS-Induce consists of two components, a decision tree induction component and a CLIPS production extraction component. ID3, a popular decision tree induction algorithm, is used to induce a decision tree from the test cases. CLIPS production extraction is accomplished through a top-down traversal of the decision tree. Nodes of the tree are used to construct query rules, and branches of the tree are used to construct classification rules. The learned CLIPS productions may easily be incorporated into a large CLIPS system that perform tasks such as accessing a database or displaying information.
Teng, Ju-Hsi; Lin, Kuan-Chia; Ho, Bin-Shenq
2007-10-01
A community-based aboriginal study was conducted and analysed to explore the application of classification tree and logistic regression. A total of 1066 aboriginal residents in Yilan County were screened during 2003-2004. The independent variables include demographic characteristics, physical examinations, geographic location, health behaviours, dietary habits and family hereditary diseases history. Risk factors of cardiovascular diseases were selected as the dependent variables in further analysis. The completion rate for heath interview is 88.9%. The classification tree results find that if body mass index is higher than 25.72 kg m(-2) and the age is above 51 years, the predicted probability for number of cardiovascular risk factors > or =3 is 73.6% and the population is 322. If body mass index is higher than 26.35 kg m(-2) and geographical latitude of the village is lower than 24 degrees 22.8', the predicted probability for number of cardiovascular risk factors > or =4 is 60.8% and the population is 74. As the logistic regression results indicate that body mass index, drinking habit and menopause are the top three significant independent variables. The classification tree model specifically shows the discrimination paths and interactions between the risk groups. The logistic regression model presents and analyses the statistical independent factors of cardiovascular risks. Applying both models to specific situations will provide a different angle for the design and management of future health intervention plans after community-based study.
Vlsi implementation of flexible architecture for decision tree classification in data mining
NASA Astrophysics Data System (ADS)
Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak
2017-07-01
The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.
Sorting Olive Batches for the Milling Process Using Image Processing
Puerto, Daniel Aguilera; Martínez Gila, Diego Manuel; Gámez García, Javier; Gómez Ortega, Juan
2015-01-01
The quality of virgin olive oil obtained in the milling process is directly bound to the characteristics of the olives. Hence, the correct classification of the different incoming olive batches is crucial to reach the maximum quality of the oil. The aim of this work is to provide an automatic inspection system, based on computer vision, and to classify automatically different batches of olives entering the milling process. The classification is based on the differentiation between ground and tree olives. For this purpose, three different species have been studied (Picudo, Picual and Hojiblanco). The samples have been obtained by picking the olives directly from the tree or from the ground. The feature vector of the samples has been obtained on the basis of the olive image histograms. Moreover, different image preprocessing has been employed, and two classification techniques have been used: these are discriminant analysis and neural networks. The proposed methodology has been validated successfully, obtaining good classification results. PMID:26147729
Thin-layer chromatographic identification of Chinese propolis using chemometric fingerprinting.
Tang, Tie-xin; Guo, Wei-yan; Xu, Ye; Zhang, Si-ming; Xu, Xin-jun; Wang, Dong-mei; Zhao, Zhi-min; Zhu, Long-ping; Yang, De-po
2014-01-01
Poplar tree gum has a similar chemical composition and appearance to Chinese propolis (bee glue) and has been widely used as a counterfeit propolis because Chinese propolis is typically the poplar-type propolis, the chemical composition of which is determined mainly by the resin of poplar trees. The discrimination of Chinese propolis from poplar tree gum is a challenging task. To develop a rapid thin-layer chromatographic (TLC) identification method using chemometric fingerprinting to discriminate Chinese propolis from poplar tree gum. A new TLC method using a combination of ammonia and hydrogen peroxide vapours as the visualisation reagent was developed to characterise the chemical profile of Chinese propolis. Three separate people performed TLC on eight Chinese propolis samples and three poplar tree gum samples of varying origins. Five chemometric methods, including similarity analysis, hierarchical clustering, k-means clustering, neural network and support vector machine, were compared for use in classifying the samples based on their densitograms obtained from the TLC chromatograms via image analysis. Hierarchical clustering, neural network and support vector machine analyses achieved a correct classification rate of 100% in classifying the samples. A strategy for TLC identification of Chinese propolis using chemometric fingerprinting was proposed and it provided accurate sample classification. The study has shown that the TLC identification method using chemometric fingerprinting is a rapid, low-cost method for the discrimination of Chinese propolis from poplar tree gum and may be used for the quality control of Chinese propolis. Copyright © 2014 John Wiley & Sons, Ltd.
Seligman, D A; Pullinger, A G
2000-01-01
Confusion about the relationship of occlusion to temporomandibular disorders (TMD) persists. This study attempted to identify occlusal and attrition factors plus age that would characterize asymptomatic normal female subjects. A total of 124 female patients with intracapsular TMD were compared with 47 asymptomatic female controls for associations to 9 occlusal factors, 3 attrition severity measures, and age using classification tree, multiple stepwise logistic regression, and univariate analyses. Models were tested for accuracy (sensitivity and specificity) and total contribution to the variance. The classification tree model had 4 terminal nodes that used only anterior attrition and age. "Normals" were mainly characterized by low attrition levels, whereas patients had higher attrition and tended to be younger. The tree model was only moderately useful (sensitivity 63%, specificity 94%) in predicting normals. The logistic regression model incorporated unilateral posterior crossbite and mediotrusive attrition severity in addition to the 2 factors in the tree, but was slightly less accurate than the tree (sensitivity 51%, specificity 90%). When only occlusal factors were considered in the analysis, normals were additionally characterized by a lack of anterior open bite, smaller overjet, and smaller RCP-ICP slides. The log likelihood accounted for was similar for both the tree (pseudo R(2) = 29.38%; mean deviance = 0.95) and the multiple logistic regression (Cox Snell R(2) = 30.3%, mean deviance = 0.84) models. The occlusal and attrition factors studied were only moderately useful in differentiating normals from TMD patients.
Spatial modeling and classification of corneal shape.
Marsolo, Keith; Twa, Michael; Bullimore, Mark A; Parthasarathy, Srinivasan
2007-03-01
One of the most promising applications of data mining is in biomedical data used in patient diagnosis. Any method of data analysis intended to support the clinical decision-making process should meet several criteria: it should capture clinically relevant features, be computationally feasible, and provide easily interpretable results. In an initial study, we examined the feasibility of using Zernike polynomials to represent biomedical instrument data in conjunction with a decision tree classifier to distinguish between the diseased and non-diseased eyes. Here, we provide a comprehensive follow-up to that work, examining a second representation, pseudo-Zernike polynomials, to determine whether they provide any increase in classification accuracy. We compare the fidelity of both methods using residual root-mean-square (rms) error and evaluate accuracy using several classifiers: neural networks, C4.5 decision trees, Voting Feature Intervals, and Naïve Bayes. We also examine the effect of several meta-learning strategies: boosting, bagging, and Random Forests (RFs). We present results comparing accuracy as it relates to dataset and transformation resolution over a larger, more challenging, multi-class dataset. They show that classification accuracy is similar for both data transformations, but differs by classifier. We find that the Zernike polynomials provide better feature representation than the pseudo-Zernikes and that the decision trees yield the best balance of classification accuracy and interpretability.
NASA Astrophysics Data System (ADS)
Permata, Anggi; Juniansah, Anwar; Nurcahyati, Eka; Dimas Afrizal, Mousafi; Adnan Shafry Untoro, Muhammad; Arifatha, Na'ima; Ramadhani Yudha Adiwijaya, Raden; Farda, Nur Mohammad
2016-11-01
Landslide is an unpredictable natural disaster which commonly happens in highslope area. Aerial photography in small format is one of acquisition method that can reach and obtain high resolution spatial data faster than other methods, and provide data such as orthomosaic and Digital Surface Model (DSM). The study area contained landslide area in Clapar, Madukara District of Banjarnegara. Aerial photographs of landslide area provided advantage in objects visibility. Object's characters such as shape, size, and texture were clearly seen, therefore GEOBIA (Geography Object Based Image Analysis) was compatible as method for classifying land cover in study area. Dissimilar with PPA (PerPixel Analyst) method that used spectral information as base object detection, GEOBIA could use spatial elements as classification basis to establish a land cover map with better accuracy. GEOBIA method used classification hierarchy to divide post disaster land cover into three main objects: vegetation, landslide/soil, and building. Those three were required to obtain more detailed information that can be used in estimating loss caused by landslide and establishing land cover map in landslide area. Estimating loss in landslide area related to damage in Salak (Salacca zalacca) plantations. This estimation towards quantity of Salak tree that were drifted away by landslide was calculated in assumption that every tree damaged by landslide had same age and production class with other tree that weren't damaged. Loss calculation was done by approximating quantity of damaged trees in landslide area with data of trees around area that were acquired from GEOBIA classification method.
NASA Astrophysics Data System (ADS)
Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias
2018-03-01
This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.
NASA Astrophysics Data System (ADS)
Kaur, Parneet; Singh, Sukhwinder; Garg, Sushil; Harmanpreet
2010-11-01
In this paper we study about classification algorithms for farm DSS. By applying classification algorithms i.e. Limited search, ID3, CHAID, C4.5, Improved C4.5 and One VS all Decision Tree on common data set of crop with specified class, results are obtained. The tool used to derive results is SPINA. The graphical results obtained from tool are compared to suggest best technique to develop farm Decision Support System. This analysis would help to researchers to design effective and fast DSS for farmer to take decision for enhancing their yield.
76 FR 32933 - International Standard-Setting Activities
Federal Register 2010, 2011, 2012, 2013, 2014
2011-06-07
... or re-evaluation by JECFA. Proposed amendments to the Risk Analysis Principles for CCRVDF for comments and consideration at the next session. Proposed revision of Risk Analysis Principles Applied by... the Classification of Foods and Animal Feeds: Tree Nuts, Herbs and Spices. Draft Principle and...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Soner Yorgun, M.; Rood, Richard B.
An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
Soner Yorgun, M.; Rood, Richard B.
2016-11-11
An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
NASA Astrophysics Data System (ADS)
Wang, Audrey; Price, David T.
2007-03-01
A simple integrated algorithm was developed to relate global climatology to distributions of tree plant functional types (PFT). Multivariate cluster analysis was performed to analyze the statistical homogeneity of the climate space occupied by individual tree PFTs. Forested regions identified from the satellite-based GLC2000 classification were separated into tropical, temperate, and boreal sub-PFTs for use in the Canadian Terrestrial Ecosystem Model (CTEM). Global data sets of monthly minimum temperature, growing degree days, an index of climatic moisture, and estimated PFT cover fractions were then used as variables in the cluster analysis. The statistical results for individual PFT clusters were found consistent with other global-scale classifications of dominant vegetation. As an improvement of the quantification of the climatic limitations on PFT distributions, the results also demonstrated overlapping of PFT cluster boundaries that reflected vegetation transitions, for example, between tropical and temperate biomes. The resulting global database should provide a better basis for simulating the interaction of climate change and terrestrial ecosystem dynamics using global vegetation models.
Almeida, Mariana R; Fidelis, Carlos H V; Barata, Lauro E S; Poppi, Ronei J
2013-12-15
The Amazon tree Aniba rosaeodora Ducke (rosewood) provides an essential oil valuable for the perfume industry, but after decades of predatory extraction it is at risk of extinction. The extraction of the essential oil from wood implies the cutting of the tree, and then the study of oil extracted from the leaves is important as a sustainable alternative. The goal of this study was to test the applicability of Raman spectroscopy and Partial Least Square Discriminant Analysis (PLS-DA) as means to classify the essential oil extracted from different parties (wood, leaves and branches) of the Brazilian tree A. rosaeodora. For the development of classification models, the Raman spectra were split into two sets: training and test. The value of the limit that separates the classes was calculated based on the distribution of samples of training. This value was calculated in a manner that the classes are divided with a lower probability of incorrect classification for future estimates. The best model presented sensitivity and specificity of 100%, predictive accuracy and efficiency of 100%. These results give an overall vision of the behavior of the model, but do not give information about individual samples; in this case, the confidence interval for each sample of classification was also calculated using the resampling bootstrap technique. The methodology developed have the potential to be an alternative for standard procedures used for oil analysis and it can be employed as screening method, since it is fast, non-destructive and robust. © 2013 Elsevier B.V. All rights reserved.
Indicators of Terrorism Vulnerability in Africa
2015-03-26
the terror threat and vulnerabilities across Africa. Key words: Terrorism, Africa, Negative Binomial Regression, Classification Tree iv I would like...31 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Log -likelihood...70 viii Page 5.3 Classification Tree Description
NASA Technical Reports Server (NTRS)
Tian, Jianhui; Porter, Adam; Zelkowitz, Marvin V.
1992-01-01
Identification of high cost modules has been viewed as one mechanism to improve overall system reliability, since such modules tend to produce more than their share of problems. A decision tree model was used to identify such modules. In this current paper, a previously developed axiomatic model of program complexity is merged with the previously developed decision tree process for an improvement in the ability to identify such modules. This improvement was tested using data from the NASA Software Engineering Laboratory.
Seera, Manjeevan; Lim, Chee Peng; Ishak, Dahaman; Singh, Harapajan
2012-01-01
In this paper, a novel approach to detect and classify comprehensive fault conditions of induction motors using a hybrid fuzzy min-max (FMM) neural network and classification and regression tree (CART) is proposed. The hybrid model, known as FMM-CART, exploits the advantages of both FMM and CART for undertaking data classification and rule extraction problems. A series of real experiments is conducted, whereby the motor current signature analysis method is applied to form a database comprising stator current signatures under different motor conditions. The signal harmonics from the power spectral density are extracted as discriminative input features for fault detection and classification with FMM-CART. A comprehensive list of induction motor fault conditions, viz., broken rotor bars, unbalanced voltages, stator winding faults, and eccentricity problems, has been successfully classified using FMM-CART with good accuracy rates. The results are comparable, if not better, than those reported in the literature. Useful explanatory rules in the form of a decision tree are also elicited from FMM-CART to analyze and understand different fault conditions of induction motors.
[An object-based information extraction technology for dominant tree species group types].
Tian, Tian; Fan, Wen-yi; Lu, Wei; Xiao, Xiang
2015-06-01
Information extraction for dominant tree group types is difficult in remote sensing image classification, howevers, the object-oriented classification method using high spatial resolution remote sensing data is a new method to realize the accurate type information extraction. In this paper, taking the Jiangle Forest Farm in Fujian Province as the research area, based on the Quickbird image data in 2013, the object-oriented method was adopted to identify the farmland, shrub-herbaceous plant, young afforested land, Pinus massoniana, Cunninghamia lanceolata and broad-leave tree types. Three types of classification factors including spectral, texture, and different vegetation indices were used to establish a class hierarchy. According to the different levels, membership functions and the decision tree classification rules were adopted. The results showed that the method based on the object-oriented method by using texture, spectrum and the vegetation indices achieved the classification accuracy of 91.3%, which was increased by 5.7% compared with that by only using the texture and spectrum.
Instruction-matrix-based genetic programming.
Li, Gang; Wang, Jin Feng; Lee, Kin Hong; Leung, Kwong-Sak
2008-08-01
In genetic programming (GP), evolving tree nodes separately would reduce the huge solution space. However, tree nodes are highly interdependent with respect to their fitness. In this paper, we propose a new GP framework, namely, instruction-matrix (IM)-based GP (IMGP), to handle their interactions. IMGP maintains an IM to evolve tree nodes and subtrees separately. IMGP extracts program trees from an IM and updates the IM with the information of the extracted program trees. As the IM actually keeps most of the information of the schemata of GP and evolves the schemata directly, IMGP is effective and efficient. Our experimental results on benchmark problems have verified that IMGP is not only better than those of canonical GP in terms of the qualities of the solutions and the number of program evaluations, but they are also better than some of the related GP algorithms. IMGP can also be used to evolve programs for classification problems. The classifiers obtained have higher classification accuracies than four other GP classification algorithms on four benchmark classification problems. The testing errors are also comparable to or better than those obtained with well-known classifiers. Furthermore, an extended version, called condition matrix for rule learning, has been used successfully to handle multiclass classification problems.
Classification tree for the assessment of sedentary lifestyle among hypertensive.
Castelo Guedes Martins, Larissa; Venícios de Oliveira Lopes, Marcos; Gomes Guedes, Nirla; Paixão de Menezes, Angélica; de Oliveira Farias, Odaleia; Alves Dos Santos, Naftale
2016-04-01
To develop a classification tree of clinical indicators for the correct prediction of the nursing diagnosis "Sedentary lifestyle" (SL) in people with high blood pressure (HTN). A cross-sectional study conducted in an outpatient care center specializing in high blood pressure and Mellitus diabetes located in northeastern Brazil. The sample consisted of 285 people between 19 and 59 years old diagnosed with high blood pressure and was applied an interview and physical examination, obtaining socio-demographic information, related factors and signs and symptoms that made the defining characteristics for the diagnosis under study. The tree was generated using the CHAID algorithm (Chi-square Automatic Interaction Detection). The construction of the decision tree allowed establishing the interactions between clinical indicators that facilitate a probabilistic analysis of multiple situations allowing quantify the probability of an individual presenting a sedentary lifestyle. The tree included the clinical indicator Choose daily routine without exercise as the first node. People with this indicator showed a probability of 0.88 of presenting the SL. The second node was composed of the indicator Does not perform physical activity during leisure, with 0.99 probability of presenting the SL with these two indicators. The predictive capacity of the tree was established at 69.5%. Decision trees help nurses who care HTN people in decision-making in assessing the characteristics that increase the probability of SL nursing diagnosis, optimizing the time for diagnostic inference.
NASA Astrophysics Data System (ADS)
Ghosh, Aniruddha; Fassnacht, Fabian Ewald; Joshi, P. K.; Koch, Barbara
2014-02-01
Knowledge of tree species distribution is important worldwide for sustainable forest management and resource evaluation. The accuracy and information content of species maps produced using remote sensing images vary with scale, sensor (optical, microwave, LiDAR), classification algorithm, verification design and natural conditions like tree age, forest structure and density. Imaging spectroscopy reduces the inaccuracies making use of the detailed spectral response. However, the scale effect still has a strong influence and cannot be neglected. This study aims to bridge the knowledge gap in understanding the scale effect in imaging spectroscopy when moving from 4 to 30 m pixel size for tree species mapping, keeping in mind that most current and future hyperspectral satellite based sensors work with spatial resolution around 30 m or more. Two airborne (HyMAP) and one spaceborne (Hyperion) imaging spectroscopy dataset with pixel sizes of 4, 8 and 30 m, respectively were available to examine the effect of scale over a central European forest. The forest under examination is a typical managed forest with relatively homogenous stands featuring mostly two canopy layers. Normalized digital surface model (nDSM) derived from LiDAR data was used additionally to examine the effect of height information in tree species mapping. Six different sets of predictor variables (reflectance value of all bands, selected components of a Minimum Noise Fraction (MNF), Vegetation Indices (VI) and each of these sets combined with LiDAR derived height) were explored at each scale. Supervised kernel based (Support Vector Machines) and ensemble based (Random Forest) machine learning algorithms were applied on the dataset to investigate the effect of the classifier. Iterative bootstrap-validation with 100 iterations was performed for classification model building and testing for all the trials. For scale, analysis of overall classification accuracy and kappa values indicated that 8 m spatial resolution (reaching kappa values of over 0.83) slightly outperformed the results obtained from 4 m for the study area and five tree species under examination. The 30 m resolution Hyperion image produced sound results (kappa values of over 0.70), which in some areas of the test site were comparable with the higher spatial resolution imagery when qualitatively assessing the map outputs. Considering input predictor sets, MNF bands performed best at 4 and 8 m resolution. Optical bands were found to be best for 30 m spatial resolution. Classification with MNF as input predictors produced better visual appearance of tree species patches when compared with reference maps. Based on the analysis, it was concluded that there is no significant effect of height information on tree species classification accuracies for the present framework and study area. Furthermore, in the examined cases there was no single best choice among the two classifiers across scales and predictors. It can be concluded that tree species mapping from imaging spectroscopy for forest sites comparable to the one under investigation is possible with reliable accuracies not only from airborne but also from spaceborne imaging spectroscopy datasets.
Salas-Leiva, Dayana E; Meerow, Alan W; Calonje, Michael; Griffith, M Patrick; Francisco-Ortega, Javier; Nakamura, Kyoko; Stevenson, Dennis W; Lewis, Carl E; Namoff, Sandra
2013-11-01
Despite a recent new classification, a stable phylogeny for the cycads has been elusive, particularly regarding resolution of Bowenia, Stangeria and Dioon. In this study, five single-copy nuclear genes (SCNGs) are applied to the phylogeny of the order Cycadales. The specific aim is to evaluate several gene tree-species tree reconciliation approaches for developing an accurate phylogeny of the order, to contrast them with concatenated parsimony analysis and to resolve the erstwhile problematic phylogenetic position of these three genera. DNA sequences of five SCNGs were obtained for 20 cycad species representing all ten genera of Cycadales. These were analysed with parsimony, maximum likelihood (ML) and three Bayesian methods of gene tree-species tree reconciliation, using Cycas as the outgroup. A calibrated date estimation was developed with Bayesian methods, and biogeographic analysis was also conducted. Concatenated parsimony, ML and three species tree inference methods resolve exactly the same tree topology with high support at most nodes. Dioon and Bowenia are the first and second branches of Cycadales after Cycas, respectively, followed by an encephalartoid clade (Macrozamia-Lepidozamia-Encephalartos), which is sister to a zamioid clade, of which Ceratozamia is the first branch, and in which Stangeria is sister to Microcycas and Zamia. A single, well-supported phylogenetic hypothesis of the generic relationships of the Cycadales is presented. However, massive extinction events inferred from the fossil record that eliminated broader ancestral distributions within Zamiaceae compromise accurate optimization of ancestral biogeographical areas for that hypothesis. While major lineages of Cycadales are ancient, crown ages of all modern genera are no older than 12 million years, supporting a recent hypothesis of mostly Miocene radiations. This phylogeny can contribute to an accurate infrafamilial classification of Zamiaceae.
NASA Astrophysics Data System (ADS)
Zou, Xiaoliang; Zhao, Guihua; Li, Jonathan; Yang, Yuanxi; Fang, Yong
2016-06-01
With the rapid developments of the sensor technology, high spatial resolution imagery and airborne Lidar point clouds can be captured nowadays, which make classification, extraction, evaluation and analysis of a broad range of object features available. High resolution imagery, Lidar dataset and parcel map can be widely used for classification as information carriers. Therefore, refinement of objects classification is made possible for the urban land cover. The paper presents an approach to object based image analysis (OBIA) combing high spatial resolution imagery and airborne Lidar point clouds. The advanced workflow for urban land cover is designed with four components. Firstly, colour-infrared TrueOrtho photo and laser point clouds were pre-processed to derive the parcel map of water bodies and nDSM respectively. Secondly, image objects are created via multi-resolution image segmentation integrating scale parameter, the colour and shape properties with compactness criterion. Image can be subdivided into separate object regions. Thirdly, image objects classification is performed on the basis of segmentation and a rule set of knowledge decision tree. These objects imagery are classified into six classes such as water bodies, low vegetation/grass, tree, low building, high building and road. Finally, in order to assess the validity of the classification results for six classes, accuracy assessment is performed through comparing randomly distributed reference points of TrueOrtho imagery with the classification results, forming the confusion matrix and calculating overall accuracy and Kappa coefficient. The study area focuses on test site Vaihingen/Enz and a patch of test datasets comes from the benchmark of ISPRS WG III/4 test project. The classification results show higher overall accuracy for most types of urban land cover. Overall accuracy is 89.5% and Kappa coefficient equals to 0.865. The OBIA approach provides an effective and convenient way to combine high resolution imagery and Lidar ancillary data for classification of urban land cover.
Ramírez, J; Górriz, J M; Segovia, F; Chaves, R; Salas-Gonzalez, D; López, M; Alvarez, I; Padilla, P
2010-03-19
This letter shows a computer aided diagnosis (CAD) technique for the early detection of the Alzheimer's disease (AD) by means of single photon emission computed tomography (SPECT) image classification. The proposed method is based on partial least squares (PLS) regression model and a random forest (RF) predictor. The challenge of the curse of dimensionality is addressed by reducing the large dimensionality of the input data by downscaling the SPECT images and extracting score features using PLS. A RF predictor then forms an ensemble of classification and regression tree (CART)-like classifiers being its output determined by a majority vote of the trees in the forest. A baseline principal component analysis (PCA) system is also developed for reference. The experimental results show that the combined PLS-RF system yields a generalization error that converges to a limit when increasing the number of trees in the forest. Thus, the generalization error is reduced when using PLS and depends on the strength of the individual trees in the forest and the correlation between them. Moreover, PLS feature extraction is found to be more effective for extracting discriminative information from the data than PCA yielding peak sensitivity, specificity and accuracy values of 100%, 92.7%, and 96.9%, respectively. Moreover, the proposed CAD system outperformed several other recently developed AD CAD systems. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
Analysis of data mining classification by comparison of C4.5 and ID algorithms
NASA Astrophysics Data System (ADS)
Sudrajat, R.; Irianingsih, I.; Krisnawan, D.
2017-01-01
The rapid development of information technology, triggered by the intensive use of information technology. For example, data mining widely used in investment. Many techniques that can be used assisting in investment, the method that used for classification is decision tree. Decision tree has a variety of algorithms, such as C4.5 and ID3. Both algorithms can generate different models for similar data sets and different accuracy. C4.5 and ID3 algorithms with discrete data provide accuracy are 87.16% and 99.83% and C4.5 algorithm with numerical data is 89.69%. C4.5 and ID3 algorithms with discrete data provides 520 and 598 customers and C4.5 algorithm with numerical data is 546 customers. From the analysis of the both algorithm it can classified quite well because error rate less than 15%.
Reticulate classification of mosaic microbial genomes using NeAT website.
Lima-Mendez, Gipsi
2012-01-01
The tree of life is the classical representation of the evolutionary relationships between existent species. A tree is appropriate to display the divergence of species through mutation, i.e., by vertical descent. However, lateral gene transfer (LGT) is excluded from such representations. When LGT contribution to genome evolution cannot be neglected (e.g., for prokaryotes and mobile genetic elements), the tree becomes misleading. Networks appear as an intuitive way to represent both vertical and horizontal relationships, while overlapping groups within such graphs are more suitable for their classification. Here, we describe a method to represent both vertical and horizontal relationships. We start with a set of genomes whose coded proteins have been grouped into families based on sequence similarity. Next, all pairs of genomes are compared, counting the number of proteins classified into the same family. From this comparison, we derive a weighted graph where genomes with a significant number of similar proteins are linked. Finally, we apply a two-step clustering of this graph to produce a classification where nodes can be assigned to multiple clusters. The procedure can be performed using the Network Analysis Tools (NeAT) website.
NASA Astrophysics Data System (ADS)
Wang, Le
2003-10-01
Modern forest management poses an increasing need for detailed knowledge of forest information at different spatial scales. At the forest level, the information for tree species assemblage is desired whereas at or below the stand level, individual tree related information is preferred. Remote Sensing provides an effective tool to extract the above information at multiple spatial scales in the continuous time domain. To date, the increasing volume and readily availability of high-spatial-resolution data have lead to a much wider application of remotely sensed products. Nevertheless, to make effective use of the improving spatial resolution, conventional pixel-based classification methods are far from satisfactory. Correspondingly, developing object-based methods becomes a central challenge for researchers in the field of Remote Sensing. This thesis focuses on the development of methods for accurate individual tree identification and tree species classification. We develop a method in which individual tree crown boundaries and treetop locations are derived under a unified framework. We apply a two-stage approach with edge detection followed by marker-controlled watershed segmentation. Treetops are modeled from radiometry and geometry aspects. Specifically, treetops are assumed to be represented by local radiation maxima and to be located near the center of the tree-crown. As a result, a marker image was created from the derived treetop to guide a watershed segmentation to further differentiate overlapping trees and to produce a segmented image comprised of individual tree crowns. The image segmentation method developed achieves a promising result for a 256 x 256 CASI image. Then further effort is made to extend our methods to the multiscales which are constructed from a wavelet decomposition. A scale consistency and geometric consistency are designed to examine the gradients along the scale-space for the purpose of separating true crown boundary from unwanted textures occurring due to branches and twigs. As a result from the inverse wavelet transform, the tree crown boundary is enhanced while the unwanted textures are suppressed. Based on the enhanced image, an improvement is achieved when applying the two-stage methods to a high resolution aerial photograph. To improve tree species classification, we develop a new method to choose the optimal scale parameter with the aid of Bhattacharya Distance (BD), a well-known index of class separability in traditional pixel-based classification. The optimal scale parameter is then fed in the process of a region-growing-based segmentation as a break-off value. Our object classification achieves a better accuracy in separating tree species when compared to the conventional Maximum Likelihood Classification (MLC). In summary, we develop two object-based methods for identifying individual trees and classifying tree species from high-spatial resolution imagery. Both methods achieve promising results and will promote integration of Remote Sensing and GIS in forest applications.
Dyer, Betsey D.; Kahn, Michael J.; LeBlanc, Mark D.
2008-01-01
Classification and regression tree (CART) analysis was applied to genome-wide tetranucleotide frequencies (genomic signatures) of 195 archaea and bacteria. Although genomic signatures have typically been used to classify evolutionary divergence, in this study, convergent evolution was the focus. Temperature optima for most of the organisms examined could be distinguished by CART analyses of tetranucleotide frequencies. This suggests that pervasive (nonlinear) qualities of genomes may reflect certain environmental conditions (such as temperature) in which those genomes evolved. The predominant use of GAGA and AGGA as the discriminating tetramers in CART models suggests that purine-loading and codon biases of thermophiles may explain some of the results. PMID:19054742
Analysis of dual tree M-band wavelet transform based features for brain image classification.
Ayalapogu, Ratna Raju; Pabboju, Suresh; Ramisetty, Rajeswara Rao
2018-04-29
The most complex organ in the human body is the brain. The unrestrained growth of cells in the brain is called a brain tumor. The cause of a brain tumor is still unknown and the survival rate is lower than other types of cancers. Hence, early detection is very important for proper treatment. In this study, an efficient computer-aided diagnosis (CAD) system is presented for brain image classification by analyzing MRI of the brain. At first, the MRI brain images of normal and abnormal categories are modeled by using the statistical features of dual tree m-band wavelet transform (DTMBWT). A maximum margin classifier, support vector machine (SVM) is then used for the classification and validated with k-fold approach. Results show that the system provides promising results on a repository of molecular brain neoplasia data (REMBRANDT) with 97.5% accuracy using 4 th level statistical features of DTMBWT. Viewing the experimental results, we conclude that the system gives a satisfactory performance for the brain image classification. © 2018 International Society for Magnetic Resonance in Medicine.
[Changes of Forest Canopy Spectral Reflectance with Seasons in Lang Ya Mountains].
Li, Wei-tao; Peng, Dao-li; Zhang, Yan; Wu, Jian; Chen, Tai-sheng
2015-08-01
The physiological mechanism and ecological structure of forest trees can change with the changes of years. In a certain extent, the changes were expressed through the canopy spectral features. The mastery of changing rules about spectral characteristics of trees over the years is benefit to remote sensing interpretation and provide scientific basis for the classification of different trees. The study adopted high-resolution spectrometer to measure the canopy spectral characteristics for seven major deciduous trees and seven evergreen trees to gain the spectrum curve of four different ages and calculate the first derivative curve. The analysis of changing rules about spectral characteristics of different deciduous trees and evergreen trees and the comparison of changes about spectrum of various trees in the visible and infrared band could find the best year and best band for identification of trees. The results showed that the canopy spectral reflectance of deciduous and evergreen trees increases with the increase of age. And the spectral changes of two species were most obvious in the near infrared band.
Barlin, Joyce N; Zhou, Qin; St Clair, Caryn M; Iasonos, Alexia; Soslow, Robert A; Alektiar, Kaled M; Hensley, Martee L; Leitao, Mario M; Barakat, Richard R; Abu-Rustum, Nadeem R
2013-09-01
The objectives of the study are to evaluate which clinicopathologic factors influenced overall survival (OS) in endometrial carcinoma and to determine if the surgical effort to assess para-aortic (PA) lymph nodes (LNs) at initial staging surgery impacts OS. All patients diagnosed with endometrial cancer from 1/1993-12/2011 who had LNs excised were included. PALN assessment was defined by the identification of one or more PALNs on final pathology. A multivariate analysis was performed to assess the effect of PALNs on OS. A form of recursive partitioning called classification and regression tree (CART) analysis was implemented. Variables included: age, stage, tumor subtype, grade, myometrial invasion, total LNs removed, evaluation of PALNs, and adjuvant chemotherapy. The cohort included 1920 patients, with a median age of 62 years. The median number of LNs removed was 16 (range, 1-99). The removal of PALNs was not associated with OS (P=0.450). Using the CART hierarchically, stage I vs. stages II-IV and grades 1-2 vs. grade 3 emerged as predictors of OS. If the tree was allowed to grow, further branching was based on age and myometrial invasion. Total number of LNs removed and assessment of PALNs as defined in this study were not predictive of OS. This innovative CART analysis emphasized the importance of proper stage assignment and a binary grading system in impacting OS. Notably, the total number of LNs removed and specific evaluation of PALNs as defined in this study were not important predictors of OS. Copyright © 2013 Elsevier Inc. All rights reserved.
Simmons, T; Goodburn, B; Singhrao, S K
2016-01-01
This feasibility study was undertaken to describe and record the histological characteristics of burnt and unburnt cranial bone fragments from human and non-human bones. Reference series of fully mineralized, transverse sections of cranial bone, from all variables and specimen states, were prepared by manual cutting and semi-automated grinding and polishing methods. A photomicrograph catalogue reflecting differences in burnt and unburnt bone from human and non-humans was recorded and qualitative analysis was performed using an established classification system based on primary bone characteristics. The histomorphology associated with human and non-human samples was, for the main part, preserved following burning at high temperature. Clearly, fibro-lamellar complex tissue subtypes, such as plexiform or laminar primary bone, were only present in non-human bones. A decision tree analysis based on histological features provided a definitive identification key for distinguishing human from non-human bone, with an accuracy of 100%. The decision tree for samples where burning was unknown was 96% accurate, and multi-step classification to taxon was possible with 100% accuracy. The results of this feasibility study strongly suggest that histology remains a viable alternative technique if fragments of cranial bone require forensic examination in both burnt and unburnt states. The decision tree analysis may provide an additional but vital tool to enhance data interpretation. Further studies are needed to assess variation in histomorphology taking into account other cranial bones, ontogeny, species and burning conditions. © The Author(s) 2015.
NASA Astrophysics Data System (ADS)
Bigdeli, Behnaz; Pahlavani, Parham
2017-01-01
Interpretation of synthetic aperture radar (SAR) data processing is difficult because the geometry and spectral range of SAR are different from optical imagery. Consequently, SAR imaging can be a complementary data to multispectral (MS) optical remote sensing techniques because it does not depend on solar illumination and weather conditions. This study presents a multisensor fusion of SAR and MS data based on the use of classification and regression tree (CART) and support vector machine (SVM) through a decision fusion system. First, different feature extraction strategies were applied on SAR and MS data to produce more spectral and textural information. To overcome the redundancy and correlation between features, an intrinsic dimension estimation method based on noise-whitened Harsanyi, Farrand, and Chang determines the proper dimension of the features. Then, principal component analysis and independent component analysis were utilized on stacked feature space of two data. Afterward, SVM and CART classified each reduced feature space. Finally, a fusion strategy was utilized to fuse the classification results. To show the effectiveness of the proposed methodology, single classification on each data was compared to the obtained results. A coregistered Radarsat-2 and WorldView-2 data set from San Francisco, USA, was available to examine the effectiveness of the proposed method. The results show that combinations of SAR data with optical sensor based on the proposed methodology improve the classification results for most of the classes. The proposed fusion method provided approximately 93.24% and 95.44% for two different areas of the data.
Forest tree species discrimination in western Himalaya using EO-1 Hyperion
NASA Astrophysics Data System (ADS)
George, Rajee; Padalia, Hitendra; Kushwaha, S. P. S.
2014-05-01
The information acquired in the narrow bands of hyperspectral remote sensing data has potential to capture plant species spectral variability, thereby improving forest tree species mapping. This study assessed the utility of spaceborne EO-1 Hyperion data in discrimination and classification of broadleaved evergreen and conifer forest tree species in western Himalaya. The pre-processing of 242 bands of Hyperion data resulted into 160 noise-free and vertical stripe corrected reflectance bands. Of these, 29 bands were selected through step-wise exclusion of bands (Wilk's Lambda). Spectral Angle Mapper (SAM) and Support Vector Machine (SVM) algorithms were applied to the selected bands to assess their effectiveness in classification. SVM was also applied to broadband data (Landsat TM) to compare the variation in classification accuracy. All commonly occurring six gregarious tree species, viz., white oak, brown oak, chir pine, blue pine, cedar and fir in western Himalaya could be effectively discriminated. SVM produced a better species classification (overall accuracy 82.27%, kappa statistic 0.79) than SAM (overall accuracy 74.68%, kappa statistic 0.70). It was noticed that classification accuracy achieved with Hyperion bands was significantly higher than Landsat TM bands (overall accuracy 69.62%, kappa statistic 0.65). Study demonstrated the potential utility of narrow spectral bands of Hyperion data in discriminating tree species in a hilly terrain.
Jose M. Iniguez; Joseph L. Ganey; Peter J. Daughtery; John D. Bailey
2005-01-01
The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such a system we qualitatively and quantitatively compared a hierarchical (Wardâs) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups represented by...
Jose M. Iniguez; Joseph L. Ganey; Peter J. Daugherty; John D. Bailey
2005-01-01
The objective of this study was to develop a rule based cover type classification system for the forest and woodland vegetation in the Sky Islands of southeastern Arizona. In order to develop such system we qualitatively and quantitatively compared a hierarchical (Wardâs) and a non-hierarchical (k-means) clustering method. Ecologically, unique groups and plots...
Identification and Mapping of Tree Species in Urban Areas Using WORLDVIEW-2 Imagery
NASA Astrophysics Data System (ADS)
Mustafa, Y. T.; Habeeb, H. N.; Stein, A.; Sulaiman, F. Y.
2015-10-01
Monitoring and mapping of urban trees are essential to provide urban forestry authorities with timely and consistent information. Modern techniques increasingly facilitate these tasks, but require the development of semi-automatic tree detection and classification methods. In this article, we propose an approach to delineate and map the crown of 15 tree species in the city of Duhok, Kurdistan Region of Iraq using WorldView-2 (WV-2) imagery. A tree crown object is identified first and is subsequently delineated as an image object (IO) using vegetation indices and texture measurements. Next, three classification methods: Maximum Likelihood, Neural Network, and Support Vector Machine were used to classify IOs using selected IO features. The best results are obtained with Support Vector Machine classification that gives the best map of urban tree species in Duhok. The overall accuracy was between 60.93% to 88.92% and κ-coefficient was between 0.57 to 0.75. We conclude that fifteen tree species were identified and mapped at a satisfactory accuracy in urban areas of this study.
Vector quantizer designs for joint compression and terrain categorization of multispectral imagery
NASA Technical Reports Server (NTRS)
Gorman, John D.; Lyons, Daniel F.
1994-01-01
Two vector quantizer designs for compression of multispectral imagery and their impact on terrain categorization performance are evaluated. The mean-squared error (MSE) and classification performance of the two quantizers are compared, and it is shown that a simple two-stage design minimizing MSE subject to a constraint on classification performance has a significantly better classification performance than a standard MSE-based tree-structured vector quantizer followed by maximum likelihood classification. This improvement in classification performance is obtained with minimal loss in MSE performance. The results show that it is advantageous to tailor compression algorithm designs to the required data exploitation tasks. Applications of joint compression/classification include compression for the archival or transmission of Landsat imagery that is later used for land utility surveys and/or radiometric analysis.
Genome-Based Taxonomic Classification of Bacteroidetes
Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; ...
2016-12-20
The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less
Genome-Based Taxonomic Classification of Bacteroidetes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina
The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogeneticmore » analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved.« less
Genome-Based Taxonomic Classification of Bacteroidetes
Hahnke, Richard L.; Meier-Kolthoff, Jan P.; García-López, Marina; Mukherjee, Supratim; Huntemann, Marcel; Ivanova, Natalia N.; Woyke, Tanja; Kyrpides, Nikos C.; Klenk, Hans-Peter; Göker, Markus
2016-01-01
The bacterial phylum Bacteroidetes, characterized by a distinct gliding motility, occurs in a broad variety of ecosystems, habitats, life styles, and physiologies. Accordingly, taxonomic classification of the phylum, based on a limited number of features, proved difficult and controversial in the past, for example, when decisions were based on unresolved phylogenetic trees of the 16S rRNA gene sequence. Here we use a large collection of type-strain genomes from Bacteroidetes and closely related phyla for assessing their taxonomy based on the principles of phylogenetic classification and trees inferred from genome-scale data. No significant conflict between 16S rRNA gene and whole-genome phylogenetic analysis is found, whereas many but not all of the involved taxa are supported as monophyletic groups, particularly in the genome-scale trees. Phenotypic and phylogenomic features support the separation of Balneolaceae as new phylum Balneolaeota from Rhodothermaeota and of Saprospiraceae as new class Saprospiria from Chitinophagia. Epilithonimonas is nested within the older genus Chryseobacterium and without significant phenotypic differences; thus merging the two genera is proposed. Similarly, Vitellibacter is proposed to be included in Aequorivita. Flexibacter is confirmed as being heterogeneous and dissected, yielding six distinct genera. Hallella seregens is a later heterotypic synonym of Prevotella dentalis. Compared to values directly calculated from genome sequences, the G+C content mentioned in many species descriptions is too imprecise; moreover, corrected G+C content values have a significantly better fit to the phylogeny. Corresponding emendations of species descriptions are provided where necessary. Whereas most observed conflict with the current classification of Bacteroidetes is already visible in 16S rRNA gene trees, as expected whole-genome phylogenies are much better resolved. PMID:28066339
Proceedings of the 14th Annual Software Engineering Workshop
NASA Technical Reports Server (NTRS)
1989-01-01
Several software related topics are presented. Topics covered include studies and experiment at the Software Engineering Laboratory at the Goddard Space Flight Center, predicting project success from the Software Project Management Process, software environments, testing in a reuse environment, domain directed reuse, and classification tree analysis using the Amadeus measurement and empirical analysis.
Acknowledging Different Needs: Developing a Taxonomy of Welfare Leavers.
ERIC Educational Resources Information Center
Julnes, George; Hayashi, Kentaro; Anderson, Steven
2001-01-01
Used cluster analysis of survey data for 506 respondents to create a taxonomy of welfare leavers in Illinois based on their self-reported well-being after leaving welfare. Used classification tree analysis to identify factors associated with different types of leavers. Findings highlight the existence of many marginally successful leavers who…
Assessing wildfire risks at multiple spatial scales
Justin Fitch
2008-01-01
In continuation of the efforts to advance wildfire science and develop tools for wildland fire managers, a spatial wildfire risk assessment was carried out using Classification and Regression Tree analysis (CART) and Geographic Information Systems (GIS). The analysis was performed at two scales. The small-scale assessment covered the entire state of New Mexico, while...
NASA Astrophysics Data System (ADS)
Hao Chiang, Shou; Valdez, Miguel; Chen, Chi-Farn
2016-06-01
Forest is a very important ecosystem and natural resource for living things. Based on forest inventories, government is able to make decisions to converse, improve and manage forests in a sustainable way. Field work for forestry investigation is difficult and time consuming, because it needs intensive physical labor and the costs are high, especially surveying in remote mountainous regions. A reliable forest inventory can give us a more accurate and timely information to develop new and efficient approaches of forest management. The remote sensing technology has been recently used for forest investigation at a large scale. To produce an informative forest inventory, forest attributes, including tree species are unavoidably required to be considered. In this study the aim is to classify forest tree species in Erdenebulgan County, Huwsgul province in Mongolia, using Maximum Entropy method. The study area is covered by a dense forest which is almost 70% of total territorial extension of Erdenebulgan County and is located in a high mountain region in northern Mongolia. For this study, Landsat satellite imagery and a Digital Elevation Model (DEM) were acquired to perform tree species mapping. The forest tree species inventory map was collected from the Forest Division of the Mongolian Ministry of Nature and Environment as training data and also used as ground truth to perform the accuracy assessment of the tree species classification. Landsat images and DEM were processed for maximum entropy modeling, and this study applied the model with two experiments. The first one is to use Landsat surface reflectance for tree species classification; and the second experiment incorporates terrain variables in addition to the Landsat surface reflectance to perform the tree species classification. All experimental results were compared with the tree species inventory to assess the classification accuracy. Results show that the second one which uses Landsat surface reflectance coupled with terrain variables produced better result, with the higher overall accuracy and kappa coefficient than first experiment. The results indicate that the Maximum Entropy method is an applicable, and to classify tree species using satellite imagery data coupled with terrain information can improve the classification of tree species in the study area.
Garland, Ellen C; Castellote, Manuel; Berchok, Catherine L
2015-06-01
Beluga whales, Delphinapterus leucas, have a graded call system; call types exist on a continuum making classification challenging. A description of vocalizations from the eastern Beaufort Sea beluga population during its spring migration are presented here, using both a non-parametric classification tree analysis (CART), and a Random Forest analysis. Twelve frequency and duration measurements were made on 1019 calls recorded over 14 days off Icy Cape, Alaska, resulting in 34 identifiable call types with 83% agreement in classification for both CART and Random Forest analyses. This high level of agreement in classification, with an initial subjective classification of calls into 36 categories, demonstrates that the methods applied here provide a quantitative analysis of a graded call dataset. Further, as calls cannot be attributed to individuals using single sensor passive acoustic monitoring efforts, these methods provide a comprehensive analysis of data where the influence of pseudo-replication of calls from individuals is unknown. This study is the first to describe the vocal repertoire of a beluga population using a robust and repeatable methodology. A baseline eastern Beaufort Sea beluga population repertoire is presented here, against which the call repertoire of other seasonally sympatric Alaskan beluga populations can be compared.
Time Series of Images to Improve Tree Species Classification
NASA Astrophysics Data System (ADS)
Miyoshi, G. T.; Imai, N. N.; de Moraes, M. V. A.; Tommaselli, A. M. G.; Näsi, R.
2017-10-01
Tree species classification provides valuable information to forest monitoring and management. The high floristic variation of the tree species appears as a challenging issue in the tree species classification because the vegetation characteristics changes according to the season. To help to monitor this complex environment, the imaging spectroscopy has been largely applied since the development of miniaturized sensors attached to Unmanned Aerial Vehicles (UAV). Considering the seasonal changes in forests and the higher spectral and spatial resolution acquired with sensors attached to UAV, we present the use of time series of images to classify four tree species. The study area is an Atlantic Forest area located in the western part of São Paulo State. Images were acquired in August 2015 and August 2016, generating three data sets of images: only with the image spectra of 2015; only with the image spectra of 2016; with the layer stacking of images from 2015 and 2016. Four tree species were classified using Spectral angle mapper (SAM), Spectral information divergence (SID) and Random Forest (RF). The results showed that SAM and SID caused an overfitting of the data whereas RF showed better results and the use of the layer stacking improved the classification achieving a kappa coefficient of 18.26 %.
Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi
2013-02-01
The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.
Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T
2015-01-01
We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover,we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging [corrected].
Musmeci, Nicoló; Aste, Tomaso; Di Matteo, T.
2015-01-01
We quantify the amount of information filtered by different hierarchical clustering methods on correlations between stock returns comparing the clustering structure with the underlying industrial activity classification. We apply, for the first time to financial data, a novel hierarchical clustering approach, the Directed Bubble Hierarchical Tree and we compare it with other methods including the Linkage and k-medoids. By taking the industrial sector classification of stocks as a benchmark partition, we evaluate how the different methods retrieve this classification. The results show that the Directed Bubble Hierarchical Tree can outperform other methods, being able to retrieve more information with fewer clusters. Moreover, we show that the economic information is hidden at different levels of the hierarchical structures depending on the clustering method. The dynamical analysis on a rolling window also reveals that the different methods show different degrees of sensitivity to events affecting financial markets, like crises. These results can be of interest for all the applications of clustering methods to portfolio optimization and risk hedging. PMID:25786703
Chung, Sukhoon; Rhee, Hyunsill; Suh, Yongmoo
2010-01-01
Objectives This study sought to find answers to the following questions: 1) Can we predict whether a patient will revisit a healthcare center? 2) Can we anticipate diseases of patients who revisit the center? Methods For the first question, we applied 5 classification algorithms (decision tree, artificial neural network, logistic regression, Bayesian networks, and Naïve Bayes) and the stacking-bagging method for building classification models. To solve the second question, we performed sequential pattern analysis. Results We determined: 1) In general, the most influential variables which impact whether a patient of a public healthcare center will revisit it or not are personal burden, insurance bill, period of prescription, age, systolic pressure, name of disease, and postal code. 2) The best plain classification model is dependent on the dataset. 3) Based on average of classification accuracy, the proposed stacking-bagging method outperformed all traditional classification models and our sequential pattern analysis revealed 16 sequential patterns. Conclusions Classification models and sequential patterns can help public healthcare centers plan and implement healthcare service programs and businesses that are more appropriate to local residents, encouraging them to revisit public health centers. PMID:21818426
Architecture Analysis with AADL: The Speed Regulation Case-Study
2014-11-01
Overview Functional Hazard Analysis ( FHA ) Failures inventory with description, classification, etc. Fault-Tree Analysis (FTA) Dependencies between...University Pittsburgh, PA 15213 Julien Delange Report Documentation Page Form ApprovedOMB No. 0704-0188 Public reporting burden for the collection of...Information Operations and Reports , 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any
NASA Technical Reports Server (NTRS)
Buntine, Wray
1991-01-01
Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.
A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem
Liu, Dong-sheng; Fan, Shu-jiang
2014-01-01
In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389
NASA Astrophysics Data System (ADS)
To, Cuong; Pham, Tuan D.
2010-01-01
In machine learning, pattern recognition may be the most popular task. "Similar" patterns identification is also very important in biology because first, it is useful for prediction of patterns associated with disease, for example cancer tissue (normal or tumor); second, similarity or dissimilarity of the kinetic patterns is used to identify coordinately controlled genes or proteins involved in the same regulatory process. Third, similar genes (proteins) share similar functions. In this paper, we present an algorithm which uses genetic programming to create decision tree for binary classification problem. The application of the algorithm was implemented on five real biological databases. Base on the results of comparisons with well-known methods, we see that the algorithm is outstanding in most of cases.
Guo, Huey-Ming; Shyu, Yea-Ing Lotus; Chang, Her-Kun
2006-01-01
In this article, the authors provide an overview of a research method to predict quality of care in home health nursing data set. The results of this study can be visualized through classification an regression tree (CART) graphs. The analysis was more effective, and the results were more informative since the home health nursing dataset was analyzed with a combination of the logistic regression and CART, these two techniques complete each other. And the results more informative that more patients' characters were related to quality of care in home care. The results contributed to home health nurse predict patient outcome in case management. Improved prediction is needed for interventions to be appropriately targeted for improved patient outcome and quality of care.
NASA Astrophysics Data System (ADS)
Zhang, C.; Pan, X.; Zhang, S. Q.; Li, H. P.; Atkinson, P. M.
2017-09-01
Recent advances in remote sensing have witnessed a great amount of very high resolution (VHR) images acquired at sub-metre spatial resolution. These VHR remotely sensed data has post enormous challenges in processing, analysing and classifying them effectively due to the high spatial complexity and heterogeneity. Although many computer-aid classification methods that based on machine learning approaches have been developed over the past decades, most of them are developed toward pixel level spectral differentiation, e.g. Multi-Layer Perceptron (MLP), which are unable to exploit abundant spatial details within VHR images. This paper introduced a rough set model as a general framework to objectively characterize the uncertainty in CNN classification results, and further partition them into correctness and incorrectness on the map. The correct classification regions of CNN were trusted and maintained, whereas the misclassification areas were reclassified using a decision tree with both CNN and MLP. The effectiveness of the proposed rough set decision tree based MLP-CNN was tested using an urban area at Bournemouth, United Kingdom. The MLP-CNN, well capturing the complementarity between CNN and MLP through the rough set based decision tree, achieved the best classification performance both visually and numerically. Therefore, this research paves the way to achieve fully automatic and effective VHR image classification.
Systematization method for distinguishing plastic groups by using NIR spectroscopy.
Kaihara, Mikio; Satoh, Minami; Satoh, Minoru
2007-07-01
A systematic classification method for polymers is not yet available in case of using near infrared spectra (NIR). That is why we have been searching for a systematic method. Because raw NIR spectra usually have few obvious peaks, NIR spectra have been pretreated by 2nd derivation for taking well modulated spectra. After the pretreatment, we applied classification and regression trees (CART) to the discrimination between the spectra and the species of polymers. As a result, we obtained a relatively simple classification tree. Judging from the obtained splitting conditions and the classified polymers, we concluded that obtained knowledge on the chemical function groups estimated by the important wavelength regions is not always applicable to this classification tree. However, we clarified the splitting rules for polymer species from the NIR spectral point of view.
Analyses of rear-end crashes based on classification tree models.
Yan, Xuedong; Radwan, Essam
2006-09-01
Signalized intersections are accident-prone areas especially for rear-end crashes due to the fact that the diversity of the braking behaviors of drivers increases during the signal change. The objective of this article is to improve knowledge of the relationship between rear-end crashes occurring at signalized intersections and a series of potential traffic risk factors classified by driver characteristics, environments, and vehicle types. Based on the 2001 Florida crash database, the classification tree method and Quasi-induced exposure concept were used to perform the statistical analysis. Two binary classification tree models were developed in this study. One was used for the crash comparison between rear-end and non-rear-end to identify those specific trends of the rear-end crashes. The other was constructed for the comparison between striking vehicles/drivers (at-fault) and struck vehicles/drivers (not-at-fault) to find more complex crash pattern associated with the traffic attributes of driver, vehicle, and environment. The modeling results showed that the rear-end crashes are over-presented in the higher speed limits (45-55 mph); the rear-end crash propensity for daytime is apparently larger than nighttime; and the reduction of braking capacity due to wet and slippery road surface conditions would definitely contribute to rear-end crashes, especially at intersections with higher speed limits. The tree model segmented drivers into four homogeneous age groups: < 21 years, 21-31 years, 32-75 years, and > 75 years. The youngest driver group shows the largest crash propensity; in the 21-31 age group, the male drivers are over-involved in rear-end crashes under adverse weather conditions and the 32-75 years drivers driving large size vehicles have a larger crash propensity compared to those driving passenger vehicles. Combined with the quasi-induced exposure concept, the classification tree method is a proper statistical tool for traffic-safety analysis to investigate crash propensity. Compared to the logistic regression models, tree models have advantages for handling continuous independent variables and easily explaining the complex interaction effect with more than two independent variables. This research recommended that at signalized intersections with higher speed limits, reducing the speed limit to 40 mph efficiently contribute to a lower accident rate. Drivers involved in alcohol use may increase not only rear-end crash risk but also the driver injury severity. Education and enforcement countermeasures should focus on the driver group younger than 21 years. Further studies are suggested to compare crash risk distributions of the driver age for other main crash types to seek corresponding traffic countermeasures.
Garrity, Steven R.; Allen, Craig D.; Brumby, Steven P.; Gangodagamage, Chandana; McDowell, Nate G.; Cai, D. Michael
2013-01-01
Widespread tree mortality events have recently been observed in several biomes. To effectively quantify the severity and extent of these events, tools that allow for rapid assessment at the landscape scale are required. Past studies using high spatial resolution satellite imagery have primarily focused on detecting green, red, and gray tree canopies during and shortly after tree damage or mortality has occurred. However, detecting trees in various stages of death is not always possible due to limited availability of archived satellite imagery. Here we assess the capability of high spatial resolution satellite imagery for tree mortality detection in a southwestern U.S. mixed species woodland using archived satellite images acquired prior to mortality and well after dead trees had dropped their leaves. We developed a multistep classification approach that uses: supervised masking of non-tree image elements; bi-temporal (pre- and post-mortality) differencing of normalized difference vegetation index (NDVI) and red:green ratio (RGI); and unsupervised multivariate clustering of pixels into live and dead tree classes using a Gaussian mixture model. Classification accuracies were improved in a final step by tuning the rules of pixel classification using the posterior probabilities of class membership obtained from the Gaussian mixture model. Classifications were produced for two images acquired post-mortality with overall accuracies of 97.9% and 98.5%, respectively. Classified images were combined with land cover data to characterize the spatiotemporal characteristics of tree mortality across areas with differences in tree species composition. We found that 38% of tree crown area was lost during the drought period between 2002 and 2006. The majority of tree mortality during this period was concentrated in piñon-juniper (Pinus edulis-Juniperus monosperma) woodlands. An additional 20% of the tree canopy died or was removed between 2006 and 2011, primarily in areas experiencing wildfire and management activity. -Our results demonstrate that unsupervised clustering of bi-temporal NDVI and RGI differences can be used to detect tree mortality resulting from numerous causes and in several forest cover types.
Ramezankhani, Azra; Pournik, Omid; Shahrabi, Jamal; Khalili, Davood; Azizi, Fereidoun; Hadaegh, Farzad
2014-09-01
The aim of this study was to create a prediction model using data mining approach to identify low risk individuals for incidence of type 2 diabetes, using the Tehran Lipid and Glucose Study (TLGS) database. For a 6647 population without diabetes, aged ≥20 years, followed for 12 years, a prediction model was developed using classification by the decision tree technique. Seven hundred and twenty-nine (11%) diabetes cases occurred during the follow-up. Predictor variables were selected from demographic characteristics, smoking status, medical and drug history and laboratory measures. We developed the predictive models by decision tree using 60 input variables and one output variable. The overall classification accuracy was 90.5%, with 31.1% sensitivity, 97.9% specificity; and for the subjects without diabetes, precision and f-measure were 92% and 0.95, respectively. The identified variables included fasting plasma glucose, body mass index, triglycerides, mean arterial blood pressure, family history of diabetes, educational level and job status. In conclusion, decision tree analysis, using routine demographic, clinical, anthropometric and laboratory measurements, created a simple tool to predict individuals at low risk for type 2 diabetes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
2011-01-01
Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed overall classification accuracy above a median value of 0.63, but for most sensitivity was around or even lower than a median value of 0.5. Conclusions When taking into account sensitivity, specificity and overall classification accuracy Random Forests and Linear Discriminant analysis rank first among all the classifiers tested in prediction of dementia using several neuropsychological tests. These methods may be used to improve accuracy, sensitivity and specificity of Dementia predictions from neuropsychological testing. PMID:21849043
NASA Astrophysics Data System (ADS)
Navratil, Peter; Wilps, Hans
2013-01-01
Three different object-based image classification techniques are applied to high-resolution satellite data for the mapping of the habitats of Asian migratory locust (Locusta migratoria migratoria) in the southern Aral Sea basin, Uzbekistan. A set of panchromatic and multispectral Système Pour l'Observation de la Terre-5 satellite images was spectrally enhanced by normalized difference vegetation index and tasseled cap transformation and segmented into image objects, which were then classified by three different classification approaches: a rule-based hierarchical fuzzy threshold (HFT) classification method was compared to a supervised nearest neighbor classifier and classification tree analysis by the quick, unbiased, efficient statistical trees algorithm. Special emphasis was laid on the discrimination of locust feeding and breeding habitats due to the significance of this discrimination for practical locust control. Field data on vegetation and land cover, collected at the time of satellite image acquisition, was used to evaluate classification accuracy. The results show that a robust HFT classifier outperformed the two automated procedures by 13% overall accuracy. The classification method allowed a reliable discrimination of locust feeding and breeding habitats, which is of significant importance for the application of the resulting data for an economically and environmentally sound control of locust pests because exact spatial knowledge on the habitat types allows a more effective surveying and use of pesticides.
Asubonteng, Kwabena; Pfeffer, Karin; Ros-Tonen, Mirjam; Verbesselt, Jan; Baud, Isa
2018-05-11
Tree crops such as cocoa and oil palm are important to smallholders' livelihoods and national economies of tropical producer countries. Governments seek to expand tree-crop acreages and improve yields. Existing literature has analyzed socioeconomic and environmental effects of tree-crop expansion, but its spatial effects on the landscape are yet to be explored. This study aims to assess the effects of tree-crop farming on the composition and the extent of land-cover transitions in a mixed cocoa/oil palm landscape in Ghana. Land-cover maps of 1986 and 2015 produced through ISODATA, and maximum likelihood classification were validated with field reference, Google Earth data, and key respondent interviews. Post-classification change detection was conducted and the transition matrix analyzed using intensity analysis. Cocoa and oil palm areas have increased in extent by 8.9% and 11.2%, respectively, mainly at the expense of food-crop land and forest. The intensity of forest loss to both tree crops is at a lower intensity than the loss of food-crop land. There were transitions between cocoa and oil palm, but the gains in oil palm outweigh those of cocoa. Cocoa and oil palm have increased in area and dominance. The main cover types converted to tree-crop areas are food-crop land and off-reserve forest. This is beginning to have serious implications for food security and livelihood options that depend on ecosystem services provided by the mosaic landscape. Tree-crop policies should take account of the geographical distribution of tree-commodity production at landscape level and its implications for food production and ecosystems services.
NASA Astrophysics Data System (ADS)
Squiers, John J.; Li, Weizhi; King, Darlene R.; Mo, Weirong; Zhang, Xu; Lu, Yang; Sellke, Eric W.; Fan, Wensheng; DiMaio, J. Michael; Thatcher, Jeffrey E.
2016-03-01
The clinical judgment of expert burn surgeons is currently the standard on which diagnostic and therapeutic decisionmaking regarding burn injuries is based. Multispectral imaging (MSI) has the potential to increase the accuracy of burn depth assessment and the intraoperative identification of viable wound bed during surgical debridement of burn injuries. A highly accurate classification model must be developed using machine-learning techniques in order to translate MSI data into clinically-relevant information. An animal burn model was developed to build an MSI training database and to study the burn tissue classification ability of several models trained via common machine-learning algorithms. The algorithms tested, from least to most complex, were: K-nearest neighbors (KNN), decision tree (DT), linear discriminant analysis (LDA), weighted linear discriminant analysis (W-LDA), quadratic discriminant analysis (QDA), ensemble linear discriminant analysis (EN-LDA), ensemble K-nearest neighbors (EN-KNN), and ensemble decision tree (EN-DT). After the ground-truth database of six tissue types (healthy skin, wound bed, blood, hyperemia, partial injury, full injury) was generated by histopathological analysis, we used 10-fold cross validation to compare the algorithms' performances based on their accuracies in classifying data against the ground truth, and each algorithm was tested 100 times. The mean test accuracy of the algorithms were KNN 68.3%, DT 61.5%, LDA 70.5%, W-LDA 68.1%, QDA 68.9%, EN-LDA 56.8%, EN-KNN 49.7%, and EN-DT 36.5%. LDA had the highest test accuracy, reflecting the bias-variance tradeoff over the range of complexities inherent to the algorithms tested. Several algorithms were able to match the current standard in burn tissue classification, the clinical judgment of expert burn surgeons. These results will guide further development of an MSI burn tissue classification system. Given that there are few surgeons and facilities specializing in burn care, this technology may improve the standard of burn care for patients without access to specialized facilities.
NASA Astrophysics Data System (ADS)
Liu, Tao; Im, Jungho; Quackenbush, Lindi J.
2015-12-01
This study provides a novel approach to individual tree crown delineation (ITCD) using airborne Light Detection and Ranging (LiDAR) data in dense natural forests using two main steps: crown boundary refinement based on a proposed Fishing Net Dragging (FiND) method, and segment merging based on boundary classification. FiND starts with approximate tree crown boundaries derived using a traditional watershed method with Gaussian filtering and refines these boundaries using an algorithm that mimics how a fisherman drags a fishing net. Random forest machine learning is then used to classify boundary segments into two classes: boundaries between trees and boundaries between branches that belong to a single tree. Three groups of LiDAR-derived features-two from the pseudo waveform generated along with crown boundaries and one from a canopy height model (CHM)-were used in the classification. The proposed ITCD approach was tested using LiDAR data collected over a mountainous region in the Adirondack Park, NY, USA. Overall accuracy of boundary classification was 82.4%. Features derived from the CHM were generally more important in the classification than the features extracted from the pseudo waveform. A comprehensive accuracy assessment scheme for ITCD was also introduced by considering both area of crown overlap and crown centroids. Accuracy assessment using this new scheme shows the proposed ITCD achieved 74% and 78% as overall accuracy, respectively, for deciduous and mixed forest.
Classification tree analysis to enhance targeting for follow-up exam of colorectal cancer screening
2013-01-01
Background Follow-up rate after a fecal occult blood test (FOBT) is low worldwide. In order to increase the follow-up rate, segmentation of the target population has been proposed as a promising strategy, because an intervention can then be tailored toward specific subgroups of the population rather than using one type of intervention for all groups. The aim of this study is to identify subgroups that share the same patterns of characteristics related to follow-up exams after FOBT. Methods The study sample consisted of 143 patients aged 50–69 years who were requested to undergo follow-up exams after FOBT. A classification tree analysis was performed, using the follow-up rate as a dependent variable and sociodemographic variables, psychological variables, past FOBT and follow-up exam, family history of colorectal cancer (CRC), and history of bowel disease as predictive variables. Results The follow-up rate in 143 participants was 74.1% (n = 106). A classification tree analysis identified four subgroups as follows; (1) subgroup with a high degree of fear of CRC, unemployed and with a history of bowel disease (n = 24, 100.0% follow-up rate), (2) subgroup with a high degree of fear of CRC, unemployed and with no history of bowel disease (n = 17, 82.4% follow-up rate), (3) subgroup with a high degree of fear of CRC and employed (n = 24, 66.7% follow-up rate), and (4) subgroup with a low degree of fear of CRC (n = 78, 66.7% follow-up rate). Conclusion The identification of four subgroups with a diverse range of follow-up rates for CRC screening indicates the direction to take in future development of an effective tailored intervention strategy. PMID:24112563
Yu, F L; Ye, Y; Yan, Y S
2017-05-10
Objective: To find out the dietary patterns and explore the relationship between environmental factors (especially dietary patterns) and diabetes mellitus in the adults of Fujian. Methods: Multi-stage sampling method were used to survey residents aged ≥18 years by questionnaire, physical examination and laboratory detection in 10 disease surveillance points in Fujian. Factor analysis was used to identify the dietary patterns, while logistic regression model was applied to analyze relationship between dietary patterns and diabetes mellitus, and classification tree model was adopted to identify the influencing factors for diabetes mellitus. Results: There were four dietary patterns in the population, including meat, plant, high-quality protein, and fried food and beverages patterns. The result of logistic analysis showed that plant pattern, which has higher factor loading of fresh fruit-vegetables and cereal-tubers, was a protective factor for non-diabetes mellitus. The risk of diabetes mellitus in the population at T2 and T3 levels of factor score were 0.727 (95 %CI: 0.561-0.943) times and 0.736 (95 %CI : 0.573-0.944) times higher, respectively, than those whose factor score was in lowest quartile. Thirteen influencing factors and eleven group at high-risk for diabetes mellitus were identified by classification tree model. The influencing factors were dyslipidemia, age, family history of diabetes, hypertension, physical activity, career, sex, sedentary time, abdominal adiposity, BMI, marital status, sleep time and high-quality protein pattern. Conclusion: There is a close association between dietary patterns and diabetes mellitus. It is necessary to promote healthy and reasonable diet, strengthen the monitoring and control of blood lipids, blood pressure and body weight, and have good lifestyle for the prevention and control of diabetes mellitus.
Digital computer processing of peach orchard multispectral aerial photography
NASA Technical Reports Server (NTRS)
Atkinson, R. J.
1976-01-01
Several methods of analysis using digital computers applicable to digitized multispectral aerial photography, are described, with particular application to peach orchard test sites. This effort was stimulated by the recent premature death of peach trees in the Southeastern United States. The techniques discussed are: (1) correction of intensity variations by digital filtering, (2) automatic detection and enumeration of trees in five size categories, (3) determination of unhealthy foliage by infrared reflectances, and (4) four band multispectral classification into healthy and declining categories.
An information-based network approach for protein classification
Wan, Xiaogeng; Zhao, Xin; Yau, Stephen S. T.
2017-01-01
Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method. PMID:28350835
NASA Astrophysics Data System (ADS)
Weller, Andrew F.; Harris, Anthony J.; Ware, J. Andrew; Jarvis, Paul S.
2006-11-01
The classification of sedimentary organic matter (OM) images can be improved by determining the saliency of image analysis (IA) features measured from them. Knowing the saliency of IA feature measurements means that only the most significant discriminating features need be used in the classification process. This is an important consideration for classification techniques such as artificial neural networks (ANNs), where too many features can lead to the 'curse of dimensionality'. The classification scheme adopted in this work is a hybrid of morphologically and texturally descriptive features from previous manual classification schemes. Some of these descriptive features are assigned to IA features, along with several others built into the IA software (Halcon) to ensure that a valid cross-section is available. After an image is captured and segmented, a total of 194 features are measured for each particle. To reduce this number to a more manageable magnitude, the SPSS AnswerTree Exhaustive CHAID (χ 2 automatic interaction detector) classification tree algorithm is used to establish each measurement's saliency as a classification discriminator. In the case of continuous data as used here, the F-test is used as opposed to the published algorithm. The F-test checks various statistical hypotheses about the variance of groups of IA feature measurements obtained from the particles to be classified. The aim is to reduce the number of features required to perform the classification without reducing its accuracy. In the best-case scenario, 194 inputs are reduced to 8, with a subsequent multi-layer back-propagation ANN recognition rate of 98.65%. This paper demonstrates the ability of the algorithm to reduce noise, help overcome the curse of dimensionality, and facilitate an understanding of the saliency of IA features as discriminators for sedimentary OM classification.
Efforts are increasingly being made to classify the world’s wetland resources, an important ecosystem and habitat that is diminishing in abundance. There are multiple remote sensing classification methods, including a suite of nonparametric classifiers such as decision-tree...
Mapping Urban Tree Canopy Cover Using Fused Airborne LIDAR and Satellite Imagery Data
NASA Astrophysics Data System (ADS)
Parmehr, Ebadat G.; Amati, Marco; Fraser, Clive S.
2016-06-01
Urban green spaces, particularly urban trees, play a key role in enhancing the liveability of cities. The availability of accurate and up-to-date maps of tree canopy cover is important for sustainable development of urban green spaces. LiDAR point clouds are widely used for the mapping of buildings and trees, and several LiDAR point cloud classification techniques have been proposed for automatic mapping. However, the effectiveness of point cloud classification techniques for automated tree extraction from LiDAR data can be impacted to the point of failure by the complexity of tree canopy shapes in urban areas. Multispectral imagery, which provides complementary information to LiDAR data, can improve point cloud classification quality. This paper proposes a reliable method for the extraction of tree canopy cover from fused LiDAR point cloud and multispectral satellite imagery data. The proposed method initially associates each LiDAR point with spectral information from the co-registered satellite imagery data. It calculates the normalised difference vegetation index (NDVI) value for each LiDAR point and corrects tree points which have been misclassified as buildings. Then, region growing of tree points, taking the NDVI value into account, is applied. Finally, the LiDAR points classified as tree points are utilised to generate a canopy cover map. The performance of the proposed tree canopy cover mapping method is experimentally evaluated on a data set of airborne LiDAR and WorldView 2 imagery covering a suburb in Melbourne, Australia.
Species of the Mississippi River Headwaters Reservoirs Region.
1976-07-01
inventory and analysis system (ERIAS). _ 00, ’<*" 14175 JAM TO WJ COITION OF > MOV «S IS OBSOLETE UNCLASSIFIED SECURITY CLASSIFICATION1 OF THIS...COMMON) SUNFISH, GREEN (SUNFISH), PUMPKIN SEED SWALLOW, BANK SWALLOW, BARN SWALLOW, ROUGH-WINGED SWALLOW, TREE SWAN, WHISTLING SWIFT
Lajnef, Tarek; Chaibi, Sahbi; Ruby, Perrine; Aguera, Pierre-Emmanuel; Eichenlaub, Jean-Baptiste; Samet, Mounir; Kachouri, Abdennaceur; Jerbi, Karim
2015-07-30
Sleep staging is a critical step in a range of electrophysiological signal processing pipelines used in clinical routine as well as in sleep research. Although the results currently achievable with automatic sleep staging methods are promising, there is need for improvement, especially given the time-consuming and tedious nature of visual sleep scoring. Here we propose a sleep staging framework that consists of a multi-class support vector machine (SVM) classification based on a decision tree approach. The performance of the method was evaluated using polysomnographic data from 15 subjects (electroencephalogram (EEG), electrooculogram (EOG) and electromyogram (EMG) recordings). The decision tree, or dendrogram, was obtained using a hierarchical clustering technique and a wide range of time and frequency-domain features were extracted. Feature selection was carried out using forward sequential selection and classification was evaluated using k-fold cross-validation. The dendrogram-based SVM (DSVM) achieved mean specificity, sensitivity and overall accuracy of 0.92, 0.74 and 0.88 respectively, compared to expert visual scoring. Restricting DSVM classification to data where both experts' scoring was consistent (76.73% of the data) led to a mean specificity, sensitivity and overall accuracy of 0.94, 0.82 and 0.92 respectively. The DSVM framework outperforms classification with more standard multi-class "one-against-all" SVM and linear-discriminant analysis. The promising results of the proposed methodology suggest that it may be a valuable alternative to existing automatic methods and that it could accelerate visual scoring by providing a robust starting hypnogram that can be further fine-tuned by expert inspection. Copyright © 2015 Elsevier B.V. All rights reserved.
An Assessment of Worldview-2 Imagery for the Classification Of a Mixed Deciduous Forest
NASA Astrophysics Data System (ADS)
Carter, Nahid
Remote sensing provides a variety of methods for classifying forest communities and can be a valuable tool for the impact assessment of invasive species. The emerald ash borer (Agrilus planipennis) infestation of ash trees (Fraxinus) in the United States has resulted in the mortality of large stands of ash throughout the Northeast. This study assessed the suitability of multi-temporal Worldview-2 multispectral satellite imagery for classifying a mixed deciduous forest in Upstate New York. Training sites were collected using a Global Positioning System (GPS) receiver, with each training site consisting of a single tree of a corresponding class. Six classes were collected; Ash, Maple, Oak, Beech, Evergreen, and Other. Three different classifications were investigated on four data sets. A six class classification (6C), a two class classification consisting of ash and all other classes combined (2C), and a merging of the ash and maple classes for a five class classification (5C). The four data sets included Worldview-2 multispectral data collection from June 2010 (J-WV2) and September 2010 (S-WV2), a layer stacked data set using J-WV2 and S-WV2 (LS-WV2), and a reduced data set (RD-WV2). RD-WV2 was created using a statistical analysis of the processed and unprocessed imagery. Statistical analysis was used to reduce the dimensionality of the data and identify key bands to create a fourth data set (RD-WV2). Overall accuracy varied considerably depending upon the classification type, but results indicated that ash was confused with maple in a majority of the classifications. Ash was most accurately identified using the 2C classification and RD-WV2 data set (81.48%). A combination of the ash and maple classes yielded an accuracy of 89.41%. Future work should focus on separating the ash and maple classifiers by using data sources such as hyperspectral imagery, LiDAR, or extensive forest surveys.
Phylogenetic classification and the universal tree.
Doolittle, W F
1999-06-25
From comparative analyses of the nucleotide sequences of genes encoding ribosomal RNAs and several proteins, molecular phylogeneticists have constructed a "universal tree of life," taking it as the basis for a "natural" hierarchical classification of all living things. Although confidence in some of the tree's early branches has recently been shaken, new approaches could still resolve many methodological uncertainties. More challenging is evidence that most archaeal and bacterial genomes (and the inferred ancestral eukaryotic nuclear genome) contain genes from multiple sources. If "chimerism" or "lateral gene transfer" cannot be dismissed as trivial in extent or limited to special categories of genes, then no hierarchical universal classification can be taken as natural. Molecular phylogeneticists will have failed to find the "true tree," not because their methods are inadequate or because they have chosen the wrong genes, but because the history of life cannot properly be represented as a tree. However, taxonomies based on molecular sequences will remain indispensable, and understanding of the evolutionary process will ultimately be enriched, not impoverished.
Forest tree species clssification based on airborne hyper-spectral imagery
NASA Astrophysics Data System (ADS)
Dian, Yuanyong; Li, Zengyuan; Pang, Yong
2013-10-01
Forest precision classification products were the basic data for surveying of forest resource, updating forest subplot information, logging and design of forest. However, due to the diversity of stand structure, complexity of the forest growth environment, it's difficult to discriminate forest tree species using multi-spectral image. The airborne hyperspectral images can achieve the high spatial and spectral resolution imagery of forest canopy, so it will good for tree species level classification. The aim of this paper was to test the effective of combining spatial and spectral features in airborne hyper-spectral image classification. The CASI hyper spectral image data were acquired from Liangshui natural reserves area. Firstly, we use the MNF (minimum noise fraction) transform method for to reduce the hyperspectral image dimensionality and highlighting variation. And secondly, we use the grey level co-occurrence matrix (GLCM) to extract the texture features of forest tree canopy from the hyper-spectral image, and thirdly we fused the texture and the spectral features of forest canopy to classify the trees species using support vector machine (SVM) with different kernel functions. The results showed that when using the SVM classifier, MNF and texture-based features combined with linear kernel function can achieve the best overall accuracy which was 85.92%. It was also confirm that combine the spatial and spectral information can improve the accuracy of tree species classification.
NASA Astrophysics Data System (ADS)
Gibril, Mohamed Barakat A.; Idrees, Mohammed Oludare; Yao, Kouame; Shafri, Helmi Zulhaidi Mohd
2018-01-01
The growing use of optimization for geographic object-based image analysis and the possibility to derive a wide range of information about the image in textual form makes machine learning (data mining) a versatile tool for information extraction from multiple data sources. This paper presents application of data mining for land-cover classification by fusing SPOT-6, RADARSAT-2, and derived dataset. First, the images and other derived indices (normalized difference vegetation index, normalized difference water index, and soil adjusted vegetation index) were combined and subjected to segmentation process with optimal segmentation parameters obtained using combination of spatial and Taguchi statistical optimization. The image objects, which carry all the attributes of the input datasets, were extracted and related to the target land-cover classes through data mining algorithms (decision tree) for classification. To evaluate the performance, the result was compared with two nonparametric classifiers: support vector machine (SVM) and random forest (RF). Furthermore, the decision tree classification result was evaluated against six unoptimized trials segmented using arbitrary parameter combinations. The result shows that the optimized process produces better land-use land-cover classification with overall classification accuracy of 91.79%, 87.25%, and 88.69% for SVM and RF, respectively, while the results of the six unoptimized classifications yield overall accuracy between 84.44% and 88.08%. Higher accuracy of the optimized data mining classification approach compared to the unoptimized results indicates that the optimization process has significant impact on the classification quality.
The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis.
Koziol, James A; Feng, Anne C; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan
2009-01-01
Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors.
NASA Technical Reports Server (NTRS)
Rignot, Eric; Williams, Cynthia; Way, Jobea; Viereck, Leslie
1993-01-01
A maximum a posteriori Bayesian classifier for multifrequency polarimetric SAR data is used to perform a supervised classification of forest types in the floodplains of Alaska. The image classes include white spruce, balsam poplar, black spruce, alder, non-forests, and open water. The authors investigate the effect on classification accuracy of changing environmental conditions, and of frequency and polarization of the signal. The highest classification accuracy (86 percent correctly classified forest pixels, and 91 percent overall) is obtained combining L- and C-band frequencies fully polarimetric on a date where the forest is just recovering from flooding. The forest map compares favorably with a vegetation map assembled from digitized aerial photos which took five years for completion, and address the state of the forest in 1978, ignoring subsequent fires, changes in the course of the river, clear-cutting of trees, and tree growth. HV-polarization is the most useful polarization at L- and C-band for classification. C-band VV (ERS-1 mode) and L-band HH (J-ERS-1 mode) alone or combined yield unsatisfactory classification accuracies. Additional data acquired in the winter season during thawed and frozen days yield classification accuracies respectively 20 percent and 30 percent lower due to a greater confusion between conifers and deciduous trees. Data acquired at the peak of flooding in May 1991 also yield classification accuracies 10 percent lower because of dominant trunk-ground interactions which mask out finer differences in radar backscatter between tree species. Combination of several of these dates does not improve classification accuracy. For comparison, panchromatic optical data acquired by SPOT in the summer season of 1991 are used to classify the same area. The classification accuracy (78 percent for the forest types and 90 percent if open water is included) is lower than that obtained with AIRSAR although conifers and deciduous trees are better separated due to the presence of leaves on the deciduous trees. Optical data do not separate black spruce and white spruce as well as SAR data, cannot separate alder from balsam poplar, and are of course limited by the frequent cloud cover in the polar regions. Yet, combining SPOT and AIRSAR offers better chances to identify vegetation types independent of ground truth information using a combination of NDVI indexes from SPOT, biomass numbers from AIRSAR, and a segmentation map from either one.
NASA Astrophysics Data System (ADS)
Khan, Asif; Ryoo, Chang-Kyung; Kim, Heung Soo
2017-04-01
This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to model delamination at different interfaces along the thickness and longitudinal directions of the smart composite laminate. The input-output data obtained through surface bonded piezoelectric sensor and actuator is analyzed by the system identification algorithm to get the system parameters. The identified parameters for the healthy and delaminated structure are supplied as input data to the classification algorithms. The classification algorithms considered in this study are ZeroR, Classification via regression, Naïve Bayes, Multilayer Perceptron, Sequential Minimal Optimization, Multiclass-Classifier, and Decision tree (J48). The open source software of Waikato Environment for Knowledge Analysis (WEKA) is used to evaluate the classification performance of the classifiers mentioned above via 75-25 holdout and leave-one-sample-out cross-validation regarding classification accuracy, precision, recall, kappa statistic and ROC Area.
NASA Astrophysics Data System (ADS)
Kamal, Muhammad; Johansen, Kasper
2017-10-01
Effective mangrove management requires spatially explicit information of mangrove tree crown map as a basis for ecosystem diversity study and health assessment. Accuracy assessment is an integral part of any mapping activities to measure the effectiveness of the classification approach. In geographic object-based image analysis (GEOBIA) the assessment of the geometric accuracy (shape, symmetry and location) of the created image objects from image segmentation is required. In this study we used an explicit area-based accuracy assessment to measure the degree of similarity between the results of the classification and reference data from different aspects, including overall quality (OQ), user's accuracy (UA), producer's accuracy (PA) and overall accuracy (OA). We developed a rule set to delineate the mangrove tree crown using WorldView-2 pan-sharpened image. The reference map was obtained by visual delineation of the mangrove tree crowns boundaries form a very high-spatial resolution aerial photograph (7.5cm pixel size). Ten random points with a 10 m radius circular buffer were created to calculate the area-based accuracy assessment. The resulting circular polygons were used to clip both the classified image objects and reference map for area comparisons. In this case, the area-based accuracy assessment resulted 64% and 68% for the OQ and OA, respectively. The overall quality of the calculation results shows the class-related area accuracy; which is the area of correctly classified as tree crowns was 64% out of the total area of tree crowns. On the other hand, the overall accuracy of 68% was calculated as the percentage of all correctly classified classes (tree crowns and canopy gaps) in comparison to the total class area (an entire image). Overall, the area-based accuracy assessment was simple to implement and easy to interpret. It also shows explicitly the omission and commission error variations of object boundary delineation with colour coded polygons.
Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems
NASA Astrophysics Data System (ADS)
Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen
Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.
Online adaptive decision trees: pattern classification and function approximation.
Basak, Jayanta
2006-09-01
Recently we have shown that decision trees can be trained in the online adaptive (OADT) mode (Basak, 2004), leading to better generalization score. OADTs were bottlenecked by the fact that they are able to handle only two-class classification tasks with a given structure. In this article, we provide an architecture based on OADT, ExOADT, which can handle multiclass classification tasks and is able to perform function approximation. ExOADT is structurally similar to OADT extended with a regression layer. We also show that ExOADT is capable not only of adapting the local decision hyperplanes in the nonterminal nodes but also has the potential of smoothly changing the structure of the tree depending on the data samples. We provide the learning rules based on steepest gradient descent for the new model ExOADT. Experimentally we demonstrate the effectiveness of ExOADT in the pattern classification and function approximation tasks. Finally, we briefly discuss the relationship of ExOADT with other classification models.
Kajungu, Dan K; Selemani, Majige; Masanja, Irene; Baraka, Amuri; Njozi, Mustafa; Khatib, Rashid; Dodoo, Alexander N; Binka, Fred; Macq, Jean; D'Alessandro, Umberto; Speybroeck, Niko
2012-09-05
Drug prescription practices depend on several factors related to the patient, health worker and health facilities. A better understanding of the factors influencing prescription patterns is essential to develop strategies to mitigate the negative consequences associated with poor practices in both the public and private sectors. A cross-sectional study was conducted in rural Tanzania among patients attending health facilities, and health workers. Patients, health workers and health facilities-related factors with the potential to influence drug prescription patterns were used to build a model of key predictors. Standard data mining methodology of classification tree analysis was used to define the importance of the different factors on prescription patterns. This analysis included 1,470 patients and 71 health workers practicing in 30 health facilities. Patients were mostly treated in dispensaries. Twenty two variables were used to construct two classification tree models: one for polypharmacy (prescription of ≥3 drugs) on a single clinic visit and one for co-prescription of artemether-lumefantrine (AL) with antibiotics. The most important predictor of polypharmacy was the diagnosis of several illnesses. Polypharmacy was also associated with little or no supervision of the health workers, administration of AL and private facilities. Co-prescription of AL with antibiotics was more frequent in children under five years of age and the other important predictors were transmission season, mode of diagnosis and the location of the health facility. Standard data mining methodology is an easy-to-implement analytical approach that can be useful for decision-making. Polypharmacy is mainly due to the diagnosis of multiple illnesses.
Using Classification Trees to Predict Alumni Giving for Higher Education
ERIC Educational Resources Information Center
Weerts, David J.; Ronca, Justin M.
2009-01-01
As the relative level of public support for higher education declines, colleges and universities aim to maximize alumni-giving to keep their programs competitive. Anchored in a utility maximization framework, this study employs the classification and regression tree methodology to examine characteristics of alumni donors and non-donors at a…
David L. Sonderman; Robert L. Brisbin
1978-01-01
Forest managers have no objective way to determine the relative value of culturally treated forest stands in terms of product potential. This paper describes the first step in the development of a quality classification system based on the measurement of individual tree characteristics for young hardwood stands.
Zuo, Yu; Xie, Wenfang; Pang, Yue; Li, Tiesong; Li, Qingwei; Li, Yingying
2017-01-01
The composition of the bacterial communities in the hindgut contents of Lampetrs japonica was surveyed by Illumina MiSeq of the 16S rRNA gene. An average of 32385 optimized reads was obtained from three samples. The rarefaction curve based on the operational taxonomic units tended to approach the asymptote. The rank abundance curve representing the species richness and evenness was calculated. The composition of microbe in six classification levels was also analyzed. Top 20 members in genera level were displayed as the classification tree. The abundance of microorganisms in different individuals was displayed as the pie charts at the branch nodes in the classification tree. The differences of top 50 genera in abundance between individuals of lamprey are displayed as a heatmap. The pairwise comparison of bacterial taxa abundance revealed that there are no significant differences of gut microbiota between three individuals of lamprey at a given rarefied depth. Also, the gut microbiota derived from L. japonica displays little similarity with other aquatic organism of Vertebrata after UPGMA analysis. The metabolic function of the bacterial communities was predicted through KEGG analysis. This study represents the first analysis of the bacterial community composition in the gut content of L. japonica. The investigation of the gut microbiota associated with L. japonica will broaden our understanding of this unique organism.
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, P.; Beaudet, P.
1980-01-01
The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.
Using decision tree analysis to identify risk factors for relapse to smoking
Piper, Megan E.; Loh, Wei-Yin; Smith, Stevens S.; Japuntich, Sandra J.; Baker, Timothy B.
2010-01-01
This research used classification tree analysis and logistic regression models to identify risk factors related to short- and long-term abstinence. Baseline and cessation outcome data from two smoking cessation trials, conducted from 2001 to 2002, in two Midwestern urban areas, were analyzed. There were 928 participants (53.1% women, 81.8% white) with complete data. Both analyses suggest that relapse risk is produced by interactions of risk factors and that early and late cessation outcomes reflect different vulnerability factors. The results illustrate the dynamic nature of relapse risk and suggest the importance of efficient modeling of interactions in relapse prediction. PMID:20397871
Voice based gender classification using machine learning
NASA Astrophysics Data System (ADS)
Raahul, A.; Sapthagiri, R.; Pankaj, K.; Vijayarajan, V.
2017-11-01
Gender identification is one of the major problem speech analysis today. Tracing the gender from acoustic data i.e., pitch, median, frequency etc. Machine learning gives promising results for classification problem in all the research domains. There are several performance metrics to evaluate algorithms of an area. Our Comparative model algorithm for evaluating 5 different machine learning algorithms based on eight different metrics in gender classification from acoustic data. Agenda is to identify gender, with five different algorithms: Linear Discriminant Analysis (LDA), K-Nearest Neighbour (KNN), Classification and Regression Trees (CART), Random Forest (RF), and Support Vector Machine (SVM) on basis of eight different metrics. The main parameter in evaluating any algorithms is its performance. Misclassification rate must be less in classification problems, which says that the accuracy rate must be high. Location and gender of the person have become very crucial in economic markets in the form of AdSense. Here with this comparative model algorithm, we are trying to assess the different ML algorithms and find the best fit for gender classification of acoustic data.
ERIC Educational Resources Information Center
Snowden, Jessica A.; Leon, Scott C.; Bryant, Fred B.; Lyons, John S.
2007-01-01
This study explored clinical and nonclinical predictors of inpatient hospital admission decisions across a sample of children in foster care over 4 years (N = 13,245). Forty-eight percent of participants were female and the mean age was 13.4 (SD = 3.5 years). Optimal data analysis (Yarnold & Soltysik, 2005) was used to construct a nonlinear…
NASA Technical Reports Server (NTRS)
Lure, Y. M. Fleming; Grody, Norman C.; Chiou, Y. S. Peter; Yeh, H. Y. Michael
1993-01-01
A data fusion system with artificial neural networks (ANN) is used for fast and accurate classification of five earth surface conditions and surface changes, based on seven SSMI multichannel microwave satellite measurements. The measurements include brightness temperatures at 19, 22, 37, and 85 GHz at both H and V polarizations (only V at 22 GHz). The seven channel measurements are processed through a convolution computation such that all measurements are located at same grid. Five surface classes including non-scattering surface, precipitation over land, over ocean, snow, and desert are identified from ground-truth observations. The system processes sensory data in three consecutive phases: (1) pre-processing to extract feature vectors and enhance separability among detected classes; (2) preliminary classification of Earth surface patterns using two separate and parallely acting classifiers: back-propagation neural network and binary decision tree classifiers; and (3) data fusion of results from preliminary classifiers to obtain the optimal performance in overall classification. Both the binary decision tree classifier and the fusion processing centers are implemented by neural network architectures. The fusion system configuration is a hierarchical neural network architecture, in which each functional neural net will handle different processing phases in a pipelined fashion. There is a total of around 13,500 samples for this analysis, of which 4 percent are used as the training set and 96 percent as the testing set. After training, this classification system is able to bring up the detection accuracy to 94 percent compared with 88 percent for back-propagation artificial neural networks and 80 percent for binary decision tree classifiers. The neural network data fusion classification is currently under progress to be integrated in an image processing system at NOAA and to be implemented in a prototype of a massively parallel and dynamically reconfigurable Modular Neural Ring (MNR).
Keller, Thomas E.; Blakeslee, Jennifer E.; Lemon, Stephenie C.; Courtney, Mark E.
2010-01-01
Objective: Distinctive combinations of factors are likely to be associated with serious alcohol problems among adolescents about to emancipate from the foster care system and face the difficult transition to independent adulthood. This study identifies particular subpopulations of older foster youths that differ markedly in the probability of a lifetime diagnosis for alcohol abuse or dependence. Method: Classification and regression tree (CART) analysis was applied to a large, representative sample (N = 732) of individuals, 17 years of age or older, placed in the child welfare system for more than 1 year. CART evaluated two exploratory sets of variables for optimal splits into groups distinguished from each other on the criterion of lifetime alcohol-use disorder diagnosis. Results: Each classification tree yielded four terminal groups with different rates of lifetime alcohol-use disorder diagnosis. Notable groups in the first tree included one characterized by high levels of both delinquency and violence exposure (53% diagnosed) and another that featured lower delinquency but an independent-living placement (21% diagnosed). Notable groups in the second tree included African American adolescents (only 8% diagnosed), White adolescents not close to caregivers (40% diagnosed), and White adolescents closer to caregivers but with a history of psychological abuse (36% diagnosed). Conclusions: Analyses incorporating variables that could be comorbid with or symptomatic of alcohol problems, such as delinquency, yielded classifications potentially useful for assessment and service planning. Analyses without such variables identified other factors, such as quality of caregiving relationships and maltreatment, associated with serious alcohol problems, suggesting opportunities for prevention or intervention. PMID:20946738
NASA Astrophysics Data System (ADS)
Naidoo, L.; Cho, M. A.; Mathieu, R.; Asner, G.
2012-04-01
The accurate classification and mapping of individual trees at species level in the savanna ecosystem can provide numerous benefits for the managerial authorities. Such benefits include the mapping of economically useful tree species, which are a key source of food production and fuel wood for the local communities, and of problematic alien invasive and bush encroaching species, which can threaten the integrity of the environment and livelihoods of the local communities. Species level mapping is particularly challenging in African savannas which are complex, heterogeneous, and open environments with high intra-species spectral variability due to differences in geology, topography, rainfall, herbivory and human impacts within relatively short distances. Savanna vegetation are also highly irregular in canopy and crown shape, height and other structural dimensions with a combination of open grassland patches and dense woody thicket - a stark contrast to the more homogeneous forest vegetation. This study classified eight common savanna tree species in the Greater Kruger National Park region, South Africa, using a combination of hyperspectral and Light Detection and Ranging (LiDAR)-derived structural parameters, in the form of seven predictor datasets, in an automated Random Forest modelling approach. The most important predictors, which were found to play an important role in the different classification models and contributed to the success of the hybrid dataset model when combined, were species tree height; NDVI; the chlorophyll b wavelength (466 nm) and a selection of raw, continuum removed and Spectral Angle Mapper (SAM) bands. It was also concluded that the hybrid predictor dataset Random Forest model yielded the highest classification accuracy and prediction success for the eight savanna tree species with an overall classification accuracy of 87.68% and KHAT value of 0.843.
Kassian, Alexei
2015-01-01
A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies.
Kassian, Alexei
2015-01-01
A lexicostatistical classification is proposed for 20 languages and dialects of the Lezgian group of the North Caucasian family, based on meticulously compiled 110-item wordlists, published as part of the Global Lexicostatistical Database project. The lexical data have been subsequently analyzed with the aid of the principal phylogenetic methods, both distance-based and character-based: Starling neighbor joining (StarlingNJ), Neighbor joining (NJ), Unweighted pair group method with arithmetic mean (UPGMA), Bayesian Markov chain Monte Carlo (MCMC), Unweighted maximum parsimony (UMP). Cognation indexes within the input matrix were marked by two different algorithms: traditional etymological approach and phonetic similarity, i.e., the automatic method of consonant classes (Levenshtein distances). Due to certain reasons (first of all, high lexicographic quality of the wordlists and a consensus about the Lezgian phylogeny among Caucasologists), the Lezgian database is a perfect testing area for appraisal of phylogenetic methods. For the etymology-based input matrix, all the phylogenetic methods, with the possible exception of UMP, have yielded trees that are sufficiently compatible with each other to generate a consensus phylogenetic tree of the Lezgian lects. The obtained consensus tree agrees with the traditional expert classification as well as some of the previously proposed formal classifications of this linguistic group. Contrary to theoretical expectations, the UMP method has suggested the least plausible tree of all. In the case of the phonetic similarity-based input matrix, the distance-based methods (StarlingNJ, NJ, UPGMA) have produced the trees that are rather close to the consensus etymology-based tree and the traditional expert classification, whereas the character-based methods (Bayesian MCMC, UMP) have yielded less likely topologies. PMID:25719456
Probabilistic grammatical model for helix‐helix contact site classification
2013-01-01
Background Hidden Markov Models power many state‐of‐the‐art tools in the field of protein bioinformatics. While excelling in their tasks, these methods of protein analysis do not convey directly information on medium‐ and long‐range residue‐residue interactions. This requires an expressive power of at least context‐free grammars. However, application of more powerful grammar formalisms to protein analysis has been surprisingly limited. Results In this work, we present a probabilistic grammatical framework for problem‐specific protein languages and apply it to classification of transmembrane helix‐helix pairs configurations. The core of the model consists of a probabilistic context‐free grammar, automatically inferred by a genetic algorithm from only a generic set of expert‐based rules and positive training samples. The model was applied to produce sequence based descriptors of four classes of transmembrane helix‐helix contact site configurations. The highest performance of the classifiers reached AUCROC of 0.70. The analysis of grammar parse trees revealed the ability of representing structural features of helix‐helix contact sites. Conclusions We demonstrated that our probabilistic context‐free framework for analysis of protein sequences outperforms the state of the art in the task of helix‐helix contact site classification. However, this is achieved without necessarily requiring modeling long range dependencies between interacting residues. A significant feature of our approach is that grammar rules and parse trees are human‐readable. Thus they could provide biologically meaningful information for molecular biologists. PMID:24350601
2011-01-01
Background The avian family Cettiidae, including the genera Cettia, Urosphena, Tesia, Abroscopus and Tickellia and Orthotomus cucullatus, has recently been proposed based on analysis of a small number of loci and species. The close relationship of most of these taxa was unexpected, and called for a comprehensive study based on multiple loci and dense taxon sampling. In the present study, we infer the relationships of all except one of the species in this family using one mitochondrial and three nuclear loci. We use traditional gene tree methods (Bayesian inference, maximum likelihood bootstrapping, parsimony bootstrapping), as well as a recently developed Bayesian species tree approach (*BEAST) that accounts for lineage sorting processes that might produce discordance between gene trees. We also analyse mitochondrial DNA for a larger sample, comprising multiple individuals and a large number of subspecies of polytypic species. Results There are many topological incongruences among the single-locus trees, although none of these is strongly supported. The multi-locus tree inferred using concatenated sequences and the species tree agree well with each other, and are overall well resolved and well supported by the data. The main discrepancy between these trees concerns the most basal split. Both methods infer the genus Cettia to be highly non-monophyletic, as it is scattered across the entire family tree. Deep intraspecific divergences are revealed, and one or two species and one subspecies are inferred to be non-monophyletic (differences between methods). Conclusions The molecular phylogeny presented here is strongly inconsistent with the traditional, morphology-based classification. The remarkably high degree of non-monophyly in the genus Cettia is likely to be one of the most extraordinary examples of misconceived relationships in an avian genus. The phylogeny suggests instances of parallel evolution, as well as highly unequal rates of morphological divergence in different lineages. This complex morphological evolution apparently misled earlier taxonomists. These results underscore the well-known but still often neglected problem of basing classifications on overall morphological similarity. Based on the molecular data, a revised taxonomy is proposed. Although the traditional and species tree methods inferred much the same tree in the present study, the assumption by species tree methods that all species are monophyletic is a limitation in these methods, as some currently recognized species might have more complex histories. PMID:22142197
Joshi, Vinayak S; Reinhardt, Joseph M; Garvin, Mona K; Abramoff, Michael D
2014-01-01
The separation of the retinal vessel network into distinct arterial and venous vessel trees is of high interest. We propose an automated method for identification and separation of retinal vessel trees in a retinal color image by converting a vessel segmentation image into a vessel segment map and identifying the individual vessel trees by graph search. Orientation, width, and intensity of each vessel segment are utilized to find the optimal graph of vessel segments. The separated vessel trees are labeled as primary vessel or branches. We utilize the separated vessel trees for arterial-venous (AV) classification, based on the color properties of the vessels in each tree graph. We applied our approach to a dataset of 50 fundus images from 50 subjects. The proposed method resulted in an accuracy of 91.44% correctly classified vessel pixels as either artery or vein. The accuracy of correctly classified major vessel segments was 96.42%.
Decision tree methods: applications for classification and prediction.
Song, Yan-Yan; Lu, Ying
2015-04-25
Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
Classification of the Gabon SAR Mosaic Using a Wavelet Based Rule Classifier
NASA Technical Reports Server (NTRS)
Simard, Marc; Saatchi, Sasan; DeGrandi, Gianfranco
2000-01-01
A method is developed for semi-automated classification of SAR images of the tropical forest. Information is extracted using the wavelet transform (WT). The transform allows for extraction of structural information in the image as a function of scale. In order to classify the SAR image, a Desicion Tree Classifier is used. The method of pruning is used to optimize classification rate versus tree size. The results give explicit insight on the type of information useful for a given class.
Ozge, C; Toros, F; Bayramkaya, E; Camdeviren, H; Sasmaz, T
2006-08-01
The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Using in-class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed.
The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis
Koziol, James A.; Feng, Anne C.; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan
2009-01-01
Motivation: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Results: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors. Contact: dmercola@uci.edu PMID:18628288
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, R.; Beaudet, P.
1982-01-01
An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.
Chen, Hsiu-Chin; Bennett, Sean
2016-08-01
Little evidence shows the use of decision-tree algorithms in identifying predictors and analyzing their associations with pass rates for the NCLEX-RN(®) in associate degree nursing students. This longitudinal and retrospective cohort study investigated whether a decision-tree algorithm could be used to develop an accurate prediction model for the students' passing or failing the NCLEX-RN. This study used archived data from 453 associate degree nursing students in a selected program. The chi-squared automatic interaction detection analysis of the decision trees module was used to examine the effect of the collected predictors on passing/failing the NCLEX-RN. The actual percentage scores of Assessment Technologies Institute®'s RN Comprehensive Predictor(®) accurately identified students at risk of failing. The classification model correctly classified 92.7% of the students for passing. This study applied the decision-tree model to analyze a sequence database for developing a prediction model for early remediation in preparation for the NCLEXRN. [J Nurs Educ. 2016;55(8):454-457.]. Copyright 2016, SLACK Incorporated.
USDA-ARS?s Scientific Manuscript database
Risk factors for obesity and weight gain are typically evaluated individually while "adjusting for" the influence of other confounding factors, and few studies, if any, have created risk profiles by clustering risk factors. We identified subgroups of postmenopausal women homogeneous in their cluster...
Estimating forest characteristics using NAIP imagery and ArcObjects
John S Hogland; Nathaniel M. Anderson; Woodam Chung; Lucas Wells
2014-01-01
Detailed, accurate, efficient, and inexpensive methods of estimating basal area, trees, and aboveground biomass per acre across broad extents are needed to effectively manage forests. In this study we present such a methodology using readily available National Agriculture Imagery Program imagery, Forest Inventory Analysis samples, a two stage classification and...
Many units in public housing or other low-income urban dwellings may have elevated pesticide residues, given recurring infestation, but it would be logistically and economically infeasible to sample a large number of units to identify highly exposed households to design interven...
VanEngelsdorp, Dennis; Speybroeck, Niko; Evans, Jay D; Nguyen, Bach Kim; Mullin, Chris; Frazier, Maryann; Frazier, Jim; Cox-Foster, Diana; Chen, Yanping; Tarpy, David R; Haubruge, Eric; Pettis, Jeffrey S; Saegerman, Claude
2010-10-01
Colony collapse disorder (CCD), a syndrome whose defining trait is the rapid loss of adult worker honey bees, Apis mellifera L., is thought to be responsible for a minority of the large overwintering losses experienced by U.S. beekeepers since the winter 2006-2007. Using the same data set developed to perform a monofactorial analysis (PloS ONE 4: e6481, 2009), we conducted a classification and regression tree (CART) analysis in an attempt to better understand the relative importance and interrelations among different risk variables in explaining CCD. Fifty-five exploratory variables were used to construct two CART models: one model with and one model without a cost of misclassifying a CCD-diagnosed colony as a non-CCD colony. The resulting model tree that permitted for misclassification had a sensitivity and specificity of 85 and 74%, respectively. Although factors measuring colony stress (e.g., adult bee physiological measures, such as fluctuating asymmetry or mass of head) were important discriminating values, six of the 19 variables having the greatest discriminatory value were pesticide levels in different hive matrices. Notably, coumaphos levels in brood (a miticide commonly used by beekeepers) had the highest discriminatory value and were highest in control (healthy) colonies. Our CART analysis provides evidence that CCD is probably the result of several factors acting in concert, making afflicted colonies more susceptible to disease. This analysis highlights several areas that warrant further attention, including the effect of sublethal pesticide exposure on pathogen prevalence and the role of variability in bee tolerance to pesticides on colony survivorship.
Spectral analysis of white ash response to emerald ash borer infestations
NASA Astrophysics Data System (ADS)
Calandra, Laura
The emerald ash borer (EAB) (Agrilus planipennis Fairmaire) is an invasive insect that has killed over 50 million ash trees in the US. The goal of this research was to establish a method to identify ash trees infested with EAB using remote sensing techniques at the leaf-level and tree crown level. First, a field-based study at the leaf-level used the range of spectral bands from the WorldView-2 sensor to determine if there was a significant difference between EAB-infested white ash (Fraxinus americana) and healthy leaves. Binary logistic regression models were developed using individual and combinations of wavelengths; the most successful model included 545 and 950 nm bands. The second half of this research employed imagery to identify healthy and EAB-infested trees, comparing pixel- and object-based methods by applying an unsupervised classification approach and a tree crown delineation algorithm, respectively. The pixel-based models attained the highest overall accuracies.
An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.
Liang, Ying; Liao, Bo; Zhu, Wen
2017-01-01
Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.
NASA Technical Reports Server (NTRS)
Fomenkova, M. N.
1997-01-01
The computer-intensive project consisted of the analysis and synthesis of existing data on composition of comet Halley dust particles. The main objective was to obtain a complete inventory of sulfur containing compounds in the comet Halley dust by building upon the existing classification of organic and inorganic compounds and applying a variety of statistical techniques for cluster and cross-correlational analyses. A student hired for this project wrote and tested the software to perform cluster analysis. The following tasks were carried out: (1) selecting the data from existing database for the proposed project; (2) finding access to a standard library of statistical routines for cluster analysis; (3) reformatting the data as necessary for input into the library routines; (4) performing cluster analysis and constructing hierarchical cluster trees using three methods to define the proximity of clusters; (5) presenting the output results in different formats to facilitate the interpretation of the obtained cluster trees; (6) selecting groups of data points common for all three trees as stable clusters. We have also considered the chemistry of sulfur in inorganic compounds.
NASA Astrophysics Data System (ADS)
Khodasevich, M. A.; Sinitsyn, G. V.; Skorbanova, E. A.; Rogovaya, M. V.; Kambur, E. I.; Aseev, V. A.
2016-06-01
Analysis of multiparametric data on transmission spectra of 24 divins (Moldovan cognacs) in the 190-2600 nm range allows identification of outliers and their removal from a sample under study in the following consideration. The principal component analysis and classification tree with a single-rank predictor constructed in the 2D space of principal components allow classification of divin manufacturers. It is shown that the accuracy of syringaldehyde, ethyl acetate, vanillin, and gallic acid concentrations in divins calculated with the regression to latent structures depends on the sample volume and is 3, 6, 16, and 20%, respectively, which is acceptable for the application.
Ebolavirus Classification Based on Natural Vectors
Zheng, Hui; Yin, Changchuan; Hoang, Tung; He, Rong Lucy; Yang, Jie
2015-01-01
According to the WHO, ebolaviruses have resulted in 8818 human deaths in West Africa as of January 2015. To better understand the evolutionary relationship of the ebolaviruses and infer virulence from the relationship, we applied the alignment-free natural vector method to classify the newest ebolaviruses. The dataset includes three new Guinea viruses as well as 99 viruses from Sierra Leone. For the viruses of the family of Filoviridae, both genus label classification and species label classification achieve an accuracy rate of 100%. We represented the relationships among Filoviridae viruses by Unweighted Pair Group Method with Arithmetic Mean (UPGMA) phylogenetic trees and found that the filoviruses can be separated well by three genera. We performed the phylogenetic analysis on the relationship among different species of Ebolavirus by their coding-complete genomes and seven viral protein genes (glycoprotein [GP], nucleoprotein [NP], VP24, VP30, VP35, VP40, and RNA polymerase [L]). The topology of the phylogenetic tree by the viral protein VP24 shows consistency with the variations of virulence of ebolaviruses. The result suggests that VP24 be a pharmaceutical target for treating or preventing ebolaviruses. PMID:25803489
Du, Hua Qiang; Sun, Xiao Yan; Han, Ning; Mao, Fang Jie
2017-10-01
By synergistically using the object-based image analysis (OBIA) and the classification and regression tree (CART) methods, the distribution information, the indexes (including diameter at breast, tree height, and crown closure), and the aboveground carbon storage (AGC) of moso bamboo forest in Shanchuan Town, Anji County, Zhejiang Province were investigated. The results showed that the moso bamboo forest could be accurately delineated by integrating the multi-scale ima ge segmentation in OBIA technique and CART, which connected the image objects at various scales, with a pretty good producer's accuracy of 89.1%. The investigation of indexes estimated by regression tree model that was constructed based on the features extracted from the image objects reached normal or better accuracy, in which the crown closure model archived the best estimating accuracy of 67.9%. The estimating accuracy of diameter at breast and tree height was relatively low, which was consistent with conclusion that estimating diameter at breast and tree height using optical remote sensing could not achieve satisfactory results. Estimation of AGC reached relatively high accuracy, and accuracy of the region of high value achieved above 80%.
The application of data mining techniques to oral cancer prognosis.
Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan
2015-05-01
This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruan, D; Shao, W; Low, D
Purpose: To evaluate and test the hypothesis that plan quality may be systematically affected by treatment delivery techniques and target-tocritical structure geometric relationship in radiotherapy for brain tumor. Methods: Thirty-four consecutive brain tumor patients treated between 2011–2014 were analyzed. Among this cohort, 10 were planned with 3DCRT, 11 with RadipArc, and 13 with helical IMRT on TomoTherapy. The selected dosimetric endpoints (i.e., PTV V100, maximum brainstem/chiasm/ optic nerve doses) were considered as a vector in a highdimensional space. A Pareto analysis was performed to identify the subset of Pareto-efficient plans.The geometric relationships, specifically the overlapping volume and centroid-of-mass distance betweenmore » each critical structure to the PTV were extracted as potential geometric features. The classification-tree analyses were repeated using these geometric features with and without the treatment modality as an additional categorical predictor. In both scenarios, the dominant features to prognosticate the Pareto membership were identified and the tree structures to provide optimal inference were recorded. The classification performance was further analyzed to determine the role of treatment modality in affecting plan quality. Results: Seven Pareto-efficient plans were identified based on dosimetric endpoints (3 from 3DCRT, 3 from RapicArc, 1 from Tomo), which implies that the evaluated treatment modality may have a minor influence on plan quality. Classification trees with/without the treatment modality as a predictor both achieved accuracy of 88.2%: with 100% sensitivity and 87.1% specificity for the former, and 66.7% sensitivity and 96.0% specificity for the latter. The coincidence of accuracy from both analyses further indicates no-to-weak dependence of plan quality on treatment modality. Both analyses have identified the brainstem to PTV distance as the primary predictive feature for Pareto-efficiency. Conclusion: Pareto evaluation and classification-tree analyses have indicated that plan quality depends strongly on geometry for brain tumor, specifically PTV-tobrain-stem-distance but minimally on treatment modality.« less
Land use/cover classification in the Brazilian Amazon using satellite images.
Lu, Dengsheng; Batistella, Mateus; Li, Guiying; Moran, Emilio; Hetrick, Scott; Freitas, Corina da Costa; Dutra, Luciano Vieira; Sant'anna, Sidnei João Siqueira
2012-09-01
Land use/cover classification is one of the most important applications in remote sensing. However, mapping accurate land use/cover spatial distribution is a challenge, particularly in moist tropical regions, due to the complex biophysical environment and limitations of remote sensing data per se. This paper reviews experiments related to land use/cover classification in the Brazilian Amazon for a decade. Through comprehensive analysis of the classification results, it is concluded that spatial information inherent in remote sensing data plays an essential role in improving land use/cover classification. Incorporation of suitable textural images into multispectral bands and use of segmentation-based method are valuable ways to improve land use/cover classification, especially for high spatial resolution images. Data fusion of multi-resolution images within optical sensor data is vital for visual interpretation, but may not improve classification performance. In contrast, integration of optical and radar data did improve classification performance when the proper data fusion method was used. Of the classification algorithms available, the maximum likelihood classifier is still an important method for providing reasonably good accuracy, but nonparametric algorithms, such as classification tree analysis, has the potential to provide better results. However, they often require more time to achieve parametric optimization. Proper use of hierarchical-based methods is fundamental for developing accurate land use/cover classification, mainly from historical remotely sensed data.
Land use/cover classification in the Brazilian Amazon using satellite images
Lu, Dengsheng; Batistella, Mateus; Li, Guiying; Moran, Emilio; Hetrick, Scott; Freitas, Corina da Costa; Dutra, Luciano Vieira; Sant’Anna, Sidnei João Siqueira
2013-01-01
Land use/cover classification is one of the most important applications in remote sensing. However, mapping accurate land use/cover spatial distribution is a challenge, particularly in moist tropical regions, due to the complex biophysical environment and limitations of remote sensing data per se. This paper reviews experiments related to land use/cover classification in the Brazilian Amazon for a decade. Through comprehensive analysis of the classification results, it is concluded that spatial information inherent in remote sensing data plays an essential role in improving land use/cover classification. Incorporation of suitable textural images into multispectral bands and use of segmentation-based method are valuable ways to improve land use/cover classification, especially for high spatial resolution images. Data fusion of multi-resolution images within optical sensor data is vital for visual interpretation, but may not improve classification performance. In contrast, integration of optical and radar data did improve classification performance when the proper data fusion method was used. Of the classification algorithms available, the maximum likelihood classifier is still an important method for providing reasonably good accuracy, but nonparametric algorithms, such as classification tree analysis, has the potential to provide better results. However, they often require more time to achieve parametric optimization. Proper use of hierarchical-based methods is fundamental for developing accurate land use/cover classification, mainly from historical remotely sensed data. PMID:24353353
Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram
2015-08-01
In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.
A fuzzy decision tree for fault classification.
Zio, Enrico; Baraldi, Piero; Popescu, Irina C
2008-02-01
In plant accident management, the control room operators are required to identify the causes of the accident, based on the different patterns of evolution of the monitored process variables thereby developing. This task is often quite challenging, given the large number of process parameters monitored and the intense emotional states under which it is performed. To aid the operators, various techniques of fault classification have been engineered. An important requirement for their practical application is the physical interpretability of the relationships among the process variables underpinning the fault classification. In this view, the present work propounds a fuzzy approach to fault classification, which relies on fuzzy if-then rules inferred from the clustering of available preclassified signal data, which are then organized in a logical and transparent decision tree structure. The advantages offered by the proposed approach are precisely that a transparent fault classification model is mined out of the signal data and that the underlying physical relationships among the process variables are easily interpretable as linguistic if-then rules that can be explicitly visualized in the decision tree structure. The approach is applied to a case study regarding the classification of simulated faults in the feedwater system of a boiling water reactor.
NASA Astrophysics Data System (ADS)
Ko, C.; Sohn, G.; Remmel, T. K.
2012-07-01
We present a comparative study between two different approaches for tree genera classification using descriptors derived from tree geometry and those derived from the vertical profile analysis of LiDAR point data. The different methods provide two perspectives for processing LiDAR point clouds for tree genera identification. The geometric perspective analyzes individual tree crowns in relation to valuable information related to characteristics of clusters and line segments derived within crowns and overall tree shapes to highlight the spatial distribution of LiDAR points within the crown. Conversely, analyzing vertical profiles retrieves information about the point distributions with respect to height percentiles; this perspective emphasizes of the importance that point distributions at specific heights express, accommodating for the decreased point density with respect to depth of canopy penetration by LiDAR pulses. The targeted species include white birch, maple, oak, poplar, white pine and jack pine at a study site northeast of Sault Ste. Marie, Ontario, Canada.
Conterminous U.S. and Alaska Forest Type Mapping Using Forest Inventory and Analysis Data
B. Ruefenacht; M.V. Finco; M.D. Nelson; R. Czaplewski; E.H. Helmer; J. A. Blackard; G.R. Holden; A.J. Lister; D. Salajanu; D. Weyermann; K. Winterberger
2008-01-01
Classification-trees were used to model forest type groups and forest types for the conterminous United States and Alaska. The predictor data were a geospatial data set with a spatial resolution of 250 m developed by the U.S. Department of Agriculture Forest Service (USFS). The response data were plot data from the USFS Forest Inventory and Analysis program. Overall...
Study on Ecological Risk Assessment of Guangxi Coastal Zone Based on 3s Technology
NASA Astrophysics Data System (ADS)
Zhong, Z.; Luo, H.; Ling, Z. Y.; Huang, Y.; Ning, W. Y.; Tang, Y. B.; Shao, G. Z.
2018-05-01
This paper takes Guangxi coastal zone as the study area, following the standards of land use type, divides the coastal zone of ecological landscape into seven kinds of natural wetland landscape types such as woodland, farmland, grassland, water, urban land and wetlands. Using TM data of 2000-2015 such 15 years, with the CART decision tree algorithm, for analysis the characteristic of types of landscape's remote sensing image and build decision tree rules of landscape classification to extract information classification. Analyzing of the evolution process of the landscape pattern in Guangxi coastal zone in nearly 15 years, we may understand the distribution characteristics and change rules. Combined with the natural disaster data, we use of landscape index and the related risk interference degree and construct ecological risk evaluation model in Guangxi coastal zone for ecological risk assessment results of Guangxi coastal zone.
Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.
2006-01-01
As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.
Comparative Issues and Methods in Organizational Diagnosis. Report II. The Decision Tree Approach.
organizational diagnosis . The advantages and disadvantages of the decision-tree approach generally, and in this study specifically, are examined. A pre-test, using a civilian sample of 174 work groups with Survey of Organizations data, was conducted to assess various decision-tree classification criteria, in terms of their similarity to the distance function used by Bowers and Hausser (1977). The results suggested the use of a large developmental sample, which should result in more distinctly defined boundary lines between classification profiles. Also, the decision matrix
Fu, Chen; Zhang, Nevin Lianwen; Chen, Bao-Xin; Chen, Zhou Rong; Jin, Xiang Lan; Guo, Rong-Juan; Chen, Zhi-Gang; Zhang, Yun-Ling
2017-05-01
To treat patients with vascular mild cognitive impairment (VMCI) using traditional Chinese medicine (TCM), it is necessary to classify the patients into TCM syndrome types and to apply different treatments to different types. In this paper, we investigate how to properly carry out the classification for patients with VMCI aged 50 or above using a novel data-driven method known as latent tree analysis (LTA). A cross-sectional survey on VMCI was carried out in several regions in Northern China between February 2008 and February 2012 which resulted in a data set that involves 803 patients and 93 symptoms. LTA was performed on the data to reveal symptom co-occurrence patterns, and the patients were partitioned into clusters in multiple ways based on the patterns. The patient clusters were matched up with syndrome types, and population statistics of the clusters are used to quantify the syndrome types and to establish classification rules. Eight syndrome types are identified: Qi deficiency, Qi stagnation, Blood deficiency, Blood stasis, Phlegm-dampness, Fire-heat, Yang deficiency, and Yin deficiency. The prevalence and symptom occurrence characteristics of each syndrome type are determined. Quantitative classification rules are established for determining whether a patient belongs to each of the syndrome types. A solution for the TCM syndrome classification problem for patients with VMCI and aged 50 or above is established based on the LTA of unlabeled symptom survey data. The results can be used as a reference in clinic practice to improve the quality of syndrome differentiation and to reduce diagnosis variances across physicians. They can also be used for patient selection in research projects aimed at finding biomarkers for the syndrome types and in randomized control trials aimed at determining the efficacy of TCM treatments of VMCI.
Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery.
Li, Guiying; Lu, Dengsheng; Moran, Emilio; Hetrick, Scott
2011-01-01
This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms - maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based classification (OBC), were explored. The results indicated that a combination of vegetation indices as extra bands into Landsat TM multispectral bands did not improve the overall classification performance, but the combination of textural images was valuable for improving vegetation classification accuracy. In particular, the combination of both vegetation indices and textural images into TM multispectral bands improved overall classification accuracy by 5.6% and kappa coefficient by 6.25%. Comparison of the different classification algorithms indicated that CTA and ANN have poor classification performance in this research, but OBC improved primary forest and pasture classification accuracies. This research indicates that use of textural images or use of OBC are especially valuable for improving the vegetation classes such as upland and liana forest classes having complex stand structures and having relatively large patch sizes.
Land-cover classification in a moist tropical region of Brazil with Landsat TM imagery
LI, GUIYING; LU, DENGSHENG; MORAN, EMILIO; HETRICK, SCOTT
2011-01-01
This research aims to improve land-cover classification accuracy in a moist tropical region in Brazil by examining the use of different remote sensing-derived variables and classification algorithms. Different scenarios based on Landsat Thematic Mapper (TM) spectral data and derived vegetation indices and textural images, and different classification algorithms – maximum likelihood classification (MLC), artificial neural network (ANN), classification tree analysis (CTA), and object-based classification (OBC), were explored. The results indicated that a combination of vegetation indices as extra bands into Landsat TM multispectral bands did not improve the overall classification performance, but the combination of textural images was valuable for improving vegetation classification accuracy. In particular, the combination of both vegetation indices and textural images into TM multispectral bands improved overall classification accuracy by 5.6% and kappa coefficient by 6.25%. Comparison of the different classification algorithms indicated that CTA and ANN have poor classification performance in this research, but OBC improved primary forest and pasture classification accuracies. This research indicates that use of textural images or use of OBC are especially valuable for improving the vegetation classes such as upland and liana forest classes having complex stand structures and having relatively large patch sizes. PMID:22368311
Ensemble methods with simple features for document zone classification
NASA Astrophysics Data System (ADS)
Obafemi-Ajayi, Tayo; Agam, Gady; Xie, Bingqing
2012-01-01
Document layout analysis is of fundamental importance for document image understanding and information retrieval. It requires the identification of blocks extracted from a document image via features extraction and block classification. In this paper, we focus on the classification of the extracted blocks into five classes: text (machine printed), handwriting, graphics, images, and noise. We propose a new set of features for efficient classifications of these blocks. We present a comparative evaluation of three ensemble based classification algorithms (boosting, bagging, and combined model trees) in addition to other known learning algorithms. Experimental results are demonstrated for a set of 36503 zones extracted from 416 document images which were randomly selected from the tobacco legacy document collection. The results obtained verify the robustness and effectiveness of the proposed set of features in comparison to the commonly used Ocropus recognition features. When used in conjunction with the Ocropus feature set, we further improve the performance of the block classification system to obtain a classification accuracy of 99.21%.
Edwards, T.C.; Cutler, D.R.; Zimmermann, N.E.; Geiser, L.; Moisen, Gretchen G.
2006-01-01
We evaluated the effects of probabilistic (hereafter DESIGN) and non-probabilistic (PURPOSIVE) sample surveys on resultant classification tree models for predicting the presence of four lichen species in the Pacific Northwest, USA. Models derived from both survey forms were assessed using an independent data set (EVALUATION). Measures of accuracy as gauged by resubstitution rates were similar for each lichen species irrespective of the underlying sample survey form. Cross-validation estimates of prediction accuracies were lower than resubstitution accuracies for all species and both design types, and in all cases were closer to the true prediction accuracies based on the EVALUATION data set. We argue that greater emphasis should be placed on calculating and reporting cross-validation accuracy rates rather than simple resubstitution accuracy rates. Evaluation of the DESIGN and PURPOSIVE tree models on the EVALUATION data set shows significantly lower prediction accuracy for the PURPOSIVE tree models relative to the DESIGN models, indicating that non-probabilistic sample surveys may generate models with limited predictive capability. These differences were consistent across all four lichen species, with 11 of the 12 possible species and sample survey type comparisons having significantly lower accuracy rates. Some differences in accuracy were as large as 50%. The classification tree structures also differed considerably both among and within the modelled species, depending on the sample survey form. Overlap in the predictor variables selected by the DESIGN and PURPOSIVE tree models ranged from only 20% to 38%, indicating the classification trees fit the two evaluated survey forms on different sets of predictor variables. The magnitude of these differences in predictor variables throws doubt on ecological interpretation derived from prediction models based on non-probabilistic sample surveys. ?? 2006 Elsevier B.V. All rights reserved.
Predicted seafloor facies of Central Santa Monica Bay, California
Dartnell, Peter; Gardner, James V.
2004-01-01
Summary -- Mapping surficial seafloor facies (sand, silt, muddy sand, rock, etc.) should be the first step in marine geological studies and is crucial when modeling sediment processes, pollution transport, deciphering tectonics, and defining benthic habitats. This report outlines an empirical technique that predicts the distribution of seafloor facies for a large area offshore Los Angeles, CA using high-resolution bathymetry and co-registered, calibrated backscatter from multibeam echosounders (MBES) correlated to ground-truth sediment samples. The technique uses a series of procedures that involve supervised classification and a hierarchical decision tree classification that are now available in advanced image-analysis software packages. Derivative variance images of both bathymetry and acoustic backscatter are calculated from the MBES data and then used in a hierarchical decision-tree framework to classify the MBES data into areas of rock, gravelly muddy sand, muddy sand, and mud. A quantitative accuracy assessment on the classification results is performed using ground-truth sediment samples. The predicted facies map is also ground-truthed using seafloor photographs and high-resolution sub-bottom seismic-reflection profiles. This Open-File Report contains the predicted seafloor facies map as a georeferenced TIFF image along with the multibeam bathymetry and acoustic backscatter data used in the study as well as an explanation of the empirical classification process.
A universal hybrid decision tree classifier design for human activity classification.
Chien, Chieh; Pottie, Gregory J
2012-01-01
A system that reliably classifies daily life activities can contribute to more effective and economical treatments for patients with chronic conditions or undergoing rehabilitative therapy. We propose a universal hybrid decision tree classifier for this purpose. The tree classifier can flexibly implement different decision rules at its internal nodes, and can be adapted from a population-based model when supplemented by training data for individuals. The system was tested using seven subjects each monitored by 14 triaxial accelerometers. Each subject performed fourteen different activities typical of daily life. Using leave-one-out cross validation, our decision tree produced average classification accuracies of 89.9%. In contrast, the MATLAB personalized tree classifiers using Gini's diversity index as the split criterion followed by optimally tuning the thresholds for each subject yielded 69.2%.
Thomas C. Edwards; D. Richard Cutler; Niklaus E. Zimmermann; Linda Geiser; Gretchen G. Moisen
2006-01-01
We evaluated the effects of probabilistic (hereafter DESIGN) and non-probabilistic (PURPOSIVE) sample surveys on resultant classification tree models for predicting the presence of four lichen species in the Pacific Northwest, USA. Models derived from both survey forms were assessed using an independent data set (EVALUATION). Measures of accuracy as gauged by...
A Theoretical Analysis of Why Hybrid Ensembles Work.
Hsu, Kuo-Wei
2017-01-01
Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles.
Geometric subspace methods and time-delay embedding for EEG artifact removal and classification.
Anderson, Charles W; Knight, James N; O'Connor, Tim; Kirby, Michael J; Sokolov, Artem
2006-06-01
Generalized singular-value decomposition is used to separate multichannel electroencephalogram (EEG) into components found by optimizing a signal-to-noise quotient. These components are used to filter out artifacts. Short-time principal components analysis of time-delay embedded EEG is used to represent windowed EEG data to classify EEG according to which mental task is being performed. Examples are presented of the filtering of various artifacts and results are shown of classification of EEG from five mental tasks using committees of decision trees.
Classifying smoking urges via machine learning
Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin
2016-01-01
Background and objective Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. Methods To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. Results The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. Conclusions In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms’ performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. PMID:28110725
Classifying smoking urges via machine learning.
Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin
2016-12-01
Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Organic cattle products: Authenticating production origin by analysis of serum mineral content.
Rodríguez-Bermúdez, Ruth; Herrero-Latorre, Carlos; López-Alonso, Marta; Losada, David E; Iglesias, Roberto; Miranda, Marta
2018-10-30
An authentication procedure for differentiating between organic and non-organic cattle production on the basis of analysis of serum samples has been developed. For this purpose, the concentrations of fourteen mineral elements (As, Cd, Co, Cr, Cu, Fe, Hg, I, Mn, Mo, Ni, Pb, Se and Zn) in 522 serum samples from cows (341 from organic farms and 181 from non-organic farms), determined by inductively coupled plasma spectrometry, were used. The chemical information provided by serum analysis was employed to construct different pattern recognition classification models that predict the origin of each sample: organic or non-organic class. Among all classification procedures considered, the best results were obtained with the decision tree C5.0, Random Forest and AdaBoost neural networks, with hit levels close to 90% for both production types. The proposed method, involving analysis of serum samples, provided rapid, accurate in vivo classification of cattle according to organic and non-organic production type. Copyright © 2018 Elsevier Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Sudden losses of managed honey bee (Apis mellifera L.) colonies are considered an important problem worldwide but the underlying cause or causes of these losses are currently unknown. In the United States, this syndrome was termed Colony Collapse Disorder (CCD), since the defining trait was a rapid ...
Remote sensing based approach for monitoring urban growth in Mexico city, Mexico: A case study
NASA Astrophysics Data System (ADS)
Obade, Vincent
The world is experiencing a rapid rate of urban expansion, largely contributed by the population growth. Other factors supporting urban growth include the improved efficiency in the transportation sector and increasing dependence on cars as a means of transport. The problems attributed to the urban growth include: depletion of energy resources, water and air pollution; loss of landscapes and wildlife, loss of agricultural land, inadequate social security and lack of employment or underemployment. Aerial photography is one of the popular techniques for analyzing, planning and minimizing urbanization related problems. However, with the advances in space technology, satellite remote sensing is increasingly being utilized in the analysis and planning of the urban environment. This article outlines the strengths and limitations of potential remote sensing techniques for monitoring urban growth. The selected methods include: Principal component analysis, Maximum likelihood classification and "decision tree". The results indicate that the "classification tree" approach is the most promising for monitoring urban change, given the improved accuracy and smooth transition between the various land cover classes
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-01-01
Summary Objective Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this Review was to assess machine learning alternatives to logistic regression which may accomplish the same goals but with fewer assumptions or greater accuracy. Study Design and Setting We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. Results We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (CART), and meta-classifiers (in particular, boosting). Conclusion While the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and to a lesser extent decision trees (particularly CART) appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. PMID:20630332
NASA Astrophysics Data System (ADS)
Dechesne, Clément; Mallet, Clément; Le Bris, Arnaud; Gouet-Brunet, Valérie
2017-04-01
Forest stands are the basic units for forest inventory and mapping. Stands are defined as large forested areas (e.g., ⩾ 2 ha) of homogeneous tree species composition and age. Their accurate delineation is usually performed by human operators through visual analysis of very high resolution (VHR) infra-red images. This task is tedious, highly time consuming, and should be automated for scalability and efficient updating purposes. In this paper, a method based on the fusion of airborne lidar data and VHR multispectral images is proposed for the automatic delineation of forest stands containing one dominant species (purity superior to 75%). This is the key preliminary task for forest land-cover database update. The multispectral images give information about the tree species whereas 3D lidar point clouds provide geometric information on the trees and allow their individual extraction. Multi-modal features are computed, both at pixel and object levels: the objects are individual trees extracted from lidar data. A supervised classification is then performed at the object level in order to coarsely discriminate the existing tree species in each area of interest. The classification results are further processed to obtain homogeneous areas with smooth borders by employing an energy minimum framework, where additional constraints are joined to form the energy function. The experimental results show that the proposed method provides very satisfactory results both in terms of stand labeling and delineation (overall accuracy ranges between 84 % and 99 %).
High-resolution tree canopy mapping for New York City using LIDAR and object-based image analysis
NASA Astrophysics Data System (ADS)
MacFaden, Sean W.; O'Neil-Dunne, Jarlath P. M.; Royar, Anna R.; Lu, Jacqueline W. T.; Rundle, Andrew G.
2012-01-01
Urban tree canopy is widely believed to have myriad environmental, social, and human-health benefits, but a lack of precise canopy estimates has hindered quantification of these benefits in many municipalities. This problem was addressed for New York City using object-based image analysis (OBIA) to develop a comprehensive land-cover map, including tree canopy to the scale of individual trees. Mapping was performed using a rule-based expert system that relied primarily on high-resolution LIDAR, specifically its capacity for evaluating the height and texture of aboveground features. Multispectral imagery was also used, but shadowing and varying temporal conditions limited its utility. Contextual analysis was a key part of classification, distinguishing trees according to their physical and spectral properties as well as their relationships to adjacent, nonvegetated features. The automated product was extensively reviewed and edited via manual interpretation, and overall per-pixel accuracy of the final map was 96%. Although manual editing had only a marginal effect on accuracy despite requiring a majority of project effort, it maximized aesthetic quality and ensured the capture of small, isolated trees. Converting high-resolution LIDAR and imagery into usable information is a nontrivial exercise, requiring significant processing time and labor, but an expert system-based combination of OBIA and manual review was an effective method for fine-scale canopy mapping in a complex urban environment.
Generation of 2D Land Cover Maps for Urban Areas Using Decision Tree Classification
NASA Astrophysics Data System (ADS)
Höhle, J.
2014-09-01
A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects like buildings, roads, grassland, trees, hedges, and walls from such an "intelligent" point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using image analysis techniques. The proposed methodology is described step by step. The classification, assessment, and refinement is carried out by the open source software "R"; the generation of the dense and accurate digital surface model by the "Match-T DSM" program of the Trimble Company. A practical example of a 2D land cover map generation is carried out. Images of a multispectral medium-format aerial camera covering an urban area in Switzerland are used. The assessment of the produced land cover map is based on class-wise stratified sampling where reference values of samples are determined by means of stereo-observations of false-colour stereopairs. The stratified statistical assessment of the produced land cover map with six classes and based on 91 points per class reveals a high thematic accuracy for classes "building" (99 %, 95 % CI: 95 %-100 %) and "road and parking lot" (90 %, 95 % CI: 83 %-95 %). Some other accuracy measures (overall accuracy, kappa value) and their 95 % confidence intervals are derived as well. The proposed methodology has a high potential for automation and fast processing and may be applied to other scenes and sensors.
Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann
2003-01-01
Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.
Classification Algorithms for Big Data Analysis, a Map Reduce Approach
NASA Astrophysics Data System (ADS)
Ayma, V. A.; Ferreira, R. S.; Happ, P.; Oliveira, D.; Feitosa, R.; Costa, G.; Plaza, A.; Gamba, P.
2015-03-01
Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.
Predicting Tillage Patterns in the Tiffin River Watershed Using Remote Sensing Methods
NASA Astrophysics Data System (ADS)
Brooks, C.; McCarty, J. L.; Dean, D. B.; Mann, B. F.
2012-12-01
Previous research in tillage mapping has focused primarily on utilizing low to no-cost, moderate (30 m to 15 m) resolution satellite data. Successful data processing techniques published in the scientific literature have focused on extracting and/or classifying tillage patterns through manipulation of spectral bands. For instance, Daughtry et al. (2005) evaluated several spectral indices for crop residue cover using satellite multispectral and hyperspectral data and to categorize soil tillage intensity in agricultural fields. A weak to moderate relationship between Landsat Thematic Mapper (TM) indices and crop residue cover was found; similar results were reported in Minnesota. Building on the findings from the scientific literature and previous work done by MTRI in the heavily agricultural Tiffin watershed of northwest Ohio and southeast Michigan, a decision tree classifier approach (also referred to as a classification tree) was used, linking several satellite data to on-the-ground tillage information in order to boost classification results. This approach included five tillage indices and derived products. A decision tree methodology enabled the development of statistically optimized (i.e., minimizing misclassification rates) classification algorithms at various desired time steps: monthly, seasonally, and annual over the 2006-2010 time period. Due to their flexibility, processing speed, and availability within all major remote sensing and statistical software packages, decision trees can ingest several data inputs from multiple sensors and satellite products, selecting only the bands, band ratios, indices, and products that further reduce misclassification errors. The project team created crop-specific tillage pattern classification trees whereby a training data set (~ 50% of available ground data) was created for production of the actual decision tree and a validation data set was set aside (~ 50% of available ground data) in order to assess the accuracy of the classification. A seasonal time step was used, optimizing a decision tree based on seasonal ground data for tillage patterns and satellite data and products for years 2006 through 2010. Annual crop type maps derived by the project team and the USDA Cropland Data Layer project was used an input to understand locations of corn, soybeans, wheat, etc. on a yearly basis. As previously stated, the robustness of the decision tree approach is the ability to implement various satellite data and products across temporal, spectral, and spatial resolutions, thereby improving the resulting classification and providing a reliable method that is not sensor-dependent. Tillage pattern classification from satellite imagery is not a simple task and has proven a challenge to previous researchers investigating this remote sensing topic. The team's decision tree method produced a practical, usable output within a focused project time period. Daughtry, C.S.T., Hunt Jr., E.R., Doraiswamy, P.C., McMurtrey III, J.E. 2005. Remote sensing the spatial distribution of crop residues. Agron. J. 97, 864-871.
Voxel classification based airway tree segmentation
NASA Astrophysics Data System (ADS)
Lo, Pechin; de Bruijne, Marleen
2008-03-01
This paper presents a voxel classification based method for segmenting the human airway tree in volumetric computed tomography (CT) images. In contrast to standard methods that use only voxel intensities, our method uses a more complex appearance model based on a set of local image appearance features and Kth nearest neighbor (KNN) classification. The optimal set of features for classification is selected automatically from a large set of features describing the local image structure at several scales. The use of multiple features enables the appearance model to differentiate between airway tree voxels and other voxels of similar intensities in the lung, thus making the segmentation robust to pathologies such as emphysema. The classifier is trained on imperfect segmentations that can easily be obtained using region growing with a manual threshold selection. Experiments show that the proposed method results in a more robust segmentation that can grow into the smaller airway branches without leaking into emphysematous areas, and is able to segment many branches that are not present in the training set.
A renewed perspective on agroforestry concepts and classification.
Torquebiau, E F
2000-11-01
Agroforestry, the association of trees with farming practices, is progressively becoming a recognized land-use discipline. However, it is still perceived by some scientists, technicians and farmers as a sort of environmental fashion which does not deserve credit. The peculiar history of agroforestry and the complex relationships between agriculture and forestry explain some misunderstandings about the concepts and classification of agroforestry and reveal that, contrarily to common perception, agroforestry is closer to agriculture than to forestry. Based on field experience from several countries, a structural classification of agroforestry into six simple categories is proposed: crops under tree cover, agroforests, agroforestry in a linear arrangement, animal agroforestry, sequential agroforestry and minor agroforestry techniques. It is argued that this pragmatic classification encompasses all major agroforestry associations and allows simultaneous agroforestry to be clearly differentiated from sequential agroforestry, two categories showing contrasting ecological tree-crop interactions. It can also contribute to a betterment of the image of agroforestry and lead to a simplification of its definition.
Rutzinger, Martin; Höfle, Bernhard; Hollaus, Markus; Pfeifer, Norbert
2008-01-01
Airborne laser scanning (ALS) is a remote sensing technique well-suited for 3D vegetation mapping and structure characterization because the emitted laser pulses are able to penetrate small gaps in the vegetation canopy. The backscattered echoes from the foliage, woody vegetation, the terrain, and other objects are detected, leading to a cloud of points. Higher echo densities (>20 echoes/m2) and additional classification variables from full-waveform (FWF) ALS data, namely echo amplitude, echo width and information on multiple echoes from one shot, offer new possibilities in classifying the ALS point cloud. Currently FWF sensor information is hardly used for classification purposes. This contribution presents an object-based point cloud analysis (OBPA) approach, combining segmentation and classification of the 3D FWF ALS points designed to detect tall vegetation in urban environments. The definition tall vegetation includes trees and shrubs, but excludes grassland and herbage. In the applied procedure FWF ALS echoes are segmented by a seeded region growing procedure. All echoes sorted descending by their surface roughness are used as seed points. Segments are grown based on echo width homogeneity. Next, segment statistics (mean, standard deviation, and coefficient of variation) are calculated by aggregating echo features such as amplitude and surface roughness. For classification a rule base is derived automatically from a training area using a statistical classification tree. To demonstrate our method we present data of three sites with around 500,000 echoes each. The accuracy of the classified vegetation segments is evaluated for two independent validation sites. In a point-wise error assessment, where the classification is compared with manually classified 3D points, completeness and correctness better than 90% are reached for the validation sites. In comparison to many other algorithms the proposed 3D point classification works on the original measurements directly, i.e. the acquired points. Gridding of the data is not necessary, a process which is inherently coupled to loss of data and precision. The 3D properties provide especially a good separability of buildings and terrain points respectively, if they are occluded by vegetation. PMID:27873771
Data mining for rapid prediction of facility fit and debottlenecking of biomanufacturing facilities.
Yang, Yang; Farid, Suzanne S; Thornhill, Nina F
2014-06-10
Higher titre processes can pose facility fit challenges in legacy biopharmaceutical purification suites with capacities originally matched to lower titre processes. Bottlenecks caused by mismatches in equipment sizes, combined with process fluctuations upon scale-up, can result in discarding expensive product. This paper describes a data mining decisional tool for rapid prediction of facility fit issues and debottlenecking of biomanufacturing facilities exposed to batch-to-batch variability and higher titres. The predictive tool comprised advanced multivariate analysis techniques to interrogate Monte Carlo stochastic simulation datasets that mimicked batch fluctuations in cell culture titres, step yields and chromatography eluate volumes. A decision tree classification method, CART (classification and regression tree) was introduced to explore the impact of these process fluctuations on product mass loss and reveal the root causes of bottlenecks. The resulting pictorial decision tree determined a series of if-then rules for the critical combinations of factors that lead to different mass loss levels. Three different debottlenecking strategies were investigated involving changes to equipment sizes, using higher capacity chromatography resins and elution buffer optimisation. The analysis compared the impact of each strategy on mass output, direct cost of goods per gram and processing time, as well as consideration of extra capital investment and space requirements. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.
Özge, C; Toros, F; Bayramkaya, E; Çamdeviren, H; Şaşmaz, T
2006-01-01
Background The purpose of this study is to evaluate the most important sociodemographic factors on smoking status of high school students using a broad randomised epidemiological survey. Methods Using in‐class, self administered questionnaire about their sociodemographic variables and smoking behaviour, a representative sample of total 3304 students of preparatory, 9th, 10th, and 11th grades, from 22 randomly selected schools of Mersin, were evaluated and discriminative factors have been determined using appropriate statistics. In addition to binary logistic regression analysis, the study evaluated combined effects of these factors using classification and regression tree methodology, as a new statistical method. Results The data showed that 38% of the students reported lifetime smoking and 16.9% of them reported current smoking with a male predominancy and increasing prevalence by age. Second hand smoking was reported at a 74.3% frequency with father predominance (56.6%). The significantly important factors that affect current smoking in these age groups were increased by household size, late birth rank, certain school types, low academic performance, increased second hand smoking, and stress (especially reported as separation from a close friend or because of violence at home). Classification and regression tree methodology showed the importance of some neglected sociodemographic factors with a good classification capacity. Conclusions It was concluded that, as closely related with sociocultural factors, smoking was a common problem in this young population, generating important academic and social burden in youth life and with increasing data about this behaviour and using new statistical methods, effective coping strategies could be composed. PMID:16891446
Using spectrotemporal indices to improve the fruit-tree crop classification accuracy
NASA Astrophysics Data System (ADS)
Peña, M. A.; Liao, R.; Brenning, A.
2017-06-01
This study assesses the potential of spectrotemporal indices derived from satellite image time series (SITS) to improve the classification accuracy of fruit-tree crops. Six major fruit-tree crop types in the Aconcagua Valley, Chile, were classified by applying various linear discriminant analysis (LDA) techniques on a Landsat-8 time series of nine images corresponding to the 2014-15 growing season. As features we not only used the complete spectral resolution of the SITS, but also all possible normalized difference indices (NDIs) that can be constructed from any two bands of the time series, a novel approach to derive features from SITS. Due to the high dimensionality of this "enhanced" feature set we used the lasso and ridge penalized variants of LDA (PLDA). Although classification accuracies yielded by the standard LDA applied on the full-band SITS were good (misclassification error rate, MER = 0.13), they were further improved by 23% (MER = 0.10) with ridge PLDA using the enhanced feature set. The most important bands to discriminate the crops of interest were mainly concentrated on the first two image dates of the time series, corresponding to the crops' greenup stage. Despite the high predictor weights provided by the red and near infrared bands, typically used to construct greenness spectral indices, other spectral regions were also found important for the discrimination, such as the shortwave infrared band at 2.11-2.19 μm, sensitive to foliar water changes. These findings support the usefulness of spectrotemporal indices in the context of SITS-based crop type classifications, which until now have been mainly constructed by the arithmetic combination of two bands of the same image date in order to derive greenness temporal profiles like those from the normalized difference vegetation index.
NASA Astrophysics Data System (ADS)
Clark, M. L.
2016-12-01
The goal of this study was to assess multi-temporal, Hyperspectral Infrared Imager (HyspIRI) satellite imagery for improved forest class mapping relative to multispectral satellites. The study area was the western San Francisco Bay Area, California and forest alliances (e.g., forest communities defined by dominant or co-dominant trees) were defined using the U.S. National Vegetation Classification System. Simulated 30-m HyspIRI, Landsat 8 and Sentinel-2 imagery were processed from image data acquired by NASA's AVIRIS airborne sensor in year 2015, with summer and multi-temporal (spring, summer, fall) data analyzed separately. HyspIRI reflectance was used to generate a suite of hyperspectral metrics that targeted key spectral features related to chemical and structural properties. The Random Forests classifier was applied to the simulated images and overall accuracies (OA) were compared to those from real Landsat 8 images. For each image group, broad land cover (e.g., Needle-leaf Trees, Broad-leaf Trees, Annual agriculture, Herbaceous, Built-up) was classified first, followed by a finer-detail forest alliance classification for pixels mapped as closed-canopy forest. There were 5 needle-leaf tree alliances and 16 broad-leaf tree alliances, including 7 Quercus (oak) alliance types. No forest alliance classification exceeded 50% OA, indicating that there was broad spectral similarity among alliances, most of which were not spectrally pure but rather a mix of tree species. In general, needle-leaf (Pine, Redwood, Douglas Fir) alliances had better class accuracies than broad-leaf alliances (Oaks, Madrone, Bay Laurel, Buckeye, etc). Multi-temporal data classifications all had 5-6% greater OA than with comparable summer data. For simulated data, HyspIRI metrics had 4-5% greater OA than Landsat 8 and Sentinel-2 multispectral imagery and 3-4% greater OA than HyspIRI reflectance. Finally, HyspIRI metrics had 8% greater OA than real Landsat 8 imagery. In conclusion, forest alliance classification was found to be a difficult remote sensing application with moderate resolution (30 m) satellite imagery; however, of the data tested, HyspIRI spectral metrics had the best performance relative to multispectral satellites.
Kos, Gregor; Sieger, Markus; McMullin, David; Zahradnik, Celine; Sulyok, Michael; Öner, Tuba; Mizaikoff, Boris; Krska, Rudolf
2016-10-01
The rapid identification of mycotoxins such as deoxynivalenol and aflatoxin B 1 in agricultural commodities is an ongoing concern for food importers and processors. While sophisticated chromatography-based methods are well established for regulatory testing by food safety authorities, few techniques exist to provide a rapid assessment for traders. This study advances the development of a mid-infrared spectroscopic method, recording spectra with little sample preparation. Spectral data were classified using a bootstrap-aggregated (bagged) decision tree method, evaluating the protein and carbohydrate absorption regions of the spectrum. The method was able to classify 79% of 110 maize samples at the European Union regulatory limit for deoxynivalenol of 1750 µg kg -1 and, for the first time, 77% of 92 peanut samples at 8 µg kg -1 of aflatoxin B 1 . A subset model revealed a dependency on variety and type of fungal infection. The employed CRC and SBL maize varieties could be pooled in the model with a reduction of classification accuracy from 90% to 79%. Samples infected with Fusarium verticillioides were removed, leaving samples infected with F. graminearum and F. culmorum in the dataset improving classification accuracy from 73% to 79%. A 500 µg kg -1 classification threshold for deoxynivalenol in maize performed even better with 85% accuracy. This is assumed to be due to a larger number of samples around the threshold increasing representativity. Comparison with established principal component analysis classification, which consistently showed overlapping clusters, confirmed the superior performance of bagged decision tree classification.
Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H
2017-02-01
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.
Nonparametric analysis of Minnesota spruce and aspen tree data and LANDSAT data
NASA Technical Reports Server (NTRS)
Scott, D. W.; Jee, R.
1984-01-01
The application of nonparametric methods in data-intensive problems faced by NASA is described. The theoretical development of efficient multivariate density estimators and the novel use of color graphics workstations are reviewed. The use of nonparametric density estimates for data representation and for Bayesian classification are described and illustrated. Progress in building a data analysis system in a workstation environment is reviewed and preliminary runs presented.
NASA Astrophysics Data System (ADS)
Matikainen, Leena; Karila, Kirsi; Hyyppä, Juha; Litkey, Paula; Puttonen, Eetu; Ahokas, Eero
2017-06-01
During the last 20 years, airborne laser scanning (ALS), often combined with passive multispectral information from aerial images, has shown its high feasibility for automated mapping processes. The main benefits have been achieved in the mapping of elevated objects such as buildings and trees. Recently, the first multispectral airborne laser scanners have been launched, and active multispectral information is for the first time available for 3D ALS point clouds from a single sensor. This article discusses the potential of this new technology in map updating, especially in automated object-based land cover classification and change detection in a suburban area. For our study, Optech Titan multispectral ALS data over a suburban area in Finland were acquired. Results from an object-based random forests analysis suggest that the multispectral ALS data are very useful for land cover classification, considering both elevated classes and ground-level classes. The overall accuracy of the land cover classification results with six classes was 96% compared with validation points. The classes under study included building, tree, asphalt, gravel, rocky area and low vegetation. Compared to classification of single-channel data, the main improvements were achieved for ground-level classes. According to feature importance analyses, multispectral intensity features based on several channels were more useful than those based on one channel. Automatic change detection for buildings and roads was also demonstrated by utilising the new multispectral ALS data in combination with old map vectors. In change detection of buildings, an old digital surface model (DSM) based on single-channel ALS data was also used. Overall, our analyses suggest that the new data have high potential for further increasing the automation level in mapping. Unlike passive aerial imaging commonly used in mapping, the multispectral ALS technology is independent of external illumination conditions, and there are no shadows on intensity images produced from the data. These are significant advantages in developing automated classification and change detection procedures.
Explaining Match Outcome During The Men’s Basketball Tournament at The Olympic Games
Leicht, Anthony S.; Gómez, Miguel A.; Woods, Carl T.
2017-01-01
In preparation for the Olympics, there is a limited opportunity for coaches and athletes to interact regularly with team performance indicators providing important guidance to coaches for enhanced match success at the elite level. This study examined the relationship between match outcome and team performance indicators during men’s basketball tournaments at the Olympic Games. Twelve team performance indicators were collated from all men’s teams and matches during the basketball tournament of the 2004-2016 Olympic Games (n = 156). Linear and non-linear analyses examined the relationship between match outcome and team performance indicator characteristics; namely, binary logistic regression and a conditional interference (CI) classification tree. The most parsimonious logistic regression model retained ‘assists’, ‘defensive rebounds’, ‘field-goal percentage’, ‘fouls’, ‘fouls against’, ‘steals’ and ‘turnovers’ (delta AIC <0.01; Akaike weight = 0.28) with a classification accuracy of 85.5%. Conversely, four performance indicators were retained with the CI classification tree with an average classification accuracy of 81.4%. However, it was the combination of ‘field-goal percentage’ and ‘defensive rebounds’ that provided the greatest probability of winning (93.2%). Match outcome during the men’s basketball tournaments at the Olympic Games was identified by a unique combination of performance indicators. Despite the average model accuracy being marginally higher for the logistic regression analysis, the CI classification tree offered a greater practical utility for coaches through its resolution of non-linear phenomena to guide team success. Key points A unique combination of team performance indicators explained 93.2% of winning observations in men’s basketball at the Olympics. Monitoring of these team performance indicators may provide coaches with the capability to devise multiple game plans or strategies to enhance their likelihood of winning. Incorporation of machine learning techniques with team performance indicators may provide a valuable and strategic approach to explain patterns within multivariate datasets in sport science. PMID:29238245
Schelhorn, J; Benndorf, M; Dietzel, M; Burmeister, H P; Kaiser, W A; Baltzer, P A T
2012-09-01
To evaluate the diagnostic accuracy of qualitative descriptors alone and in combination for the classification of focal liver lesions (FLLs) suspicious for metastasis in gadolinium-EOB-DTPA-enhanced liver MR imaging. Consecutive patients with clinically suspected liver metastases were eligible for this retrospective investigation. 50 patients met the inclusion criteria. All underwent Gd-EOB-DTPA-enhanced liver MRI (T2w, chemical shift T1w, dynamic T1w). Primary liver malignancies or treated lesions were excluded. All investigations were read by two blinded observers (O1, O2). Both independently identified the presence of lesions and evaluated predefined qualitative lesion descriptors (signal intensities, enhancement pattern and morphology). A reference standard was determined under consideration of all clinical and follow-up information. Statistical analysis besides contingency tables (chi square, kappa statistics) included descriptor combinations using classification trees (CHAID methodology) as well as ROC analysis. In 38 patients, 120 FLLs (52 benign, 68 malignant) were present. 115 (48 benign, 67 malignant) were identified by the observers. The enhancement pattern, relative SI upon T2w and late enhanced T1w images contributed significantly to the differentiation of FLLs. The overall classification accuracy was 91.3 % (O1) and 88.7 % (O2), kappa = 0.902. The combination of qualitative lesion descriptors proposed in this work revealed high diagnostic accuracy and interobserver agreement in the differentiation of focal liver lesions suspicious for metastases using Gd-EOB-DTPA-enhanced liver MRI. © Georg Thieme Verlag KG Stuttgart · New York.
Deep Multi-Task Learning for Tree Genera Classification
NASA Astrophysics Data System (ADS)
Ko, C.; Kang, J.; Sohn, G.
2018-05-01
The goal for our paper is to classify tree genera using airborne Light Detection and Ranging (LiDAR) data with Convolution Neural Network (CNN) - Multi-task Network (MTN) implementation. Unlike Single-task Network (STN) where only one task is assigned to the learning outcome, MTN is a deep learning architect for learning a main task (classification of tree genera) with other tasks (in our study, classification of coniferous and deciduous) simultaneously, with shared classification features. The main contribution of this paper is to improve classification accuracy from CNN-STN to CNN-MTN. This is achieved by introducing a concurrence loss (Lcd) to the designed MTN. This term regulates the overall network performance by minimizing the inconsistencies between the two tasks. Results show that we can increase the classification accuracy from 88.7 % to 91.0 % (from STN to MTN). The second goal of this paper is to solve the problem of small training sample size by multiple-view data generation. The motivation of this goal is to address one of the most common problems in implementing deep learning architecture, the insufficient number of training data. We address this problem by simulating training dataset with multiple-view approach. The promising results from this paper are providing a basis for classifying a larger number of dataset and number of classes in the future.
Guo, Hao; Liu, Lei; Chen, Junjie; Xu, Yong; Jie, Xiang
2017-01-01
Functional magnetic resonance imaging (fMRI) is one of the most useful methods to generate functional connectivity networks of the brain. However, conventional network generation methods ignore dynamic changes of functional connectivity between brain regions. Previous studies proposed constructing high-order functional connectivity networks that consider the time-varying characteristics of functional connectivity, and a clustering method was performed to decrease computational cost. However, random selection of the initial clustering centers and the number of clusters negatively affected classification accuracy, and the network lost neurological interpretability. Here we propose a novel method that introduces the minimum spanning tree method to high-order functional connectivity networks. As an unbiased method, the minimum spanning tree simplifies high-order network structure while preserving its core framework. The dynamic characteristics of time series are not lost with this approach, and the neurological interpretation of the network is guaranteed. Simultaneously, we propose a multi-parameter optimization framework that involves extracting discriminative features from the minimum spanning tree high-order functional connectivity networks. Compared with the conventional methods, our resting-state fMRI classification method based on minimum spanning tree high-order functional connectivity networks greatly improved the diagnostic accuracy for Alzheimer's disease. PMID:29249926
Microscopic saw mark analysis: an empirical approach.
Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles
2015-01-01
Microscopic saw mark analysis is a well published and generally accepted qualitative analytical method. However, little research has focused on identifying and mitigating potential sources of error associated with the method. The presented study proposes the use of classification trees and random forest classifiers as an optimal, statistically sound approach to mitigate the potential for error of variability and outcome error in microscopic saw mark analysis. The statistical model was applied to 58 experimental saw marks created with four types of saws. The saw marks were made in fresh human femurs obtained through anatomical gift and were analyzed using a Keyence digital microscope. The statistical approach weighed the variables based on discriminatory value and produced decision trees with an associated outcome error rate of 8.62-17.82%. © 2014 American Academy of Forensic Sciences.
NASA Technical Reports Server (NTRS)
Solomon, J. L.; Miller, W. F.; Quattrochi, D. A.
1979-01-01
In a cooperative project with the Geological Survey of Alabama, the Mississippi State Remote Sensing Applications Program has developed a single purpose, decision-tree classifier using band-ratioing techniques to discriminate various stages of surface mining activity. The tree classifier has four levels and employs only two channels in classification at each level. An accurate computation of the amount of disturbed land resulting from the mining activity can be made as a product of the classification output. The utilization of Landsat data provides a cost-efficient, rapid, and accurate means of monitoring surface mining activities.
Using machine learning techniques to automate sky survey catalog generation
NASA Technical Reports Server (NTRS)
Fayyad, Usama M.; Roden, J. C.; Doyle, R. J.; Weir, Nicholas; Djorgovski, S. G.
1993-01-01
We describe the application of machine classification techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Palomar Observatory Sky Survey provides comprehensive photographic coverage of the northern celestial hemisphere. The photographic plates are being digitized into images containing on the order of 10(exp 7) galaxies and 10(exp 8) stars. Since the size of this data set precludes manual analysis and classification of objects, our approach is to develop a software system which integrates independently developed techniques for image processing and data classification. Image processing routines are applied to identify and measure features of sky objects. Selected features are used to determine the classification of each object. GID3* and O-BTree, two inductive learning techniques, are used to automatically learn classification decision trees from examples. We describe the techniques used, the details of our specific application, and the initial encouraging results which indicate that our approach is well-suited to the problem. The benefits of the approach are increased data reduction throughput, consistency of classification, and the automated derivation of classification rules that will form an objective, examinable basis for classifying sky objects. Furthermore, astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems given automatically cataloged data.
Bào, Yīmíng; Kuhn, Jens H
2018-01-01
During the last decade, genome sequence-based classification of viruses has become increasingly prominent. Viruses can be even classified based on coding-complete genome sequence data alone. Nevertheless, classification remains arduous as experts are required to establish phylogenetic trees to depict the evolutionary relationships of such sequences for preliminary taxonomic placement. Pairwise sequence comparison (PASC) of genomes is one of several novel methods for establishing relationships among viruses. This method, provided by the US National Center for Biotechnology Information as an open-access tool, circumvents phylogenetics, and yet PASC results are often in agreement with those of phylogenetic analyses. Computationally inexpensive, PASC can be easily performed by non-taxonomists. Here we describe how to use the PASC tool for the preliminary classification of novel viral hemorrhagic fever-causing viruses.
Genetic analysis of a novel Xylella fastidiosa subspecies found in the southwestern United States.
Randall, Jennifer J; Goldberg, Natalie P; Kemp, John D; Radionenko, Maxim; French, Jason M; Olsen, Mary W; Hanson, Stephen F
2009-09-01
Xylella fastidiosa, the causal agent of several scorch diseases, is associated with leaf scorch symptoms in Chitalpa tashkentensis, a common ornamental landscape plant used throughout the southwestern United States. For a number of years, many chitalpa trees in southern New Mexico and Arizona exhibited leaf scorch symptoms, and the results from a regional survey show that chitalpa trees from New Mexico, Arizona, and California are frequently infected with X. fastidiosa. Phylogenetic analysis of multiple loci was used to compare the X. fastidiosa infecting chitalpa strains from New Mexico, Arizona, and trees imported into New Mexico nurseries with previously reported X. fastidiosa strains. Loci analyzed included the 16S ribosome, 16S-23S ribosomal intergenic spacer region, gyrase-B, simple sequence repeat sequences, X. fastidiosa-specific sequences, and the virulence-associated protein (VapD). This analysis indicates that the X. fastidiosa isolates associated with infected chitalpa trees in the Southwest are a highly related group that is distinct from the four previously defined taxons X. fastidiosa subsp. fastidiosa (piercei), X. fastidiosa subsp. multiplex, X. fastidiosa subsp. sandyi, and X. fastidiosa subsp. pauca. Therefore, the classification proposed for this new subspecies is X. fastidiosa subsp. tashke.
A Theoretical Analysis of Why Hybrid Ensembles Work
2017-01-01
Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles. PMID:28255296
Presence of indicator plant species as a predictor of wetland vegetation integrity
Stapanian, Martin A.; Adams, Jean V.; Gara, Brian
2013-01-01
We fit regression and classification tree models to vegetation data collected from Ohio (USA) wetlands to determine (1) which species best predict Ohio vegetation index of biotic integrity (OVIBI) score and (2) which species best predict high-quality wetlands (OVIBI score >75). The simplest regression tree model predicted OVIBI score based on the occurrence of three plant species: skunk-cabbage (Symplocarpus foetidus), cinnamon fern (Osmunda cinnamomea), and swamp rose (Rosa palustris). The lowest OVIBI scores were best predicted by the absence of the selected plant species rather than by the presence of other species. The simplest classification tree model predicted high-quality wetlands based on the occurrence of two plant species: skunk-cabbage and marsh-fern (Thelypteris palustris). The overall misclassification rate from this tree was 13 %. Again, low-quality wetlands were better predicted than high-quality wetlands by the absence of selected species rather than the presence of other species using the classification tree model. Our results suggest that a species’ wetland status classification and coefficient of conservatism are of little use in predicting wetland quality. A simple, statistically derived species checklist such as the one created in this study could be used by field biologists to quickly and efficiently identify wetland sites likely to be regulated as high-quality, and requiring more intensive field assessments. Alternatively, it can be used for advanced determinations of low-quality wetlands. Agencies can save considerable money by screening wetlands for the presence/absence of such “indicator” species before issuing permits.
Bou Kheir, Rania; Greve, Mogens H; Bøcher, Peder K; Greve, Mette B; Larsen, René; McCloy, Keith
2010-05-01
Soil organic carbon (SOC) is one of the most important carbon stocks globally and has large potential to affect global climate. Distribution patterns of SOC in Denmark constitute a nation-wide baseline for studies on soil carbon changes (with respect to Kyoto protocol). This paper predicts and maps the geographic distribution of SOC across Denmark using remote sensing (RS), geographic information systems (GISs) and decision-tree modeling (un-pruned and pruned classification trees). Seventeen parameters, i.e. parent material, soil type, landscape type, elevation, slope gradient, slope aspect, mean curvature, plan curvature, profile curvature, flow accumulation, specific catchment area, tangent slope, tangent curvature, steady-state wetness index, Normalized Difference Vegetation Index (NDVI), Normalized Difference Wetness Index (NDWI) and Soil Color Index (SCI) were generated to statistically explain SOC field measurements in the area of interest (Denmark). A large number of tree-based classification models (588) were developed using (i) all of the parameters, (ii) all Digital Elevation Model (DEM) parameters only, (iii) the primary DEM parameters only, (iv), the remote sensing (RS) indices only, (v) selected pairs of parameters, (vi) soil type, parent material and landscape type only, and (vii) the parameters having a high impact on SOC distribution in built pruned trees. The best constructed classification tree models (in the number of three) with the lowest misclassification error (ME) and the lowest number of nodes (N) as well are: (i) the tree (T1) combining all of the parameters (ME=29.5%; N=54); (ii) the tree (T2) based on the parent material, soil type and landscape type (ME=31.5%; N=14); and (iii) the tree (T3) constructed using parent material, soil type, landscape type, elevation, tangent slope and SCI (ME=30%; N=39). The produced SOC maps at 1:50,000 cartographic scale using these trees are highly matching with coincidence values equal to 90.5% (Map T1/Map T2), 95% (Map T1/Map T3) and 91% (Map T2/Map T3). The overall accuracies of these maps once compared with field observations were estimated to be 69.54% (Map T1), 68.87% (Map T2) and 69.41% (Map T3). The proposed tree models are relatively simple, and may be also applied to other areas. Copyright 2010 Elsevier Ltd. All rights reserved.
Semi-supervised SVM for individual tree crown species classification
NASA Astrophysics Data System (ADS)
Dalponte, Michele; Ene, Liviu Theodor; Marconcini, Mattia; Gobakken, Terje; Næsset, Erik
2015-12-01
In this paper a novel semi-supervised SVM classifier is presented, specifically developed for tree species classification at individual tree crown (ITC) level. In ITC tree species classification, all the pixels belonging to an ITC should have the same label. This assumption is used in the learning of the proposed semi-supervised SVM classifier (ITC-S3VM). This method exploits the information contained in the unlabeled ITC samples in order to improve the classification accuracy of a standard SVM. The ITC-S3VM method can be easily implemented using freely available software libraries. The datasets used in this study include hyperspectral imagery and laser scanning data acquired over two boreal forest areas characterized by the presence of three information classes (Pine, Spruce, and Broadleaves). The experimental results quantify the effectiveness of the proposed approach, which provides classification accuracies significantly higher (from 2% to above 27%) than those obtained by the standard supervised SVM and by a state-of-the-art semi-supervised SVM (S3VM). Particularly, by reducing the number of training samples (i.e. from 100% to 25%, and from 100% to 5% for the two datasets, respectively) the proposed method still exhibits results comparable to the ones of a supervised SVM trained with the full available training set. This property of the method makes it particularly suitable for practical forest inventory applications in which collection of in situ information can be very expensive both in terms of cost and time.
Site classification of ponderosa pine stands under stocking control in California
Robert F. Powers; William W. Oliver
1978-01-01
Existing systems for estimating site index of ponderosa pine (Pinus ponderosa Laws.) do not apply well to California stands where stocking is controlled. A more suitable system has been developed using trends in natural height growth, derived from stem analysis of dominant trees in California. This site index system produces polymorphic patterns of...
Knowledge and Community: The Effect of a First-Year Seminar on Student Persistence
ERIC Educational Resources Information Center
Pittendrigh, Adele; Borkowski, John; Swinford, Steven; Plumb, Carolyn
2016-01-01
This study explores the effects of an academic seminar on the persistence of first-year college students, including effects on students most at risk of dropping out. A secondary interest was demonstrating the utility of using classification and regression tree analysis to identify relevant predictors of student persistence. The results of the…
Evaluation of open source data mining software packages
Bonnie Ruefenacht; Greg Liknes; Andrew J. Lister; Haans Fisk; Dan Wendt
2009-01-01
Since 2001, the USDA Forest Service (USFS) has used classification and regression-tree technology to map USFS Forest Inventory and Analysis (FIA) biomass, forest type, forest type groups, and National Forest vegetation. This prior work used Cubist/See5 software for the analyses. The objective of this project, sponsored by the Remote Sensing Steering Committee (RSSC),...
Deep Phylogeny—How a Tree Can Help Characterize Early Life on Earth
Gaucher, Eric A.; Kratzer, James T.; Randall, Ryan N.
2010-01-01
The Darwinian concept of biological evolution assumes that life on Earth shares a common ancestor. The diversification of this common ancestor through speciation events and vertical transmission of genetic material implies that the classification of life can be illustrated in a tree-like manner, commonly referred to as the Tree of Life. This article describes features of the Tree of Life, such as how the tree has been both pruned and become bushier throughout the past century as our knowledge of biology has expanded. We present current views that the classification of life may be best illustrated as a ring or even a coral with tree-like characteristics. This article also discusses how the organization of the Tree of Life offers clues about ancient life on Earth. In particular, we focus on the environmental conditions and temperature history of Precambrian life and show how chemical, biological, and geological data can converge to better understand this history. “You know, a tree is a tree. How many more do you need to look at?” –Ronald Reagan (Governor of California), quoted in the Sacramento Bee, opposing expansion of Redwood National Park, March 3, 1966 PMID:20182607
Deep phylogeny--how a tree can help characterize early life on Earth.
Gaucher, Eric A; Kratzer, James T; Randall, Ryan N
2010-01-01
The Darwinian concept of biological evolution assumes that life on Earth shares a common ancestor. The diversification of this common ancestor through speciation events and vertical transmission of genetic material implies that the classification of life can be illustrated in a tree-like manner, commonly referred to as the Tree of Life. This article describes features of the Tree of Life, such as how the tree has been both pruned and become bushier throughout the past century as our knowledge of biology has expanded. We present current views that the classification of life may be best illustrated as a ring or even a coral with tree-like characteristics. This article also discusses how the organization of the Tree of Life offers clues about ancient life on Earth. In particular, we focus on the environmental conditions and temperature history of Precambrian life and show how chemical, biological, and geological data can converge to better understand this history."You know, a tree is a tree. How many more do you need to look at?"--Ronald Reagan (Governor of California), quoted in the Sacramento Bee, opposing expansion of Redwood National Park, March 3, 1966.
Mapping forest tree species over large areas with partially cloudy Landsat imagery
NASA Astrophysics Data System (ADS)
Turlej, K.; Radeloff, V.
2017-12-01
Forests provide numerous services to natural systems and humankind, but which services forest provide depends greatly on their tree species composition. That makes it important to track not only changes in forest extent, something that remote sensing excels in, but also to map tree species. The main goal of our work was to map tree species with Landsat imagery, and to identify how to maximize mapping accuracy by including partially cloudy imagery. Our study area covered one Landsat footprint (26/28) in Northern Wisconsin, USA, with temperate and boreal forests. We selected this area because it contains numerous tree species and variable forest composition providing an ideal study area to test the limits of Landsat data. We quantified how species-level classification accuracy was affected by a) the number of acquisitions, b) the seasonal distribution of observations, and c) the amount of cloud contamination. We classified a single year stack of Landsat-7, and -8 images data with a decision tree algorithm to generate a map of dominant tree species at the pixel- and stand-level. We obtained three important results. First, we achieved producer's accuracies in the range 70-80% and user's accuracies in range 80-90% for the most abundant tree species in our study area. Second, classification accuracy improved with more acquisitions, when observations were available from all seasons, and is the best when images with up to 40% cloud cover are included. Finally, classifications for pure stands were 10 to 30 percentage points better than those for mixed stands. We conclude that including partially cloudy Landsat imagery allows to map forest tree species with accuracies that were previously only possible for rare years with many cloud-free observations. Our approach thus provides important information for both forest management and science.
Prediction of strontium bromide laser efficiency using cluster and decision tree analysis
NASA Astrophysics Data System (ADS)
Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar
2018-01-01
Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.
Emerald ash borer (Agrilus planipennis): Towards a classification of tree health and early detection
Matthew P. Peters; Louis R. Iverson; T. Davis Sydnor
2009-01-01
Forty-five green ash (Fraxinus pennsylvanica) street trees in Toledo, Ohio were photographed, measured, and visually rated for conditions related to emerald ash borer (Agrilus planipennis) (EAB) attacks. These trees were later removed, and sections were examined from each tree to determine the length of time that growth rates had...
Das, A.J.; Battles, J.J.; Stephenson, N.L.; van Mantgem, P.J.
2007-01-01
We examined mortality of Abies concolor (Gord. & Glend.) Lindl. (white fir) and Pinus lambertiana Dougl. (sugar pine) by developing logistic models using three growth indices obtained from tree rings: average growth, growth trend, and count of abrupt growth declines. For P. lambertiana, models with average growth, growth trend, and count of abrupt declines improved overall prediction (78.6% dead trees correctly classified, 83.7% live trees correctly classified) compared with a model with average recent growth alone (69.6% dead trees correctly classified, 67.3% live trees correctly classified). For A. concolor, counts of abrupt declines and longer time intervals improved overall classification (trees with DBH ???20 cm: 78.9% dead trees correctly classified and 76.7% live trees correctly classified vs. 64.9% dead trees correctly classified and 77.9% live trees correctly classified; trees with DBH <20 cm: 71.6% dead trees correctly classified and 71.0% live trees correctly classified vs. 67.2% dead trees correctly classified and 66.7% live trees correctly classified). In general, count of abrupt declines improved live-tree classification. External validation of A. concolor models showed that they functioned well at stands not used in model development, and the development of size-specific models demonstrated important differences in mortality risk between understory and canopy trees. Population-level mortality-risk models were developed for A. concolor and generated realistic mortality rates at two sites. Our results support the contention that a more comprehensive use of the growth record yields a more robust assessment of mortality risk. ?? 2007 NRC.
Proposition of a Classification of Adult Patients with Hemiparesis in Chronic Phase.
Chantraine, Frédéric; Filipetti, Paul; Schreiber, Céline; Remacle, Angélique; Kolanowski, Elisabeth; Moissenet, Florent
2016-01-01
Patients who have developed hemiparesis as a result of a central nervous system lesion, often experience reduced walking capacity and worse gait quality. Although clinically, similar gait patterns have been observed, presently, no clinically driven classification has been validated to group these patients' gait abnormalities at the level of the hip, knee and ankle joints. This study has thus intended to put forward a new gait classification for adult patients with hemiparesis in chronic phase, and to validate its discriminatory capacity. Twenty-six patients with hemiparesis were included in this observational study. Following a clinical examination, a clinical gait analysis, complemented by a video analysis, was performed whereby participants were requested to walk spontaneously on a 10m walkway. A patient's classification was established from clinical examination data and video analysis. This classification was made up of three groups, including two sub-groups, defined with key abnormalities observed whilst walking. Statistical analysis was achieved on the basis of 25 parameters resulting from the clinical gait analysis in order to assess the discriminatory characteristic of the classification as displayed by the walking speed and kinematic parameters. Results revealed that the parameters related to the discriminant criteria of the proposed classification were all significantly different between groups and subgroups. More generally, nearly two thirds of the 25 parameters showed significant differences (p<0.05) between the groups and sub-groups. However, prior to being fully validated, this classification must still be tested on a larger number of patients, and the repeatability of inter-operator measures must be assessed. This classification enables patients to be grouped on the basis of key abnormalities observed whilst walking and has the advantage of being able to be used in clinical routines without necessitating complex apparatus. In the midterm, this classification may allow a decision-tree of therapies to be developed on the basis of the group in which the patient has been categorised.
Proposition of a Classification of Adult Patients with Hemiparesis in Chronic Phase
Filipetti, Paul; Remacle, Angélique; Kolanowski, Elisabeth
2016-01-01
Background Patients who have developed hemiparesis as a result of a central nervous system lesion, often experience reduced walking capacity and worse gait quality. Although clinically, similar gait patterns have been observed, presently, no clinically driven classification has been validated to group these patients’ gait abnormalities at the level of the hip, knee and ankle joints. This study has thus intended to put forward a new gait classification for adult patients with hemiparesis in chronic phase, and to validate its discriminatory capacity. Methods and Findings Twenty-six patients with hemiparesis were included in this observational study. Following a clinical examination, a clinical gait analysis, complemented by a video analysis, was performed whereby participants were requested to walk spontaneously on a 10m walkway. A patient’s classification was established from clinical examination data and video analysis. This classification was made up of three groups, including two sub-groups, defined with key abnormalities observed whilst walking. Statistical analysis was achieved on the basis of 25 parameters resulting from the clinical gait analysis in order to assess the discriminatory characteristic of the classification as displayed by the walking speed and kinematic parameters. Results revealed that the parameters related to the discriminant criteria of the proposed classification were all significantly different between groups and subgroups. More generally, nearly two thirds of the 25 parameters showed significant differences (p<0.05) between the groups and sub-groups. However, prior to being fully validated, this classification must still be tested on a larger number of patients, and the repeatability of inter-operator measures must be assessed. Conclusions This classification enables patients to be grouped on the basis of key abnormalities observed whilst walking and has the advantage of being able to be used in clinical routines without necessitating complex apparatus. In the midterm, this classification may allow a decision-tree of therapies to be developed on the basis of the group in which the patient has been categorised. PMID:27271533
Fragment-based prediction of skin sensitization using recursive partitioning
NASA Astrophysics Data System (ADS)
Lu, Jing; Zheng, Mingyue; Wang, Yong; Shen, Qiancheng; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian
2011-09-01
Skin sensitization is an important toxic endpoint in the risk assessment of chemicals. In this paper, structure-activity relationships analysis was performed on the skin sensitization potential of 357 compounds with local lymph node assay data. Structural fragments were extracted by GASTON (GrAph/Sequence/Tree extractiON) from the training set. Eight fragments with accuracy significantly higher than 0.73 ( p < 0.1) were retained to make up an indicator descriptor fragment. The fragment descriptor and eight other physicochemical descriptors closely related to the endpoint were calculated to construct the recursive partitioning tree (RP tree) for classification. The balanced accuracy of the training set, test set I, and test set II in the leave-one-out model were 0.846, 0.800, and 0.809, respectively. The results highlight that fragment-based RP tree is a preferable method for identifying skin sensitizers. Moreover, the selected fragments provide useful structural information for exploring sensitization mechanisms, and RP tree creates a graphic tree to identify the most important properties associated with skin sensitization. They can provide some guidance for designing of drugs with lower sensitization level.
Wright, C.; Gallant, Alisa L.
2007-01-01
The U.S. Fish and Wildlife Service uses the term palustrine wetland to describe vegetated wetlands traditionally identified as marsh, bog, fen, swamp, or wet meadow. Landsat TM imagery was combined with image texture and ancillary environmental data to model probabilities of palustrine wetland occurrence in Yellowstone National Park using classification trees. Model training and test locations were identified from National Wetlands Inventory maps, and classification trees were built for seven years spanning a range of annual precipitation. At a coarse level, palustrine wetland was separated from upland. At a finer level, five palustrine wetland types were discriminated: aquatic bed (PAB), emergent (PEM), forested (PFO), scrub–shrub (PSS), and unconsolidated shore (PUS). TM-derived variables alone were relatively accurate at separating wetland from upland, but model error rates dropped incrementally as image texture, DEM-derived terrain variables, and other ancillary GIS layers were added. For classification trees making use of all available predictors, average overall test error rates were 7.8% for palustrine wetland/upland models and 17.0% for palustrine wetland type models, with consistent accuracies across years. However, models were prone to wetland over-prediction. While the predominant PEM class was classified with omission and commission error rates less than 14%, we had difficulty identifying the PAB and PSS classes. Ancillary vegetation information greatly improved PSS classification and moderately improved PFO discrimination. Association with geothermal areas distinguished PUS wetlands. Wetland over-prediction was exacerbated by class imbalance in likely combination with spatial and spectral limitations of the TM sensor. Wetland probability surfaces may be more informative than hard classification, and appear to respond to climate-driven wetland variability. The developed method is portable, relatively easy to implement, and should be applicable in other settings and over larger extents.
Robertson, Dale M.; Saad, D.A.; Heisey, D.M.
2006-01-01
Various approaches are used to subdivide large areas into regions containing streams that have similar reference or background water quality and that respond similarly to different factors. For many applications, such as establishing reference conditions, it is preferable to use physical characteristics that are not affected by human activities to delineate these regions. However, most approaches, such as ecoregion classifications, rely on land use to delineate regions or have difficulties compensating for the effects of land use. Land use not only directly affects water quality, but it is often correlated with the factors used to define the regions. In this article, we describe modifications to SPARTA (spatial regression-tree analysis), a relatively new approach applied to water-quality and environmental characteristic data to delineate zones with similar factors affecting water quality. In this modified approach, land-use-adjusted (residualized) water quality and environmental characteristics are computed for each site. Regression-tree analysis is applied to the residualized data to determine the most statistically important environmental characteristics describing the distribution of a specific water-quality constituent. Geographic information for small basins throughout the study area is then used to subdivide the area into relatively homogeneous environmental water-quality zones. For each zone, commonly used approaches are subsequently used to define its reference water quality and how its water quality responds to changes in land use. SPARTA is used to delineate zones of similar reference concentrations of total phosphorus and suspended sediment throughout the upper Midwestern part of the United States. ?? 2006 Springer Science+Business Media, Inc.
Parker, G; McCraw, S; Hadzi-Pavlovic, D
2015-07-15
Studies suggest that differentiating melancholic from non-melancholic depressive disorders is advanced by use of illness course as well as symptom variables but, in practice, potentially differentiating variables are generally positioned as having equal value. Judging that differentiating features are more likely to vary in their signal intensity, we sought to determine the number of features required to effect differentiation and their hierarchical order. The 24-item clinician-rated Sydney Melancholia Prototype Index (SMPI-CR) was completed for 364 unipolar depressed patients. The sample was divided into two cohorts according to the recruitment period. An RPART classification tree analysis identified the most discriminating SMPI items in the development sample of 197 patients, and examined the sensitivity and specificity of the diagnostic decisions, then sought to replicate findings in a validation sample of 169 patients. Independent analyses of putative SMPI items identified only seven items as required to discriminate those with clinically-diagnosed melancholic or non-melancholic depression when the conditions were examined separately. An RPART analysis considering differentiation of melancholic and non-melancholic depression in the total samples retained five of those items in the classification tree, three of which were non-symptom items, and with 92% sensitivity and 80% specificity in the development sample. This reduced item set showed 93% sensitivity and 82% specificity in the validation sample. Our clinical judgment of melancholic or non-melancholic depression may not correspond with the clinical logic employed by other clinicians. Only five SMPI items were required to derive a succinct and efficient decision tree, comprising high sensitivity and specificity in differentiating melancholic and non-melancholic depression. Current study findings provide an empirical model that could enrich clinicians׳ approach to differentiating melancholic and non-melancholic depression. Copyright © 2015 Elsevier B.V. All rights reserved.
Tran, Thi Huong Giang; Ressl, Camillo; Pfeifer, Norbert
2018-02-03
This paper suggests a new approach for change detection (CD) in 3D point clouds. It combines classification and CD in one step using machine learning. The point cloud data of both epochs are merged for computing features of four types: features describing the point distribution, a feature relating to relative terrain elevation, features specific for the multi-target capability of laser scanning, and features combining the point clouds of both epochs to identify the change. All these features are merged in the points and then training samples are acquired to create the model for supervised classification, which is then applied to the whole study area. The final results reach an overall accuracy of over 90% for both epochs of eight classes: lost tree, new tree, lost building, new building, changed ground, unchanged building, unchanged tree, and unchanged ground.
NASA Technical Reports Server (NTRS)
Mehta, N. C.
1984-01-01
The utility of radar scatterometers for discrimination and characterization of natural vegetation was investigated. Backscatter measurements were acquired with airborne multi-frequency, multi-polarization, multi-angle radar scatterometers over a test site in a southern temperate forest. Separability between ground cover classes was studied using a two-class separability measure. Very good separability is achieved between most classes. Longer wavelength is useful in separating trees from non-tree classes, while shorter wavelength and cross polarization are helpful for discrimination among tree classes. Using the maximum likelihood classifier, 50% overall classification accuracy is achieved using a single, short-wavelength scatterometer channel. Addition of multiple incidence angles and another radar band improves classification accuracy by 20% and 50%, respectively, over the single channel accuracy. Incorporation of a third radar band seems redundant for vegetation classification. Vertical transmit polarization is critically important for all classes.
Classification of Tree Species in Overstorey Canopy of Subtropical Forest Using QuickBird Images.
Lin, Chinsu; Popescu, Sorin C; Thomson, Gavin; Tsogt, Khongor; Chang, Chein-I
2015-01-01
This paper proposes a supervised classification scheme to identify 40 tree species (2 coniferous, 38 broadleaf) belonging to 22 families and 36 genera in high spatial resolution QuickBird multispectral images (HMS). Overall kappa coefficient (OKC) and species conditional kappa coefficients (SCKC) were used to evaluate classification performance in training samples and estimate accuracy and uncertainty in test samples. Baseline classification performance using HMS images and vegetation index (VI) images were evaluated with an OKC value of 0.58 and 0.48 respectively, but performance improved significantly (up to 0.99) when used in combination with an HMS spectral-spatial texture image (SpecTex). One of the 40 species had very high conditional kappa coefficient performance (SCKC ≥ 0.95) using 4-band HMS and 5-band VIs images, but, only five species had lower performance (0.68 ≤ SCKC ≤ 0.94) using the SpecTex images. When SpecTex images were combined with a Visible Atmospherically Resistant Index (VARI), there was a significant improvement in performance in the training samples. The same level of improvement could not be replicated in the test samples indicating that a high degree of uncertainty exists in species classification accuracy which may be due to individual tree crown density, leaf greenness (inter-canopy gaps), and noise in the background environment (intra-canopy gaps). These factors increase uncertainty in the spectral texture features and therefore represent potential problems when using pixel-based classification techniques for multi-species classification.
Meneguzzo, Dacia M; Liknes, Greg C; Nelson, Mark D
2013-08-01
Discrete trees and small groups of trees in nonforest settings are considered an essential resource around the world and are collectively referred to as trees outside forests (ToF). ToF provide important functions across the landscape, such as protecting soil and water resources, providing wildlife habitat, and improving farmstead energy efficiency and aesthetics. Despite the significance of ToF, forest and other natural resource inventory programs and geospatial land cover datasets that are available at a national scale do not include comprehensive information regarding ToF in the United States. Additional ground-based data collection and acquisition of specialized imagery to inventory these resources are expensive alternatives. As a potential solution, we identified two remote sensing-based approaches that use free high-resolution aerial imagery from the National Agriculture Imagery Program (NAIP) to map all tree cover in an agriculturally dominant landscape. We compared the results obtained using an unsupervised per-pixel classifier (independent component analysis-[ICA]) and an object-based image analysis (OBIA) procedure in Steele County, Minnesota, USA. Three types of accuracy assessments were used to evaluate how each method performed in terms of: (1) producing a county-level estimate of total tree-covered area, (2) correctly locating tree cover on the ground, and (3) how tree cover patch metrics computed from the classified outputs compared to those delineated by a human photo interpreter. Both approaches were found to be viable for mapping tree cover over a broad spatial extent and could serve to supplement ground-based inventory data. The ICA approach produced an estimate of total tree cover more similar to the photo-interpreted result, but the output from the OBIA method was more realistic in terms of describing the actual observed spatial pattern of tree cover.
[The changes of forest canopy spectral reflectance with seasons in Xiaoxing'anling].
Xu, Guang-Cai; Pang, Yong; Li, Zeng-Yuan; Zhao, Kai-Rui; Liu, Lu-Xia
2013-12-01
The ASD FieldSpec portable spectrometer was adopted to collect canopy reflectance spectrum data of the 9 main tree species in study area by a long-term observation to get the data of the four seasons Then the smoothed reflectance curve and the first derivation curve from 350 to 1400 nm and several commonly used vegetation spectral characteristic parameters were generated to analyse seasonal change characteristics and variation of the 9 tree species in visible and near-infrared band and to explore the best band characteristics and period for species identification. The results showed that different trees had different and rather unique spectral features during the four seasons. The spectral characteristics of the deciduous trees have regular changes with the cycle of the seasons, whereas those of the evergreen tree species have no significant changes in one year. As well changes in the spectral characteristics could effectively reflect forest phenology changes, and it is proposed that the optimal strategy for tree species classification may be the integration and analysis of multi-seasonal spectral data. Evergreen trees and deciduous trees in the winter have obvious differences in the canopy spectral characteristics and the best single-season remote sensing data for tree species recognition is in summer.
Salas-Leiva, Dayana E.; Meerow, Alan W.; Calonje, Michael; Griffith, M. Patrick; Francisco-Ortega, Javier; Nakamura, Kyoko; Stevenson, Dennis W.; Lewis, Carl E.; Namoff, Sandra
2013-01-01
Background and aims Despite a recent new classification, a stable phylogeny for the cycads has been elusive, particularly regarding resolution of Bowenia, Stangeria and Dioon. In this study, five single-copy nuclear genes (SCNGs) are applied to the phylogeny of the order Cycadales. The specific aim is to evaluate several gene tree–species tree reconciliation approaches for developing an accurate phylogeny of the order, to contrast them with concatenated parsimony analysis and to resolve the erstwhile problematic phylogenetic position of these three genera. Methods DNA sequences of five SCNGs were obtained for 20 cycad species representing all ten genera of Cycadales. These were analysed with parsimony, maximum likelihood (ML) and three Bayesian methods of gene tree–species tree reconciliation, using Cycas as the outgroup. A calibrated date estimation was developed with Bayesian methods, and biogeographic analysis was also conducted. Key Results Concatenated parsimony, ML and three species tree inference methods resolve exactly the same tree topology with high support at most nodes. Dioon and Bowenia are the first and second branches of Cycadales after Cycas, respectively, followed by an encephalartoid clade (Macrozamia–Lepidozamia–Encephalartos), which is sister to a zamioid clade, of which Ceratozamia is the first branch, and in which Stangeria is sister to Microcycas and Zamia. Conclusions A single, well-supported phylogenetic hypothesis of the generic relationships of the Cycadales is presented. However, massive extinction events inferred from the fossil record that eliminated broader ancestral distributions within Zamiaceae compromise accurate optimization of ancestral biogeographical areas for that hypothesis. While major lineages of Cycadales are ancient, crown ages of all modern genera are no older than 12 million years, supporting a recent hypothesis of mostly Miocene radiations. This phylogeny can contribute to an accurate infrafamilial classification of Zamiaceae. PMID:23997230
Wheeler, David C.; Archer, Kellie J.; Burstyn, Igor; Yu, Kai; Stewart, Patricia A.; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Schwenn, Molly; Johnson, Alison; Armenti, Karla; Silverman, Debra T.; Friesen, Melissa C.
2015-01-01
Objectives: To evaluate occupational exposures in case–control studies, exposure assessors typically review each job individually to assign exposure estimates. This process lacks transparency and does not provide a mechanism for recreating the decision rules in other studies. In our previous work, nominal (unordered categorical) classification trees (CTs) generally successfully predicted expert-assessed ordinal exposure estimates (i.e. none, low, medium, high) derived from occupational questionnaire responses, but room for improvement remained. Our objective was to determine if using recently developed ordinal CTs would improve the performance of nominal trees in predicting ordinal occupational diesel exhaust exposure estimates in a case–control study. Methods: We used one nominal and four ordinal CT methods to predict expert-assessed probability, intensity, and frequency estimates of occupational diesel exhaust exposure (each categorized as none, low, medium, or high) derived from questionnaire responses for the 14983 jobs in the New England Bladder Cancer Study. To replicate the common use of a single tree, we applied each method to a single sample of 70% of the jobs, using 15% to test and 15% to validate each method. To characterize variability in performance, we conducted a resampling analysis that repeated the sample draws 100 times. We evaluated agreement between the tree predictions and expert estimates using Somers’ d, which measures differences in terms of ordinal association between predicted and observed scores and can be interpreted similarly to a correlation coefficient. Results: From the resampling analysis, compared with the nominal tree, an ordinal CT method that used a quadratic misclassification function and controlled tree size based on total misclassification cost had a slightly better predictive performance that was statistically significant for the frequency metric (Somers’ d: nominal tree = 0.61; ordinal tree = 0.63) and similar performance for the probability (nominal = 0.65; ordinal = 0.66) and intensity (nominal = 0.65; ordinal = 0.65) metrics. The best ordinal CT predicted fewer cases of large disagreement with the expert assessments (i.e. no exposure predicted for a job with high exposure and vice versa) compared with the nominal tree across all of the exposure metrics. For example, the percent of jobs with expert-assigned high intensity of exposure that the model predicted as no exposure was 29% for the nominal tree and 22% for the best ordinal tree. Conclusions: The overall agreements were similar across CT models; however, the use of ordinal models reduced the magnitude of the discrepancy when disagreements occurred. As the best performing model can vary by situation, researchers should consider evaluating multiple CT methods to maximize the predictive performance within their data. PMID:25433003
NASA Technical Reports Server (NTRS)
Han, Chia Yung; Wan, Liqun; Wee, William G.
1990-01-01
A knowledge-based interactive problem solving environment called KIPSE1 is presented. The KIPSE1 is a system built on a commercial expert system shell, the KEE system. This environment gives user capability to carry out exploratory data analysis and pattern classification tasks. A good solution often consists of a sequence of steps with a set of methods used at each step. In KIPSE1, solution is represented in the form of a decision tree and each node of the solution tree represents a partial solution to the problem. Many methodologies are provided at each node to the user such that the user can interactively select the method and data sets to test and subsequently examine the results. Otherwise, users are allowed to make decisions at various stages of problem solving to subdivide the problem into smaller subproblems such that a large problem can be handled and a better solution can be found.
Martin, Michael A; Meyricke, Ramona; O'Neill, Terry; Roberts, Steven
2006-04-20
A critical choice facing breast cancer patients is which surgical treatment--mastectomy or breast conserving surgery (BCS)--is most appropriate. Several studies have investigated factors that impact the type of surgery chosen, identifying features such as place of residence, age at diagnosis, tumor size, socio-economic and racial/ethnic elements as relevant. Such assessment of "propensity" is important in understanding issues such as a reported under-utilisation of BCS among women for whom such treatment was not contraindicated. Using Western Australian (WA) data, we further examine the factors associated with the type of surgical treatment for breast cancer using a classification tree approach. This approach deals naturally with complicated interactions between factors, and so allows flexible and interpretable models for treatment choice to be built that add to the current understanding of this complex decision process. Data was extracted from the WA Cancer Registry on women diagnosed with breast cancer in WA from 1990 to 2000. Subjects' treatment preferences were predicted from covariates using both classification trees and logistic regression. Tumor size was the primary determinant of patient choice, subjects with tumors smaller than 20 mm in diameter preferring BCS. For subjects with tumors greater than 20 mm in diameter factors such as patient age, nodal status, and tumor histology become relevant as predictors of patient choice. Classification trees perform as well as logistic regression for predicting patient choice, but are much easier to interpret for clinical use. The selected tree can inform clinicians' advice to patients.
Tree planting in the Allegheny section
Northeastern Forest Experiment Station
1961-01-01
Tree planting involves many considerations - site classification, selection of species, planting practices, and protection from fire, insects, and diseases. The information about many of these aspects of planting is scattered and fragmentary.
Wen, Tingxi; Zhang, Zhongnan
2017-01-01
Abstract In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy. PMID:28489789
Wen, Tingxi; Zhang, Zhongnan
2017-05-01
In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.
An Evaluation of Risk Factors Related to Employment Outcomes for Youth with Disabilities
ERIC Educational Resources Information Center
Sima, Adam P.; Wehman, Paul H.; Chan, Fong; West, Michael D.; Leucking, Richard G.
2015-01-01
This study explores non-modifiable risk factors associated with poor post-school competitive employment outcomes for students with disabilities. A classification tree analysis was used with a sample of 2,900 students who were in the second National Longitudinal Transition Study-2 (NLTS2) up to 6 years following school exit to identify groups of…
Monitoring urban tree cover using object-based image analysis and public domain remotely sensed data
L. Monika Moskal; Diane M. Styers; Meghan Halabisky
2011-01-01
Urban forest ecosystems provide a range of social and ecological services, but due to the heterogeneity of these canopies their spatial extent is difficult to quantify and monitor. Traditional per-pixel classification methods have been used to map urban canopies, however, such techniques are not generally appropriate for assessing these highly variable landscapes....
Who Is and Who Is Not Willing to Use Online Employer-Provided Retirement Investment Advice
ERIC Educational Resources Information Center
Joo, So-Hyun; Grable, John E.; Choe, Hyuncha
2007-01-01
This study used classification tree analysis to examine who is and who is not willing to use online employer-provided retirement investment advice. Using data from the Retirement Confidence Survey (Employee Benefit Research Institute, 2004), the study focused on who was more likely to use online retirement investment advice when it was available…
Effect of foot shape on the three-dimensional position of foot bones.
Ledoux, William R; Rohr, Eric S; Ching, Randal P; Sangeorzan, Bruce J
2006-12-01
To eliminate some of the ambiguity in describing foot shape, we developed three-dimensional (3D), objective measures of foot type based on computerized tomography (CT) scans. Feet were classified via clinical examination as pes cavus (high arch), neutrally aligned (normal arch), asymptomatic pes planus (flat arch with no pain), or symptomatic pes planus (flat arch with pain). We enrolled 10 subjects of each foot type; if both feet were of the same foot type, then each foot was scanned (n=65 total). Partial weightbearing (20% body weight) CT scans were performed. We generated embedded coordinate systems for each foot bone by assuming uniform density and calculating the inertial matrix. Cardan angles were used to describe five bone-to-bone relationships, resulting in 15 angular measurements. Significant differences were found among foot types for 12 of the angles. The angles were also used to develop a classification tree analysis, which determined the correct foot type for 64 of the 65 feet. Our measure provides insight into how foot bone architecture differs between foot types. The classification tree analysis demonstrated that objective measures can be used to discriminate between feet with high, normal, and low arches. Copyright (c) 2006 Orthopaedic Research Society.
Feature Relevance Assessment of Multispectral Airborne LIDAR Data for Tree Species Classification
NASA Astrophysics Data System (ADS)
Amiri, N.; Heurich, M.; Krzystek, P.; Skidmore, A. K.
2018-04-01
The presented experiment investigates the potential of Multispectral Laser Scanning (MLS) point clouds for single tree species classification. The basic idea is to simulate a MLS sensor by combining two different Lidar sensors providing three different wavelngthes. The available data were acquired in the summer 2016 at the same date in a leaf-on condition with an average point density of 37 points/m2. For the purpose of classification, we segmented the combined 3D point clouds consisiting of three different spectral channels into 3D clusters using Normalized Cut segmentation approach. Then, we extracted four group of features from the 3D point cloud space. Once a varity of features has been extracted, we applied forward stepwise feature selection in order to reduce the number of irrelevant or redundant features. For the classification, we used multinomial logestic regression with L1 regularization. Our study is conducted using 586 ground measured single trees from 20 sample plots in the Bavarian Forest National Park, in Germany. Due to lack of reference data for some rare species, we focused on four classes of species. The results show an improvement between 4-10 pp for the tree species classification by using MLS data in comparison to a single wavelength based approach. A cross validated (15-fold) accuracy of 0.75 can be achieved when all feature sets from three different spectral channels are used. Our results cleary indicates that the use of MLS point clouds has great potential to improve detailed forest species mapping.
Texture classification of lung computed tomography images
NASA Astrophysics Data System (ADS)
Pheng, Hang See; Shamsuddin, Siti M.
2013-03-01
Current development of algorithms in computer-aided diagnosis (CAD) scheme is growing rapidly to assist the radiologist in medical image interpretation. Texture analysis of computed tomography (CT) scans is one of important preliminary stage in the computerized detection system and classification for lung cancer. Among different types of images features analysis, Haralick texture with variety of statistical measures has been used widely in image texture description. The extraction of texture feature values is essential to be used by a CAD especially in classification of the normal and abnormal tissue on the cross sectional CT images. This paper aims to compare experimental results using texture extraction and different machine leaning methods in the classification normal and abnormal tissues through lung CT images. The machine learning methods involve in this assessment are Artificial Immune Recognition System (AIRS), Naive Bayes, Decision Tree (J48) and Backpropagation Neural Network. AIRS is found to provide high accuracy (99.2%) and sensitivity (98.0%) in the assessment. For experiments and testing purpose, publicly available datasets in the Reference Image Database to Evaluate Therapy Response (RIDER) are used as study cases.
An automatic graph-based approach for artery/vein classification in retinal images.
Dashtbozorg, Behdad; Mendonça, Ana Maria; Campilho, Aurélio
2014-03-01
The classification of retinal vessels into artery/vein (A/V) is an important phase for automating the detection of vascular changes, and for the calculation of characteristic signs associated with several systemic diseases such as diabetes, hypertension, and other cardiovascular conditions. This paper presents an automatic approach for A/V classification based on the analysis of a graph extracted from the retinal vasculature. The proposed method classifies the entire vascular tree deciding on the type of each intersection point (graph nodes) and assigning one of two labels to each vessel segment (graph links). Final classification of a vessel segment as A/V is performed through the combination of the graph-based labeling results with a set of intensity features. The results of this proposed method are compared with manual labeling for three public databases. Accuracy values of 88.3%, 87.4%, and 89.8% are obtained for the images of the INSPIRE-AVR, DRIVE, and VICAVR databases, respectively. These results demonstrate that our method outperforms recent approaches for A/V classification.
Evaluating Chemical Persistence in a Multimedia Environment: ACART Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bennett, D.H.; McKone, T.E.; Kastenberg, W.E.
1999-02-01
For the thousands of chemicals continuously released into the environment, it is desirable to make prospective assessments of those likely to be persistent. Persistent chemicals are difficult to remove if adverse health or ecological effects are later discovered. A tiered approach using a classification scheme and a multimedia model for determining persistence is presented. Using specific criteria for persistence, a classification tree is developed to classify a chemical as ''persistent'' or ''non-persistent'' based on the chemical properties. In this approach, the classification is derived from the results of a standardized unit world multimedia model. Thus, the classifications are more robustmore » for multimedia pollutants than classifications using a single medium half-life. The method can be readily implemented and provides insight without requiring extensive and often unavailable data. This method can be used to classify chemicals when only a few properties are known and be used to direct further data collection. Case studies are presented to demonstrate the advantages of the approach.« less
Fuzzy support vector machine: an efficient rule-based classification technique for microarrays.
Hajiloo, Mohsen; Rabiee, Hamid R; Anooshahpour, Mahdi
2013-01-01
The abundance of gene expression microarray data has led to the development of machine learning algorithms applicable for tackling disease diagnosis, disease prognosis, and treatment selection problems. However, these algorithms often produce classifiers with weaknesses in terms of accuracy, robustness, and interpretability. This paper introduces fuzzy support vector machine which is a learning algorithm based on combination of fuzzy classifiers and kernel machines for microarray classification. Experimental results on public leukemia, prostate, and colon cancer datasets show that fuzzy support vector machine applied in combination with filter or wrapper feature selection methods develops a robust model with higher accuracy than the conventional microarray classification models such as support vector machine, artificial neural network, decision trees, k nearest neighbors, and diagonal linear discriminant analysis. Furthermore, the interpretable rule-base inferred from fuzzy support vector machine helps extracting biological knowledge from microarray data. Fuzzy support vector machine as a new classification model with high generalization power, robustness, and good interpretability seems to be a promising tool for gene expression microarray classification.
Unrealistic phylogenetic trees may improve phylogenetic footprinting.
Nettling, Martin; Treutler, Hendrik; Cerquides, Jesus; Grosse, Ivo
2017-06-01
The computational investigation of DNA binding motifs from binding sites is one of the classic tasks in bioinformatics and a prerequisite for understanding gene regulation as a whole. Due to the development of sequencing technologies and the increasing number of available genomes, approaches based on phylogenetic footprinting become increasingly attractive. Phylogenetic footprinting requires phylogenetic trees with attached substitution probabilities for quantifying the evolution of binding sites, but these trees and substitution probabilities are typically not known and cannot be estimated easily. Here, we investigate the influence of phylogenetic trees with different substitution probabilities on the classification performance of phylogenetic footprinting using synthetic and real data. For synthetic data we find that the classification performance is highest when the substitution probability used for phylogenetic footprinting is similar to that used for data generation. For real data, however, we typically find that the classification performance of phylogenetic footprinting surprisingly increases with increasing substitution probabilities and is often highest for unrealistically high substitution probabilities close to one. This finding suggests that choosing realistic model assumptions might not always yield optimal predictions in general and that choosing unrealistically high substitution probabilities close to one might actually improve the classification performance of phylogenetic footprinting. The proposed PF is implemented in JAVA and can be downloaded from https://github.com/mgledi/PhyFoo. : martin.nettling@informatik.uni-halle.de. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press.
Laufenberg, Jared S.; Clark, Joseph D.; Chandler, Richard B.
2018-01-01
Monitoring vulnerable species is critical for their conservation. Thresholds or tipping points are commonly used to indicate when populations become vulnerable to extinction and to trigger changes in conservation actions. However, quantitative methods to determine such thresholds have not been well explored. The Louisiana black bear (Ursus americanus luteolus) was removed from the list of threatened and endangered species under the U.S. Endangered Species Act in 2016 and our objectives were to determine the most appropriate parameters and thresholds for monitoring and management action. Capture mark recapture (CMR) data from 2006 to 2012 were used to estimate population parameters and variances. We used stochastic population simulations and conditional classification trees to identify demographic rates for monitoring that would be most indicative of heighted extinction risk. We then identified thresholds that would be reliable predictors of population viability. Conditional classification trees indicated that annual apparent survival rates for adult females averaged over 5 years () was the best predictor of population persistence. Specifically, population persistence was estimated to be ≥95% over 100 years when , suggesting that this statistic can be used as threshold to trigger management intervention. Our evaluation produced monitoring protocols that reliably predicted population persistence and was cost-effective. We conclude that population projections and conditional classification trees can be valuable tools for identifying extinction thresholds used in monitoring programs.
Laufenberg, Jared S; Clark, Joseph D; Chandler, Richard B
2018-01-01
Monitoring vulnerable species is critical for their conservation. Thresholds or tipping points are commonly used to indicate when populations become vulnerable to extinction and to trigger changes in conservation actions. However, quantitative methods to determine such thresholds have not been well explored. The Louisiana black bear (Ursus americanus luteolus) was removed from the list of threatened and endangered species under the U.S. Endangered Species Act in 2016 and our objectives were to determine the most appropriate parameters and thresholds for monitoring and management action. Capture mark recapture (CMR) data from 2006 to 2012 were used to estimate population parameters and variances. We used stochastic population simulations and conditional classification trees to identify demographic rates for monitoring that would be most indicative of heighted extinction risk. We then identified thresholds that would be reliable predictors of population viability. Conditional classification trees indicated that annual apparent survival rates for adult females averaged over 5 years ([Formula: see text]) was the best predictor of population persistence. Specifically, population persistence was estimated to be ≥95% over 100 years when [Formula: see text], suggesting that this statistic can be used as threshold to trigger management intervention. Our evaluation produced monitoring protocols that reliably predicted population persistence and was cost-effective. We conclude that population projections and conditional classification trees can be valuable tools for identifying extinction thresholds used in monitoring programs.
Supervised DNA Barcodes species classification: analysis, comparisons and results
2014-01-01
Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. Conclusions The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community. PMID:24721333
2006-06-01
Hadjiiski, and N. Petrick, "Computerized nipple identification for multiple image analysis in computer-aided diagnosis," Medical Physics 31, 2871...candidates, 3 identification of suspicious objects, 4 feature extraction and analysis, and 5 FP reduc- tion by classification of normal tissue...detection of microcalcifi- cations on digitized mammograms.41 An illustration of a La- placian decomposition tree is shown on the left-hand side of Fig. 4
Neuroanatomical basis of paroxysmal sympathetic hyperactivity: A diffusion tensor imaging analysis
Hinson, Holly E.; Puybasset, Louis; Weiss, Nicolas; Perlbarg, Vincent; Benali, Habib; Galanaud, Damien; Lasarev, Mike; Stevens, Robert D.
2015-01-01
Primary objective Paroxysmal sympathetic hyperactivity (PSH) is observed in a sub-set of patients with moderate-to-severe traumatic brain injury (TBI). The neuroanatomical basis of PSH is poorly understood. It is hypothesized that PSH is linked to changes in connectivity within the central autonomic network. Research design Retrospective analysis in a sub-set of patients from a multi-centre, prospective cohort study Methods and procedures Adult patients who were <3 weeks after severe TBI were enrolled and screened for PSH using a standard definition. Patients underwent multimodal MRI, which included quantitative diffusion tensor imaging. Main outcomes and results Principal component analysis (PCA) was used to resolve the set of tracts into components. Ability to predict PSH was evaluated via area under the receiver operating characteristic (AUROC) and tree-based classification analyses. Among 102 enrolled patients, 16 met criteria for PSH. The first principle component was significantly associated (p = 0.024, AUROC = 0.867) with PSH status even after controlling for age and admission GCS. In a classification tree analysis, age, GCS and decreased FA in the splenium of the corpus callosum and in the right posterior limb of the internal capsule discriminated PSH vs no PSH with an AUROC of 0.933. Conclusions Disconnection involving the posterior corpus callosum and of the posterior limb of the internal capsule may play a role in the pathogenesis or expression of PSH. PMID:25565392
Dacia M. Meneguzzo; Greg C. Liknes; Mark D. Nelson
2013-01-01
Discrete trees and small groups of trees in nonforest settings are considered an essential resource around the world and are collectively referred to as trees outside forests (ToF). ToF provide important functions across the landscape, such as protecting soil and water resources, providing wildlife habitat, and improving farmstead energy efficiency and aesthetics....
E. M. Hornibrook
1939-01-01
A satisfactory silvicultural management of ponderosa pine stands requires a judicious selection of trees to be left in the reserve stand. The timber marker must know what type of tree has the greatest growth potentialities and what type of tree will respond but slightly upon being released. The silvicultural problem in marking therefore is one of recognizing the...
Lee, Saro; Park, Inhye
2013-09-30
Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. Copyright © 2013. Published by Elsevier Ltd.
Automated structural classification of lipids by machine learning.
Taylor, Ryan; Miller, Ryan H; Miller, Ryan D; Porter, Michael; Dalgleish, James; Prince, John T
2015-03-01
Modern lipidomics is largely dependent upon structural ontologies because of the great diversity exhibited in the lipidome, but no automated lipid classification exists to facilitate this partitioning. The size of the putative lipidome far exceeds the number currently classified, despite a decade of work. Automated classification would benefit ongoing classification efforts by decreasing the time needed and increasing the accuracy of classification while providing classifications for mass spectral identification algorithms. We introduce a tool that automates classification into the LIPID MAPS ontology of known lipids with >95% accuracy and novel lipids with 63% accuracy. The classification is based upon simple chemical characteristics and modern machine learning algorithms. The decision trees produced are intelligible and can be used to clarify implicit assumptions about the current LIPID MAPS classification scheme. These characteristics and decision trees are made available to facilitate alternative implementations. We also discovered many hundreds of lipids that are currently misclassified in the LIPID MAPS database, strongly underscoring the need for automated classification. Source code and chemical characteristic lists as SMARTS search strings are available under an open-source license at https://www.github.com/princelab/lipid_classifier. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Multi-phenology WorldView-2 imagery improves remote sensing of savannah tree species
NASA Astrophysics Data System (ADS)
Madonsela, Sabelo; Cho, Moses Azong; Mathieu, Renaud; Mutanga, Onisimo; Ramoelo, Abel; Kaszta, Żaneta; Kerchove, Ruben Van De; Wolff, Eléonore
2017-06-01
Biodiversity mapping in African savannah is important for monitoring changes and ensuring sustainable use of ecosystem resources. Biodiversity mapping can benefit from multi-spectral instruments such as WorldView-2 with very high spatial resolution and a spectral configuration encompassing important spectral regions not previously available for vegetation mapping. This study investigated i) the benefits of the eight-band WorldView-2 (WV-2) spectral configuration for discriminating tree species in Southern African savannah and ii) if multiple-images acquired at key points of the typical phenological development of savannahs (peak productivity, transition to senescence) improve on tree species classifications. We first assessed the discriminatory power of WV-2 bands using interspecies-Spectral Angle Mapper (SAM) via Band Add-On procedure and tested the spectral capability of WorldView-2 against simulated IKONOS for tree species classification. The results from interspecies-SAM procedure identified the yellow and red bands as the most statistically significant bands (p = 0.000251 and p = 0.000039 respectively) in the discriminatory power of WV-2 during the transition from wet to dry season (April). Using Random Forest classifier, the classification scenarios investigated showed that i) the 8-bands of the WV-2 sensor achieved higher classification accuracy for the April date (transition from wet to dry season, senescence) compared to the March date (peak productivity season) ii) the WV-2 spectral configuration systematically outperformed the IKONOS sensor spectral configuration and iii) the multi-temporal approach (March and April combined) improved the discrimination of tress species and produced the highest overall accuracy results at 80.4%. Consistent with the interspecies-SAM procedure, the yellow (605 nm) band also showed a statistically significant contribution in the improved classification accuracy from WV-2. These results highlight the mapping opportunities presented by WV-2 data for monitoring the distribution status of e.g. species often harvested by local communities (e.g. Sclerocharya birrea), encroaching species, or species-specific tree losses induced by elephants.
NASA Astrophysics Data System (ADS)
Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu
2018-02-01
A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
Predicting Increased Blood Pressure Using Machine Learning
Golino, Hudson Fernandes; Amaral, Liliany Souza de Brito; Duarte, Stenio Fernando Pimentel; Soares, Telma de Jesus; dos Reis, Luciana Araujo
2014-01-01
The present study investigates the prediction of increased blood pressure by body mass index (BMI), waist (WC) and hip circumference (HC), and waist hip ratio (WHR) using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women) from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42), misclassification (.19), and the higher pseudo R 2 (.43). This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25), misclassification (.16), and the higher pseudo R 2 (.46). This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power. PMID:24669313
Predicting increased blood pressure using machine learning.
Golino, Hudson Fernandes; Amaral, Liliany Souza de Brito; Duarte, Stenio Fernando Pimentel; Gomes, Cristiano Mauro Assis; Soares, Telma de Jesus; Dos Reis, Luciana Araujo; Santos, Joselito
2014-01-01
The present study investigates the prediction of increased blood pressure by body mass index (BMI), waist (WC) and hip circumference (HC), and waist hip ratio (WHR) using a machine learning technique named classification tree. Data were collected from 400 college students (56.3% women) from 16 to 63 years old. Fifteen trees were calculated in the training group for each sex, using different numbers and combinations of predictors. The result shows that for women BMI, WC, and WHR are the combination that produces the best prediction, since it has the lowest deviance (87.42), misclassification (.19), and the higher pseudo R (2) (.43). This model presented a sensitivity of 80.86% and specificity of 81.22% in the training set and, respectively, 45.65% and 65.15% in the test sample. For men BMI, WC, HC, and WHC showed the best prediction with the lowest deviance (57.25), misclassification (.16), and the higher pseudo R (2) (.46). This model had a sensitivity of 72% and specificity of 86.25% in the training set and, respectively, 58.38% and 69.70% in the test set. Finally, the result from the classification tree analysis was compared with traditional logistic regression, indicating that the former outperformed the latter in terms of predictive power.
Hierarchical classification with a competitive evolutionary neural tree.
Adams, R G.; Butchart, K; Davey, N
1999-04-01
A new, dynamic, tree structured network, the Competitive Evolutionary Neural Tree (CENT) is introduced. The network is able to provide a hierarchical classification of unlabelled data sets. The main advantage that the CENT offers over other hierarchical competitive networks is its ability to self determine the number, and structure, of the competitive nodes in the network, without the need for externally set parameters. The network produces stable classificatory structures by halting its growth using locally calculated heuristics. The results of network simulations are presented over a range of data sets, including Anderson's IRIS data set. The CENT network demonstrates its ability to produce a representative hierarchical structure to classify a broad range of data sets.
Classification Model for Damage Localization in a Plate Structure
NASA Astrophysics Data System (ADS)
Janeliukstis, R.; Ruchevskis, S.; Chate, A.
2018-01-01
The present study is devoted to the problem of damage localization by means of data classification. The commercial ANSYS finite-elements program was used to make a model of a cantilevered composite plate equipped with numerous strain sensors. The plate was divided into zones, and, for data classification purposes, each of them housed several points to which a point mass of magnitude 5 and 10% of plate mass was applied. At each of these points, a numerical modal analysis was performed, from which the first few natural frequencies and strain readings were extracted. The strain data for every point were the input for a classification procedure involving k nearest neighbors and decision trees. The classification model was trained and optimized by finetuning the key parameters of both algorithms. Finally, two new query points were simulated and subjected to a classification in terms of assigning a label to one of the zones of the plate, thus localizing these points. Damage localization results were compared for both algorithms and were found to be in good agreement with the actual application positions of point load.
Differences in forest area classification based on tree tally from variable- and fixed-radius plots
David Azuma; Vicente J. Monleon
2011-01-01
In forest inventory, it is not enough to formulate a definition; it is also necessary to define the "measurement procedure." In the classification of forestland by dominant cover type, the measurement design (the plot) can affect the outcome of the classification. We present results of a simulation study comparing classification of the dominant cover type...
Object based technique for delineating and mapping 15 tree species using VHR WorldView-2 imagery
NASA Astrophysics Data System (ADS)
Mustafa, Yaseen T.; Habeeb, Hindav N.
2014-10-01
Monitoring and analyzing forests and trees are required task to manage and establish a good plan for the forest sustainability. To achieve such a task, information and data collection of the trees are requested. The fastest way and relatively low cost technique is by using satellite remote sensing. In this study, we proposed an approach to identify and map 15 tree species in the Mangish sub-district, Kurdistan Region-Iraq. Image-objects (IOs) were used as the tree species mapping unit. This is achieved using the shadow index, normalized difference vegetation index and texture measurements. Four classification methods (Maximum Likelihood, Mahalanobis Distance, Neural Network, and Spectral Angel Mapper) were used to classify IOs using selected IO features derived from WorldView-2 imagery. Results showed that overall accuracy was increased 5-8% using the Neural Network method compared with other methods with a Kappa coefficient of 69%. This technique gives reasonable results of various tree species classifications by means of applying the Neural Network method with IOs techniques on WorldView-2 imagery.
NASA Technical Reports Server (NTRS)
Rejmankova, E.; Pope, K. O.; Roberts, D. R.; Lege, M. G.; Andre, R.; Greico, J.; Alonzo, Y.
1998-01-01
Surveys of larval habitats of Anopheles vestitipennis and Anopheles punctimacula were conducted in Belize, Central America. Habitat analysis and classification resulted in delineation of eight habitat types defined by dominant life forms and hydrology. Percent cover of tall dense macrophytes, shrubs, open water, and pH were significantly different between sites with and without An. vestitipennis. For An. punctimacula, percent cover of tall dense macrophytes, trees, detritus, open water, and water depth were significantly different between larvae positive and negative sites. The discriminant function for An. vestitipennis correctly predicted the presence of larvae in 65% of sites and correctly predicted the absence of larvae in 88% of sites. The discriminant function for An. punctimacula correctly predicted 81% of sites for the presence of larvae and 45% for the absence of larvae. Canonical discriminant analysis of the three groups of habitats (An. vestitipennis positive; An. punctimacula positive; all negative) confirmed that while larval habitats of An. punctimacula are clustered in the tree dominated area, larval habitats of An. vestitipennis were found in both tree dominated and tall dense macrophyte dominated environments. The forest larval habitats of An. vestitipennis and An. punctimacula seem to be randomly distributed among different forest types. Both species tend to occur in denser forests with more detritus, shallower water, and slightly higher pH. Classification of dry season (February) SPOT multispectral satellite imagery produced 10 land cover types with the swamp forest and tall dense marsh classes being of particular interest. The accuracy assessment showed that commission errors for the tall, dense marsh and swamp forest appeared to be minor; but omission errors were significant, especially for the swamp forest (perhaps because no swamp forests are flooded in February). This means that where the classification indicates there are An. vestitipennis breeding sites, they probably do exist; but breeding sites in many locations are not identified and could be more abundant than indicated.
NASA Astrophysics Data System (ADS)
Harrison, D.; Rivard, B.; Sánchez-Azofeifa, A.
2018-04-01
Remote sensing of the environment has utilized the visible, near and short-wave infrared (IR) regions of the electromagnetic (EM) spectrum to characterize vegetation health, vigor and distribution. However, relatively little research has focused on the use of the longwave infrared (LWIR, 8.0-12.5 μm) region for studies of vegetation. In this study LWIR leaf reflectance spectra were collected in the wet seasons (May through December) of 2013 and 2014 from twenty-six tree species located in a high species diversity environment, a tropical dry forest in Costa Rica. A continuous wavelet transformation (CWT) was applied to all spectra to minimize noise and broad amplitude variations attributable to non-compositional effects. Species discrimination was then explored with Random Forest classification and accuracy improved was observed with preprocessing of reflectance spectra with continuous wavelet transformation. Species were found to share common spectral features that formed the basis for five spectral types that were corroborated with linear discriminate analysis. The source of most of the observed spectral features is attributed to cell wall or cuticle compounds (cellulose, cutin, matrix glycan, silica and oleanolic acid). Spectral types could be advantageous for the analysis of airborne hyperspectral data because cavity effects will lower the spectral contrast thus increasing the reliance of classification efforts on dominant spectral features. Spectral types specifically derived from leaf level data are expected to support the labeling of spectral classes derived from imagery. The results of this study and that of Ribeiro Da Luz (2006), Ribeiro Da Luz and Crowley (2007, 2010), Ullah et al. (2012) and Rock et al. (2016) have now illustrated success in tree species discrimination across a range of ecosystems using leaf-level spectral observations. With advances in LWIR sensors and concurrent improvements in their signal to noise, applications to large-scale species detection from airborne imagery appear feasible.
Evolving optimised decision rules for intrusion detection using particle swarm paradigm
NASA Astrophysics Data System (ADS)
Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.
2012-12-01
The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
Thomas, Paul D; Kejariwal, Anish; Campbell, Michael J; Mi, Huaiyu; Diemer, Karen; Guo, Nan; Ladunga, Istvan; Ulitsky-Lazareva, Betty; Muruganujan, Anushya; Rabkin, Steven; Vandergriff, Jody A; Doremieux, Olivier
2003-01-01
The PANTHER database was designed for high-throughput analysis of protein sequences. One of the key features is a simplified ontology of protein function, which allows browsing of the database by biological functions. Biologist curators have associated the ontology terms with groups of protein sequences rather than individual sequences. Statistical models (Hidden Markov Models, or HMMs) are built from each of these groups. The advantage of this approach is that new sequences can be automatically classified as they become available. To ensure accurate functional classification, HMMs are constructed not only for families, but also for functionally distinct subfamilies. Multiple sequence alignments and phylogenetic trees, including curator-assigned information, are available for each family. The current version of the PANTHER database includes training sequences from all organisms in the GenBank non-redundant protein database, and the HMMs have been used to classify gene products across the entire genomes of human, and Drosophila melanogaster. The ontology terms and protein families and subfamilies, as well as Drosophila gene c;assifications, can be browsed and searched for free. Due to outstanding contractual obligations, access to human gene classifications and to protein family trees and multiple sequence alignments will temporarily require a nominal registration fee. PANTHER is publicly available on the web at http://panther.celera.com.
A new tool for post-AGB SED classification
NASA Astrophysics Data System (ADS)
Bendjoya, P.; Suarez, O.; Galluccio, L.; Michel, O.
We present the results of an unsupervised classification method applied on a set of 344 spectral energy distributions (SED) of post-AGB stars extracted from the Torun catalogue of Galactic post-AGB stars. This method aims to find a new unbiased method for post-AGB star classification based on the information contained in the IR region of the SED (fluxes, IR excess, colours). We used the data from IRAS and MSX satellites, and from the 2MASS survey. We applied a classification method based on the construction of the dataset of a minimal spanning tree (MST) with the Prim's algorithm. In order to build this tree, different metrics have been tested on both flux and color indices. Our method is able to classify the set of 344 post-AGB stars in 9 distinct groups according to their SEDs.
Lin, Yi; Jiang, Miao; Pellikka, Petri; Heiskanen, Janne
2018-01-01
Mensuration of tree growth habits is of considerable importance for understanding forest ecosystem processes and forest biophysical responses to climate changes. However, the complexity of tree crown morphology that is typically formed after many years of growth tends to render it a non-trivial task, even for the state-of-the-art 3D forest mapping technology-light detection and ranging (LiDAR). Fortunately, botanists have deduced the large structural diversity of tree forms into only a limited number of tree architecture models, which can present a-priori knowledge about tree structure, growth, and other attributes for different species. This study attempted to recruit Hallé architecture models (HAMs) into LiDAR mapping to investigate tree growth habits in structure. First, following the HAM-characterized tree structure organization rules, we run the kernel procedure of tree species classification based on the LiDAR-collected point clouds using a support vector machine classifier in the leave-one-out-for-cross-validation mode. Then, the HAM corresponding to each of the classified tree species was identified based on expert knowledge, assisted by the comparison of the LiDAR-derived feature parameters. Next, the tree growth habits in structure for each of the tree species were derived from the determined HAM. In the case of four tree species growing in the boreal environment, the tests indicated that the classification accuracy reached 85.0%, and their growth habits could be derived by qualitative and quantitative means. Overall, the strategy of recruiting conventional HAMs into LiDAR mapping for investigating tree growth habits in structure was validated, thereby paving a new way for efficiently reflecting tree growth habits and projecting forest structure dynamics.
Lin, Yi; Jiang, Miao; Pellikka, Petri; Heiskanen, Janne
2018-01-01
Mensuration of tree growth habits is of considerable importance for understanding forest ecosystem processes and forest biophysical responses to climate changes. However, the complexity of tree crown morphology that is typically formed after many years of growth tends to render it a non-trivial task, even for the state-of-the-art 3D forest mapping technology—light detection and ranging (LiDAR). Fortunately, botanists have deduced the large structural diversity of tree forms into only a limited number of tree architecture models, which can present a-priori knowledge about tree structure, growth, and other attributes for different species. This study attempted to recruit Hallé architecture models (HAMs) into LiDAR mapping to investigate tree growth habits in structure. First, following the HAM-characterized tree structure organization rules, we run the kernel procedure of tree species classification based on the LiDAR-collected point clouds using a support vector machine classifier in the leave-one-out-for-cross-validation mode. Then, the HAM corresponding to each of the classified tree species was identified based on expert knowledge, assisted by the comparison of the LiDAR-derived feature parameters. Next, the tree growth habits in structure for each of the tree species were derived from the determined HAM. In the case of four tree species growing in the boreal environment, the tests indicated that the classification accuracy reached 85.0%, and their growth habits could be derived by qualitative and quantitative means. Overall, the strategy of recruiting conventional HAMs into LiDAR mapping for investigating tree growth habits in structure was validated, thereby paving a new way for efficiently reflecting tree growth habits and projecting forest structure dynamics. PMID:29515616
Context-sensitive extraction of tree crown objects in urban areas using VHR satellite images
NASA Astrophysics Data System (ADS)
Ardila, Juan P.; Bijker, Wietske; Tolpekin, Valentyn A.; Stein, Alfred
2012-04-01
Municipalities need accurate and updated inventories of urban vegetation in order to manage green resources and estimate their return on investment in urban forestry activities. Earlier studies have shown that semi-automatic tree detection using remote sensing is a challenging task. This study aims to develop a reproducible geographic object-based image analysis (GEOBIA) methodology to locate and delineate tree crowns in urban areas using high resolution imagery. We propose a GEOBIA approach that considers the spectral, spatial and contextual characteristics of tree objects in the urban space. The study presents classification rules that exploit object features at multiple segmentation scales modifying the labeling and shape of image-objects. The GEOBIA methodology was implemented on QuickBird images acquired over the cities of Enschede and Delft (The Netherlands), resulting in an identification rate of 70% and 82% respectively. False negative errors concentrated on small trees and false positive errors in private gardens. The quality of crown boundaries was acceptable, with an overall delineation error <0.24 outside of gardens and backyards.
Building supertrees: an empirical assessment using the grass family (Poaceae).
Salamin, Nicolas; Hodkinson, Trevor R; Savolainen, Vincent
2002-02-01
Large and comprehensive phylogenetic trees are desirable for studying macroevolutionary processes and for classification purposes. Such trees can be obtained in two different ways. Either the widest possible range of taxa can be sampled and used in a phylogenetic analysis to produce a "big tree," or preexisting topologies can be used to create a supertree. Although large multigene analyses are often favored, combinable data are not always available, and supertrees offer a suitable solution. The most commonly used method of supertree reconstruction, matrix representation with parsimony (MRP), is presented here. We used a combined data set for the Poaceae to (1) assess the differences between an approach that uses combined data and one that uses different MRP modifications based on the character partitions and (2) investigate the advantages and disadvantages of these modifications. Baum and Ragan and Purvis modifications gave similar results. Incorporating bootstrap support associated with pre-existing topologies improved Baum and Ragan modification and its similarity with a combined analysis. Finally, we used the supertree reconstruction approach on 55 published phylogenies to build one of most comprehensive phylogenetic trees published for the grass family including 403 taxa and discuss its strengths and weaknesses in relation to other published hypotheses.
Gold-standard for computer-assisted morphological sperm analysis.
Chang, Violeta; Garcia, Alejandra; Hitschfeld, Nancy; Härtel, Steffen
2017-04-01
Published algorithms for classification of human sperm heads are based on relatively small image databases that are not open to the public, and thus no direct comparison is available for competing methods. We describe a gold-standard for morphological sperm analysis (SCIAN-MorphoSpermGS), a dataset of sperm head images with expert-classification labels in one of the following classes: normal, tapered, pyriform, small or amorphous. This gold-standard is for evaluating and comparing known techniques and future improvements to present approaches for classification of human sperm heads for semen analysis. Although this paper does not provide a computational tool for morphological sperm analysis, we present a set of experiments for comparing sperm head description and classification common techniques. This classification base-line is aimed to be used as a reference for future improvements to present approaches for human sperm head classification. The gold-standard provides a label for each sperm head, which is achieved by majority voting among experts. The classification base-line compares four supervised learning methods (1- Nearest Neighbor, naive Bayes, decision trees and Support Vector Machine (SVM)) and three shape-based descriptors (Hu moments, Zernike moments and Fourier descriptors), reporting the accuracy and the true positive rate for each experiment. We used Fleiss' Kappa Coefficient to evaluate the inter-expert agreement and Fisher's exact test for inter-expert variability and statistical significant differences between descriptors and learning techniques. Our results confirm the high degree of inter-expert variability in the morphological sperm analysis. Regarding the classification base line, we show that none of the standard descriptors or classification approaches is best suitable for tackling the problem of sperm head classification. We discovered that the correct classification rate was highly variable when trying to discriminate among non-normal sperm heads. By using the Fourier descriptor and SVM, we achieved the best mean correct classification: only 49%. We conclude that the SCIAN-MorphoSpermGS will provide a standard tool for evaluation of characterization and classification approaches for human sperm heads. Indeed, there is a clear need for a specific shape-based descriptor for human sperm heads and a specific classification approach to tackle the problem of high variability within subcategories of abnormal sperm cells. Copyright © 2017 Elsevier Ltd. All rights reserved.
Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin
2016-01-01
Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328
Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin
2016-05-20
In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.
Applying machine learning classification techniques to automate sky object cataloguing
NASA Astrophysics Data System (ADS)
Fayyad, Usama M.; Doyle, Richard J.; Weir, W. Nick; Djorgovski, Stanislav
1993-08-01
We describe the application of an Artificial Intelligence machine learning techniques to the development of an automated tool for the reduction of a large scientific data set. The 2nd Mt. Palomar Northern Sky Survey is nearly completed. This survey provides comprehensive coverage of the northern celestial hemisphere in the form of photographic plates. The plates are being transformed into digitized images whose quality will probably not be surpassed in the next ten to twenty years. The images are expected to contain on the order of 107 galaxies and 108 stars. Astronomers wish to determine which of these sky objects belong to various classes of galaxies and stars. Unfortunately, the size of this data set precludes analysis in an exclusively manual fashion. Our approach is to develop a software system which integrates the functions of independently developed techniques for image processing and data classification. Digitized sky images are passed through image processing routines to identify sky objects and to extract a set of features for each object. These routines are used to help select a useful set of attributes for classifying sky objects. Then GID3 (Generalized ID3) and O-B Tree, two inductive learning techniques, learns classification decision trees from examples. These classifiers will then be applied to new data. These developmnent process is highly interactive, with astronomer input playing a vital role. Astronomers refine the feature set used to construct sky object descriptions, and evaluate the performance of the automated classification technique on new data. This paper gives an overview of the machine learning techniques with an emphasis on their general applicability, describes the details of our specific application, and reports the initial encouraging results. The results indicate that our machine learning approach is well-suited to the problem. The primary benefit of the approach is increased data reduction throughput. Another benefit is consistency of classification. The classification rules which are the product of the inductive learning techniques will form an objective, examinable basis for classifying sky objects. A final, not to be underestimated benefit is that astronomers will be freed from the tedium of an intensely visual task to pursue more challenging analysis and interpretation problems based on automatically catalogued data.
W. Henry McNab; F. Thomas Lloyd; David L. Loftis
2002-01-01
The species indicator approach to forest site classification was evaluated for 210 relatively undisturbed plots established by the USDA Forest Service Forest Inventory and Analysis uni (FIA) in western North Carolina. Plots were classified by low, medium, and high levels of productivity based on 10-year individual tree basal area increment data standardized for initial...
Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy.
Baldeck, Claire A; Asner, Gregory P; Martin, Robin E; Anderson, Christopher B; Knapp, David E; Kellner, James R; Wright, S Joseph
2015-01-01
Remote identification and mapping of canopy tree species can contribute valuable information towards our understanding of ecosystem biodiversity and function over large spatial scales. However, the extreme challenges posed by highly diverse, closed-canopy tropical forests have prevented automated remote species mapping of non-flowering tree crowns in these ecosystems. We set out to identify individuals of three focal canopy tree species amongst a diverse background of tree and liana species on Barro Colorado Island, Panama, using airborne imaging spectroscopy data. First, we compared two leading single-class classification methods--binary support vector machine (SVM) and biased SVM--for their performance in identifying pixels of a single focal species. From this comparison we determined that biased SVM was more precise and created a multi-species classification model by combining the three biased SVM models. This model was applied to the imagery to identify pixels belonging to the three focal species and the prediction results were then processed to create a map of focal species crown objects. Crown-level cross-validation of the training data indicated that the multi-species classification model had pixel-level producer's accuracies of 94-97% for the three focal species, and field validation of the predicted crown objects indicated that these had user's accuracies of 94-100%. Our results demonstrate the ability of high spatial and spectral resolution remote sensing to accurately detect non-flowering crowns of focal species within a diverse tropical forest. We attribute the success of our model to recent classification and mapping techniques adapted to species detection in diverse closed-canopy forests, which can pave the way for remote species mapping in a wider variety of ecosystems.
Operational Tree Species Mapping in a Diverse Tropical Forest with Airborne Imaging Spectroscopy
Baldeck, Claire A.; Asner, Gregory P.; Martin, Robin E.; Anderson, Christopher B.; Knapp, David E.; Kellner, James R.; Wright, S. Joseph
2015-01-01
Remote identification and mapping of canopy tree species can contribute valuable information towards our understanding of ecosystem biodiversity and function over large spatial scales. However, the extreme challenges posed by highly diverse, closed-canopy tropical forests have prevented automated remote species mapping of non-flowering tree crowns in these ecosystems. We set out to identify individuals of three focal canopy tree species amongst a diverse background of tree and liana species on Barro Colorado Island, Panama, using airborne imaging spectroscopy data. First, we compared two leading single-class classification methods—binary support vector machine (SVM) and biased SVM—for their performance in identifying pixels of a single focal species. From this comparison we determined that biased SVM was more precise and created a multi-species classification model by combining the three biased SVM models. This model was applied to the imagery to identify pixels belonging to the three focal species and the prediction results were then processed to create a map of focal species crown objects. Crown-level cross-validation of the training data indicated that the multi-species classification model had pixel-level producer’s accuracies of 94–97% for the three focal species, and field validation of the predicted crown objects indicated that these had user’s accuracies of 94–100%. Our results demonstrate the ability of high spatial and spectral resolution remote sensing to accurately detect non-flowering crowns of focal species within a diverse tropical forest. We attribute the success of our model to recent classification and mapping techniques adapted to species detection in diverse closed-canopy forests, which can pave the way for remote species mapping in a wider variety of ecosystems. PMID:26153693
Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H
2006-01-01
Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.
Pengra, Bruce; Long, Jordan; Dahal, Devendra; Stehman, Stephen V.; Loveland, Thomas R.
2015-01-01
The methodology for selection, creation, and application of a global remote sensing validation dataset using high resolution commercial satellite data is presented. High resolution data are obtained for a stratified random sample of 500 primary sampling units (5 km × 5 km sample blocks), where the stratification based on Köppen climate classes is used to distribute the sample globally among biomes. The high resolution data are classified to categorical land cover maps using an analyst mediated classification workflow. Our initial application of these data is to evaluate a global 30 m Landsat-derived, continuous field tree cover product. For this application, the categorical reference classification produced at 2 m resolution is converted to percent tree cover per 30 m pixel (secondary sampling unit)for comparison to Landsat-derived estimates of tree cover. We provide example results (based on a subsample of 25 sample blocks in South America) illustrating basic analyses of agreement that can be produced from these reference data. Commercial high resolution data availability and data quality are shown to provide a viable means of validating continuous field tree cover. When completed, the reference classifications for the full sample of 500 blocks will be released for public use.
Lin, Hai-jun; Zhang, Hui-fang; Gao, Ya-qi; Li, Xia; Yang, Fan; Zhou, Yan-fei
2014-12-01
The hyperspectral reflectance of Populus euphratica, Tamarix hispida, Haloxylon ammodendron and Calligonum mongolicum in the lower reaches of Tarim River and Turpan Desert Botanical Garden was measured by using the HR-768 field-portable spectroradiometer. The method of continuum removal, first derivative reflectance and second derivative reflectance were used to deal with the original spectral data of four tree species. The method of Mahalanobis Distance was used to select the bands with significant differences in the original spectral data and transform spectral data to identify the different tree species. The progressive discrimination analyses were used to test the selective bands used to identify different tree species. The results showed that The Mahalanobis Distance method was an effective method in feature band extraction. The bands for identifying different tree species were most near-infrared bands. The recognition accuracy of four methods was 85%, 93.8%, 92.4% and 95.5% respectively. Spectrum transform could improve the recognition accuracy. The recognition accuracy of different research objects and different spectrum transform methods were different. The research provided evidence for desert tree species classification, monitoring biodiversity and the analysis of area in desert by using large scale remote sensing method.
SVM-based tree-type neural networks as a critic in adaptive critic designs for control.
Deb, Alok Kanti; Jayadeva; Gopal, Madan; Chandra, Suresh
2007-07-01
In this paper, we use the approach of adaptive critic design (ACD) for control, specifically, the action-dependent heuristic dynamic programming (ADHDP) method. A least squares support vector machine (SVM) regressor has been used for generating the control actions, while an SVM-based tree-type neural network (NN) is used as the critic. After a failure occurs, the critic and action are retrained in tandem using the failure data. Failure data is binary classification data, where the number of failure states are very few as compared to the number of no-failure states. The difficulty of conventional multilayer feedforward NNs in learning this type of classification data has been overcome by using the SVM-based tree-type NN, which due to its feature to add neurons to learn misclassified data, has the capability to learn any binary classification data without a priori choice of the number of neurons or the structure of the network. The capability of the trained controller to handle unforeseen situations is demonstrated.
Mani, Ashutosh; Rao, Marepalli; James, Kelley; Bhattacharya, Amit
2015-01-01
The purpose of this study was to explore data-driven models, based on decision trees, to develop practical and easy to use predictive models for early identification of firefighters who are likely to cross the threshold of hyperthermia during live-fire training. Predictive models were created for three consecutive live-fire training scenarios. The final predicted outcome was a categorical variable: will a firefighter cross the upper threshold of hyperthermia - Yes/No. Two tiers of models were built, one with and one without taking into account the outcome (whether a firefighter crossed hyperthermia or not) from the previous training scenario. First tier of models included age, baseline heart rate and core body temperature, body mass index, and duration of training scenario as predictors. The second tier of models included the outcome of the previous scenario in the prediction space, in addition to all the predictors from the first tier of models. Classification and regression trees were used independently for prediction. The response variable for the regression tree was the quantitative variable: core body temperature at the end of each scenario. The predicted quantitative variable from regression trees was compared to the upper threshold of hyperthermia (38°C) to predict whether a firefighter would enter hyperthermia. The performance of classification and regression tree models was satisfactory for the second (success rate = 79%) and third (success rate = 89%) training scenarios but not for the first (success rate = 43%). Data-driven models based on decision trees can be a useful tool for predicting physiological response without modeling the underlying physiological systems. Early prediction of heat stress coupled with proactive interventions, such as pre-cooling, can help reduce heat stress in firefighters.
NASA Astrophysics Data System (ADS)
Kukunda, Collins B.; Duque-Lazo, Joaquín; González-Ferreiro, Eduardo; Thaden, Hauke; Kleinn, Christoph
2018-03-01
Distinguishing tree species is relevant in many contexts of remote sensing assisted forest inventory. Accurate tree species maps support management and conservation planning, pest and disease control and biomass estimation. This study evaluated the performance of applying ensemble techniques with the goal of automatically distinguishing Pinus sylvestris L. and Pinus uncinata Mill. Ex Mirb within a 1.3 km2 mountainous area in Barcelonnette (France). Three modelling schemes were examined, based on: (1) high-density LiDAR data (160 returns m-2), (2) Worldview-2 multispectral imagery, and (3) Worldview-2 and LiDAR in combination. Variables related to the crown structure and height of individual trees were extracted from the normalized LiDAR point cloud at individual-tree level, after performing individual tree crown (ITC) delineation. Vegetation indices and the Haralick texture indices were derived from Worldview-2 images and served as independent spectral variables. Selection of the best predictor subset was done after a comparison of three variable selection procedures: (1) Random Forests with cross validation (AUCRFcv), (2) Akaike Information Criterion (AIC) and (3) Bayesian Information Criterion (BIC). To classify the species, 9 regression techniques were combined using ensemble models. Predictions were evaluated using cross validation and an independent dataset. Integration of datasets and models improved individual tree species classification (True Skills Statistic, TSS; from 0.67 to 0.81) over individual techniques and maintained strong predictive power (Relative Operating Characteristic, ROC = 0.91). Assemblage of regression models and integration of the datasets provided more reliable species distribution maps and associated tree-scale mapping uncertainties. Our study highlights the potential of model and data assemblage at improving species classifications needed in present-day forest planning and management.
NASA Astrophysics Data System (ADS)
Chenari, A.; Erfanifard, Y.; Dehghani, M.; Pourghasemi, H. R.
2017-09-01
Remotely sensed datasets offer a reliable means to precisely estimate biophysical characteristics of individual species sparsely distributed in open woodlands. Moreover, object-oriented classification has exhibited significant advantages over different classification methods for delineation of tree crowns and recognition of species in various types of ecosystems. However, it still is unclear if this widely-used classification method can have its advantages on unmanned aerial vehicle (UAV) digital images for mapping vegetation cover at single-tree levels. In this study, UAV orthoimagery was classified using object-oriented classification method for mapping a part of wild pistachio nature reserve in Zagros open woodlands, Fars Province, Iran. This research focused on recognizing two main species of the study area (i.e., wild pistachio and wild almond) and estimating their mean crown area. The orthoimage of study area was consisted of 1,076 images with spatial resolution of 3.47 cm which was georeferenced using 12 ground control points (RMSE=8 cm) gathered by real-time kinematic (RTK) method. The results showed that the UAV orthoimagery classified by object-oriented method efficiently estimated mean crown area of wild pistachios (52.09±24.67 m2) and wild almonds (3.97±1.69 m2) with no significant difference with their observed values (α=0.05). In addition, the results showed that wild pistachios (accuracy of 0.90 and precision of 0.92) and wild almonds (accuracy of 0.90 and precision of 0.89) were well recognized by image segmentation. In general, we concluded that UAV orthoimagery can efficiently produce precise biophysical data of vegetation stands at single-tree levels, which therefore is suitable for assessment and monitoring open woodlands.
NASA Astrophysics Data System (ADS)
Cross, M.
2016-12-01
An improved process for the identification of tree types from satellite imagery for tropical forests is needed for more accurate assessments of the impact of forests on the global climate. La Selva Biological Station in Costa Rica was the tropical forest area selected for this particular study. WorldView-3 imagery was utilized because of its high spatial, spectral and radiometric resolution, its availability, and its potential to differentiate species in a complex forest setting. The first-step was to establish confidence in the high spatial and high radiometric resolution imagery from WorldView-3 in delineating tree types within a complex forest setting. In achieving this goal, ASD field spectrometer data were collected of specific tree species to establish solid ground control within the study site. The spectrometer data were collected from the top of each specific tree canopy utilizing established towers located at La Selva Biological Station so as to match the near-nadir view of the WorldView-3 imagery. The ASD data was processed utilizing the spectral response functions for each of the WorldView-3 bands to convert the ASD data into a band specific reflectivity. This allowed direct comparison of the ASD spectrometer reflectance data to the WorldView-3 multispectral imagery. The WorldView-3 imagery was processed to surface reflectance using two standard atmospheric correction procedures and the proprietary DigitalGlobe Atmospheric Compensation (AComp) product. The most accurate correction process was identified through comparison to the spectrometer data collected. A series of statistical measures were then utilized to access the accuracy of the processed imagery and which imagery bands are best suited for tree type identification. From this analysis, a segmentation/classification process was performed to identify individual tree type locations within the study area. It is envisioned the results of this study will improve traditional forest classification processes, provide more accurate assessments of species density and distribution, facilitate a more accurate biomass estimate of the tropical forest which will impact the accuracy of tree carbon storage estimates, and ultimately assist in developing a better overall characterization of tropical rainforest dynamics.
Spatial and thematic assessment of object-based forest stand delineation using an OFA-matrix
NASA Astrophysics Data System (ADS)
Hernando, A.; Tiede, D.; Albrecht, F.; Lang, S.
2012-10-01
The delineation and classification of forest stands is a crucial aspect of forest management. Object-based image analysis (OBIA) can be used to produce detailed maps of forest stands from either orthophotos or very high resolution satellite imagery. However, measures are then required for evaluating and quantifying both the spatial and thematic accuracy of the OBIA output. In this paper we present an approach for delineating forest stands and a new Object Fate Analysis (OFA) matrix for accuracy assessment. A two-level object-based orthophoto analysis was first carried out to delineate stands on the Dehesa Boyal public land in central Spain (Avila Province). Two structural features were first created for use in class modelling, enabling good differentiation between stands: a relational tree cover cluster feature, and an arithmetic ratio shadow/tree feature. We then extended the OFA comparison approach with an OFA-matrix to enable concurrent validation of thematic and spatial accuracies. Its diagonal shows the proportion of spatial and thematic coincidence between a reference data and the corresponding classification. New parameters for Spatial Thematic Loyalty (STL), Spatial Thematic Loyalty Overall (STLOVERALL) and Maximal Interfering Object (MIO) are introduced to summarise the OFA-matrix accuracy assessment. A stands map generated by OBIA (classification data) was compared with a map of the same area produced from photo interpretation and field data (reference data). In our example the OFA-matrix results indicate good spatial and thematic accuracies (>65%) for all stand classes except for the shrub stands (31.8%), and a good STLOVERALL (69.8%). The OFA-matrix has therefore been shown to be a valid tool for OBIA accuracy assessment.
Erdoğan, Onur; Aydin Son, Yeşim
2014-01-01
Single Nucleotide Polymorphisms (SNPs) are the most common genomic variations where only a single nucleotide differs between individuals. Individual SNPs and SNP profiles associated with diseases can be utilized as biological markers. But there is a need to determine the SNP subsets and patients' clinical data which is informative for the diagnosis. Data mining approaches have the highest potential for extracting the knowledge from genomic datasets and selecting the representative SNPs as well as most effective and informative clinical features for the clinical diagnosis of the diseases. In this study, we have applied one of the widely used data mining classification methodology: "decision tree" for associating the SNP biomarkers and significant clinical data with the Alzheimer's disease (AD), which is the most common form of "dementia". Different tree construction parameters have been compared for the optimization, and the most accurate tree for predicting the AD is presented.
A tree classification for the selection forests of the Sierra Nevada
Duncan Dunning
1928-01-01
Individuality in man is accepted without question. In domestic animals, also, good and bad individuals are generally recognized. Even in some cultivated plants —orange trees and rubber trees— the poor producers are searched out and eliminated. Indeed, individual variability is a normal condition in all groups of organisms. Yet forest trees are...
Kaufmann, Liane; Huber, Stefan; Mayer, Daniel; Moeller, Korbinian; Marksteiner, Josef
2018-04-01
Adverse effects of heavy drinking on cognition have frequently been reported. In the present study, we systematically examined for the first time whether clinical neuropsychological assessments may be sensitive to alcohol abuse in elderly patients with suspected minor neurocognitive disorder. A total of 144 elderly with and without alcohol abuse (each group n=72; mean age 66.7 years) were selected from a patient pool of n=738 by applying propensity score matching (a statistical method allowing to match participants in experimental and control group by balancing various covariates to reduce selection bias). Accordingly, study groups were almost perfectly matched regarding age, education, gender, and Mini Mental State Examination score. Neuropsychological performance was measured using the CERAD (Consortium to Establish a Registry for Alzheimer's Disease). Classification analyses (i.e., decision tree and boosted trees models) were conducted to examine whether CERAD variables or total score contributed to group classification. Decision tree models disclosed that groups could be reliably classified based on the CERAD variables "Word List Discriminability" (tapping verbal recognition memory, 64% classification accuracy) and "Trail Making Test A" (measuring visuo-motor speed, 59% classification accuracy). Boosted tree analyses further indicated the sensitivity of "Word List Recall" (measuring free verbal recall) for discriminating elderly with versus without a history of alcohol abuse. This indicates that specific CERAD variables seem to be sensitive to alcohol-related cognitive dysfunctions in elderly patients with suspected minor neurocognitive disorder. (JINS, 2018, 24, 360-371).
An ecological classification system for the central hardwoods region: The Hoosier National Forest
James E. Van Kley; George R. Parker
1993-01-01
This study, a multifactor ecological classification system, using vegetation, soil characteristics, and physiography, was developed for the landscape of the Hoosier National Forest in Southern Indiana. Measurements of ground flora, saplings, and canopy trees from selected stands older than 80 years were subjected to TWINSPAN classification and DECORANA ordination....
Phan, Thanh G; Chen, Jian; Singhal, Shaloo; Ma, Henry; Clissold, Benjamin B; Ly, John; Beare, Richard
2018-01-01
Prognostication following hypoxic ischemic encephalopathy (brain injury) is important for clinical management. The aim of this exploratory study is to use a decision tree model to find clinical and MRI associates of severe disability and death in this condition. We evaluate clinical model and then the added value of MRI data. The inclusion criteria were as follows: age ≥17 years, cardio-respiratory arrest, and coma on admission (2003-2011). Decision tree analysis was used to find clinical [Glasgow Coma Score (GCS), features about cardiac arrest, therapeutic hypothermia, age, and sex] and MRI (infarct volume) associates of severe disability and death. We used the area under the ROC (auROC) to determine accuracy of model. There were 41 (63.7% males) patients having MRI imaging with the average age 51.5 ± 18.9 years old. The decision trees showed that infarct volume and age were important factors for discrimination between mild to moderate disability and severe disability and death at day 0 and day 2. The auROC for this model was 0.94 (95% CI 0.82-1.00). At day 7, GCS value was the only predictor; the auROC was 0.96 (95% CI 0.86-1.00). Our findings provide proof of concept for further exploration of the role of MR imaging and decision tree analysis in the early prognostication of hypoxic ischemic brain injury.
Treelink: data integration, clustering and visualization of phylogenetic trees.
Allende, Christian; Sohn, Erik; Little, Cedric
2015-12-29
Phylogenetic trees are central to a wide range of biological studies. In many of these studies, tree nodes need to be associated with a variety of attributes. For example, in studies concerned with viral relationships, tree nodes are associated with epidemiological information, such as location, age and subtype. Gene trees used in comparative genomics are usually linked with taxonomic information, such as functional annotations and events. A wide variety of tree visualization and annotation tools have been developed in the past, however none of them are intended for an integrative and comparative analysis. Treelink is a platform-independent software for linking datasets and sequence files to phylogenetic trees. The application allows an automated integration of datasets to trees for operations such as classifying a tree based on a field or showing the distribution of selected data attributes in branches and leafs. Genomic and proteonomic sequences can also be linked to the tree and extracted from internal and external nodes. A novel clustering algorithm to simplify trees and display the most divergent clades was also developed, where validation can be achieved using the data integration and classification function. Integrated geographical information allows ancestral character reconstruction for phylogeographic plotting based on parsimony and likelihood algorithms. Our software can successfully integrate phylogenetic trees with different data sources, and perform operations to differentiate and visualize those differences within a tree. File support includes the most popular formats such as newick and csv. Exporting visualizations as images, cluster outputs and genomic sequences is supported. Treelink is available as a web and desktop application at http://www.treelinkapp.com .
Phylogenetic classification of bony fishes.
Betancur-R, Ricardo; Wiley, Edward O; Arratia, Gloria; Acero, Arturo; Bailly, Nicolas; Miya, Masaki; Lecointre, Guillaume; Ortí, Guillermo
2017-07-06
Fish classifications, as those of most other taxonomic groups, are being transformed drastically as new molecular phylogenies provide support for natural groups that were unanticipated by previous studies. A brief review of the main criteria used by ichthyologists to define their classifications during the last 50 years, however, reveals slow progress towards using an explicit phylogenetic framework. Instead, the trend has been to rely, in varying degrees, on deep-rooted anatomical concepts and authority, often mixing taxa with explicit phylogenetic support with arbitrary groupings. Two leading sources in ichthyology frequently used for fish classifications (JS Nelson's volumes of Fishes of the World and W. Eschmeyer's Catalog of Fishes) fail to adopt a global phylogenetic framework despite much recent progress made towards the resolution of the fish Tree of Life. The first explicit phylogenetic classification of bony fishes was published in 2013, based on a comprehensive molecular phylogeny ( www.deepfin.org ). We here update the first version of that classification by incorporating the most recent phylogenetic results. The updated classification presented here is based on phylogenies inferred using molecular and genomic data for nearly 2000 fishes. A total of 72 orders (and 79 suborders) are recognized in this version, compared with 66 orders in version 1. The phylogeny resolves placement of 410 families, or ~80% of the total of 514 families of bony fishes currently recognized. The ordinal status of 30 percomorph families included in this study, however, remains uncertain (incertae sedis in the series Carangaria, Ovalentaria, or Eupercaria). Comments to support taxonomic decisions and comparisons with conflicting taxonomic groups proposed by others are presented. We also highlight cases were morphological support exist for the groups being classified. This version of the phylogenetic classification of bony fishes is substantially improved, providing resolution for more taxa than previous versions, based on more densely sampled phylogenetic trees. The classification presented in this study represents, unlike any other, the most up-to-date hypothesis of the Tree of Life of fishes.
Analysis of the Westland Data Set
NASA Technical Reports Server (NTRS)
Wen, Fang; Willett, Peter; Deb, Somnath
2001-01-01
The "Westland" set of empirical accelerometer helicopter data with seeded and labeled faults is analyzed with the aim of condition monitoring. The autoregressive (AR) coefficients from a simple linear model encapsulate a great deal of information in a relatively few measurements; and it has also been found that augmentation of these by harmonic and other parameters call improve classification significantly. Several techniques have been explored, among these restricted Coulomb energy (RCE) networks, learning vector quantization (LVQ), Gaussian mixture classifiers and decision trees. A problem with these approaches, and in common with many classification paradigms, is that augmentation of the feature dimension can degrade classification ability. Thus, we also introduce the Bayesian data reduction algorithm (BDRA), which imposes a Dirichlet prior oil training data and is thus able to quantify probability of error in all exact manner, such that features may be discarded or coarsened appropriately.
Aerial Images from AN Uav System: 3d Modeling and Tree Species Classification in a Park Area
NASA Astrophysics Data System (ADS)
Gini, R.; Passoni, D.; Pinto, L.; Sona, G.
2012-07-01
The use of aerial imagery acquired by Unmanned Aerial Vehicles (UAVs) is scheduled within the FoGLIE project (Fruition of Goods Landscape in Interactive Environment): it starts from the need to enhance the natural, artistic and cultural heritage, to produce a better usability of it by employing audiovisual movable systems of 3D reconstruction and to improve monitoring procedures, by using new media for integrating the fruition phase with the preservation ones. The pilot project focus on a test area, Parco Adda Nord, which encloses various goods' types (small buildings, agricultural fields and different tree species and bushes). Multispectral high resolution images were taken by two digital compact cameras: a Pentax Optio A40 for RGB photos and a Sigma DP1 modified to acquire the NIR band. Then, some tests were performed in order to analyze the UAV images' quality with both photogrammetric and photo-interpretation purposes, to validate the vector-sensor system, the image block geometry and to study the feasibility of tree species classification. Many pre-signalized Control Points were surveyed through GPS to allow accuracy analysis. Aerial Triangulations (ATs) were carried out with photogrammetric commercial software, Leica Photogrammetry Suite (LPS) and PhotoModeler, with manual or automatic selection of Tie Points, to pick out pros and cons of each package in managing non conventional aerial imagery as well as the differences in the modeling approach. Further analysis were done on the differences between the EO parameters and the corresponding data coming from the on board UAV navigation system.
Decision Tree Repository and Rule Set Based Mingjiang River Estuarine Wetlands Classifaction
NASA Astrophysics Data System (ADS)
Zhang, W.; Li, X.; Xiao, W.
2018-05-01
The increasing urbanization and industrialization have led to wetland losses in estuarine area of Mingjiang River over past three decades. There has been increasing attention given to produce wetland inventories using remote sensing and GIS technology. Due to inconsistency training site and training sample, traditionally pixel-based image classification methods can't achieve a comparable result within different organizations. Meanwhile, object-oriented image classification technique shows grate potential to solve this problem and Landsat moderate resolution remote sensing images are widely used to fulfill this requirement. Firstly, the standardized atmospheric correct, spectrally high fidelity texture feature enhancement was conducted before implementing the object-oriented wetland classification method in eCognition. Secondly, we performed the multi-scale segmentation procedure, taking the scale, hue, shape, compactness and smoothness of the image into account to get the appropriate parameters, using the top and down region merge algorithm from single pixel level, the optimal texture segmentation scale for different types of features is confirmed. Then, the segmented object is used as the classification unit to calculate the spectral information such as Mean value, Maximum value, Minimum value, Brightness value and the Normalized value. The Area, length, Tightness and the Shape rule of the image object Spatial features and texture features such as Mean, Variance and Entropy of image objects are used as classification features of training samples. Based on the reference images and the sampling points of on-the-spot investigation, typical training samples are selected uniformly and randomly for each type of ground objects. The spectral, texture and spatial characteristics of each type of feature in each feature layer corresponding to the range of values are used to create the decision tree repository. Finally, with the help of high resolution reference images, the random sampling method is used to conduct the field investigation, achieve an overall accuracy of 90.31 %, and the Kappa coefficient is 0.88. The classification method based on decision tree threshold values and rule set developed by the repository, outperforms the results obtained from the traditional methodology. Our decision tree repository and rule set based object-oriented classification technique was an effective method for producing comparable and consistency wetlands data set.
Shape classification of wear particles by image boundary analysis using machine learning algorithms
NASA Astrophysics Data System (ADS)
Yuan, Wei; Chin, K. S.; Hua, Meng; Dong, Guangneng; Wang, Chunhui
2016-05-01
The shape features of wear particles generated from wear track usually contain plenty of information about the wear states of a machinery operational condition. Techniques to quickly identify types of wear particles quickly to respond to the machine operation and prolong the machine's life appear to be lacking and are yet to be established. To bridge rapid off-line feature recognition with on-line wear mode identification, this paper presents a new radial concave deviation (RCD) method that mainly involves the use of the particle boundary signal to analyze wear particle features. Signal output from the RCDs subsequently facilitates the determination of several other feature parameters, typically relevant to the shape and size of the wear particle. Debris feature and type are identified through the use of various classification methods, such as linear discriminant analysis, quadratic discriminant analysis, naïve Bayesian method, and classification and regression tree method (CART). The average errors of the training and test via ten-fold cross validation suggest CART is a highly suitable approach for classifying and analyzing particle features. Furthermore, the results of the wear debris analysis enable the maintenance team to diagnose faults appropriately.
Hostettler, Isabel Charlotte; Muroi, Carl; Richter, Johannes Konstantin; Schmid, Josef; Neidert, Marian Christoph; Seule, Martin; Boss, Oliver; Pangalu, Athina; Germans, Menno Robbert; Keller, Emanuela
2018-01-19
OBJECTIVE The aim of this study was to create prediction models for outcome parameters by decision tree analysis based on clinical and laboratory data in patients with aneurysmal subarachnoid hemorrhage (aSAH). METHODS The database consisted of clinical and laboratory parameters of 548 patients with aSAH who were admitted to the Neurocritical Care Unit, University Hospital Zurich. To examine the model performance, the cohort was randomly divided into a derivation cohort (60% [n = 329]; training data set) and a validation cohort (40% [n = 219]; test data set). The classification and regression tree prediction algorithm was applied to predict death, functional outcome, and ventriculoperitoneal (VP) shunt dependency. Chi-square automatic interaction detection was applied to predict delayed cerebral infarction on days 1, 3, and 7. RESULTS The overall mortality was 18.4%. The accuracy of the decision tree models was good for survival on day 1 and favorable functional outcome at all time points, with a difference between the training and test data sets of < 5%. Prediction accuracy for survival on day 1 was 75.2%. The most important differentiating factor was the interleukin-6 (IL-6) level on day 1. Favorable functional outcome, defined as Glasgow Outcome Scale scores of 4 and 5, was observed in 68.6% of patients. Favorable functional outcome at all time points had a prediction accuracy of 71.1% in the training data set, with procalcitonin on day 1 being the most important differentiating factor at all time points. A total of 148 patients (27%) developed VP shunt dependency. The most important differentiating factor was hyperglycemia on admission. CONCLUSIONS The multiple variable analysis capability of decision trees enables exploration of dependent variables in the context of multiple changing influences over the course of an illness. The decision tree currently generated increases awareness of the early systemic stress response, which is seemingly pertinent for prognostication.
Classification of large-scale fundus image data sets: a cloud-computing framework.
Roychowdhury, Sohini
2016-08-01
Large medical image data sets with high dimensionality require substantial amount of computation time for data creation and data processing. This paper presents a novel generalized method that finds optimal image-based feature sets that reduce computational time complexity while maximizing overall classification accuracy for detection of diabetic retinopathy (DR). First, region-based and pixel-based features are extracted from fundus images for classification of DR lesions and vessel-like structures. Next, feature ranking strategies are used to distinguish the optimal classification feature sets. DR lesion and vessel classification accuracies are computed using the boosted decision tree and decision forest classifiers in the Microsoft Azure Machine Learning Studio platform, respectively. For images from the DIARETDB1 data set, 40 of its highest-ranked features are used to classify four DR lesion types with an average classification accuracy of 90.1% in 792 seconds. Also, for classification of red lesion regions and hemorrhages from microaneurysms, accuracies of 85% and 72% are observed, respectively. For images from STARE data set, 40 high-ranked features can classify minor blood vessels with an accuracy of 83.5% in 326 seconds. Such cloud-based fundus image analysis systems can significantly enhance the borderline classification performances in automated screening systems.
NASA Astrophysics Data System (ADS)
Dibb, S. D.; Ustin, S.; Grigsby, S.
2015-12-01
Air- and space-borne remote sensing instruments allow for rapid and precise study of the diversity of the Earth's ecosystems. After atmospheric correction and ground validation are performed, the gathered hyperspectral and topographic data can be assembled into a stack of layers for land cover classification. Data for this project were collected in multiple field campaigns, including the 2013 NSF NEON California campaign and 2015 NASA SARP campaign. Using hyperspectral and high resolution topography data, 25 discriminatory attributes were processed in Exelis' ENVI software and collected for use in a decision forest to classify the four major tree species (Blue Oak, Live Oak, California Buckeye, and Foothill Pine) at the San Joaquin Experimental Range near Fresno, CA. These attributes include 21 classic vegetation indices and a number of other spectral characteristics, such as color and albedo, and four topographic layers, including slope, aspect, elevation, and tree height. Additionally, a number of nearby terrain classes, including bare earth, asphalt, water, rock, shadow, structures, and grass were created. Fifty training pixels were used for each class. The training pixels for each tree species came from collected GPS points in the field. Ensemble bootstrap aggregation of decision trees was performed in MATLAB, and an arbitrary number of 500 trees were selected to be grown. The tree that produced the minimum out-of-bag classification error (4.65%) was selected to classify the entire scene. Classification results accurately distinguished between oak species, but was suboptimal in dense areas. The entire San Joaquin Experimental Range was mapped with an overall accuracy of 94.7% and a Kappa coefficient 0.94. Finally, the Commission and Omission percentage averages were 5.3% each. A highly accurate map of tree species at this scale supports studies on drought effects, disease, and species-specific growth traits.
Simulation of land use change in the three gorges reservoir area based on CART-CA
NASA Astrophysics Data System (ADS)
Yuan, Min
2018-05-01
This study proposes a new method to simulate spatiotemporal complex multiple land uses by using classification and regression tree algorithm (CART) based CA model. In this model, we use classification and regression tree algorithm to calculate land class conversion probability, and combine neighborhood factor, random factor to extract cellular transformation rules. The overall Kappa coefficient is 0.8014 and the overall accuracy is 0.8821 in the land dynamic simulation results of the three gorges reservoir area from 2000 to 2010, and the simulation results are satisfactory.
Rieger, Isaak; Kowarik, Ingo; Cherubini, Paolo; Cierjacks, Arne
2017-01-01
Aboveground carbon (C) sequestration in trees is important in global C dynamics, but reliable techniques for its modeling in highly productive and heterogeneous ecosystems are limited. We applied an extended dendrochronological approach to disentangle the functioning of drivers from the atmosphere (temperature, precipitation), the lithosphere (sedimentation rate), the hydrosphere (groundwater table, river water level fluctuation), the biosphere (tree characteristics), and the anthroposphere (dike construction). Carbon sequestration in aboveground biomass of riparian Quercus robur L. and Fraxinus excelsior L. was modeled (1) over time using boosted regression tree analysis (BRT) on cross-datable trees characterized by equal annual growth ring patterns and (2) across space using a subsequent classification and regression tree analysis (CART) on cross-datable and not cross-datable trees. While C sequestration of cross-datable Q. robur responded to precipitation and temperature, cross-datable F. excelsior also responded to a low Danube river water level. However, CART revealed that C sequestration over time is governed by tree height and parameters that vary over space (magnitude of fluctuation in the groundwater table, vertical distance to mean river water level, and longitudinal distance to upstream end of the study area). Thus, a uniform response to climatic drivers of aboveground C sequestration in Q. robur was only detectable in trees of an intermediate height class and in taller trees (>21.8m) on sites where the groundwater table fluctuated little (≤0.9m). The detection of climatic drivers and the river water level in F. excelsior depended on sites at lower altitudes above the mean river water level (≤2.7m) and along a less dynamic downstream section of the study area. Our approach indicates unexploited opportunities of understanding the interplay of different environmental drivers in aboveground C sequestration. Results may support species-specific and locally adapted forest management plans to increase carbon dioxide sequestration from the atmosphere in trees. Copyright © 2016 Elsevier B.V. All rights reserved.
Mi, Huaiyu; Huang, Xiaosong; Muruganujan, Anushya; Tang, Haiming; Mills, Caitlin; Kang, Diane; Thomas, Paul D
2017-01-04
The PANTHER database (Protein ANalysis THrough Evolutionary Relationships, http://pantherdb.org) contains comprehensive information on the evolution and function of protein-coding genes from 104 completely sequenced genomes. PANTHER software tools allow users to classify new protein sequences, and to analyze gene lists obtained from large-scale genomics experiments. In the past year, major improvements include a large expansion of classification information available in PANTHER, as well as significant enhancements to the analysis tools. Protein subfamily functional classifications have more than doubled due to progress of the Gene Ontology Phylogenetic Annotation Project. For human genes (as well as a few other organisms), PANTHER now also supports enrichment analysis using pathway classifications from the Reactome resource. The gene list enrichment tools include a new 'hierarchical view' of results, enabling users to leverage the structure of the classifications/ontologies; the tools also allow users to upload genetic variant data directly, rather than requiring prior conversion to a gene list. The updated coding single-nucleotide polymorphisms (SNP) scoring tool uses an improved algorithm. The hidden Markov model (HMM) search tools now use HMMER3, dramatically reducing search times and improving accuracy of E-value statistics. Finally, the PANTHER Tree-Attribute Viewer has been implemented in JavaScript, with new views for exploring protein sequence evolution. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Circum-Arctic petroleum systems identified using decision-tree chemometrics
Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.
2007-01-01
Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.
Poulos, Helen M; Camp, Ann E
2010-04-01
The abundance and distribution of species reflect how the niche requirements of species and the dynamics of populations interact with spatial and temporal variation in the environment. This study investigated the influence of geographical variation in environmental site conditions on tree dominance and diversity patterns in three topographically dissected mountain ranges in west Texas, USA, and northern Mexico. We measured tree abundance and basal area using a systematic sampling design across the forested areas of three mountain ranges and related these data to a suite of environmental parameters derived from field and digital elevation model data. We employed cluster analysis, classification and regression trees (CART), and rarefaction to identify (1) the dominant forest cover types across the three study sites and (2) environmental influences on tree distribution and diversity patterns. Elevation, topographic position, and incident solar radiation were the major influences on tree dominance and diversity. Mesic valley bottoms hosted high-diversity vegetation types, while hotter and drier mid-slopes and ridgetops supported lower tree diversity. Valley bottoms and other topographic positions shared few species, indicating high species turnover at the landscape scale. Mountain ranges with high topographic complexity also had higher species richness, suggesting that geographical variability in environmental conditions was a major influence on tree diversity. This study stressed the importance of landscape- and regional-scale topographic variability as a key factor controlling vegetation pattern and diversity in southwestern North America.
Neighbourhood-scale urban forest ecosystem classification.
Steenberg, James W N; Millward, Andrew A; Duinker, Peter N; Nowak, David J; Robinson, Pamela J
2015-11-01
Urban forests are now recognized as essential components of sustainable cities, but there remains uncertainty concerning how to stratify and classify urban landscapes into units of ecological significance at spatial scales appropriate for management. Ecosystem classification is an approach that entails quantifying the social and ecological processes that shape ecosystem conditions into logical and relatively homogeneous management units, making the potential for ecosystem-based decision support available to urban planners. The purpose of this study is to develop and propose a framework for urban forest ecosystem classification (UFEC). The multifactor framework integrates 12 ecosystem components that characterize the biophysical landscape, built environment, and human population. This framework is then applied at the neighbourhood scale in Toronto, Canada, using hierarchical cluster analysis. The analysis used 27 spatially-explicit variables to quantify the ecosystem components in Toronto. Twelve ecosystem classes were identified in this UFEC application. Across the ecosystem classes, tree canopy cover was positively related to economic wealth, especially income. However, education levels and homeownership were occasionally inconsistent with the expected positive relationship with canopy cover. Open green space and stocking had variable relationships with economic wealth and were more closely related to population density, building intensity, and land use. The UFEC can provide ecosystem-based information for greening initiatives, tree planting, and the maintenance of the existing canopy. Moreover, its use has the potential to inform the prioritization of limited municipal resources according to ecological conditions and to concerns of social equity in the access to nature and distribution of ecosystem service supply. Copyright © 2015 Elsevier Ltd. All rights reserved.
C.L. Schoch; G.-H. Sung; F. Lopez-Giraldez
2009-01-01
We present a six-gene, 420-species maximum-likelihood phylogeny of Ascomycota, the largest phylum of fungi. This analysis is the most taxonomically complete to date with species sampled from all 15 currently circumscribed classes. A number of superclass-level nodes that have previously evaded resolution and were unnamed in classifications of the fungi are resolved for...
Mapping and detecting bark beetle-caused tree mortality in the western United States
NASA Astrophysics Data System (ADS)
Meddens, Arjan J. H.
Recently, insect outbreaks across North America have dramatically increased and the forest area affected by bark beetles is similar to that affected by fire. Remote sensing offers the potential to detect insect outbreaks with high accuracy. Chapter one involved detection of insect-caused tree mortality on the tree level for a 90km2 area in northcentral Colorado. Classes of interest included green trees, multiple stages of post-insect attack tree mortality including dead trees with red needles ("red-attack") and dead trees without needles ("gray-attack"), and non-forest. The results illustrated that classification of an image with a spatial resolution similar to the area of a tree crown outperformed that from finer and coarser resolution imagery for mapping tree mortality and non-forest classes. I also demonstrated that multispectral imagery could be used to separate multiple postoutbreak attack stages (i.e., red-attack and gray-attack) from other classes in the image. In Chapter 2, I compared and improved methods for detecting bark beetle-caused tree mortality using medium-resolution satellite data. I found that overall classification accuracy was similar between single-date and multi-date classification methods. I developed regression models to predict percent red attack within a 30-m grid cell and these models explained >75% of the variance using three Landsat spectral explanatory variables. Results of the final product showed that approximately 24% of the forest within the Landsat scene was comprised of tree mortality caused by bark beetles. In Chapter 3, I developed a gridded data set with 1-km2 resolution using aerial survey data and improved estimates of tree mortality across the western US and British Columbia. In the US, I also produced an upper estimate by forcing the mortality area to match that from high-resolution imagery in Idaho, Colorado, and New Mexico. Cumulative mortality area from all bark beetles was 5.46 Mha in British Columbia in 2001-2010 and 0.47-5.37 Mha (lower and upper estimate) in the western conterminous US during 1997-2010. Improved methods for detection and mapping of insect outbreak areas will lead to improved assessments of the effects of these forest disturbances on the economy, carbon cycle (and feedback to climate change), fuel loads, hydrology and forest ecology.
Study and Ranking of Determinants of Taenia solium Infections by Classification Tree Models
Mwape, Kabemba E.; Phiri, Isaac K.; Praet, Nicolas; Dorny, Pierre; Muma, John B.; Zulu, Gideon; Speybroeck, Niko; Gabriël, Sarah
2015-01-01
Taenia solium taeniasis/cysticercosis is an important public health problem occurring mainly in developing countries. This work aimed to study the determinants of human T. solium infections in the Eastern province of Zambia and rank them in order of importance. A household (HH)-level questionnaire was administered to 680 HHs from 53 villages in two rural districts and the taeniasis and cysticercosis status determined. A classification tree model (CART) was used to define the relative importance and interactions between different predictor variables in their effect on taeniasis and cysticercosis. The Katete study area had a significantly higher taeniasis and cysticercosis prevalence than the Petauke area. The CART analysis for Katete showed that the most important determinant for cysticercosis infections was the number of HH inhabitants (6 to 10) and for taeniasis was the number of HH inhabitants > 6. The most important determinant in Petauke for cysticercosis was the age of head of household > 32 years and for taeniasis it was age < 55 years. The CART analysis showed that the most important determinant for both taeniasis and cysticercosis infections was the number of HH inhabitants (6 to 10) in Katete district and age in Petauke. The results suggest that control measures should target HHs with a high number of inhabitants and older individuals. PMID:25404073
Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree
NASA Astrophysics Data System (ADS)
Ru, Feng; Yin, Aijing; Jin, Jiaxin; Zhang, Xiuying; Yang, Xiaohui; Zhang, Ming; Gao, Chao
2016-08-01
Reclamation of coastal land is one of the most common ways to obtain land resources in China. However, it has long been acknowledged that the artificial interference with coastal land has disadvantageous effects, such as heavy metal contamination. This study aimed to develop a prediction model for cadmium enrichment levels and assess the importance of affecting factors in typical reclaimed land in Eastern China (DFCL: Dafeng Coastal Land). Two hundred and twenty seven surficial soil/sediment samples were collected and analyzed to identify the enrichment levels of cadmium and the possible affecting factors in soils and sediments. The classification and regression tree (CART) model was applied in this study to predict cadmium enrichment levels. The prediction results showed that cadmium enrichment levels assessed by the CART model had an accuracy of 78.0%. The CART model could extract more information on factors affecting the environmental behavior of cadmium than correlation analysis. The integration of correlation analysis and the CART model showed that fertilizer application and organic carbon accumulation were the most important factors affecting soil/sediment cadmium enrichment levels, followed by particle size effects (Al2O3, TFe2O3 and SiO2), contents of Cl and S, surrounding construction areas and reclamation history.
Classification of sodium MRI data of cartilage using machine learning.
Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R
2015-11-01
To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.
A Regional Simulation to Explore Impacts of Resource Use and Constraints
2007-03-01
mountaintops. (10) Deciduous Forest - This class is composed of forests, which contain at least 75% deciduous trees in the canopy, deciduous ... trees , pine plantations, and evergreen woodlands. (12) Mixed Forest - This class includes forests with mixed deciduous /coniferous canopies, natural...reflective surfaces. Classification of forested wetlands dominated by deciduous trees is probably more accurate than that in areas with 104
Electric Trees and Pond Creatures.
ERIC Educational Resources Information Center
Weaver, Helen; Hounshell, Paul B.
1978-01-01
Two learning activities are presented to develop observation and classification skills at the elementary level. The first is an electric box that associates tree names with leaf and bark specimens, and the second is a pond water observation and slide preparation activity. (BB)
A Pruning Neural Network Model in Credit Classification Analysis
Tang, Yajiao; Ji, Junkai; Dai, Hongwei; Yu, Yang; Todo, Yuki
2018-01-01
Nowadays, credit classification models are widely applied because they can help financial decision-makers to handle credit classification issues. Among them, artificial neural networks (ANNs) have been widely accepted as the convincing methods in the credit industry. In this paper, we propose a pruning neural network (PNN) and apply it to solve credit classification problem by adopting the well-known Australian and Japanese credit datasets. The model is inspired by synaptic nonlinearity of a dendritic tree in a biological neural model. And it is trained by an error back-propagation algorithm. The model is capable of realizing a neuronal pruning function by removing the superfluous synapses and useless dendrites and forms a tidy dendritic morphology at the end of learning. Furthermore, we utilize logic circuits (LCs) to simulate the dendritic structures successfully which makes PNN be implemented on the hardware effectively. The statistical results of our experiments have verified that PNN obtains superior performance in comparison with other classical algorithms in terms of accuracy and computational efficiency. PMID:29606961
Terrain Classification and Identification of Tree Stems Using Ground-Based Lidar
2012-12-01
hailing from North America and Eastern Asia. Stands are mixed age and very diverse, making this an appealing test site in terms of tree variety...sparse scene in Fig. 3(b) contains several deciduous trees and shrubs, but is largely open. The moderate scene, shown in Fig. 3(c), is cluttered with...numerous deciduous trees and shrubs, and significant ground cover. The remaining two data sets, dense1 and dense2 were collected at Breakheart
NASA Astrophysics Data System (ADS)
Praskievicz, S. J.; Luo, C.
2017-12-01
Classification of rivers is useful for a variety of purposes, such as generating and testing hypotheses about watershed controls on hydrology, predicting hydrologic variables for ungaged rivers, and setting goals for river management. In this research, we present a bottom-up (based on machine learning) river classification designed to investigate the underlying physical processes governing rivers' hydrologic regimes. The classification was developed for the entire state of Alabama, based on 248 United States Geological Survey (USGS) stream gages that met criteria for length and completeness of records. Five dimensionless hydrologic signatures were derived for each gage: slope of the flow duration curve (indicator of flow variability), baseflow index (ratio of baseflow to average streamflow), rising limb density (number of rising limbs per unit time), runoff ratio (ratio of long-term average streamflow to long-term average precipitation), and streamflow elasticity (sensitivity of streamflow to precipitation). We used a Bayesian clustering algorithm to classify the gages, based on the five hydrologic signatures, into distinct hydrologic regimes. We then used classification and regression trees (CART) to predict each gaged river's membership in different hydrologic regimes based on climatic and watershed variables. Using existing geospatial data, we applied the CART analysis to classify ungaged streams in Alabama, with the National Hydrography Dataset Plus (NHDPlus) catchment (average area 3 km2) as the unit of classification. The results of the classification can be used for meeting management and conservation objectives in Alabama, such as developing statewide standards for environmental instream flows. Such hydrologic classification approaches are promising for contributing to process-based understanding of river systems.
NASA Astrophysics Data System (ADS)
Borodinov, A. A.; Myasnikov, V. V.
2018-04-01
The present work is devoted to comparing the accuracy of the known qualification algorithms in the task of recognizing local objects on radar images for various image preprocessing methods. Preprocessing involves speckle noise filtering and normalization of the object orientation in the image by the method of image moments and by a method based on the Hough transform. In comparison, the following classification algorithms are used: Decision tree; Support vector machine, AdaBoost, Random forest. The principal component analysis is used to reduce the dimension. The research is carried out on the objects from the base of radar images MSTAR. The paper presents the results of the conducted studies.
Lee, I-M; Bottner-Parker, K D; Zhao, Y; Bertaccini, A; Davis, R E
2012-09-01
The pigeon pea witches'-broom phytoplasma group (16SrIX) comprises diverse strains that cause numerous diseases in leguminous trees and herbaceous crops, vegetables, a fruit, a nut tree and a forest tree. At least 14 strains have been reported worldwide. Comparative phylogenetic analyses of the highly conserved 16S rRNA gene and the moderately conserved rplV (rpl22)-rpsC (rps3) and secY genes indicated that the 16SrIX group consists of at least six distinct genetic lineages. Some of these lineages cannot be readily differentiated based on analysis of 16S rRNA gene sequences alone. The relative genetic distances among these closely related lineages were better assessed by including more variable genes [e.g. ribosomal protein (rp) and secY genes]. The present study demonstrated that virtual RFLP analyses using rp and secY gene sequences allowed unambiguous identification of such lineages. A coding system is proposed to designate each distinct rp and secY subgroup in the 16SrIX group.
Fault tree analysis of most common rolling bearing tribological failures
NASA Astrophysics Data System (ADS)
Vencl, Aleksandar; Gašić, Vlada; Stojanović, Blaža
2017-02-01
Wear as a tribological process has a major influence on the reliability and life of rolling bearings. Field examinations of bearing failures due to wear indicate possible causes and point to the necessary measurements for wear reduction or elimination. Wear itself is a very complex process initiated by the action of different mechanisms, and can be manifested by different wear types which are often related. However, the dominant type of wear can be approximately determined. The paper presents the classification of most common bearing damages according to the dominant wear type, i.e. abrasive wear, adhesive wear, surface fatigue wear, erosive wear, fretting wear and corrosive wear. The wear types are correlated with the terms used in ISO 15243 standard. Each wear type is illustrated with an appropriate photograph, and for each wear type, appropriate description of causes and manifestations is presented. Possible causes of rolling bearing failure are used for the fault tree analysis (FTA). It was performed to determine the root causes for bearing failures. The constructed fault tree diagram for rolling bearing failure can be useful tool for maintenance engineers.
James, Lachlan P; Robertson, Sam; Haff, G Gregory; Beckman, Emma M; Kelly, Vincent G
2017-03-01
To determine those performance indicators that have the greatest influence on classifying outcome at the elite level of mixed martial arts (MMA). A secondary objective was to establish the efficacy of decision tree analysis in explaining the characteristics of victory when compared to alternate statistical methods. Cross-sectional observational. Eleven raw performance indicators from male Ultimate Fighting Championship bouts (n=234) from July 2014 to December 2014 were screened for analysis. Each raw performance indicator was also converted to a rate-dependent measure to be scaled to fight duration. Further, three additional performance indicators were calculated from the dataset and included in the analysis. Cohen's d effect sizes were employed to determine the magnitude of the differences between Wins and Losses, while decision tree (chi-square automatic interaction detector (CHAID)) and discriminant function analyses (DFA) were used to classify outcome (Win and Loss). Effect size comparisons revealed differences between Wins and Losses across a number of performance indicators. Decision tree (raw: 71.8%; rate-scaled: 76.3%) and DFA (raw: 71.4%; rate-scaled 71.2%) achieved similar classification accuracies. Grappling and accuracy performance indicators were the most influential in explaining outcome. The decision tree models also revealed multiple combinations of performance indicators leading to victory. The decision tree analyses suggest that grappling activity and technique accuracy are of particular importance in achieving victory in elite-level MMA competition. The DFA results supported the importance of these performance indicators. Decision tree induction represents an intuitive and slightly more accurate approach to explaining bout outcome in this sport when compared to DFA. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Kumar, Ashwani; Singh, Tiratha Raj
2017-03-01
Alzheimer's disease (AD) is a progressive, incurable and terminal neurodegenerative disorder of the brain and is associated with mutations in amyloid precursor protein, presenilin 1, presenilin 2 or apolipoprotein E, but its underlying mechanisms are still not fully understood. Healthcare sector is generating a large amount of information corresponding to diagnosis, disease identification and treatment of an individual. Mining knowledge and providing scientific decision-making for the diagnosis and treatment of disease from the clinical dataset are therefore increasingly becoming necessary. The current study deals with the construction of classifiers that can be human readable as well as robust in performance for gene dataset of AD using a decision tree. Models of classification for different AD genes were generated according to Mini-Mental State Examination scores and all other vital parameters to achieve the identification of the expression level of different proteins of disorder that may possibly determine the involvement of genes in various AD pathogenesis pathways. The effectiveness of decision tree in AD diagnosis is determined by information gain with confidence value (0.96), specificity (92 %), sensitivity (98 %) and accuracy (77 %). Besides this functional gene classification using different parameters and enrichment analysis, our finding indicates that the measures of all the gene assess in single cohorts are sufficient to diagnose AD and will help in the prediction of important parameters for other relevant assessments.
An efficient ensemble learning method for gene microarray classification.
Osareh, Alireza; Shadgar, Bita
2013-01-01
The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.
A hybrid clustering and classification approach for predicting crash injury severity on rural roads.
Hasheminejad, Seyed Hessam-Allah; Zahedi, Mohsen; Hasheminejad, Seyed Mohammad Hossein
2018-03-01
As a threat for transportation system, traffic crashes have a wide range of social consequences for governments. Traffic crashes are increasing in developing countries and Iran as a developing country is not immune from this risk. There are several researches in the literature to predict traffic crash severity based on artificial neural networks (ANNs), support vector machines and decision trees. This paper attempts to investigate the crash injury severity of rural roads by using a hybrid clustering and classification approach to compare the performance of classification algorithms before and after applying the clustering. In this paper, a novel rule-based genetic algorithm (GA) is proposed to predict crash injury severity, which is evaluated by performance criteria in comparison with classification algorithms like ANN. The results obtained from analysis of 13,673 crashes (5600 property damage, 778 fatal crashes, 4690 slight injuries and 2605 severe injuries) on rural roads in Tehran Province of Iran during 2011-2013 revealed that the proposed GA method outperforms other classification algorithms based on classification metrics like precision (86%), recall (88%) and accuracy (87%). Moreover, the proposed GA method has the highest level of interpretation, is easy to understand and provides feedback to analysts.
Tree Colors: Color Schemes for Tree-Structured Data.
Tennekes, Martijn; de Jonge, Edwin
2014-12-01
We present a method to map tree structures to colors from the Hue-Chroma-Luminance color model, which is known for its well balanced perceptual properties. The Tree Colors method can be tuned with several parameters, whose effect on the resulting color schemes is discussed in detail. We provide a free and open source implementation with sensible parameter defaults. Categorical data are very common in statistical graphics, and often these categories form a classification tree. We evaluate applying Tree Colors to tree structured data with a survey on a large group of users from a national statistical institute. Our user study suggests that Tree Colors are useful, not only for improving node-link diagrams, but also for unveiling tree structure in non-hierarchical visualizations.
NASA Astrophysics Data System (ADS)
Freeman, Mary Pyott
ABSTRACT An Analysis of Tree Mortality Using High Resolution Remotely-Sensed Data for Mixed-Conifer Forests in San Diego County by Mary Pyott Freeman The montane mixed-conifer forests of San Diego County are currently experiencing extensive tree mortality, which is defined as dieback where whole stands are affected. This mortality is likely the result of the complex interaction of many variables, such as altered fire regimes, climatic conditions such as drought, as well as forest pathogens and past management strategies. Conifer tree mortality and its spatial pattern and change over time were examined in three components. In component 1, two remote sensing approaches were compared for their effectiveness in delineating dead trees, a spatial contextual approach and an OBIA (object based image analysis) approach, utilizing various dates and spatial resolutions of airborne image data. For each approach transforms and masking techniques were explored, which were found to improve classifications, and an object-based assessment approach was tested. In component 2, dead tree maps produced by the most effective techniques derived from component 1 were utilized for point pattern and vector analyses to further understand spatio-temporal changes in tree mortality for the years 1997, 2000, 2002, and 2005 for three study areas: Palomar, Volcan and Laguna mountains. Plot-based fieldwork was conducted to further assess mortality patterns. Results indicate that conifer mortality was significantly clustered, increased substantially between 2002 and 2005, and was non-random with respect to tree species and diameter class sizes. In component 3, multiple environmental variables were used in Generalized Linear Model (GLM-logistic regression) and decision tree classifier model development, revealing the importance of climate and topographic factors such as precipitation and elevation, in being able to predict areas of high risk for tree mortality. The results from this study highlight the importance of multi-scale spatial as well as temporal analyses, in order to understand mixed-conifer forest structure, dynamics, and processes of decline, which can lead to more sustainable management of forests with continued natural and anthropogenic disturbance.
Texture classification of normal tissues in computed tomography using Gabor filters
NASA Astrophysics Data System (ADS)
Dettori, Lucia; Bashir, Alia; Hasemann, Julie
2007-03-01
The research presented in this article is aimed at developing an automated imaging system for classification of normal tissues in medical images obtained from Computed Tomography (CT) scans. Texture features based on a bank of Gabor filters are used to classify the following tissues of interests: liver, spleen, kidney, aorta, trabecular bone, lung, muscle, IP fat, and SQ fat. The approach consists of three steps: convolution of the regions of interest with a bank of 32 Gabor filters (4 frequencies and 8 orientations), extraction of two Gabor texture features per filter (mean and standard deviation), and creation of a Classification and Regression Tree-based classifier that automatically identifies the various tissues. The data set used consists of approximately 1000 DIACOM images from normal chest and abdominal CT scans of five patients. The regions of interest were labeled by expert radiologists. Optimal trees were generated using two techniques: 10-fold cross-validation and splitting of the data set into a training and a testing set. In both cases, perfect classification rules were obtained provided enough images were available for training (~65%). All performance measures (sensitivity, specificity, precision, and accuracy) for all regions of interest were at 100%. This significantly improves previous results that used Wavelet, Ridgelet, and Curvelet texture features, yielding accuracy values in the 85%-98% range The Gabor filters' ability to isolate features at different frequencies and orientations allows for a multi-resolution analysis of texture essential when dealing with, at times, very subtle differences in the texture of tissues in CT scans.
A phylogeny and revised classification of Squamata, including 4161 species of lizards and snakes
2013-01-01
Background The extant squamates (>9400 known species of lizards and snakes) are one of the most diverse and conspicuous radiations of terrestrial vertebrates, but no studies have attempted to reconstruct a phylogeny for the group with large-scale taxon sampling. Such an estimate is invaluable for comparative evolutionary studies, and to address their classification. Here, we present the first large-scale phylogenetic estimate for Squamata. Results The estimated phylogeny contains 4161 species, representing all currently recognized families and subfamilies. The analysis is based on up to 12896 base pairs of sequence data per species (average = 2497 bp) from 12 genes, including seven nuclear loci (BDNF, c-mos, NT3, PDC, R35, RAG-1, and RAG-2), and five mitochondrial genes (12S, 16S, cytochrome b, ND2, and ND4). The tree provides important confirmation for recent estimates of higher-level squamate phylogeny based on molecular data (but with more limited taxon sampling), estimates that are very different from previous morphology-based hypotheses. The tree also includes many relationships that differ from previous molecular estimates and many that differ from traditional taxonomy. Conclusions We present a new large-scale phylogeny of squamate reptiles that should be a valuable resource for future comparative studies. We also present a revised classification of squamates at the family and subfamily level to bring the taxonomy more in line with the new phylogenetic hypothesis. This classification includes new, resurrected, and modified subfamilies within gymnophthalmid and scincid lizards, and boid, colubrid, and lamprophiid snakes. PMID:23627680
Iris Image Classification Based on Hierarchical Visual Codebook.
Zhenan Sun; Hui Zhang; Tieniu Tan; Jianyu Wang
2014-06-01
Iris recognition as a reliable method for personal identification has been well-studied with the objective to assign the class label of each iris image to a unique subject. In contrast, iris image classification aims to classify an iris image to an application specific category, e.g., iris liveness detection (classification of genuine and fake iris images), race classification (e.g., classification of iris images of Asian and non-Asian subjects), coarse-to-fine iris identification (classification of all iris images in the central database into multiple categories). This paper proposes a general framework for iris image classification based on texture analysis. A novel texture pattern representation method called Hierarchical Visual Codebook (HVC) is proposed to encode the texture primitives of iris images. The proposed HVC method is an integration of two existing Bag-of-Words models, namely Vocabulary Tree (VT), and Locality-constrained Linear Coding (LLC). The HVC adopts a coarse-to-fine visual coding strategy and takes advantages of both VT and LLC for accurate and sparse representation of iris texture. Extensive experimental results demonstrate that the proposed iris image classification method achieves state-of-the-art performance for iris liveness detection, race classification, and coarse-to-fine iris identification. A comprehensive fake iris image database simulating four types of iris spoof attacks is developed as the benchmark for research of iris liveness detection.
NASA Technical Reports Server (NTRS)
Basu, Saikat; Ganguly, Sangram; Michaelis, Andrew; Votava, Petr; Roy, Anshuman; Mukhopadhyay, Supratik; Nemani, Ramakrishna
2015-01-01
Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets, which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.
NASA Astrophysics Data System (ADS)
Basu, S.; Ganguly, S.; Michaelis, A.; Votava, P.; Roy, A.; Mukhopadhyay, S.; Nemani, R. R.
2015-12-01
Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.
Deriving pathway maps from automated text analysis using a grammar-based approach.
Olsson, Björn; Gawronska, Barbara; Erlendsson, Björn
2006-04-01
We demonstrate how automated text analysis can be used to support the large-scale analysis of metabolic and regulatory pathways by deriving pathway maps from textual descriptions found in the scientific literature. The main assumption is that correct syntactic analysis combined with domain-specific heuristics provides a good basis for relation extraction. Our method uses an algorithm that searches through the syntactic trees produced by a parser based on a Referent Grammar formalism, identifies relations mentioned in the sentence, and classifies them with respect to their semantic class and epistemic status (facts, counterfactuals, hypotheses). The semantic categories used in the classification are based on the relation set used in KEGG (Kyoto Encyclopedia of Genes and Genomes), so that pathway maps using KEGG notation can be automatically generated. We present the current version of the relation extraction algorithm and an evaluation based on a corpus of abstracts obtained from PubMed. The results indicate that the method is able to combine a reasonable coverage with high accuracy. We found that 61% of all sentences were parsed, and 97% of the parse trees were judged to be correct. The extraction algorithm was tested on a sample of 300 parse trees and was found to produce correct extractions in 90.5% of the cases.
Mapping grass communities based on multi-temporal Landsat TM imagery and environmental variables
NASA Astrophysics Data System (ADS)
Zeng, Yuandi; Liu, Yanfang; Liu, Yaolin; de Leeuw, Jan
2007-06-01
Information on the spatial distribution of grass communities in wetland is increasingly recognized as important for effective wetland management and biological conservation. Remote sensing techniques has been proved to be an effective alternative to intensive and costly ground surveys for mapping grass community. However, the mapping accuracy of grass communities in wetland is still not preferable. The aim of this paper is to develop an effective method to map grass communities in Poyang Lake Natural Reserve. Through statistic analysis, elevation is selected as an environmental variable for its high relationship with the distribution of grass communities; NDVI stacked from images of different months was used to generate Carex community map; the image in October was used to discriminate Miscanthus and Cynodon communities. Classifications were firstly performed with maximum likelihood classifier using single date satellite image with and without elevation; then layered classifications were performed using multi-temporal satellite imagery and elevation with maximum likelihood classifier, decision tree and artificial neural network separately. The results show that environmental variables can improve the mapping accuracy; and the classification with multitemporal imagery and elevation is significantly better than that with single date image and elevation (p=0.001). Besides, maximum likelihood (a=92.71%, k=0.90) and artificial neural network (a=94.79%, k=0.93) perform significantly better than decision tree (a=86.46%, k=0.83).
Gupta, Radhey S
2016-07-01
Analyses of genome sequences, by some approaches, suggest that the widespread occurrence of horizontal gene transfers (HGTs) in prokaryotes disguises their evolutionary relationships and have led to questioning of the Darwinian model of evolution for prokaryotes. These inferences are critically examined in the light of comparative genome analysis, characteristic synapomorphies, phylogenetic trees and Darwin's views on examining evolutionary relationships. Genome sequences are enabling discovery of numerous molecular markers (synapomorphies) such as conserved signature indels (CSIs) and conserved signature proteins (CSPs), which are distinctive characteristics of different prokaryotic taxa. Based on these molecular markers, exhibiting high degree of specificity and predictive ability, numerous prokaryotic taxa of different ranks, currently identified based on the 16S rRNA gene trees, can now be reliably demarcated in molecular terms. Within all studied groups, multiple CSIs and CSPs have been identified for successive nested clades providing reliable information regarding their hierarchical relationships and these inferences are not affected by HGTs. These results strongly support Darwin's views on evolution and classification and supplement the current phylogenetic framework based on 16S rRNA in important respects. The identified molecular markers provide important means for developing novel diagnostics, therapeutics and for functional studies providing important insights regarding prokaryotic taxa. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Gretchen G. Moisen; Elizabeth A. Freeman; Jock A. Blackard; Tracey S. Frescino; Niklaus E. Zimmermann; Thomas C. Edwards
2006-01-01
Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's© See5 and Cubist (for binary and continuous responses,...
E. Freeman; G. Moisen; J. Coulston; B. Wilson
2014-01-01
Random forests (RF) and stochastic gradient boosting (SGB), both involving an ensemble of classification and regression trees, are compared for modeling tree canopy cover for the 2011 National Land Cover Database (NLCD). The objectives of this study were twofold. First, sensitivity of RF and SGB to choices in tuning parameters was explored. Second, performance of the...
Brian J. Kopper; Kier D. Klepzig; Kenneth F. Raffa
2004-01-01
Efforts to describe the complex relationships between bark beetles and the ophiostomatoid (stain) fungi they transport have largely resulted in a dichotomous classification. These symbioses have been viewed as either mutualistic (i.e., fungi help bark beetles colonize living trees by overcoming tree defenses or by providing nutrients after colonization in return for...
Portable Language-Independent Adaptive Translation from OCR. Phase 1
2009-04-01
including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation
Tree species classification using within crown localization of waveform LiDAR attributes
NASA Astrophysics Data System (ADS)
Blomley, Rosmarie; Hovi, Aarne; Weinmann, Martin; Hinz, Stefan; Korpela, Ilkka; Jutzi, Boris
2017-11-01
Since forest planning is increasingly taking an ecological, diversity-oriented perspective into account, remote sensing technologies are becoming ever more important in assessing existing resources with reduced manual effort. While the light detection and ranging (LiDAR) technology provides a good basis for predictions of tree height and biomass, tree species identification based on this type of data is particularly challenging in structurally heterogeneous forests. In this paper, we analyse existing approaches with respect to the geometrical scale of feature extraction (whole tree, within crown partitions or within laser footprint) and conclude that currently features are always extracted separately from the different scales. Since multi-scale approaches however have proven successful in other applications, we aim to utilize the within-tree-crown distribution of within-footprint signal characteristics as additional features. To do so, a spin image algorithm, originally devised for the extraction of 3D surface features in object recognition, is adapted. This algorithm relies on spinning an image plane around a defined axis, e.g. the tree stem, collecting the number of LiDAR returns or mean values of returns attributes per pixel as respective values. Based on this representation, spin image features are extracted that comprise only those components of highest variability among a given set of library trees. The relative performance and the combined improvement of these spin image features with respect to non-spatial statistical metrics of the waveform (WF) attributes are evaluated for the tree species classification of Scots pine (Pinus sylvestris L.), Norway spruce (Picea abies (L.) Karst.) and Silver/Downy birch (Betula pendula Roth/Betula pubescens Ehrh.) in a boreal forest environment. This evaluation is performed for two WF LiDAR datasets that differ in footprint size, pulse density at ground, laser wavelength and pulse width. Furthermore, we evaluate the robustness of the proposed method with respect to internal parameters and tree size. The results reveal, that the consideration of the crown-internal distribution of within-footprint signal characteristics captured in spin image features improves the classification results in nearly all test cases.
NASA Technical Reports Server (NTRS)
Botkin, Daniel B.
1987-01-01
The analysis of ground-truth data from the boreal forest plots in the Superior National Forest, Minnesota, was completed. Development of statistical methods was completed for dimension analysis (equations to estimate the biomass of trees from measurements of diameter and height). The dimension-analysis equations were applied to the data obtained from ground-truth plots, to estimate the biomass. Classification and analyses of remote sensing images of the Superior National Forest were done as a test of the technique to determine forest biomass and ecological state by remote sensing. Data was archived on diskette and tape and transferred to UCSB to be used in subsequent research.
Suzanne M. Joy; R. M. Reich; Richard T. Reynolds
2003-01-01
Traditional land classification techniques for large areas that use Landsat Thematic Mapper (TM) imagery are typically limited to the fixed spatial resolution of the sensors (30m). However, the study of some ecological processes requires land cover classifications at finer spatial resolutions. We model forest vegetation types on the Kaibab National Forest (KNF) in...
The Early Detection of the Emerald Ash Borer (EAB) Using Advanced Geospacial Technologies
NASA Astrophysics Data System (ADS)
Hu, B.; Li, J.; Wang, J.; Hall, B.
2014-11-01
The objectives of this study were to exploit Light Detection And Ranging (LiDAR) and very high spatial resolution (VHR) data and their synergy with hyperspectral imagery in the early detection of the EAB presence in trees within urban areas and to develop a framework to combine information extracted from multiple data sources. To achieve these, an object-oriented framework was developed to combine information derived from available data sets to characterize ash trees. Within this framework, individual trees were first extracted and then classified into different species based on their spectral information derived from hyperspectral imagery, spatial information from VHR imagery, and for each ash tree its health state and EAB infestation stage were determined based on hyperspectral imagery. The developed framework and methods were demonstrated to be effective according to the results obtained on two study sites in the city of Toronto, Ontario Canada. The individual tree delineation method provided satisfactory results with an overall accuracy of 78 % and 19 % commission and 23 % omission errors when used on the combined very high-spatial resolution imagery and LiDAR data. In terms of the identification of ash trees, given sufficient representative training data, our classification model was able to predict tree species with above 75 % overall accuracy, and mis-classification occurred mainly between ash and maple trees. The hypothesis that a strong correlation exists between general tree stress and EAB infestation was confirmed. Vegetation indices sensitive to leaf chlorophyll content derived from hyperspectral imagery can be used to predict the EAB infestation levels for each ash tree.
NASA Astrophysics Data System (ADS)
Saran, Sameer; Sterk, Geert; Kumar, Suresh
2007-10-01
Land use/cover is an important watershed surface characteristic that affects surface runoff and erosion. Many of the available hydrological models divide the watershed into Hydrological Response Units (HRU), which are spatial units with expected similar hydrological behaviours. The division into HRU's requires good-quality spatial data on land use/cover. This paper presents different approaches to attain an optimal land use/cover map based on remote sensing imagery for a Himalayan watershed in northern India. First digital classifications using maximum likelihood classifier (MLC) and a decision tree classifier were applied. The results obtained from the decision tree were better and even improved after post classification sorting. But the obtained land use/cover map was not sufficient for the delineation of HRUs, since the agricultural land use/cover class did not discriminate between the two major crops in the area i.e. paddy and maize. Therefore we adopted a visual classification approach using optical data alone and also fused with ENVISAT ASAR data. This second step with detailed classification system resulted into better classification accuracy within the 'agricultural land' class which will be further combined with topography and soil type to derive HRU's for physically-based hydrological modelling.
Variable coupling between sap-flow and transpiration in pine trees under drought conditions
NASA Astrophysics Data System (ADS)
Preisler, Yakir; Tatarinov, Fyodor; Rohatyn, Shani; Rotenberg, Eyal; Grunzweig, Jose M.; Yakir, Dan
2016-04-01
Changes in diurnal patterns in water transport and physiological activities in response to changes in environmental conditions are important adjustments of trees to drought. The rate of sap flow (SF) in trees is expected to be in agreement with the rate of tree-scale transpiration (T) and provides a powerful measure of water transport in the soil-plant-atmosphere system. The aim of this five-years study was to investigate the temporal links between SF and T in Pinus halepensis exposed to extreme seasonal drought in the Yatir forest in Israel. We continuously measured SF (20 trees), the daily variations in stem diameter (ΔDBH, determined with high precision dendrometers; 8 trees), and ecosystem evapotranspiration (ET; eddy covariance), which were complemented with short-term campaigns of leaf-scale measurements of H2O and CO2 gas exchange, water potentials, and hydraulic conductivity. During the rainy season, tree SF was well synchronized with ecosystem ET, reaching maximum rates during midday in all trees. However, during the dry season, the daily SF trends greatly varied among trees, allowing a classification of trees into three classes: 1) Trees that remain with SF maximum at midday, 2) trees that advanced their SF peak to early morning, and 3) trees that delayed their SF peak to late afternoon hours. This classification remained valid for the entire study period (2010-2015), and strongly correlated with tree height and DBH, and to a lower degree with crown size and competition index. In the dry season, class 3 trees (large) tended to delay the timing of SF maximum to the afternoon, and to advance their maximum diurnal DBH to early morning, while class 2 trees (smaller) advanced their SF maximum to early morning and had maximum daily DBH during midday and afternoon. Leaf-scale transpiration (T), measurements showed a typical morning peak in all trees, irrespective of classification, and a secondary peak in the afternoon in large trees only. Water potential and hydraulic conductivity in larger trees recovered faster from midday depression than in smaller ones. We concluded that the observed changes in the patterns of water flow into and out of the trees reflected differences in the utilization of external and internal 'water storage'. Large trees appear to rely on sufficient internal water storage that filled up in the morning (max DBH) and supported transpiration both in the morning and the afternoon, while SF increased throughout the day to compensate for the depletion in water storage (SF maximum in the afternoon). In contrast, small trees with insufficient internal water storage must rely on soil water availability and maximize SF in the morning to support concurrent tree transpiration, achieving some internal storage only in the afternoon, when T declines and maximum daily DBH is observed. The results indicated also that trees with insufficient internal storage, as can be detected by the simultaneous SF and DBH patterns, are likely to be more vulnerable to drought-related mortality since soil water availability may not be sufficient to support transpiration and stomata opening.
A novel underwater dam crack detection and classification approach based on sonar images
Shi, Pengfei; Fan, Xinnan; Ni, Jianjun; Khan, Zubair; Li, Min
2017-01-01
Underwater dam crack detection and classification based on sonar images is a challenging task because underwater environments are complex and because cracks are quite random and diverse in nature. Furthermore, obtainable sonar images are of low resolution. To address these problems, a novel underwater dam crack detection and classification approach based on sonar imagery is proposed. First, the sonar images are divided into image blocks. Second, a clustering analysis of a 3-D feature space is used to obtain the crack fragments. Third, the crack fragments are connected using an improved tensor voting method. Fourth, a minimum spanning tree is used to obtain the crack curve. Finally, an improved evidence theory combined with fuzzy rule reasoning is proposed to classify the cracks. Experimental results show that the proposed approach is able to detect underwater dam cracks and classify them accurately and effectively under complex underwater environments. PMID:28640925
A novel underwater dam crack detection and classification approach based on sonar images.
Shi, Pengfei; Fan, Xinnan; Ni, Jianjun; Khan, Zubair; Li, Min
2017-01-01
Underwater dam crack detection and classification based on sonar images is a challenging task because underwater environments are complex and because cracks are quite random and diverse in nature. Furthermore, obtainable sonar images are of low resolution. To address these problems, a novel underwater dam crack detection and classification approach based on sonar imagery is proposed. First, the sonar images are divided into image blocks. Second, a clustering analysis of a 3-D feature space is used to obtain the crack fragments. Third, the crack fragments are connected using an improved tensor voting method. Fourth, a minimum spanning tree is used to obtain the crack curve. Finally, an improved evidence theory combined with fuzzy rule reasoning is proposed to classify the cracks. Experimental results show that the proposed approach is able to detect underwater dam cracks and classify them accurately and effectively under complex underwater environments.
Martin, Peter; Davies, Roger; Macdougall, Amy; Ritchie, Benjamin; Vostanis, Panos; Whale, Andy; Wolpert, Miranda
2017-09-01
Case-mix classification is a focus of international attention in considering how best to manage and fund services, by providing a basis for fairer comparison of resource utilization. Yet there is little evidence of the best ways to establish case mix for child and adolescent mental health services (CAMHS). To develop a case mix classification for CAMHS that is clinically meaningful and predictive of number of appointments attended and to investigate the influence of presenting problems, context and complexity factors and provider variation. We analysed 4573 completed episodes of outpatient care from 11 English CAMHS. Cluster analysis, regression trees and a conceptual classification based on clinical best practice guidelines were compared regarding their ability to predict number of appointments, using mixed effects negative binomial regression. The conceptual classification is clinically meaningful and did as well as data-driven classifications in accounting for number of appointments. There was little evidence for effects of complexity or context factors, with the possible exception of school attendance problems. Substantial variation in resource provision between providers was not explained well by case mix. The conceptually-derived classification merits further testing and development in the context of collaborative decision making.
Discriminative Hierarchical K-Means Tree for Large-Scale Image Classification.
Chen, Shizhi; Yang, Xiaodong; Tian, Yingli
2015-09-01
A key challenge in large-scale image classification is how to achieve efficiency in terms of both computation and memory without compromising classification accuracy. The learning-based classifiers achieve the state-of-the-art accuracies, but have been criticized for the computational complexity that grows linearly with the number of classes. The nonparametric nearest neighbor (NN)-based classifiers naturally handle large numbers of categories, but incur prohibitively expensive computation and memory costs. In this brief, we present a novel classification scheme, i.e., discriminative hierarchical K-means tree (D-HKTree), which combines the advantages of both learning-based and NN-based classifiers. The complexity of the D-HKTree only grows sublinearly with the number of categories, which is much better than the recent hierarchical support vector machines-based methods. The memory requirement is the order of magnitude less than the recent Naïve Bayesian NN-based approaches. The proposed D-HKTree classification scheme is evaluated on several challenging benchmark databases and achieves the state-of-the-art accuracies, while with significantly lower computation cost and memory requirement.
Apple rootstocks: history, physiology, management and breeding
USDA-ARS?s Scientific Manuscript database
For more than two millennia superior fruit tree genotypes have been grafted onto rootstocks to maintain the genetic identity of the desirable scions. Until the 20th century most fruit trees were grafted onto seedling rootstocks. Following the classification, evaluation and propagation of clonal root...
ERIC Educational Resources Information Center
Markham, Mary T.
2000-01-01
Introduces a unit on forest management in which students manage the school forest. Involves students in tree identification, determining the size or volume and height of trees, and evaluation of the forest for management decisions. Integrates mathematics, writing, and social studies with plant classification, plant reproduction, and the use of…
Building Diversified Multiple Trees for classification in high dimensional noisy biomedical data.
Li, Jiuyong; Liu, Lin; Liu, Jixue; Green, Ryan
2017-12-01
It is common that a trained classification model is applied to the operating data that is deviated from the training data because of noise. This paper will test an ensemble method, Diversified Multiple Tree (DMT), on its capability for classifying instances in a new laboratory using the classifier built on the instances of another laboratory. DMT is tested on three real world biomedical data sets from different laboratories in comparison with four benchmark ensemble methods, AdaBoost, Bagging, Random Forests, and Random Trees. Experiments have also been conducted on studying the limitation of DMT and its possible variations. Experimental results show that DMT is significantly more accurate than other benchmark ensemble classifiers on classifying new instances of a different laboratory from the laboratory where instances are used to build the classifier. This paper demonstrates that an ensemble classifier, DMT, is more robust in classifying noisy data than other widely used ensemble methods. DMT works on the data set that supports multiple simple trees.
Factor complexity of crash occurrence: An empirical demonstration using boosted regression trees.
Chung, Yi-Shih
2013-12-01
Factor complexity is a characteristic of traffic crashes. This paper proposes a novel method, namely boosted regression trees (BRT), to investigate the complex and nonlinear relationships in high-variance traffic crash data. The Taiwanese 2004-2005 single-vehicle motorcycle crash data are used to demonstrate the utility of BRT. Traditional logistic regression and classification and regression tree (CART) models are also used to compare their estimation results and external validities. Both the in-sample cross-validation and out-of-sample validation results show that an increase in tree complexity provides improved, although declining, classification performance, indicating a limited factor complexity of single-vehicle motorcycle crashes. The effects of crucial variables including geographical, time, and sociodemographic factors explain some fatal crashes. Relatively unique fatal crashes are better approximated by interactive terms, especially combinations of behavioral factors. BRT models generally provide improved transferability than conventional logistic regression and CART models. This study also discusses the implications of the results for devising safety policies. Copyright © 2012 Elsevier Ltd. All rights reserved.
A self-trained classification technique for producing 30 m percent-water maps from Landsat data
Rover, Jennifer R.; Wylie, Bruce K.; Ji, Lei
2010-01-01
Small bodies of water can be mapped with moderate-resolution satellite data using methods where water is mapped as subpixel fractions using field measurements or high-resolution images as training datasets. A new method, developed from a regression-tree technique, uses a 30 m Landsat image for training the regression tree that, in turn, is applied to the same image to map subpixel water. The self-trained method was evaluated by comparing the percent-water map with three other maps generated from established percent-water mapping methods: (1) a regression-tree model trained with a 5 m SPOT 5 image, (2) a regression-tree model based on endmembers and (3) a linear unmixing classification technique. The results suggest that subpixel water fractions can be accurately estimated when high-resolution satellite data or intensively interpreted training datasets are not available, which increases our ability to map small water bodies or small changes in lake size at a regional scale.
NASA Astrophysics Data System (ADS)
Gutierrez-Velez, V. H.; DeFries, R. S.
2011-12-01
Oil palm expansion has led to clearing of extensive forest areas in the tropics. However quantitative assessments of the magnitude of oil palm expansion to deforestation have been challenging due in large part to the limitations presented by conventional optical data sets for discriminating plantations from forests and other tree cover vegetations. Recently available information from active remote sensors has opened the possibility of using these data sources to overcome these limitations. The purpose of this analysis is to evaluate the accuracy of oil palm classification when using ALOS/PALSAR active satellite data in conjunction with Landsat information, compared to the use of Landsat data only. The analysis takes place in a focused region around the city of Pucallpa in the Ucayali province of the Peruvian Amazon for the year 2010. Oil palm plantations were separated in five categories consisting of four age classes (0-3, 3-5, 5-10 and > 10 yrs) and an additional class accounting for degraded plantations older than 15 yr. Other land covers were water bodies, unvegetated land, short and tall grass, fallow, secondary vegetation, and forest. Classifications were performed using random forests. Training points for calibration and validation consisted of 411 polygons measured in areas representative of the land covers of interest and totaled 6,367 ha. Overall classification accuracy increased from 89.9% using only Landsat data sets to 94.3% using both Landast and ALOS/PALSAR. Both user's and producer's accuracy increased in all classes when using both data sets except for producer's accuracy in short grass which decreased by 1%. The largest increase in user's accuracy was obtained in oil palm plantations older than 10 years from 62 to 80% while producer's accuracy improved the most in plantations in age class 3-5 from 63 to 80%. Results demonstrate the suitability of data from ALOS/PALSAR and other active remote sensors to improve classification of oil palm plantations in age classes and discriminate them from other land covers. Results suggest a potential for improving discrimination of other tree cover types using a combination of active and conventional optical remote sensors.
NASA Astrophysics Data System (ADS)
Li, Mengmeng; Bijker, Wietske; Stein, Alfred
2015-04-01
Two main challenges are faced when classifying urban land cover from very high resolution satellite images: obtaining an optimal image segmentation and distinguishing buildings from other man-made objects. For optimal segmentation, this work proposes a hierarchical representation of an image by means of a Binary Partition Tree (BPT) and an unsupervised evaluation of image segmentations by energy minimization. For building extraction, we apply fuzzy sets to create a fuzzy landscape of shadows which in turn involves a two-step procedure. The first step is a preliminarily image classification at a fine segmentation level to generate vegetation and shadow information. The second step models the directional relationship between building and shadow objects to extract building information at the optimal segmentation level. We conducted the experiments on two datasets of Pléiades images from Wuhan City, China. To demonstrate its performance, the proposed classification is compared at the optimal segmentation level with Maximum Likelihood Classification and Support Vector Machine classification. The results show that the proposed classification produced the highest overall accuracies and kappa coefficients, and the smallest over-classification and under-classification geometric errors. We conclude first that integrating BPT with energy minimization offers an effective means for image segmentation. Second, we conclude that the directional relationship between building and shadow objects represented by a fuzzy landscape is important for building extraction.
On the detection of pornographic digital images
NASA Astrophysics Data System (ADS)
Schettini, Raimondo; Brambilla, Carla; Cusano, Claudio; Ciocca, Gianluigi
2003-06-01
The paper addresses the problem of distinguishing between pornographic and non-pornographic photographs, for the design of semantic filters for the web. Both, decision forests of trees built according to CART (Classification And Regression Trees) methodology and Support Vectors Machines (SVM), have been used to perform the classification. The photographs are described by a set of low-level features, features that can be automatically computed simply on gray-level and color representation of the image. The database used in our experiments contained 1500 photographs, 750 of which labeled as pornographic on the basis of the independent judgement of several viewers.
Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set.
Adler, Werner; Gefeller, Olaf; Gul, Asma; Horn, Folkert K; Khan, Zardad; Lausen, Berthold
2016-12-07
Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased risk of glaucoma.
Yule, Daniel L.; Adams, Jean V.; Hrabik, Thomas R.; Vinson, Mark R.; Woiak, Zebadiah; Ahrenstroff, Tyler D.
2013-01-01
Acoustic methods are used to estimate the density of pelagic fish in large lakes with results of midwater trawling used to assign species composition. Apportionment in lakes having mixed species can be challenging because only a small fraction of the water sampled acoustically is sampled with trawl gear. Here we describe a new method where single echo detections (SEDs) are assigned to species based on classification tree models developed from catch data that separate species based on fish size and the spatial habitats they occupy. During the summer of 2011, we conducted a spatially-balanced lake-wide acoustic and midwater trawl survey of Lake Superior. A total of 51 sites in four bathymetric depth strata (0–30 m, 30–100 m, 100–200 m, and >200 m) were sampled. We developed classification tree models for each stratum and found fish length was the most important variable for separating species. To apply these trees to the acoustic data, we needed to identify a target strength to length (TS-to-L) relationship appropriate for all abundant Lake Superior pelagic species. We tested performance of 7 general (i.e., multi-species) relationships derived from three published studies. The best-performing relationship was identified by comparing predicted and observed catch compositions using a second independent Lake Superior data set. Once identified, the relationship was used to predict lengths of SEDs from the lake-wide survey, and the classification tree models were used to assign each SED to a species. Exotic rainbow smelt (Osmerus mordax) were the most common species at bathymetric depths 100 m (384 million; 6.0 kt). Cisco (Coregonus artedi) were widely distributed over all strata with their population estimated at 182 million (44 kt). The apportionment method we describe should be transferable to other large lakes provided fish are not tightly aggregated, and an appropriate TS-to-L relationship for abundant pelagic fish species can be determined.
Perceived Organizational Support for Enhancing Welfare at Work: A Regression Tree Model
Giorgi, Gabriele; Dubin, David; Perez, Javier Fiz
2016-01-01
When trying to examine outcomes such as welfare and well-being, research tends to focus on main effects and take into account limited numbers of variables at a time. There are a number of techniques that may help address this problem. For example, many statistical packages available in R provide easy-to-use methods of modeling complicated analysis such as classification and tree regression (i.e., recursive partitioning). The present research illustrates the value of recursive partitioning in the prediction of perceived organizational support in a sample of more than 6000 Italian bankers. Utilizing the tree function party package in R, we estimated a regression tree model predicting perceived organizational support from a multitude of job characteristics including job demand, lack of job control, lack of supervisor support, training, etc. The resulting model appears particularly helpful in pointing out several interactions in the prediction of perceived organizational support. In particular, training is the dominant factor. Another dimension that seems to influence organizational support is reporting (perceived communication about safety and stress concerns). Results are discussed from a theoretical and methodological point of view. PMID:28082924
NASA Astrophysics Data System (ADS)
Wu, Jie; Besnehard, Quentin; Marchessoux, Cédric
2011-03-01
Clinical studies for the validation of new medical imaging devices require hundreds of images. An important step in creating and tuning the study protocol is the classification of images into "difficult" and "easy" cases. This consists of classifying the image based on features like the complexity of the background, the visibility of the disease (lesions). Therefore, an automatic medical background classification tool for mammograms would help for such clinical studies. This classification tool is based on a multi-content analysis framework (MCA) which was firstly developed to recognize image content of computer screen shots. With the implementation of new texture features and a defined breast density scale, the MCA framework is able to automatically classify digital mammograms with a satisfying accuracy. BI-RADS (Breast Imaging Reporting Data System) density scale is used for grouping the mammograms, which standardizes the mammography reporting terminology and assessment and recommendation categories. Selected features are input into a decision tree classification scheme in MCA framework, which is the so called "weak classifier" (any classifier with a global error rate below 50%). With the AdaBoost iteration algorithm, these "weak classifiers" are combined into a "strong classifier" (a classifier with a low global error rate) for classifying one category. The results of classification for one "strong classifier" show the good accuracy with the high true positive rates. For the four categories the results are: TP=90.38%, TN=67.88%, FP=32.12% and FN =9.62%.
From Curves to Trees: A Tree-like Shapes Distance Using the Elastic Shape Analysis Framework.
Mottini, A; Descombes, X; Besse, F
2015-04-01
Trees are a special type of graph that can be found in various disciplines. In the field of biomedical imaging, trees have been widely studied as they can be used to describe structures such as neurons, blood vessels and lung airways. It has been shown that the morphological characteristics of these structures can provide information on their function aiding the characterization of pathological states. Therefore, it is important to develop methods that analyze their shape and quantify differences between their structures. In this paper, we present a method for the comparison of tree-like shapes that takes into account both topological and geometrical information. This method, which is based on the Elastic Shape Analysis Framework, also computes the mean shape of a population of trees. As a first application, we have considered the comparison of axon morphology. The performance of our method has been evaluated on two sets of images. For the first set of images, we considered four different populations of neurons from different animals and brain sections from the NeuroMorpho.org open database. The second set was composed of a database of 3D confocal microscopy images of three populations of axonal trees (normal and two types of mutations) of the same type of neurons. We have calculated the inter and intra class distances between the populations and embedded the distance in a classification scheme. We have compared the performance of our method against three other state of the art algorithms, and results showed that the proposed method better distinguishes between the populations. Furthermore, we present the mean shape of each population. These shapes present a more complete picture of the morphological characteristics of each population, compared to the average value of certain predefined features.
Mitochondrial DNA haplogroup phylogeny of the dog: Proposal for a cladistic nomenclature.
Fregel, Rosa; Suárez, Nicolás M; Betancor, Eva; González, Ana M; Cabrera, Vicente M; Pestano, José
2015-05-01
Canis lupus familiaris mitochondrial DNA analysis has increased in recent years, not only for the purpose of deciphering dog domestication but also for forensic genetic studies or breed characterization. The resultant accumulation of data has increased the need for a normalized and phylogenetic-based nomenclature like those provided for human maternal lineages. Although a standardized classification has been proposed, haplotype names within clades have been assigned gradually without considering the evolutionary history of dog mtDNA. Moreover, this classification is based only on the D-loop region, proven to be insufficient for phylogenetic purposes due to its high number of recurrent mutations and the lack of relevant information present in the coding region. In this study, we design 1) a refined mtDNA cladistic nomenclature from a phylogenetic tree based on complete sequences, classifying dog maternal lineages into haplogroups defined by specific diagnostic mutations, and 2) a coding region SNP analysis that allows a more accurate classification into haplogroups when combined with D-loop sequencing, thus improving the phylogenetic information obtained in dog mitochondrial DNA studies. Copyright © 2015 Elsevier B.V. All rights reserved.
Research on Classification of Chinese Text Data Based on SVM
NASA Astrophysics Data System (ADS)
Lin, Yuan; Yu, Hongzhi; Wan, Fucheng; Xu, Tao
2017-09-01
Data Mining has important application value in today’s industry and academia. Text classification is a very important technology in data mining. At present, there are many mature algorithms for text classification. KNN, NB, AB, SVM, decision tree and other classification methods all show good classification performance. Support Vector Machine’ (SVM) classification method is a good classifier in machine learning research. This paper will study the classification effect based on the SVM method in the Chinese text data, and use the support vector machine method in the chinese text to achieve the classify chinese text, and to able to combination of academia and practical application.
NASA Astrophysics Data System (ADS)
Burkholder, Aaron
This project investigated the spectral separability of the invasive species Ailanthus altissima, commonly called tree of heaven, and four other native species. Leaves were collected from Ailanthus and four native tree species from May 13 through August 24, 2008, and spectral reflectance factor measurements were gathered for each tree using an ASD (Boulder, Colorado) FieldSpec Pro full-range spectroradiometer. The original data covered the range from 350-2500 nm, with one reflectance measurement collected per one nm wavelength. To reduce dimensionality, the measurements were resampled to the actual resolution of the spectrometer's sensors, and regions of atmospheric absorption were removed. Continuum removal was performed on the reflectance data, resulting in a second dataset. For both the reflectance and continuum removed datasets, least angle regression (LARS) and random forest classification were used to identify a single set of optimal wavelengths across all sampled dates, a set of optimal wavelengths for each date, and the dates for which Ailanthus is most separable from other species. It was found that classification accuracy varies both with dates and bands used. Contrary to expectations that early spring would provide the best separability, the lowest classification error was observed on July 22 for the reflectance data, and on May 13, July 11 and August 1 for the continuum removed data. This suggests that July and August are also potentially good months for species differentiation. Applying continuum removal in many cases reduced classification error, although not consistently. Band selection seems to be more important for reflectance data in that it results in greater improvement in classification accuracy, and LARS appears to be an effective band selection tool. The optimal spectral bands were selected from across the spectrum, often with bands from the blue (401-431 nm), NIR (1115 nm) and SWIR (1985-1995 nm), suggesting that hyperspectral sensors with broad wavelength sensitivity are important for mapping and identification of Ailanthus.
Stratification of the severity of critically ill patients with classification trees
2009-01-01
Background Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69-75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients. PMID:20003229
Moon, Mikyung; Lee, Soo-Kyoung
2017-01-01
The purpose of this study was to use decision tree analysis to explore the factors associated with pressure ulcers (PUs) among elderly people admitted to Korean long-term care facilities. The data were extracted from the 2014 National Inpatient Sample (NIS)-data of Health Insurance Review and Assessment Service (HIRA). A MapReduce-based program was implemented to join and filter 5 tables of the NIS. The outcome predicted by the decision tree model was the prevalence of PUs as defined by the Korean Standard Classification of Disease-7 (KCD-7; code L89 * ). Using R 3.3.1, a decision tree was generated with the finalized 15,856 cases and 830 variables. The decision tree displayed 15 subgroups with 8 variables showing 0.804 accuracy, 0.820 sensitivity, and 0.787 specificity. The most significant primary predictor of PUs was length of stay less than 0.5 day. Other predictors were the presence of an infectious wound dressing, followed by having diagnoses numbering less than 3.5 and the presence of a simple dressing. Among diagnoses, "injuries to the hip and thigh" was the top predictor ranking 5th overall. Total hospital cost exceeding 2,200,000 Korean won (US $2,000) rounded out the top 7. These results support previous studies that showed length of stay, comorbidity, and total hospital cost were associated with PUs. Moreover, wound dressings were commonly used to treat PUs. They also show that machine learning, such as a decision tree, could effectively predict PUs using big data.
From Google Maps to a fine-grained catalog of street trees
NASA Astrophysics Data System (ADS)
Branson, Steve; Wegner, Jan Dirk; Hall, David; Lang, Nico; Schindler, Konrad; Perona, Pietro
2018-01-01
Up-to-date catalogs of the urban tree population are of importance for municipalities to monitor and improve quality of life in cities. Despite much research on automation of tree mapping, mainly relying on dedicated airborne LiDAR or hyperspectral campaigns, tree detection and species recognition is still mostly done manually in practice. We present a fully automated tree detection and species recognition pipeline that can process thousands of trees within a few hours using publicly available aerial and street view images of Google MapsTM. These data provide rich information from different viewpoints and at different scales from global tree shapes to bark textures. Our work-flow is built around a supervised classification that automatically learns the most discriminative features from thousands of trees and corresponding, publicly available tree inventory data. In addition, we introduce a change tracker that recognizes changes of individual trees at city-scale, which is essential to keep an urban tree inventory up-to-date. The system takes street-level images of the same tree location at two different times and classifies the type of change (e.g., tree has been removed). Drawing on recent advances in computer vision and machine learning, we apply convolutional neural networks (CNN) for all classification tasks. We propose the following pipeline: download all available panoramas and overhead images of an area of interest, detect trees per image and combine multi-view detections in a probabilistic framework, adding prior knowledge; recognize fine-grained species of detected trees. In a later, separate module, track trees over time, detect significant changes and classify the type of change. We believe this is the first work to exploit publicly available image data for city-scale street tree detection, species recognition and change tracking, exhaustively over several square kilometers, respectively many thousands of trees. Experiments in the city of Pasadena, California, USA show that we can detect >70% of the street trees, assign correct species to >80% for 40 different species, and correctly detect and classify changes in >90% of the cases.
Multi-test decision tree and its application to microarray data classification.
Czajkowski, Marcin; Grześ, Marek; Kretowski, Marek
2014-05-01
The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 14 datasets by an average 6%. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts. Copyright © 2014 Elsevier B.V. All rights reserved.
Eric Rowell; Carl Selelstad; Lee Vierling; Lloyd Queen; Wayne Sheppard
2006-01-01
The success of a local maximum (LM) tree detection algorithm for detecting individual trees from lidar data depends on stand conditions that are often highly variable. A laser height variance and percent canopy cover (PCC) classification is used to segment the landscape by stand condition prior to stem detection. We test the performance of the LM algorithm using canopy...
Khalkhali, Hamid Reza; Lotfnezhad Afshar, Hadi; Esnaashari, Omid; Jabbari, Nasrollah
2016-01-01
Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.
A hybrid approach to select features and classify diseases based on medical data
NASA Astrophysics Data System (ADS)
AbdelLatif, Hisham; Luo, Jiawei
2018-03-01
Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms
NASA Astrophysics Data System (ADS)
Ganguly, S.; Basu, S.; Mukhopadhyay, S.; Michaelis, A.; Milesi, C.; Votava, P.; Nemani, R. R.
2013-12-01
An unresolved issue with coarse-to-medium resolution satellite-based forest carbon mapping over regional to continental scales is the high level of uncertainty in above ground biomass (AGB) estimates caused by the absence of forest cover information at a high enough spatial resolution (current spatial resolution is limited to 30-m). To put confidence in existing satellite-derived AGB density estimates, it is imperative to create continuous fields of tree cover at a sufficiently high resolution (e.g. 1-m) such that large uncertainties in forested area are reduced. The proposed work will provide means to reduce uncertainty in present satellite-derived AGB maps and Forest Inventory and Analysis (FIA) based regional estimates. Our primary objective will be to create Very High Resolution (VHR) estimates of tree cover at a spatial resolution of 1-m for the Continental United States using all available National Agriculture Imaging Program (NAIP) color-infrared imagery from 2010 till 2012. We will leverage the existing capabilities of the NASA Earth Exchange (NEX) high performance computing and storage facilities. The proposed 1-m tree cover map can be further aggregated to provide percent tree cover at any medium-to-coarse resolution spatial grid, which will aid in reducing uncertainties in AGB density estimation at the respective grid and overcome current limitations imposed by medium-to-coarse resolution land cover maps. We have implemented a scalable and computationally-efficient parallelized framework for tree-cover delineation - the core components of the algorithm [that] include a feature extraction process, a Statistical Region Merging image segmentation algorithm and a classification algorithm based on Deep Belief Network and a Feedforward Backpropagation Neural Network algorithm. An initial pilot exercise has been performed over the state of California (~11,000 scenes) to create a wall-to-wall 1-m tree cover map and the classification accuracy has been assessed. Results show an improvement in accuracy of tree-cover delineation as compared to existing forest cover maps from NLCD, especially over fragmented, heterogeneous and urban landscapes. Estimates of VHR tree cover will complement and enhance the accuracy of present remote-sensing based AGB modeling approaches and forest inventory based estimates at both national and local scales. A requisite step will be to characterize the inherent uncertainties in tree cover estimates and propagate them to estimate AGB.
Automatic Classification of Trees from Laser Scanning Point Clouds
NASA Astrophysics Data System (ADS)
Sirmacek, B.; Lindenbergh, R.
2015-08-01
Development of laser scanning technologies has promoted tree monitoring studies to a new level, as the laser scanning point clouds enable accurate 3D measurements in a fast and environmental friendly manner. In this paper, we introduce a probability matrix computation based algorithm for automatically classifying laser scanning point clouds into 'tree' and 'non-tree' classes. Our method uses the 3D coordinates of the laser scanning points as input and generates a new point cloud which holds a label for each point indicating if it belongs to the 'tree' or 'non-tree' class. To do so, a grid surface is assigned to the lowest height level of the point cloud. The grids are filled with probability values which are calculated by checking the point density above the grid. Since the tree trunk locations appear with very high values in the probability matrix, selecting the local maxima of the grid surface help to detect the tree trunks. Further points are assigned to tree trunks if they appear in the close proximity of trunks. Since heavy mathematical computations (such as point cloud organization, detailed shape 3D detection methods, graph network generation) are not required, the proposed algorithm works very fast compared to the existing methods. The tree classification results are found reliable even on point clouds of cities containing many different objects. As the most significant weakness, false detection of light poles, traffic signs and other objects close to trees cannot be prevented. Nevertheless, the experimental results on mobile and airborne laser scanning point clouds indicate the possible usage of the algorithm as an important step for tree growth observation, tree counting and similar applications. While the laser scanning point cloud is giving opportunity to classify even very small trees, accuracy of the results is reduced in the low point density areas further away than the scanning location. These advantages and disadvantages of two laser scanning point cloud sources are discussed in detail.
Julián-Ortiz, Jesus V de; Gozalbes, Rafael; Besalú, Emili
2016-01-01
The search for new drug candidates in databases is of paramount importance in pharmaceutical chemistry. The selection of molecular subsets is greatly optimized and much more promising when potential drug-like molecules are detected a priori. In this work, about one hundred thousand molecules are ranked following a new methodology: a drug/non-drug classifier constructed by a consensual set of classification trees. The classification trees arise from the stochastic generation of training sets, which in turn are used to estimate probability factors of test molecules to be drug-like compounds. Molecules were represented by Topological Quantum Similarity Indices and their Graph Theoretical counterparts. The contribution of the present paper consists of presenting an effective ranking method able to improve the probability of finding drug-like substances by using these types of molecular descriptors.
Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando
2014-09-01
This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation. © 2014 Institute of Food Technologists®
A Climatic Classification for Citrus Winter Survival in China.
NASA Astrophysics Data System (ADS)
Shou, Bo Huang
1991-05-01
The citrus tree is susceptible to frost damage. Winter injury to citrus from freezing weather is the major meteorological problem in the northern pail of citrus growing regions in China. Based on meteorological data collected at 120 stations in southern China and on the extent of citrus freezing injury, five climatic regions for citrus winter survival in China were developed. They were: 1) no citrus tree injury. 2) light injury to mandarins (citrus reticulate) or moderate injury to oranges (citrus sinensis), 3) moderate injury to mandarins or heavy injury to oranges, 4) heavy injury to mandarins, and 5) impossible citrus tree growth. This citrus climatic classification was an attempt to provide guidelines for regulation of citrus production, to effectively utilize land and climatic resources, to chose suitable citrus varieties, and to develop methods to prevent injury by freezing.
deGraffenried, Jeff B; Shepherd, Keith D
2009-12-15
Human induced soil erosion has severe economic and environmental impacts throughout the world. It is more severe in the tropics than elsewhere and results in diminished food production and security. Kenya has limited arable land and 30 percent of the country experiences severe to very severe human induced soil degradation. The purpose of this research was to test visible near infrared diffuse reflectance spectroscopy (VNIR) as a tool for rapid assessment and benchmarking of soil condition and erosion severity class. The study was conducted in the Saiwa River watershed in the northern Rift Valley Province of western Kenya, a tropical highland area. Soil 137 Cs concentration was measured to validate spectrally derived erosion classes and establish the background levels for difference land use types. Results indicate VNIR could be used to accurately evaluate a large and diverse soil data set and predict soil erosion characteristics. Soil condition was spectrally assessed and modeled. Analysis of mean raw spectra indicated significant reflectance differences between soil erosion classes. The largest differences occurred between 1,350 and 1,950 nm with the largest separation occurring at 1,920 nm. Classification and Regression Tree (CART) analysis indicated that the spectral model had practical predictive success (72%) with Receiver Operating Characteristic (ROC) of 0.74. The change in 137 Cs concentrations supported the premise that VNIR is an effective tool for rapid screening of soil erosion condition.
Zhao, Dehua; Jiang, Hao; Yang, Tangwu; Cai, Ying; Xu, Delin; An, Shuqing
2012-03-01
Classification trees (CT) have been used successfully in the past to classify aquatic vegetation from spectral indices (SI) obtained from remotely-sensed images. However, applying CT models developed for certain image dates to other time periods within the same year or among different years can reduce the classification accuracy. In this study, we developed CT models with modified thresholds using extreme SI values (CT(m)) to improve the stability of the models when applying them to different time periods. A total of 903 ground-truth samples were obtained in September of 2009 and 2010 and classified as emergent, floating-leaf, or submerged vegetation or other cover types. Classification trees were developed for 2009 (Model-09) and 2010 (Model-10) using field samples and a combination of two images from winter and summer. Overall accuracies of these models were 92.8% and 94.9%, respectively, which confirmed the ability of CT analysis to map aquatic vegetation in Taihu Lake. However, Model-10 had only 58.9-71.6% classification accuracy and 31.1-58.3% agreement (i.e., pixels classified the same in the two maps) for aquatic vegetation when it was applied to image pairs from both a different time period in 2010 and a similar time period in 2009. We developed a method to estimate the effects of extrinsic (EF) and intrinsic (IF) factors on model uncertainty using Modis images. Results indicated that 71.1% of the instability in classification between time periods was due to EF, which might include changes in atmospheric conditions, sun-view angle and water quality. The remainder was due to IF, such as phenological and growth status differences between time periods. The modified version of Model-10 (i.e. CT(m)) performed better than traditional CT with different image dates. When applied to 2009 images, the CT(m) version of Model-10 had very similar thresholds and performance as Model-09, with overall accuracies of 92.8% and 90.5% for Model-09 and the CT(m) version of Model-10, respectively. CT(m) decreased the variability related to EF and IF and thereby improved the applicability of the models to different time periods. In both practice and theory, our results suggested that CT(m) was more stable than traditional CT models and could be used to map aquatic vegetation in time periods other than the one for which the model was developed. Copyright © 2011 Elsevier Ltd. All rights reserved.
Doyle, Suzanne R.; Donovan, Dennis M.
2014-01-01
Aims The purpose of this study was to explore the selection of predictor variables in the evaluation of drug treatment completion using an ensemble approach with classification trees. The basic methodology is reviewed and the subagging procedure of random subsampling is applied. Methods Among 234 individuals with stimulant use disorders randomized to a 12-Step facilitative intervention shown to increase stimulant use abstinence, 67.52% were classified as treatment completers. A total of 122 baseline variables were used to identify factors associated with completion. Findings The number of types of self-help activity involvement prior to treatment was the predominant predictor. Other effective predictors included better coping self-efficacy for substance use in high-risk situations, more days of prior meeting attendance, greater acceptance of the Disease model, higher confidence for not resuming use following discharge, lower ASI Drug and Alcohol composite scores, negative urine screens for cocaine or marijuana, and fewer employment problems. Conclusions The application of an ensemble subsampling regression tree method utilizes the fact that classification trees are unstable but, on average, produce an improved prediction of the completion of drug abuse treatment. The results support the notion there are early indicators of treatment completion that may allow for modification of approaches more tailored to fitting the needs of individuals and potentially provide more successful treatment engagement and improved outcomes. PMID:25134038
A cladistic analysis of Aristotle's animal groups in the Historia animalium.
von Lieven, Alexander Fürst; Humar, Marcel
2008-01-01
The Historia animalium (HA) of Aristotle contains an extraordinarily rich compilation of descriptions of animal anatomy, development, and behaviour. It is believed that Aristotle's aim in HA was to describe the correlations of characters rather than to classify or define animal groups. In order to assess if Aristotle, while organising his character correlations, referred to a pre-existing classification that underlies the descriptions in HA, we carried out a cladistic analysis according to the following procedure: by disentangeling 147 species and 40 higher taxa-designations from 157 predicates in the texts, we transcribed Aristotle's descriptions on anatomy and development of animals in books I-V of HA into a character matrix for a cladistic analysis. By analysing the distribution of characters as described in his books, we obtained a non-phylogenetic dendrogram displaying 58 monophyletic groups, 29 of which have equivalents among Aristotle's group designations. Eleven Aristotelian groupings turned out to be non-monophyletic, and six of them are inconsistent with the monophyletic groups. Twelve of 29 taxa without equivalents in Aristotle's works have equivalents in modern classifications. With this analysis we demonstate there exists a fairly consistent underlying classification in the zoological works of Aristotle. The peculiarities of Aristotle's character basis are discussed and the dendrogram is compared with a current phylogenetic tree.
Weakly Supervised Dictionary Learning
NASA Astrophysics Data System (ADS)
You, Zeyu; Raich, Raviv; Fern, Xiaoli Z.; Kim, Jinsub
2018-05-01
We present a probabilistic modeling and inference framework for discriminative analysis dictionary learning under a weak supervision setting. Dictionary learning approaches have been widely used for tasks such as low-level signal denoising and restoration as well as high-level classification tasks, which can be applied to audio and image analysis. Synthesis dictionary learning aims at jointly learning a dictionary and corresponding sparse coefficients to provide accurate data representation. This approach is useful for denoising and signal restoration, but may lead to sub-optimal classification performance. By contrast, analysis dictionary learning provides a transform that maps data to a sparse discriminative representation suitable for classification. We consider the problem of analysis dictionary learning for time-series data under a weak supervision setting in which signals are assigned with a global label instead of an instantaneous label signal. We propose a discriminative probabilistic model that incorporates both label information and sparsity constraints on the underlying latent instantaneous label signal using cardinality control. We present the expectation maximization (EM) procedure for maximum likelihood estimation (MLE) of the proposed model. To facilitate a computationally efficient E-step, we propose both a chain and a novel tree graph reformulation of the graphical model. The performance of the proposed model is demonstrated on both synthetic and real-world data.
Corkeron, Peter; Rolland, Rosalind M; Hunt, Kathleen E; Kraus, Scott D
2017-01-01
Immunoassay of hormone metabolites extracted from faecal samples of free-ranging large whales can provide biologically relevant information on reproductive state and stress responses. North Atlantic right whales ( Eubalaena glacialis Müller 1776) are an ideal model for testing the conservation value of faecal metabolites. Almost all North Atlantic right whales are individually identified, most of the population is sighted each year, and systematic survey effort extends back to 1986. North Atlantic right whales number <500 individuals and are subject to anthropogenic mortality, morbidity and other stressors, and scientific data to inform conservation planning are recognized as important. Here, we describe the use of classification trees as an alternative method of analysing multiple-hormone data sets, building on univariate models that have previously been used to describe hormone profiles of individual North Atlantic right whales of known reproductive state. Our tree correctly classified the age class, sex and reproductive state of 83% of 112 faecal samples from known individual whales. Pregnant females, lactating females and both mature and immature males were classified reliably using our model. Non-reproductive [i.e. 'resting' (not pregnant and not lactating) and immature] females proved the most unreliable to distinguish. There were three individual males that, given their age, would traditionally be considered immature but that our tree classed as mature males, possibly calling for a re-evaluation of their reproductive status. Our analysis reiterates the importance of considering the reproductive state of whales when assessing the relationship between cortisol concentrations and stress. Overall, these results confirm findings from previous univariate statistical analyses, but with a more robust multivariate approach that may prove useful for the multiple-analyte data sets that are increasingly used by conservation physiologists.
Computer-aided diagnosis of breast microcalcifications based on dual-tree complex wavelet transform.
Jian, Wushuai; Sun, Xueyan; Luo, Shuqian
2012-12-19
Digital mammography is the most reliable imaging modality for breast carcinoma diagnosis and breast micro-calcifications is regarded as one of the most important signs on imaging diagnosis. In this paper, a computer-aided diagnosis (CAD) system is presented for breast micro-calcifications based on dual-tree complex wavelet transform (DT-CWT) to facilitate radiologists like double reading. Firstly, 25 abnormal ROIs were extracted according to the center and diameter of the lesions manually and 25 normal ROIs were selected randomly. Then micro-calcifications were segmented by combining space and frequency domain techniques. We extracted three texture features based on wavelet (Haar, DB4, DT-CWT) transform. Totally 14 descriptors were introduced to define the characteristics of the suspicious micro-calcifications. Principal Component Analysis (PCA) was used to transform these descriptors to a compact and efficient vector expression. Support Vector Machine (SVM) classifier was used to classify potential micro-calcifications. Finally, we used the receiver operating characteristic (ROC) curve and free-response operating characteristic (FROC) curve to evaluate the performance of the CAD system. The results of SVM classifications based on different wavelets shows DT-CWT has a better performance. Compared with other results, DT-CWT method achieved an accuracy of 96% and 100% for the classification of normal and abnormal ROIs, and the classification of benign and malignant micro-calcifications respectively. In FROC analysis, our CAD system for clinical dataset detection achieved a sensitivity of 83.5% at a false positive per image of 1.85. Compared with general wavelets, DT-CWT could describe the features more effectively, and our CAD system had a competitive performance.
Computer-aided diagnosis of breast microcalcifications based on dual-tree complex wavelet transform
2012-01-01
Background Digital mammography is the most reliable imaging modality for breast carcinoma diagnosis and breast micro-calcifications is regarded as one of the most important signs on imaging diagnosis. In this paper, a computer-aided diagnosis (CAD) system is presented for breast micro-calcifications based on dual-tree complex wavelet transform (DT-CWT) to facilitate radiologists like double reading. Methods Firstly, 25 abnormal ROIs were extracted according to the center and diameter of the lesions manually and 25 normal ROIs were selected randomly. Then micro-calcifications were segmented by combining space and frequency domain techniques. We extracted three texture features based on wavelet (Haar, DB4, DT-CWT) transform. Totally 14 descriptors were introduced to define the characteristics of the suspicious micro-calcifications. Principal Component Analysis (PCA) was used to transform these descriptors to a compact and efficient vector expression. Support Vector Machine (SVM) classifier was used to classify potential micro-calcifications. Finally, we used the receiver operating characteristic (ROC) curve and free-response operating characteristic (FROC) curve to evaluate the performance of the CAD system. Results The results of SVM classifications based on different wavelets shows DT-CWT has a better performance. Compared with other results, DT-CWT method achieved an accuracy of 96% and 100% for the classification of normal and abnormal ROIs, and the classification of benign and malignant micro-calcifications respectively. In FROC analysis, our CAD system for clinical dataset detection achieved a sensitivity of 83.5% at a false positive per image of 1.85. Conclusions Compared with general wavelets, DT-CWT could describe the features more effectively, and our CAD system had a competitive performance. PMID:23253202
Estimation of Trees Outside Forests using IRS High Resolution data by Object Based Image Analysis
NASA Astrophysics Data System (ADS)
Pujar, G. S.; Reddy, P. M.; Reddy, C. S.; Jha, C. S.; Dadhwal, V. K.
2014-11-01
Assessment of Trees outside forests (TOF) is widely being recognized as a pivotal theme, in sustainable natural resource management, due to their role in offering variety of goods, such as timber, fruits and fodder as well as services like water, carbon, biodiversity. Forest Conservation efforts involving reduction of deforestation and degradation may have to increasingly rely on alternatives provided by TOF in catering to economic demands in forest edges. Spatial information systems involving imaging, analysis and monitoring to achieve objectives under protocols like REDD+, require incorporation of information content from areas under forest as well as trees outside forests, to aid holistic decisions. In this perspective, automation in retrieving information on area under trees, growing outside forests, using high resolution imaging is essential so that measuring and verification of extant carbon pools, are strengthened. Retrieval of this tree cover is demonstrated herewith, using object based image analysis in a forest edge of dry deciduous forests of Eastern Ghats, in Khammam district of Telangana state of India. IRS high resolution panchromatic 2.5 m data (Cartosat-1 Orthorectified) used in tandem with 5.8 m multispectral LISS IV data, discerns tree crowns and clusters at a detailed scale and hence semi-automated approach is attempted to classify TOF from a pair of image from relatively crop and cloud free season. Object based image analysis(OBIA) approach as implemented in commercial suite of e-Cognition (Ver 8.9) consists of segmentation at user defined scale followed by application of wide range of spectral, textural and object geometry based parameters for classification. Software offers innovative blend of raster and vector features that can be juxtaposed flexibly, across scales horizontally or vertically. Segmentation was carried out at multiple scales to discern first the major land covers, such as forest, water, agriculture followed by that at a finer scale, within cultivated landscape. Latter scale aimed to segregate TOF in configurations such as individual or scattered crowns, linear formations and patch TOF. As per the adopted norms in India for defining tree cover, units up to 1 ha area were considered as candidate TOF. Classification of fine scale (at 10) segments was accomplished using size, shape and texture. A customised parameter involving ratio of area of segment to its main skeleton length discerned linear formations consistently. Texture of Cartosat-1 2.5 m data was also used segregate tree cover from smoother crop patches in patch TOF category. In view of the specificity of the landscape character, continuum of cultivated area (b) and pockets of cultivation within forest (c) as well as the entire study area (a) were considered as three envelopes for evaluating the accuracy of the method. Accuracies not less than 75.1 per cent were reported in all the envelopes with a kappa accuracy of not less than 0.58. Overall accuracy of entire study area was 75.9 per cent with Kappa of 0.59 followed by 75.1 per cent ( Kappa: 0.58 ) of agricultural landscape (b). In pockets of cultivation context(c) accuracy was higher at 79.2 per cent ( Kappa: 0.64 ) possibly due to smaller population. Assessment showed that 1,791 ha of 24,140 ha studied (7.42 %) was under tree cover as per the definitions adopted. Strength of accuracy demonstrated obviously points to the potential of IRS high resolution data combination in setting up procedures to monitor the TOF in Indian context using OBIA approach so as to cater to the evolving demands of resource assessment and monitoring.
Magagna, Federico; Guglielmetti, Alessandro; Liberto, Erica; Reichenbach, Stephen E; Allegrucci, Elena; Gobino, Guido; Bicchi, Carlo; Cordero, Chiara
2017-08-02
This study investigates chemical information of volatile fractions of high-quality cocoa (Theobroma cacao L. Malvaceae) from different origins (Mexico, Ecuador, Venezuela, Columbia, Java, Trinidad, and Sao Tomè) produced for fine chocolate. This study explores the evolution of the entire pattern of volatiles in relation to cocoa processing (raw, roasted, steamed, and ground beans). Advanced chemical fingerprinting (e.g., combined untargeted and targeted fingerprinting) with comprehensive two-dimensional gas chromatography coupled with mass spectrometry allows advanced pattern recognition for classification, discrimination, and sensory-quality characterization. The entire data set is analyzed for 595 reliable two-dimensional peak regions, including 130 known analytes and 13 potent odorants. Multivariate analysis with unsupervised exploration (principal component analysis) and simple supervised discrimination methods (Fisher ratios and linear regression trees) reveal informative patterns of similarities and differences and identify characteristic compounds related to sample origin and manufacturing step.
Validation of tool mark analysis of cut costal cartilage.
Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles
2012-03-01
This study was designed to establish the potential error rate associated with the generally accepted method of tool mark analysis of cut marks in costal cartilage. Three knives with different blade types were used to make experimental cut marks in costal cartilage of pigs. Each cut surface was cast, and each cast was examined by three analysts working independently. The presence of striations, regularity of striations, and presence of a primary and secondary striation pattern were recorded for each cast. The distance between each striation was measured. The results showed that striations were not consistently impressed on the cut surface by the blade's cutting edge. Also, blade type classification by the presence or absence of striations led to a 65% misclassification rate. Use of the classification tree and cross-validation methods and inclusion of the mean interstriation distance decreased the error rate to c. 50%. © 2011 American Academy of Forensic Sciences.
Observations and Modelling of Alternative Tree Cover States of the Boreal Ecosystem
NASA Astrophysics Data System (ADS)
Abis, B.; Brovkin, V.
2017-12-01
Recently, multimodality of the tree cover distribution of the boreal forests has been detected, revealing the existence of three alternative vegetation modes. Identifying which are the regions with a potential for alternative tree cover states, and assessing which are the main factors underlying their existence, is important to project future change of natural vegetation cover and its effect on climate.Through the use of generalised additive models and phase-space analysis, we study the link between tree cover distribution and eight globally-observed environmental factors, such as rainfall, temperature, and permafrost distribution. Using a classification based on these factors, we show the location of areas with potentially alternative tree cover states under the same environmental conditions in the boreal region. Furthermore, to explain the multimodality found in the data and the asymmetry between North America and Eurasia, we study a conceptual model based on tree species competition, and use it to simulate the sensitivity of tree cover to changes in environmental factors.We find that the link between individual environmental variables and tree cover differs regionally. Nonetheless, environmental conditions uniquely determine the vegetation state among the three dominant modes in ˜95% of the cases. On the other hand, areas with potentially alternative tree cover states encompass ˜1.1 million km2, and correspond to possible transition zones with a reduced resilience to disturbances. Employing our conceptual model, we show that multimodality can be explained through competition between tree species with different adaptations to environmental factors and disturbances. Moreover, the model is able to reproduce the asymmetry in tree species distribution between Eurasia and North America. Finally, we find that changes in permafrost could be associated with bifurcation points of the model, corroborating the importance of permafrost in a changing climate.
This research examined sub-pixel land-cover classification performance for tree canopy, impervious surface, and cropland in the Laurentian Great Lakes Basin (GLB) using both timeseries MODIS (MOderate Resolution Imaging Spectroradiometer) NDVI (Normalized Difference Vegetation In...
Landenburger, L.; Lawrence, R.L.; Podruzny, S.; Schwartz, C.C.
2008-01-01
Moderate resolution satellite imagery traditionally has been thought to be inadequate for mapping vegetation at the species level. This has made comprehensive mapping of regional distributions of sensitive species, such as whitebark pine, either impractical or extremely time consuming. We sought to determine whether using a combination of moderate resolution satellite imagery (Landsat Enhanced Thematic Mapper Plus), extensive stand data collected by land management agencies for other purposes, and modern statistical classification techniques (boosted classification trees) could result in successful mapping of whitebark pine. Overall classification accuracies exceeded 90%, with similar individual class accuracies. Accuracies on a localized basis varied based on elevation. Accuracies also varied among administrative units, although we were not able to determine whether these differences related to inherent spatial variations or differences in the quality of available reference data.
Joshuva, A; Sugumaran, V
2017-03-01
Wind energy is one of the important renewable energy resources available in nature. It is one of the major resources for production of energy because of its dependability due to the development of the technology and relatively low cost. Wind energy is converted into electrical energy using rotating blades. Due to environmental conditions and large structure, the blades are subjected to various vibration forces that may cause damage to the blades. This leads to a liability in energy production and turbine shutdown. The downtime can be reduced when the blades are diagnosed continuously using structural health condition monitoring. These are considered as a pattern recognition problem which consists of three phases namely, feature extraction, feature selection, and feature classification. In this study, statistical features were extracted from vibration signals, feature selection was carried out using a J48 decision tree algorithm and feature classification was performed using best-first tree algorithm and functional trees algorithm. The better algorithm is suggested for fault diagnosis of wind turbine blade. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Zhang, Zhifei; Song, Yang; Cui, Haochen; Wu, Jayne; Schwartz, Fernando; Qi, Hairong
2017-09-01
Bucking the trend of big data, in microdevice engineering, small sample size is common, especially when the device is still at the proof-of-concept stage. The small sample size, small interclass variation, and large intraclass variation, have brought biosignal analysis new challenges. Novel representation and classification approaches need to be developed to effectively recognize targets of interests with the absence of a large training set. Moving away from the traditional signal analysis in the spatiotemporal domain, we exploit the biosignal representation in the topological domain that would reveal the intrinsic structure of point clouds generated from the biosignal. Additionally, we propose a Gaussian-based decision tree (GDT), which can efficiently classify the biosignals even when the sample size is extremely small. This study is motivated by the application of mastitis detection using low-voltage alternating current electrokinetics (ACEK) where five categories of bisignals need to be recognized with only two samples in each class. Experimental results demonstrate the robustness of the topological features as well as the advantage of GDT over some conventional classifiers in handling small dataset. Our method reduces the voltage of ACEK to a safe level and still yields high-fidelity results with a short assay time. This paper makes two distinctive contributions to the field of biosignal analysis, including performing signal processing in the topological domain and handling extremely small dataset. Currently, there have been no related works that can efficiently tackle the dilemma between avoiding electrochemical reaction and accelerating assay process using ACEK.
Using Bayesian neural networks to classify forest scenes
NASA Astrophysics Data System (ADS)
Vehtari, Aki; Heikkonen, Jukka; Lampinen, Jouko; Juujarvi, Jouni
1998-10-01
We present results that compare the performance of Bayesian learning methods for neural networks on the task of classifying forest scenes into trees and background. Classification task is demanding due to the texture richness of the trees, occlusions of the forest scene objects and diverse lighting conditions under operation. This makes it difficult to determine which are optimal image features for the classification. A natural way to proceed is to extract many different types of potentially suitable features, and to evaluate their usefulness in later processing stages. One approach to cope with large number of features is to use Bayesian methods to control the model complexity. Bayesian learning uses a prior on model parameters, combines this with evidence from a training data, and the integrates over the resulting posterior to make predictions. With this method, we can use large networks and many features without fear of overfitting. For this classification task we compare two Bayesian learning methods for multi-layer perceptron (MLP) neural networks: (1) The evidence framework of MacKay uses a Gaussian approximation to the posterior weight distribution and maximizes with respect to hyperparameters. (2) In a Markov Chain Monte Carlo (MCMC) method due to Neal, the posterior distribution of the network parameters is numerically integrated using the MCMC method. As baseline classifiers for comparison we use (3) MLP early stop committee, (4) K-nearest-neighbor and (5) Classification And Regression Tree.
Regional Estimates of Drought-Induced Tree Canopy Loss across Texas
NASA Astrophysics Data System (ADS)
Schwantes, A.; Swenson, J. J.; González-Roglich, M.; Johnson, D. M.; Domec, J. C.; Jackson, R. B.
2015-12-01
The severe drought of 2011 killed millions of trees across the state of Texas. Drought-induced tree-mortality can have significant impacts to carbon cycling, regional biophysics, and community composition. We quantified canopy cover loss across the state using remotely sensed imagery from before and after the drought at multiple scales. First, we classified ~200 orthophotos (1-m spatial resolution) from the National Agriculture Imagery Program, using a supervised maximum likelihood classification. Area of canopy cover loss in these classifications was highly correlated (R2 = 0.8) with ground estimates of canopy cover loss, measured in 74 plots across 15 different sites in Texas. These 1-m orthophoto classifications were then used to calibrate and validate coarser scale (30-m) Landsat imagery to create wall-to-wall tree canopy cover loss maps across the state of Texas. We quantified percent dead and live canopy within each pixel of Landsat to create continuous maps of dead and live tree cover, using two approaches: (1) a zero-inflated beta distribution model and (2) a random forest algorithm. Widespread canopy loss occurred across all the major natural systems of Texas, with the Edwards Plateau region most affected. In this region, on average, 10% of the forested area was lost due to the 2011 drought. We also identified climatic thresholds that controlled the spatial distribution of tree canopy loss across the state. However, surprisingly, there were many local hot spots of canopy loss, suggesting that not only climatic factors could explain the spatial patterns of canopy loss, but rather other factors related to soil, landscape, management, and stand density also likely played a role. As increases in extreme droughts are predicted to occur with climate change, it will become important to define methods that can detect associated drought-induced tree mortality across large regions. These maps could then be used (1) to quantify impacts to carbon cycling and regional biophysics, (2) to better understand the spatiotemporal dynamics of tree mortality, and (3) to calibrate and/or validate mortality algorithms in regional models.
Classification of Liss IV Imagery Using Decision Tree Methods
NASA Astrophysics Data System (ADS)
Verma, Amit Kumar; Garg, P. K.; Prasad, K. S. Hari; Dadhwal, V. K.
2016-06-01
Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green) has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness) with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.
Valavanis, Ioannis; Pilalis, Eleftherios; Georgiadis, Panagiotis; Kyrtopoulos, Soterios; Chatziioannou, Aristotelis
2015-01-01
DNA methylation profiling exploits microarray technologies, thus yielding a wealth of high-volume data. Here, an intelligent framework is applied, encompassing epidemiological genome-scale DNA methylation data produced from the Illumina’s Infinium Human Methylation 450K Bead Chip platform, in an effort to correlate interesting methylation patterns with cancer predisposition and, in particular, breast cancer and B-cell lymphoma. Feature selection and classification are employed in order to select, from an initial set of ~480,000 methylation measurements at CpG sites, predictive cancer epigenetic biomarkers and assess their classification power for discriminating healthy versus cancer related classes. Feature selection exploits evolutionary algorithms or a graph-theoretic methodology which makes use of the semantics information included in the Gene Ontology (GO) tree. The selected features, corresponding to methylation of CpG sites, attained moderate-to-high classification accuracies when imported to a series of classifiers evaluated by resampling or blindfold validation. The semantics-driven selection revealed sets of CpG sites performing similarly with evolutionary selection in the classification tasks. However, gene enrichment and pathway analysis showed that it additionally provides more descriptive sets of GO terms and KEGG pathways regarding the cancer phenotypes studied here. Results support the expediency of this methodology regarding its application in epidemiological studies. PMID:27600245
2009-09-01
prior to Traditional VT pro- iv cessing. This proves to be effective and provides more robust burst detection for −3 ≤ SNR ≤ 10 dB. Performance of a...TD and WD Dimensionality . . . . . 74 4.4 Performance Sensitivity Analysis . . . . . . . . . . . . . 77 4.4.1 Effect of Burst Location Error...78 4.4.2 Effect of Dissimilar Signal SNRs . . . . . . . . . 84 4.4.3 Effect of Dissimilar Signal Types . . . . . . . . 86 V. Conclusion
O’ Sullivan, Aifric
2017-01-01
A poor quality diet may be a common risk factor for both obesity and dental problems such as caries. The aim of this paper is to use classification tree analysis (CTA) to identify predictors of dental problems in a nationally representative cohort of Irish pre-school children. CTA was used to classify variables and describe interactions between multiple variables including socio-demographics, dietary intake, health-related behaviour, body mass index (BMI) and a dental problem. Data were derived from the second (2010/2011) wave of the ‘Growing Up in Ireland’ study (GUI) infant cohort at 3 years, n = 9793. The prevalence of dental problems was 5.0% (n = 493). The CTA model showed a sensitivity of 67% and specificity of 58.5% and overall correctly classified 59% of children. Ethnicity was the most significant predictor of dental problems followed by longstanding illness or disability, mother’s BMI and household income. The highest prevalence of dental problems was among children who were obese or underweight with a longstanding illness and an overweight mother. Frequency of intake of some foods showed interactions with the target variable. Results from this research highlight the interconnectedness of weight status, dental problems and general health and reinforce the importance of adopting a common risk factor approach when dealing with prevention of these diseases. PMID:29563431
Study and ranking of determinants of Taenia solium infections by classification tree models.
Mwape, Kabemba E; Phiri, Isaac K; Praet, Nicolas; Dorny, Pierre; Muma, John B; Zulu, Gideon; Speybroeck, Niko; Gabriël, Sarah
2015-01-01
Taenia solium taeniasis/cysticercosis is an important public health problem occurring mainly in developing countries. This work aimed to study the determinants of human T. solium infections in the Eastern province of Zambia and rank them in order of importance. A household (HH)-level questionnaire was administered to 680 HHs from 53 villages in two rural districts and the taeniasis and cysticercosis status determined. A classification tree model (CART) was used to define the relative importance and interactions between different predictor variables in their effect on taeniasis and cysticercosis. The Katete study area had a significantly higher taeniasis and cysticercosis prevalence than the Petauke area. The CART analysis for Katete showed that the most important determinant for cysticercosis infections was the number of HH inhabitants (6 to 10) and for taeniasis was the number of HH inhabitants > 6. The most important determinant in Petauke for cysticercosis was the age of head of household > 32 years and for taeniasis it was age < 55 years. The CART analysis showed that the most important determinant for both taeniasis and cysticercosis infections was the number of HH inhabitants (6 to 10) in Katete district and age in Petauke. The results suggest that control measures should target HHs with a high number of inhabitants and older individuals. © The American Society of Tropical Medicine and Hygiene.
Using different classification models in wheat grading utilizing visual features
NASA Astrophysics Data System (ADS)
Basati, Zahra; Rasekh, Mansour; Abbaspour-Gilandeh, Yousef
2018-04-01
Wheat is one of the most important strategic crops in Iran and in the world. The major component that distinguishes wheat from other grains is the gluten section. In Iran, sunn pest is one of the most important factors influencing the characteristics of wheat gluten and in removing it from a balanced state. The existence of bug-damaged grains in wheat will reduce the quality and price of the product. In addition, damaged grains reduce the enrichment of wheat and the quality of bread products. In this study, after preprocessing and segmentation of images, 25 features including 9 colour features, 10 morphological features, and 6 textual statistical features were extracted so as to classify healthy and bug-damaged wheat grains of Azar cultivar of four levels of moisture content (9, 11.5, 14 and 16.5% w.b.) and two lighting colours (yellow light, the composition of yellow and white lights). Using feature selection methods in the WEKA software and the CfsSubsetEval evaluator, 11 features were chosen as inputs of artificial neural network, decision tree and discriment analysis classifiers. The results showed that the decision tree with the J.48 algorithm had the highest classification accuracy of 90.20%. This was followed by artificial neural network classifier with the topology of 11-19-2 and discrimient analysis classifier at 87.46 and 81.81%, respectively
[Object-oriented aquatic vegetation extracting approach based on visible vegetation indices.
Jing, Ran; Deng, Lei; Zhao, Wen Ji; Gong, Zhao Ning
2016-05-01
Using the estimation of scale parameters (ESP) image segmentation tool to determine the ideal image segmentation scale, the optimal segmented image was created by the multi-scale segmentation method. Based on the visible vegetation indices derived from mini-UAV imaging data, we chose a set of optimal vegetation indices from a series of visible vegetation indices, and built up a decision tree rule. A membership function was used to automatically classify the study area and an aquatic vegetation map was generated. The results showed the overall accuracy of image classification using the supervised classification was 53.7%, and the overall accuracy of object-oriented image analysis (OBIA) was 91.7%. Compared with pixel-based supervised classification method, the OBIA method improved significantly the image classification result and further increased the accuracy of extracting the aquatic vegetation. The Kappa value of supervised classification was 0.4, and the Kappa value based OBIA was 0.9. The experimental results demonstrated that using visible vegetation indices derived from the mini-UAV data and OBIA method extracting the aquatic vegetation developed in this study was feasible and could be applied in other physically similar areas.
Evaluating Support for the Current Classification of Eukaryotic Diversity
Parfrey, Laura Wegener; Barbero, Erika; Lasser, Elyse; Dunthorn, Micah; Bhattacharya, Debashish; Patterson, David J; Katz, Laura A
2006-01-01
Perspectives on the classification of eukaryotic diversity have changed rapidly in recent years, as the four eukaryotic groups within the five-kingdom classification—plants, animals, fungi, and protists—have been transformed through numerous permutations into the current system of six “supergroups.” The intent of the supergroup classification system is to unite microbial and macroscopic eukaryotes based on phylogenetic inference. This supergroup approach is increasing in popularity in the literature and is appearing in introductory biology textbooks. We evaluate the stability and support for the current six-supergroup classification of eukaryotes based on molecular genealogies. We assess three aspects of each supergroup: (1) the stability of its taxonomy, (2) the support for monophyly (single evolutionary origin) in molecular analyses targeting a supergroup, and (3) the support for monophyly when a supergroup is included as an out-group in phylogenetic studies targeting other taxa. Our analysis demonstrates that supergroup taxonomies are unstable and that support for groups varies tremendously, indicating that the current classification scheme of eukaryotes is likely premature. We highlight several trends contributing to the instability and discuss the requirements for establishing robust clades within the eukaryotic tree of life. PMID:17194223
Floral markers of strawberry tree (Arbutus unedo L.) honey.
Tuberoso, Carlo I G; Bifulco, Ersilia; Caboni, Pierluigi; Cottiglia, Filippo; Cabras, Paolo; Floris, Ignazio
2010-01-13
Strawberry tree honey, due to its characteristic bitter taste, is one of the most typical Mediterranean honeys, with Sardinia being one of the largest producers. According to specific chemical studies, homogentisic acid was identified as a possible marker of this honey. This work, based on HPLC-DAD-MS/MS analysis of strawberry tree (Arbutus unedo L.) honeys, previously selected by sensory evaluation and melissopalynological analysis, showed that, in addition to the above-mentioned acid, there were other high levels of substances useful for the botanical classification of this unifloral honey. Two of these compounds were isolated and identified as (+/-)-2-cis,4-trans-abscisic acid (c,t-ABA) and (+/-)-2-trans,4-trans-abscisic acid (t,t-ABA). A third compound, a new natural product named unedone, was characterized as an epoxidic derivative of the above-mentioned acids. Structures of c,t-ABA, t,t-ABA, and unedone were elucidated on the basis of extensive 1D and 2D NMR experiments, as well as HPLC-MS/MS and Q-TOF analysis. In selected honeys the average amounts of c,t-ABA, t,t-ABA, and unedone were 176.2+/-25.4, 162.3+/-21.1, and 32.9+/-7.1 mg/kg, respectively. Analysis of the A. unedo nectar confirmed the floral origin of these compounds found in the honey. Abscisic acids were found in other unifloral honeys but not in such high amount and with a constant ratio of about 1:1. For this reason, besides homogentisic acid, these compounds could be used as complementary markers of strawberry tree honey.
Multi-level discriminative dictionary learning with application to large scale image classification.
Shen, Li; Sun, Gang; Huang, Qingming; Wang, Shuhui; Lin, Zhouchen; Wu, Enhua
2015-10-01
The sparse coding technique has shown flexibility and capability in image representation and analysis. It is a powerful tool in many visual applications. Some recent work has shown that incorporating the properties of task (such as discrimination for classification task) into dictionary learning is effective for improving the accuracy. However, the traditional supervised dictionary learning methods suffer from high computation complexity when dealing with large number of categories, making them less satisfactory in large scale applications. In this paper, we propose a novel multi-level discriminative dictionary learning method and apply it to large scale image classification. Our method takes advantage of hierarchical category correlation to encode multi-level discriminative information. Each internal node of the category hierarchy is associated with a discriminative dictionary and a classification model. The dictionaries at different layers are learnt to capture the information of different scales. Moreover, each node at lower layers also inherits the dictionary of its parent, so that the categories at lower layers can be described with multi-scale information. The learning of dictionaries and associated classification models is jointly conducted by minimizing an overall tree loss. The experimental results on challenging data sets demonstrate that our approach achieves excellent accuracy and competitive computation cost compared with other sparse coding methods for large scale image classification.
A classification model of Hyperion image base on SAM combined decision tree
NASA Astrophysics Data System (ADS)
Wang, Zhenghai; Hu, Guangdao; Zhou, YongZhang; Liu, Xin
2009-10-01
Monitoring the Earth using imaging spectrometers has necessitated more accurate analyses and new applications to remote sensing. A very high dimensional input space requires an exponentially large amount of data to adequately and reliably represent the classes in that space. On the other hand, with increase in the input dimensionality the hypothesis space grows exponentially, which makes the classification performance highly unreliable. Traditional classification algorithms Classification of hyperspectral images is challenging. New algorithms have to be developed for hyperspectral data classification. The Spectral Angle Mapper (SAM) is a physically-based spectral classification that uses an ndimensional angle to match pixels to reference spectra. The algorithm determines the spectral similarity between two spectra by calculating the angle between the spectra, treating them as vectors in a space with dimensionality equal to the number of bands. The key and difficulty is that we should artificial defining the threshold of SAM. The classification precision depends on the rationality of the threshold of SAM. In order to resolve this problem, this paper proposes a new automatic classification model of remote sensing image using SAM combined with decision tree. It can automatic choose the appropriate threshold of SAM and improve the classify precision of SAM base on the analyze of field spectrum. The test area located in Heqing Yunnan was imaged by EO_1 Hyperion imaging spectrometer using 224 bands in visual and near infrared. The area included limestone areas, rock fields, soil and forests. The area was classified into four different vegetation and soil types. The results show that this method choose the appropriate threshold of SAM and eliminates the disturbance and influence of unwanted objects effectively, so as to improve the classification precision. Compared with the likelihood classification by field survey data, the classification precision of this model heightens 9.9%.
Speaker gender identification based on majority vote classifiers
NASA Astrophysics Data System (ADS)
Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri
2017-03-01
Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.
Mane, Vijay Mahadeo; Jadhav, D V
2017-05-24
Diabetic retinopathy (DR) is the most common diabetic eye disease. Doctors are using various test methods to detect DR. But, the availability of test methods and requirements of domain experts pose a new challenge in the automatic detection of DR. In order to fulfill this objective, a variety of algorithms has been developed in the literature. In this paper, we propose a system consisting of a novel sparking process and a holoentropy-based decision tree for automatic classification of DR images to further improve the effectiveness. The sparking process algorithm is developed for automatic segmentation of blood vessels through the estimation of optimal threshold. The holoentropy enabled decision tree is newly developed for automatic classification of retinal images into normal or abnormal using hybrid features which preserve the disease-level patterns even more than the signal level of the feature. The effectiveness of the proposed system is analyzed using standard fundus image databases DIARETDB0 and DIARETDB1 for sensitivity, specificity and accuracy. The proposed system yields sensitivity, specificity and accuracy values of 96.72%, 97.01% and 96.45%, respectively. The experimental result reveals that the proposed technique outperforms the existing algorithms.
Can SLE classification rules be effectively applied to diagnose unclear SLE cases?
Mesa, Annia; Fernandez, Mitch; Wu, Wensong; Narasimhan, Giri; Greidinger, Eric L.; Mills, DeEtta K.
2016-01-01
Summary Objective Develop a novel classification criteria to distinguish between unclear SLE and MCTD cases. Methods A total of 205 variables from 111 SLE and 55 MCTD patients were evaluated to uncover unique molecular and clinical markers for each disease. Binomial logistic regressions (BLR) were performed on currently used SLE and MCTD classification criteria sets to obtain six reduced models with power to discriminate between unclear SLE and MCTD patients which were confirmed by Receiving Operating Characteristic (ROC) curve. Decision trees were employed to delineate novel classification rules to discriminate between unclear SLE and MCTD patients. Results SLE and MCTD patients exhibited contrasting molecular markers and clinical manifestations. Furthermore, reduced models highlighted SLE patients exhibit prevalence of skin rashes and renal disease while MCTD cases show dominance of myositis and muscle weakness. Additionally decision trees analyses revealed a novel classification rule tailored to differentiate unclear SLE and MCTD patients (Lu-vs-M) with an overall accuracy of 88%. Conclusions Validation of our novel proposed classification rule (Lu-vs-M) includes novel contrasting characteristics (calcinosis, CPK elevated and anti-IgM reactivity for U1-70K, U1A and U1C) between SLE and MCTD patients and showed a 33% improvement in distinguishing these disorders when compare to currently used classification criteria sets. Pending additional validation, our novel classification rule is a promising method to distinguish between patients with unclear SLE and MCTD diagnosis. PMID:27353506
NASA Astrophysics Data System (ADS)
Saran, Sameer; Sterk, Geert; Kumar, Suresh
2009-10-01
Land use/land cover is an important watershed surface characteristic that affects surface runoff and erosion. Many of the available hydrological models divide the watershed into Hydrological Response Units (HRU), which are spatial units with expected similar hydrological behaviours. The division into HRU's requires good-quality spatial data on land use/land cover. This paper presents different approaches to attain an optimal land use/land cover map based on remote sensing imagery for a Himalayan watershed in northern India. First digital classifications using maximum likelihood classifier (MLC) and a decision tree classifier were applied. The results obtained from the decision tree were better and even improved after post classification sorting. But the obtained land use/land cover map was not sufficient for the delineation of HRUs, since the agricultural land use/land cover class did not discriminate between the two major crops in the area i.e. paddy and maize. Subsequently the digital classification on fused data (ASAR and ASTER) were attempted to map land use/land cover classes with emphasis to delineate the paddy and maize crops but the supervised classification over fused datasets did not provide the desired accuracy and proper delineation of paddy and maize crops. Eventually, we adopted a visual classification approach on fused data. This second step with detailed classification system resulted into better classification accuracy within the 'agricultural land' class which will be further combined with topography and soil type to derive HRU's for physically-based hydrological modeling.
High- and low-level hierarchical classification algorithm based on source separation process
NASA Astrophysics Data System (ADS)
Loghmari, Mohamed Anis; Karray, Emna; Naceur, Mohamed Saber
2016-10-01
High-dimensional data applications have earned great attention in recent years. We focus on remote sensing data analysis on high-dimensional space like hyperspectral data. From a methodological viewpoint, remote sensing data analysis is not a trivial task. Its complexity is caused by many factors, such as large spectral or spatial variability as well as the curse of dimensionality. The latter describes the problem of data sparseness. In this particular ill-posed problem, a reliable classification approach requires appropriate modeling of the classification process. The proposed approach is based on a hierarchical clustering algorithm in order to deal with remote sensing data in high-dimensional space. Indeed, one obvious method to perform dimensionality reduction is to use the independent component analysis process as a preprocessing step. The first particularity of our method is the special structure of its cluster tree. Most of the hierarchical algorithms associate leaves to individual clusters, and start from a large number of individual classes equal to the number of pixels; however, in our approach, leaves are associated with the most relevant sources which are represented according to mutually independent axes to specifically represent some land covers associated with a limited number of clusters. These sources contribute to the refinement of the clustering by providing complementary rather than redundant information. The second particularity of our approach is that at each level of the cluster tree, we combine both a high-level divisive clustering and a low-level agglomerative clustering. This approach reduces the computational cost since the high-level divisive clustering is controlled by a simple Boolean operator, and optimizes the clustering results since the low-level agglomerative clustering is guided by the most relevant independent sources. Then at each new step we obtain a new finer partition that will participate in the clustering process to enhance semantic capabilities and give good identification rates.
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Three-dimensional object recognition using similar triangles and decision trees
NASA Technical Reports Server (NTRS)
Spirkovska, Lilly
1993-01-01
A system, TRIDEC, that is capable of distinguishing between a set of objects despite changes in the objects' positions in the input field, their size, or their rotational orientation in 3D space is described. TRIDEC combines very simple yet effective features with the classification capabilities of inductive decision tree methods. The feature vector is a list of all similar triangles defined by connecting all combinations of three pixels in a coarse coded 127 x 127 pixel input field. The classification is accomplished by building a decision tree using the information provided from a limited number of translated, scaled, and rotated samples. Simulation results are presented which show that TRIDEC achieves 94 percent recognition accuracy in the 2D invariant object recognition domain and 98 percent recognition accuracy in the 3D invariant object recognition domain after training on only a small sample of transformed views of the objects.
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Arenal-type pyroclastic flows: A probabilistic event tree risk analysis
NASA Astrophysics Data System (ADS)
Meloy, Anthony F.
2006-09-01
A quantitative hazard-specific scenario-modelling risk analysis is performed at Arenal volcano, Costa Rica for the newly recognised Arenal-type pyroclastic flow (ATPF) phenomenon using an event tree framework. These flows are generated by the sudden depressurisation and fragmentation of an active basaltic andesite lava pool as a result of a partial collapse of the crater wall. The deposits of this type of flow include angular blocks and juvenile clasts, which are rarely found in other types of pyroclastic flow. An event tree analysis (ETA) is a useful tool and framework in which to analyse and graphically present the probabilities of the occurrence of many possible events in a complex system. Four event trees are created in the analysis, three of which are extended to investigate the varying individual risk faced by three generic representatives of the surrounding community: a resident, a worker, and a tourist. The raw numerical risk estimates determined by the ETA are converted into a set of linguistic expressions (i.e. VERY HIGH, HIGH, MODERATE etc.) using an established risk classification scale. Three individually tailored semi-quantitative risk maps are then created from a set of risk conversion tables to show how the risk varies for each individual in different areas around the volcano. In some cases, by relocating from the north to the south, the level of risk can be reduced by up to three classes. While the individual risk maps may be broadly applicable, and therefore of interest to the general community, the risk maps and associated probability values generated in the ETA are intended to be used by trained professionals and government agencies to evaluate the risk and effectively manage the long-term development of infrastructure and habitation. With the addition of fresh monitoring data, the combination of both long- and short-term event trees would provide a comprehensive and consistent method of risk analysis (both during and pre-crisis), and as such, an ETA is considered to be a valuable quantitative decision support tool.
McDonald, Daniel; Price, Morgan N; Goodrich, Julia; Nawrocki, Eric P; DeSantis, Todd Z; Probst, Alexander; Andersen, Gary L; Knight, Rob; Hugenholtz, Philip
2012-03-01
Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a 'taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408,315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/.
McDonald, Daniel; Price, Morgan N; Goodrich, Julia; Nawrocki, Eric P; DeSantis, Todd Z; Probst, Alexander; Andersen, Gary L; Knight, Rob; Hugenholtz, Philip
2012-01-01
Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a ‘taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408 315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/. PMID:22134646
Beating the Odds: Trees to Success in Different Countries
ERIC Educational Resources Information Center
Finch, W. Holmes; Marchant, Gregory J.
2017-01-01
A recursive partitioning model approach in the form of classification and regression trees (CART) was used with 2012 PISA data for five countries (Canada, Finland, Germany, Singapore-China, and the Unites States). The objective of the study was to determine demographic and educational variables that differentiated between low SES student that were…
USDA-ARS?s Scientific Manuscript database
Triacylglycerols (TAG) are the major molecules of energy storage in eukaryotes. TAG are packed in subcellular structures called oil bodies or lipid droplets. Oleosins (OLE) are the major proteins in plant oil bodies. Multiple isoforms of OLE are present in plants such as tung tree (Vernicia fordii),...