Soft context clustering for F0 modeling in HMM-based speech synthesis
NASA Astrophysics Data System (ADS)
Khorram, Soheil; Sameti, Hossein; King, Simon
2015-12-01
This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method.
Comprehensive decision tree models in bioinformatics.
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
2012-01-01
Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.
Comprehensive Decision Tree Models in Bioinformatics
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
2012-01-01
Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449
Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments
ERIC Educational Resources Information Center
Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.
2009-01-01
The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…
Multi-test decision tree and its application to microarray data classification.
Czajkowski, Marcin; Grześ, Marek; Kretowski, Marek
2014-05-01
The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 14 datasets by an average 6%. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts. Copyright © 2014 Elsevier B.V. All rights reserved.
TreePOD: Sensitivity-Aware Selection of Pareto-Optimal Decision Trees.
Muhlbacher, Thomas; Linhardt, Lorenz; Moller, Torsten; Piringer, Harald
2018-01-01
Balancing accuracy gains with other objectives such as interpretability is a key challenge when building decision trees. However, this process is difficult to automate because it involves know-how about the domain as well as the purpose of the model. This paper presents TreePOD, a new approach for sensitivity-aware model selection along trade-offs. TreePOD is based on exploring a large set of candidate trees generated by sampling the parameters of tree construction algorithms. Based on this set, visualizations of quantitative and qualitative tree aspects provide a comprehensive overview of possible tree characteristics. Along trade-offs between two objectives, TreePOD provides efficient selection guidance by focusing on Pareto-optimal tree candidates. TreePOD also conveys the sensitivities of tree characteristics on variations of selected parameters by extending the tree generation process with a full-factorial sampling. We demonstrate how TreePOD supports a variety of tasks involved in decision tree selection and describe its integration in a holistic workflow for building and selecting decision trees. For evaluation, we illustrate a case study for predicting critical power grid states, and we report qualitative feedback from domain experts in the energy sector. This feedback suggests that TreePOD enables users with and without statistical background a confident and efficient identification of suitable decision trees.
Lin, Fen-Fang; Wang, Ke; Yang, Ning; Yan, Shi-Guang; Zheng, Xin-Yu
2012-02-01
In this paper, some main factors such as soil type, land use pattern, lithology type, topography, road, and industry type that affect soil quality were used to precisely obtain the spatial distribution characteristics of regional soil quality, mutual information theory was adopted to select the main environmental factors, and decision tree algorithm See 5.0 was applied to predict the grade of regional soil quality. The main factors affecting regional soil quality were soil type, land use, lithology type, distance to town, distance to water area, altitude, distance to road, and distance to industrial land. The prediction accuracy of the decision tree model with the variables selected by mutual information was obviously higher than that of the model with all variables, and, for the former model, whether of decision tree or of decision rule, its prediction accuracy was all higher than 80%. Based on the continuous and categorical data, the method of mutual information theory integrated with decision tree could not only reduce the number of input parameters for decision tree algorithm, but also predict and assess regional soil quality effectively.
Ebrahimi, Mehregan; Ebrahimie, Esmaeil; Bull, C Michael
2015-08-01
The high number of failures is one reason why translocation is often not recommended. Considering how behavior changes during translocations may improve translocation success. To derive decision-tree models for species' translocation, we used data on the short-term responses of an endangered Australian skink in 5 simulated translocations with different release conditions. We used 4 different decision-tree algorithms (decision tree, decision-tree parallel, decision stump, and random forest) with 4 different criteria (gain ratio, information gain, gini index, and accuracy) to investigate how environmental and behavioral parameters may affect the success of a translocation. We assumed behavioral changes that increased dispersal away from a release site would reduce translocation success. The trees became more complex when we included all behavioral parameters as attributes, but these trees yielded more detailed information about why and how dispersal occurred. According to these complex trees, there were positive associations between some behavioral parameters, such as fight and dispersal, that showed there was a higher chance, for example, of dispersal among lizards that fought than among those that did not fight. Decision trees based on parameters related to release conditions were easier to understand and could be used by managers to make translocation decisions under different circumstances. © 2015 Society for Conservation Biology.
NASA Astrophysics Data System (ADS)
Estuar, Maria Regina Justina; Victorino, John Noel; Coronel, Andrei; Co, Jerelyn; Tiausas, Francis; Señires, Chiara Veronica
2017-09-01
Use of wireless sensor networks and smartphone integration design to monitor environmental parameters surrounding plantations is made possible because of readily available and affordable sensors. Providing low cost monitoring devices would be beneficial, especially to small farm owners, in a developing country like the Philippines, where agriculture covers a significant amount of the labor market. This study discusses the integration of wireless soil sensor devices and smartphones to create an application that will use multidimensional analysis to detect the presence or absence of plant disease. Specifically, soil sensors are designed to collect soil quality parameters in a sink node from which the smartphone collects data from via Bluetooth. Given these, there is a need to develop a classification model on the mobile phone that will report infection status of a soil. Though tree classification is the most appropriate approach for continuous parameter-based datasets, there is a need to determine whether tree models will result to coherent results or not. Soil sensor data that resides on the phone is modeled using several variations of decision tree, namely: decision tree (DT), best-fit (BF) decision tree, functional tree (FT), Naive Bayes (NB) decision tree, J48, J48graft and LAD tree, where decision tree approaches the problem by considering all sensor nodes as one. Results show that there are significant differences among soil sensor parameters indicating that there are variances in scores between the infected and uninfected sites. Furthermore, analysis of variance in accuracy, recall, precision and F1 measure scores from tree classification models homogeneity among NBTree, J48graft and J48 tree classification models.
Lee, Saro; Park, Inhye
2013-09-30
Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. Copyright © 2013. Published by Elsevier Ltd.
Personalized Modeling for Prediction with Decision-Path Models
Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.
2015-01-01
Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570
Learning in data-limited multimodal scenarios: Scandent decision forests and tree-based features.
Hor, Soheil; Moradi, Mehdi
2016-12-01
Incomplete and inconsistent datasets often pose difficulties in multimodal studies. We introduce the concept of scandent decision trees to tackle these difficulties. Scandent trees are decision trees that optimally mimic the partitioning of the data determined by another decision tree, and crucially, use only a subset of the feature set. We show how scandent trees can be used to enhance the performance of decision forests trained on a small number of multimodal samples when we have access to larger datasets with vastly incomplete feature sets. Additionally, we introduce the concept of tree-based feature transforms in the decision forest paradigm. When combined with scandent trees, the tree-based feature transforms enable us to train a classifier on a rich multimodal dataset, and use it to classify samples with only a subset of features of the training data. Using this methodology, we build a model trained on MRI and PET images of the ADNI dataset, and then test it on cases with only MRI data. We show that this is significantly more effective in staging of cognitive impairments compared to a similar decision forest model trained and tested on MRI only, or one that uses other kinds of feature transform applied to the MRI data. Copyright © 2016. Published by Elsevier B.V.
MRI-based decision tree model for diagnosis of biliary atresia.
Kim, Yong Hee; Kim, Myung-Joon; Shin, Hyun Joo; Yoon, Haesung; Han, Seok Joo; Koh, Hong; Roh, Yun Ho; Lee, Mi-Jung
2018-02-23
To evaluate MRI findings and to generate a decision tree model for diagnosis of biliary atresia (BA) in infants with jaundice. We retrospectively reviewed features of MRI and ultrasonography (US) performed in infants with jaundice between January 2009 and June 2016 under approval of the institutional review board, including the maximum diameter of periportal signal change on MRI (MR triangular cord thickness, MR-TCT) or US (US-TCT), visibility of common bile duct (CBD) and abnormality of gallbladder (GB). Hepatic subcapsular flow was reviewed on Doppler US. We performed conditional inference tree analysis using MRI findings to generate a decision tree model. A total of 208 infants were included, 112 in the BA group and 96 in the non-BA group. Mean age at the time of MRI was 58.7 ± 36.6 days. Visibility of CBD, abnormality of GB and MR-TCT were good discriminators for the diagnosis of BA and the MRI-based decision tree using these findings with MR-TCT cut-off 5.1 mm showed 97.3 % sensitivity, 94.8 % specificity and 96.2 % accuracy. MRI-based decision tree model reliably differentiates BA in infants with jaundice. MRI can be an objective imaging modality for the diagnosis of BA. • MRI-based decision tree model reliably differentiates biliary atresia in neonatal cholestasis. • Common bile duct, gallbladder and periportal signal changes are the discriminators. • MRI has comparable performance to ultrasonography for diagnosis of biliary atresia.
Decision tree methods: applications for classification and prediction.
Song, Yan-Yan; Lu, Ying
2015-04-25
Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
Chen, Hsiu-Chin; Bennett, Sean
2016-08-01
Little evidence shows the use of decision-tree algorithms in identifying predictors and analyzing their associations with pass rates for the NCLEX-RN(®) in associate degree nursing students. This longitudinal and retrospective cohort study investigated whether a decision-tree algorithm could be used to develop an accurate prediction model for the students' passing or failing the NCLEX-RN. This study used archived data from 453 associate degree nursing students in a selected program. The chi-squared automatic interaction detection analysis of the decision trees module was used to examine the effect of the collected predictors on passing/failing the NCLEX-RN. The actual percentage scores of Assessment Technologies Institute®'s RN Comprehensive Predictor(®) accurately identified students at risk of failing. The classification model correctly classified 92.7% of the students for passing. This study applied the decision-tree model to analyze a sequence database for developing a prediction model for early remediation in preparation for the NCLEXRN. [J Nurs Educ. 2016;55(8):454-457.]. Copyright 2016, SLACK Incorporated.
Jiao, Y; Chen, R; Ke, X; Cheng, L; Chu, K; Lu, Z; Herskovits, E H
2011-01-01
Autism spectrum disorder (ASD) is a neurodevelopmental disorder, of which Asperger syndrome and high-functioning autism are subtypes. Our goal is: 1) to determine whether a diagnostic model based on single-nucleotide polymorphisms (SNPs), brain regional thickness measurements, or brain regional volume measurements can distinguish Asperger syndrome from high-functioning autism; and 2) to compare the SNP, thickness, and volume-based diagnostic models. Our study included 18 children with ASD: 13 subjects with high-functioning autism and 5 subjects with Asperger syndrome. For each child, we obtained 25 SNPs for 8 ASD-related genes; we also computed regional cortical thicknesses and volumes for 66 brain structures, based on structural magnetic resonance (MR) examination. To generate diagnostic models, we employed five machine-learning techniques: decision stump, alternating decision trees, multi-class alternating decision trees, logistic model trees, and support vector machines. For SNP-based classification, three decision-tree-based models performed better than the other two machine-learning models. The performance metrics for three decision-tree-based models were similar: decision stump was modestly better than the other two methods, with accuracy = 90%, sensitivity = 0.95 and specificity = 0.75. All thickness and volume-based diagnostic models performed poorly. The SNP-based diagnostic models were superior to those based on thickness and volume. For SNP-based classification, rs878960 in GABRB3 (gamma-aminobutyric acid A receptor, beta 3) was selected by all tree-based models. Our analysis demonstrated that SNP-based classification was more accurate than morphometry-based classification in ASD subtype classification. Also, we found that one SNP--rs878960 in GABRB3--distinguishes Asperger syndrome from high-functioning autism.
Masías, Víctor H.; Krause, Mariane; Valdés, Nelson; Pérez, J. C.; Laengle, Sigifredo
2015-01-01
Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice. PMID:25914657
Masías, Víctor H; Krause, Mariane; Valdés, Nelson; Pérez, J C; Laengle, Sigifredo
2015-01-01
Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.
Decision Tree Approach for Soil Liquefaction Assessment
Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.
2013-01-01
In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498
Decision tree approach for soil liquefaction assessment.
Gandomi, Amir H; Fridline, Mark M; Roke, David A
2013-01-01
In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view.
Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat
2015-01-01
Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Balk, Benjamin; Elder, Kelly
2000-01-01
We model the spatial distribution of snow across a mountain basin using an approach that combines binary decision tree and geostatistical techniques. In April 1997 and 1998, intensive snow surveys were conducted in the 6.9‐km2 Loch Vale watershed (LVWS), Rocky Mountain National Park, Colorado. Binary decision trees were used to model the large‐scale variations in snow depth, while the small‐scale variations were modeled through kriging interpolation methods. Binary decision trees related depth to the physically based independent variables of net solar radiation, elevation, slope, and vegetation cover type. These decision tree models explained 54–65% of the observed variance in the depth measurements. The tree‐based modeled depths were then subtracted from the measured depths, and the resulting residuals were spatially distributed across LVWS through kriging techniques. The kriged estimates of the residuals were added to the tree‐based modeled depths to produce a combined depth model. The combined depth estimates explained 60–85% of the variance in the measured depths. Snow densities were mapped across LVWS using regression analysis. Snow‐covered area was determined from high‐resolution aerial photographs. Combining the modeled depths and densities with a snow cover map produced estimates of the spatial distribution of snow water equivalence (SWE). This modeling approach offers improvement over previous methods of estimating SWE distribution in mountain basins.
Ethnographic Decision Tree Modeling: A Research Method for Counseling Psychology.
ERIC Educational Resources Information Center
Beck, Kirk A.
2005-01-01
This article describes ethnographic decision tree modeling (EDTM; C. H. Gladwin, 1989) as a mixed method design appropriate for counseling psychology research. EDTM is introduced and located within a postpositivist research paradigm. Decision theory that informs EDTM is reviewed, and the 2 phases of EDTM are highlighted. The 1st phase, model…
Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi
2013-02-01
The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.
Faults Discovery By Using Mined Data
NASA Technical Reports Server (NTRS)
Lee, Charles
2005-01-01
Fault discovery in the complex systems consist of model based reasoning, fault tree analysis, rule based inference methods, and other approaches. Model based reasoning builds models for the systems either by mathematic formulations or by experiment model. Fault Tree Analysis shows the possible causes of a system malfunction by enumerating the suspect components and their respective failure modes that may have induced the problem. The rule based inference build the model based on the expert knowledge. Those models and methods have one thing in common; they have presumed some prior-conditions. Complex systems often use fault trees to analyze the faults. Fault diagnosis, when error occurs, is performed by engineers and analysts performing extensive examination of all data gathered during the mission. International Space Station (ISS) control center operates on the data feedback from the system and decisions are made based on threshold values by using fault trees. Since those decision-making tasks are safety critical and must be done promptly, the engineers who manually analyze the data are facing time challenge. To automate this process, this paper present an approach that uses decision trees to discover fault from data in real-time and capture the contents of fault trees as the initial state of the trees.
Cheaib, Alissar; Badeau, Vincent; Boe, Julien; Chuine, Isabelle; Delire, Christine; Dufrêne, Eric; François, Christophe; Gritti, Emmanuel S; Legay, Myriam; Pagé, Christian; Thuiller, Wilfried; Viovy, Nicolas; Leadley, Paul
2012-06-01
Model-based projections of shifts in tree species range due to climate change are becoming an important decision support tool for forest management. However, poorly evaluated sources of uncertainty require more scrutiny before relying heavily on models for decision-making. We evaluated uncertainty arising from differences in model formulations of tree response to climate change based on a rigorous intercomparison of projections of tree distributions in France. We compared eight models ranging from niche-based to process-based models. On average, models project large range contractions of temperate tree species in lowlands due to climate change. There was substantial disagreement between models for temperate broadleaf deciduous tree species, but differences in the capacity of models to account for rising CO(2) impacts explained much of the disagreement. There was good quantitative agreement among models concerning the range contractions for Scots pine. For the dominant Mediterranean tree species, Holm oak, all models foresee substantial range expansion. © 2012 Blackwell Publishing Ltd/CNRS.
NASA Astrophysics Data System (ADS)
Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias
2018-03-01
This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.
The application of a decision tree to establish the parameters associated with hypertension.
Tayefi, Maryam; Esmaeili, Habibollah; Saberi Karimian, Maryam; Amirabadi Zadeh, Alireza; Ebrahimi, Mahmoud; Safarian, Mohammad; Nematy, Mohsen; Parizadeh, Seyed Mohammad Reza; Ferns, Gordon A; Ghayour-Mobarhan, Majid
2017-02-01
Hypertension is an important risk factor for cardiovascular disease (CVD). The goal of this study was to establish the factors associated with hypertension by using a decision-tree algorithm as a supervised classification method of data mining. Data from a cross-sectional study were used in this study. A total of 9078 subjects who met the inclusion criteria were recruited. 70% of these subjects (6358 cases) were randomly allocated to the training dataset for the constructing of the decision-tree. The remaining 30% (2720 cases) were used as the testing dataset to evaluate the performance of decision-tree. Two models were evaluated in this study. In model I, age, gender, body mass index, marital status, level of education, occupation status, depression and anxiety status, physical activity level, smoking status, LDL, TG, TC, FBG, uric acid and hs-CRP were considered as input variables and in model II, age, gender, WBC, RBC, HGB, HCT MCV, MCH, PLT, RDW and PDW were considered as input variables. The validation of the model was assessed by constructing a receiver operating characteristic (ROC) curve. The prevalence rates of hypertension were 32% in our population. For the decision-tree model I, the accuracy, sensitivity, specificity and area under the ROC curve (AUC) value for identifying the related risk factors of hypertension were 73%, 63%, 77% and 0.72, respectively. The corresponding values for model II were 70%, 61%, 74% and 0.68, respectively. We have developed a decision tree model to identify the risk factors associated with hypertension that maybe used to develop programs for hypertension management. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A Decision Tree for Nonmetric Sex Assessment from the Skull.
Langley, Natalie R; Dudzik, Beatrix; Cloutier, Alesia
2018-01-01
This study uses five well-documented cranial nonmetric traits (glabella, mastoid process, mental eminence, supraorbital margin, and nuchal crest) and one additional trait (zygomatic extension) to develop a validated decision tree for sex assessment. The decision tree was built and cross-validated on a sample of 293 U.S. White individuals from the William M. Bass Donated Skeletal Collection. Ordinal scores from the six traits were analyzed using the partition modeling option in JMP Pro 12. A holdout sample of 50 skulls was used to test the model. The most accurate decision tree includes three variables: glabella, zygomatic extension, and mastoid process. This decision tree yielded 93.5% accuracy on the training sample, 94% on the cross-validated sample, and 96% on a holdout validation sample. Linear weighted kappa statistics indicate acceptable agreement among observers for these variables. Mental eminence should be avoided, and definitions and figures should be referenced carefully to score nonmetric traits. © 2017 American Academy of Forensic Sciences.
A framework for sensitivity analysis of decision trees.
Kamiński, Bogumił; Jakubczyk, Michał; Szufel, Przemysław
2018-01-01
In the paper, we consider sequential decision problems with uncertainty, represented as decision trees. Sensitivity analysis is always a crucial element of decision making and in decision trees it often focuses on probabilities. In the stochastic model considered, the user often has only limited information about the true values of probabilities. We develop a framework for performing sensitivity analysis of optimal strategies accounting for this distributional uncertainty. We design this robust optimization approach in an intuitive and not overly technical way, to make it simple to apply in daily managerial practice. The proposed framework allows for (1) analysis of the stability of the expected-value-maximizing strategy and (2) identification of strategies which are robust with respect to pessimistic/optimistic/mode-favoring perturbations of probabilities. We verify the properties of our approach in two cases: (a) probabilities in a tree are the primitives of the model and can be modified independently; (b) probabilities in a tree reflect some underlying, structural probabilities, and are interrelated. We provide a free software tool implementing the methods described.
Diagnostic classification scheme in Iranian breast cancer patients using a decision tree.
Malehi, Amal Saki
2014-01-01
The objective of this study was to determine a diagnostic classification scheme using a decision tree based model. The study was conducted as a retrospective case-control study in Imam Khomeini hospital in Tehran during 2001 to 2009. Data, including demographic and clinical-pathological characteristics, were uniformly collected from 624 females, 312 of them were referred with positive diagnosis of breast cancer (cases) and 312 healthy women (controls). The decision tree was implemented to develop a diagnostic classification scheme using CART 6.0 Software. The AUC (area under curve), was measured as the overall performance of diagnostic classification of the decision tree. Five variables as main risk factors of breast cancer and six subgroups as high risk were identified. The results indicated that increasing age, low age at menarche, single and divorced statues, irregular menarche pattern and family history of breast cancer are the important diagnostic factors in Iranian breast cancer patients. The sensitivity and specificity of the analysis were 66% and 86.9% respectively. The high AUC (0.82) also showed an excellent classification and diagnostic performance of the model. Decision tree based model appears to be suitable for identifying risk factors and high or low risk subgroups. It can also assists clinicians in making a decision, since it can identify underlying prognostic relationships and understanding the model is very explicit.
Modeling individual tree survial
Quang V. Cao
2016-01-01
Information provided by growth and yield models is the basis for forest managers to make decisions on how to manage their forests. Among different types of growth models, whole-stand models offer predictions at stand level, whereas individual-tree models give detailed information at tree level. The well-known logistic regression is commonly used to predict tree...
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Decision trees in epidemiological research.
Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone
2017-01-01
In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
Amirabadizadeh, Alireza; Nezami, Hossein; Vaughn, Michael G; Nakhaee, Samaneh; Mehrpour, Omid
2018-05-12
Substance abuse exacts considerable social and health care burdens throughout the world. The aim of this study was to create a prediction model to better identify risk factors for drug use. A prospective cross-sectional study was conducted in South Khorasan Province, Iran. Of the total of 678 eligible subjects, 70% (n: 474) were randomly selected to provide a training set for constructing decision tree and multiple logistic regression (MLR) models. The remaining 30% (n: 204) were employed in a holdout sample to test the performance of the decision tree and MLR models. Predictive performance of different models was analyzed by the receiver operating characteristic (ROC) curve using the testing set. Independent variables were selected from demographic characteristics and history of drug use. For the decision tree model, the sensitivity and specificity for identifying people at risk for drug abuse were 66% and 75%, respectively, while the MLR model was somewhat less effective at 60% and 73%. Key independent variables in the analyses included first substance experience, age at first drug use, age, place of residence, history of cigarette use, and occupational and marital status. While study findings are exploratory and lack generalizability they do suggest that the decision tree model holds promise as an effective classification approach for identifying risk factors for drug use. Convergent with prior research in Western contexts is that age of drug use initiation was a critical factor predicting a substance use disorder.
Predicting the probability of mortality of gastric cancer patients using decision tree.
Mohammadzadeh, F; Noorkojuri, H; Pourhoseingholi, M A; Saadat, S; Baghestani, A R
2015-06-01
Gastric cancer is the fourth most common cancer worldwide. This reason motivated us to investigate and introduce gastric cancer risk factors utilizing statistical methods. The aim of this study was to identify the most important factors influencing the mortality of patients who suffer from gastric cancer disease and to introduce a classification approach according to decision tree model for predicting the probability of mortality from this disease. Data on 216 patients with gastric cancer, who were registered in Taleghani hospital in Tehran,Iran, were analyzed. At first, patients were divided into two groups: the dead and alive. Then, to fit decision tree model to our data, we randomly selected 20% of dataset to the test sample and remaining dataset considered as the training sample. Finally, the validity of the model examined with sensitivity, specificity, diagnosis accuracy and the area under the receiver operating characteristic curve. The CART version 6.0 and SPSS version 19.0 softwares were used for the analysis of the data. Diabetes, ethnicity, tobacco, tumor size, surgery, pathologic stage, age at diagnosis, exposure to chemical weapons and alcohol consumption were determined as effective factors on mortality of gastric cancer. The sensitivity, specificity and accuracy of decision tree were 0.72, 0.75 and 0.74 respectively. The indices of sensitivity, specificity and accuracy represented that the decision tree model has acceptable accuracy to prediction the probability of mortality in gastric cancer patients. So a simple decision tree consisted of factors affecting on mortality of gastric cancer may help clinicians as a reliable and practical tool to predict the probability of mortality in these patients.
Building of fuzzy decision trees using ID3 algorithm
NASA Astrophysics Data System (ADS)
Begenova, S. B.; Avdeenko, T. V.
2018-05-01
Decision trees are widely used in the field of machine learning and artificial intelligence. Such popularity is due to the fact that with the help of decision trees graphic models, text rules can be built and they are easily understood by the final user. Because of the inaccuracy of observations, uncertainties, the data, collected in the environment, often take an unclear form. Therefore, fuzzy decision trees becoming popular in the field of machine learning. This article presents a method that includes the features of the two above-mentioned approaches: a graphical representation of the rules system in the form of a tree and a fuzzy representation of the data. The approach uses such advantages as high comprehensibility of decision trees and the ability to cope with inaccurate and uncertain information in fuzzy representation. The received learning method is suitable for classifying problems with both numerical and symbolic features. In the article, solution illustrations and numerical results are given.
Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.
Barros, Rodrigo C; Winck, Ana T; Machado, Karina S; Basgalupp, Márcio P; de Carvalho, André C P L F; Ruiz, Duncan D; de Souza, Osmar Norberto
2012-11-21
This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data
2012-01-01
Background This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor. PMID:23171000
Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems
NASA Astrophysics Data System (ADS)
Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen
Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.
NASA Astrophysics Data System (ADS)
Gessesse, B.; Bewket, W.; Bräuning, A.
2015-11-01
Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and binary logistic regression model. The model significantly predicted farmers' tree planting decision (Chi-square = 37.29, df = 15, P<0.001). Besides, the computed significant value of the model suggests that all the considered predictor variables jointly influenced the farmers' decision to plant trees as a land management strategy. In this regard, the finding of the study show that local land-users' willingness to adopt tree growing decision is a function of a wide range of biophysical, institutional, socioeconomic and household level factors, however, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system have positively and significantly influence on tree growing investment decisions in the study watershed. Eventually, the processes of land use conversion and land degradation are serious which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, devising sustainable and integrated land management policy options and implementing them would enhance ecological restoration and livelihood sustainability in the study watershed.
NASA Astrophysics Data System (ADS)
Gessesse, Berhan; Bewket, Woldeamlak; Bräuning, Achim
2016-04-01
Land degradation due to lack of sustainable land management practices is one of the critical challenges in many developing countries including Ethiopia. This study explored the major determinants of farm-level tree-planting decisions as a land management strategy in a typical farming and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and a binary logistic regression model. The model significantly predicted farmers' tree-planting decisions (χ2 = 37.29, df = 15, P < 0.001). Besides, the computed significant value of the model revealed that all the considered predictor variables jointly influenced the farmers' decisions to plant trees as a land management strategy. The findings of the study demonstrated that the adoption of tree-growing decisions by local land users was a function of a wide range of biophysical, institutional, socioeconomic and household-level factors. In this regard, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system had a critical influence on tree-growing investment decisions in the study watershed. Eventually, the processes of land-use conversion and land degradation were serious, which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, the study recommended that devising and implementing sustainable land management policy options would enhance ecological restoration and livelihood sustainability in the study watershed.
Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng
2018-02-09
Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P < 0.001) and inferior to that of the Logistic regression model (71.87% vs 78.71%, P < 0.001). The specificity of decision tree model was equivalent to that of the line discriminant model (63.87% vs 63.73%, P = 0.939) and higher than that of the logistic regression model (63.87% vs 38.0%, P < 0.001). The Area under the ROC curve (AUROCC) of the decision tree model was greater than that of the line discriminant model (0.765 vs 0.629, P < 0.001) and logistic regression model (0.765 vs 0.662, P < 0.001). BOLD MRI is a useful non-invasive imaging technique for the evaluation of lupus nephritis. Decision tree models constructed using functions of R2* values may facilitate the prediction of renal pathological patterns.
Protein attributes contribute to halo-stability, bioinformatics approach
2011-01-01
Halophile proteins can tolerate high salt concentrations. Understanding halophilicity features is the first step toward engineering halostable crops. To this end, we examined protein features contributing to the halo-toleration of halophilic organisms. We compared more than 850 features for halophilic and non-halophilic proteins with various screening, clustering, decision tree, and generalized rule induction models to search for patterns that code for halo-toleration. Up to 251 protein attributes selected by various attribute weighting algorithms as important features contribute to halo-stability; from them 14 attributes selected by 90% of models and the count of hydrogen gained the highest value (1.0) in 70% of attribute weighting models, showing the importance of this attribute in feature selection modeling. The other attributes mostly were the frequencies of di-peptides. No changes were found in the numbers of groups when K-Means and TwoStep clustering modeling were performed on datasets with or without feature selection filtering. Although the depths of induced trees were not high, the accuracies of trees were higher than 94% and the frequency of hydrophobic residues pointed as the most important feature to build trees. The performance evaluation of decision tree models had the same values and the best correctness percentage recorded with the Exhaustive CHAID and CHAID models. We did not find any significant difference in the percent of correctness, performance evaluation, and mean correctness of various decision tree models with or without feature selection. For the first time, we analyzed the performance of different screening, clustering, and decision tree algorithms for discriminating halophilic and non-halophilic proteins and the results showed that amino acid composition can be used to discriminate between halo-tolerant and halo-sensitive proteins. PMID:21592393
Prediction of the compression ratio for municipal solid waste using decision tree.
Heshmati R, Ali Akbar; Mokhtari, Maryam; Shakiba Rad, Saeed
2014-01-01
The compression ratio of municipal solid waste (MSW) is an essential parameter for evaluation of waste settlement and landfill design. However, no appropriate model has been proposed to estimate the waste compression ratio so far. In this study, a decision tree method was utilized to predict the waste compression ratio (C'c). The tree was constructed using Quinlan's M5 algorithm. A reliable database retrieved from the literature was used to develop a practical model that relates C'c to waste composition and properties, including dry density, dry weight water content, and percentage of biodegradable organic waste using the decision tree method. The performance of the developed model was examined in terms of different statistical criteria, including correlation coefficient, root mean squared error, mean absolute error and mean bias error, recommended by researchers. The obtained results demonstrate that the suggested model is able to evaluate the compression ratio of MSW effectively.
Pak, Kyoungjune; Kim, Keunyoung; Kim, Mi-Hyun; Eom, Jung Seop; Lee, Min Ki; Cho, Jeong Su; Kim, Yun Seong; Kim, Bum Soo; Kim, Seong Jang; Kim, In Joo
2018-01-01
We aimed to develop a decision tree model to improve diagnostic performance of positron emission tomography/computed tomography (PET/CT) to detect metastatic lymph nodes (LN) in non-small cell lung cancer (NSCLC). 115 patients with NSCLC were included in this study. The training dataset included 66 patients. A decision tree model was developed with 9 variables, and validated with 49 patients: short and long diameters of LNs, ratio of short and long diameters, maximum standardized uptake value (SUVmax) of LN, mean hounsfield unit, ratio of LN SUVmax and ascending aorta SUVmax (LN/AA), and ratio of LN SUVmax and superior vena cava SUVmax. A total of 301 LNs of 115 patients were evaluated in this study. Nodular calcification was applied as the initial imaging parameter, and LN SUVmax (≥3.95) was assessed as the second. LN/AA (≥2.92) was required to high LN SUVmax. Sensitivity was 50% for training dataset, and 40% for validation dataset. However, specificity was 99.28% for training dataset, and 96.23% for validation dataset. In conclusion, we have developed a new decision tree model for interpreting mediastinal LNs. All LNs with nodular calcification were benign, and LNs with high LN SUVmax and high LN/AA were metastatic Further studies are needed to incorporate subjective parameters and pathologic evaluations into a decision tree model to improve the test performance of PET/CT.
Khalkhali, Hamid Reza; Lotfnezhad Afshar, Hadi; Esnaashari, Omid; Jabbari, Nasrollah
2016-01-01
Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.
A universal hybrid decision tree classifier design for human activity classification.
Chien, Chieh; Pottie, Gregory J
2012-01-01
A system that reliably classifies daily life activities can contribute to more effective and economical treatments for patients with chronic conditions or undergoing rehabilitative therapy. We propose a universal hybrid decision tree classifier for this purpose. The tree classifier can flexibly implement different decision rules at its internal nodes, and can be adapted from a population-based model when supplemented by training data for individuals. The system was tested using seven subjects each monitored by 14 triaxial accelerometers. Each subject performed fourteen different activities typical of daily life. Using leave-one-out cross validation, our decision tree produced average classification accuracies of 89.9%. In contrast, the MATLAB personalized tree classifiers using Gini's diversity index as the split criterion followed by optimally tuning the thresholds for each subject yielded 69.2%.
Prescriptive models to support decision making in genetics.
Pauker, S G; Pauker, S P
1987-01-01
Formal prescriptive models can help patients and clinicians better understand the risks and uncertainties they face and better formulate well-reasoned decisions. Using Bayes rule, the clinician can interpret pedigrees, historical data, physical findings and laboratory data, providing individualized probabilities of various diagnoses and outcomes of pregnancy. With the advent of screening programs for genetic disease, it becomes increasingly important to consider the prior probabilities of disease when interpreting an abnormal screening test result. Decision trees provide a convenient formalism for structuring diagnostic, therapeutic and reproductive decisions; such trees can also enhance communication between clinicians and patients. Utility theory provides a mechanism for patients to understand the choices they face and to communicate their attitudes about potential reproductive outcomes in a manner which encourages the integration of those attitudes into appropriate decisions. Using a decision tree, the relevant probabilities and the patients' utilities, physicians can estimate the relative worth of various medical and reproductive options by calculating the expected utility of each. By performing relevant sensitivity analyses, clinicians and patients can understand the impact of various soft data, including the patients' attitudes toward various health outcomes, on the decision making process. Formal clinical decision analytic models can provide deeper understanding and improved decision making in clinical genetics.
Vergara, Pablo M.; Soto, Gerardo E.; Rodewald, Amanda D.; Meneses, Luis O.; Pérez-Hernández, Christian G.
2016-01-01
Theoretical models predict that animals should make foraging decisions after assessing the quality of available habitat, but most models fail to consider the spatio-temporal scales at which animals perceive habitat availability. We tested three foraging strategies that explain how Magellanic woodpeckers (Campephilus magellanicus) assess the relative quality of trees: 1) Woodpeckers with local knowledge select trees based on the available trees in the immediate vicinity. 2) Woodpeckers lacking local knowledge select trees based on their availability at previously visited locations. 3) Woodpeckers using information from long-term memory select trees based on knowledge about trees available within the entire landscape. We observed foraging woodpeckers and used a Brownian Bridge Movement Model to identify trees available to woodpeckers along foraging routes. Woodpeckers selected trees with a later decay stage than available trees. Selection models indicated that preferences of Magellanic woodpeckers were based on clusters of trees near the most recently visited trees, thus suggesting that woodpeckers use visual cues from neighboring trees. In a second analysis, Cox’s proportional hazards models showed that woodpeckers used information consolidated across broader spatial scales to adjust tree residence times. Specifically, woodpeckers spent more time at trees with larger diameters and in a more advanced stage of decay than trees available along their routes. These results suggest that Magellanic woodpeckers make foraging decisions based on the relative quality of trees that they perceive and memorize information at different spatio-temporal scales. PMID:27416115
Vergara, Pablo M; Soto, Gerardo E; Moreira-Arce, Darío; Rodewald, Amanda D; Meneses, Luis O; Pérez-Hernández, Christian G
2016-01-01
Theoretical models predict that animals should make foraging decisions after assessing the quality of available habitat, but most models fail to consider the spatio-temporal scales at which animals perceive habitat availability. We tested three foraging strategies that explain how Magellanic woodpeckers (Campephilus magellanicus) assess the relative quality of trees: 1) Woodpeckers with local knowledge select trees based on the available trees in the immediate vicinity. 2) Woodpeckers lacking local knowledge select trees based on their availability at previously visited locations. 3) Woodpeckers using information from long-term memory select trees based on knowledge about trees available within the entire landscape. We observed foraging woodpeckers and used a Brownian Bridge Movement Model to identify trees available to woodpeckers along foraging routes. Woodpeckers selected trees with a later decay stage than available trees. Selection models indicated that preferences of Magellanic woodpeckers were based on clusters of trees near the most recently visited trees, thus suggesting that woodpeckers use visual cues from neighboring trees. In a second analysis, Cox's proportional hazards models showed that woodpeckers used information consolidated across broader spatial scales to adjust tree residence times. Specifically, woodpeckers spent more time at trees with larger diameters and in a more advanced stage of decay than trees available along their routes. These results suggest that Magellanic woodpeckers make foraging decisions based on the relative quality of trees that they perceive and memorize information at different spatio-temporal scales.
A dynamic fault tree model of a propulsion system
NASA Technical Reports Server (NTRS)
Xu, Hong; Dugan, Joanne Bechta; Meshkat, Leila
2006-01-01
We present a dynamic fault tree model of the benchmark propulsion system, and solve it using Galileo. Dynamic fault trees (DFT) extend traditional static fault trees with special gates to model spares and other sequence dependencies. Galileo solves DFT models using a judicious combination of automatically generated Markov and Binary Decision Diagram models. Galileo easily handles the complexities exhibited by the benchmark problem. In particular, Galileo is designed to model phased mission systems.
Structural Equation Model Trees
ERIC Educational Resources Information Center
Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman
2013-01-01
In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree…
The application of data mining techniques to oral cancer prognosis.
Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan
2015-05-01
This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.
Phan, Thanh G; Chen, Jian; Singhal, Shaloo; Ma, Henry; Clissold, Benjamin B; Ly, John; Beare, Richard
2018-01-01
Prognostication following hypoxic ischemic encephalopathy (brain injury) is important for clinical management. The aim of this exploratory study is to use a decision tree model to find clinical and MRI associates of severe disability and death in this condition. We evaluate clinical model and then the added value of MRI data. The inclusion criteria were as follows: age ≥17 years, cardio-respiratory arrest, and coma on admission (2003-2011). Decision tree analysis was used to find clinical [Glasgow Coma Score (GCS), features about cardiac arrest, therapeutic hypothermia, age, and sex] and MRI (infarct volume) associates of severe disability and death. We used the area under the ROC (auROC) to determine accuracy of model. There were 41 (63.7% males) patients having MRI imaging with the average age 51.5 ± 18.9 years old. The decision trees showed that infarct volume and age were important factors for discrimination between mild to moderate disability and severe disability and death at day 0 and day 2. The auROC for this model was 0.94 (95% CI 0.82-1.00). At day 7, GCS value was the only predictor; the auROC was 0.96 (95% CI 0.86-1.00). Our findings provide proof of concept for further exploration of the role of MR imaging and decision tree analysis in the early prognostication of hypoxic ischemic brain injury.
Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid
2017-04-01
Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.
Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine
NASA Technical Reports Server (NTRS)
Schwabacher, Mark A.; Aguilar, Robert; Figueroa, Fernando F.
2009-01-01
The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically "learns" a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to "train" and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it "learned" a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location.
Fang, H; Lu, B; Wang, X; Zheng, L; Sun, K; Cai, W
2017-08-17
This study proposed a decision tree model to screen upper urinary tract damage (UUTD) for patients with neurogenic bladder (NGB). Thirty-four NGB patients with UUTD were recruited in the case group, while 78 without UUTD were included in the control group. A decision tree method, classification and regression tree (CART), was then applied to develop the model in which UUTD was used as a dependent variable and history of urinary tract infections, bladder management, conservative treatment, and urodynamic findings were used as independent variables. The urethra function factor was found to be the primary screening information of patients and treated as the root node of the tree; Pabd max (maximum abdominal pressure, >14 cmH2O), Pves max (maximum intravesical pressure, ≤89 cmH2O), and gender (female) were also variables associated with UUTD. The accuracy of the proposed model was 84.8%, and the area under curve was 0.901 (95%CI=0.844-0.958), suggesting that the decision tree model might provide a new and convenient way to screen UUTD for NGB patients in both undeveloped and developing areas.
NASA Technical Reports Server (NTRS)
Tian, Jianhui; Porter, Adam; Zelkowitz, Marvin V.
1992-01-01
Identification of high cost modules has been viewed as one mechanism to improve overall system reliability, since such modules tend to produce more than their share of problems. A decision tree model was used to identify such modules. In this current paper, a previously developed axiomatic model of program complexity is merged with the previously developed decision tree process for an improvement in the ability to identify such modules. This improvement was tested using data from the NASA Software Engineering Laboratory.
Khosravi, Khabat; Pham, Binh Thai; Chapi, Kamran; Shirzadi, Ataollah; Shahabi, Himan; Revhaug, Inge; Prakash, Indra; Tien Bui, Dieu
2018-06-15
Floods are one of the most damaging natural hazards causing huge loss of property, infrastructure and lives. Prediction of occurrence of flash flood locations is very difficult due to sudden change in climatic condition and manmade factors. However, prior identification of flood susceptible areas can be done with the help of machine learning techniques for proper timely management of flood hazards. In this study, we tested four decision trees based machine learning models namely Logistic Model Trees (LMT), Reduced Error Pruning Trees (REPT), Naïve Bayes Trees (NBT), and Alternating Decision Trees (ADT) for flash flood susceptibility mapping at the Haraz Watershed in the northern part of Iran. For this, a spatial database was constructed with 201 present and past flood locations and eleven flood-influencing factors namely ground slope, altitude, curvature, Stream Power Index (SPI), Topographic Wetness Index (TWI), land use, rainfall, river density, distance from river, lithology, and Normalized Difference Vegetation Index (NDVI). Statistical evaluation measures, the Receiver Operating Characteristic (ROC) curve, and Freidman and Wilcoxon signed-rank tests were used to validate and compare the prediction capability of the models. Results show that the ADT model has the highest prediction capability for flash flood susceptibility assessment, followed by the NBT, the LMT, and the REPT, respectively. These techniques have proven successful in quickly determining flood susceptible areas. Copyright © 2018 Elsevier B.V. All rights reserved.
Batterham, Philip J; Christensen, Helen; Mackinnon, Andrew J
2009-11-22
Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.
Evolving optimised decision rules for intrusion detection using particle swarm paradigm
NASA Astrophysics Data System (ADS)
Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.
2012-12-01
The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.
Hostettler, Isabel Charlotte; Muroi, Carl; Richter, Johannes Konstantin; Schmid, Josef; Neidert, Marian Christoph; Seule, Martin; Boss, Oliver; Pangalu, Athina; Germans, Menno Robbert; Keller, Emanuela
2018-01-19
OBJECTIVE The aim of this study was to create prediction models for outcome parameters by decision tree analysis based on clinical and laboratory data in patients with aneurysmal subarachnoid hemorrhage (aSAH). METHODS The database consisted of clinical and laboratory parameters of 548 patients with aSAH who were admitted to the Neurocritical Care Unit, University Hospital Zurich. To examine the model performance, the cohort was randomly divided into a derivation cohort (60% [n = 329]; training data set) and a validation cohort (40% [n = 219]; test data set). The classification and regression tree prediction algorithm was applied to predict death, functional outcome, and ventriculoperitoneal (VP) shunt dependency. Chi-square automatic interaction detection was applied to predict delayed cerebral infarction on days 1, 3, and 7. RESULTS The overall mortality was 18.4%. The accuracy of the decision tree models was good for survival on day 1 and favorable functional outcome at all time points, with a difference between the training and test data sets of < 5%. Prediction accuracy for survival on day 1 was 75.2%. The most important differentiating factor was the interleukin-6 (IL-6) level on day 1. Favorable functional outcome, defined as Glasgow Outcome Scale scores of 4 and 5, was observed in 68.6% of patients. Favorable functional outcome at all time points had a prediction accuracy of 71.1% in the training data set, with procalcitonin on day 1 being the most important differentiating factor at all time points. A total of 148 patients (27%) developed VP shunt dependency. The most important differentiating factor was hyperglycemia on admission. CONCLUSIONS The multiple variable analysis capability of decision trees enables exploration of dependent variables in the context of multiple changing influences over the course of an illness. The decision tree currently generated increases awareness of the early systemic stress response, which is seemingly pertinent for prognostication.
Modeling time-to-event (survival) data using classification tree analysis.
Linden, Ariel; Yarnold, Paul R
2017-12-01
Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.
Comparison of Taxi Time Prediction Performance Using Different Taxi Speed Decision Trees
NASA Technical Reports Server (NTRS)
Lee, Hanbong
2017-01-01
In the STBO modeler and tactical surface scheduler for ATD-2 project, taxi speed decision trees are used to calculate the unimpeded taxi times of flights taxiing on the airport surface. The initial taxi speed values in these decision trees did not show good prediction accuracy of taxi times. Using the more recent, reliable surveillance data, new taxi speed values in ramp area and movement area were computed. Before integrating these values into the STBO system, we performed test runs using live data from Charlotte airport, with different taxi speed settings: 1) initial taxi speed values and 2) new ones. Taxi time prediction performance was evaluated by comparing various metrics. The results show that the new taxi speed decision trees can calculate the unimpeded taxi-out times more accurately.
Using Decision Trees for Estimating Mode Choice of Trips in Buca-Izmir
NASA Astrophysics Data System (ADS)
Oral, L. O.; Tecim, V.
2013-05-01
Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.
Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining.
Habibi, Shafi; Ahmadi, Maryam; Alizadeh, Somayeh
2015-03-18
The aim of this study was to examine a predictive model using features related to the diabetes type 2 risk factors. The data were obtained from a database in a diabetes control system in Tabriz, Iran. The data included all people referred for diabetes screening between 2009 and 2011. The features considered as "Inputs" were: age, sex, systolic and diastolic blood pressure, family history of diabetes, and body mass index (BMI). Moreover, we used diagnosis as "Class". We applied the "Decision Tree" technique and "J48" algorithm in the WEKA (3.6.10 version) software to develop the model. After data preprocessing and preparation, we used 22,398 records for data mining. The model precision to identify patients was 0.717. The age factor was placed in the root node of the tree as a result of higher information gain. The ROC curve indicates the model function in identification of patients and those individuals who are healthy. The curve indicates high capability of the model, especially in identification of the healthy persons. We developed a model using the decision tree for screening T2DM which did not require laboratory tests for T2DM diagnosis.
Ultrasonographic Diagnosis of Biliary Atresia Based on a Decision-Making Tree Model.
Lee, So Mi; Cheon, Jung-Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hae; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung
2015-01-01
To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.
Learning accurate very fast decision trees from uncertain data streams
NASA Astrophysics Data System (ADS)
Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo
2015-12-01
Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.
Ramezankhani, Azra; Pournik, Omid; Shahrabi, Jamal; Khalili, Davood; Azizi, Fereidoun; Hadaegh, Farzad
2014-09-01
The aim of this study was to create a prediction model using data mining approach to identify low risk individuals for incidence of type 2 diabetes, using the Tehran Lipid and Glucose Study (TLGS) database. For a 6647 population without diabetes, aged ≥20 years, followed for 12 years, a prediction model was developed using classification by the decision tree technique. Seven hundred and twenty-nine (11%) diabetes cases occurred during the follow-up. Predictor variables were selected from demographic characteristics, smoking status, medical and drug history and laboratory measures. We developed the predictive models by decision tree using 60 input variables and one output variable. The overall classification accuracy was 90.5%, with 31.1% sensitivity, 97.9% specificity; and for the subjects without diabetes, precision and f-measure were 92% and 0.95, respectively. The identified variables included fasting plasma glucose, body mass index, triglycerides, mean arterial blood pressure, family history of diabetes, educational level and job status. In conclusion, decision tree analysis, using routine demographic, clinical, anthropometric and laboratory measurements, created a simple tool to predict individuals at low risk for type 2 diabetes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Esmaily, Habibollah; Tayefi, Maryam; Doosti, Hassan; Ghayour-Mobarhan, Majid; Nezami, Hossein; Amirabadizadeh, Alireza
2018-04-24
We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program. A cross-sectional study. The MASHAD study started in 2010 and will continue until 2020. Two data mining tools, namely decision trees, and random forests, are used for predicting T2DM when some other characteristics are observed on 9528 subjects recruited from MASHAD database. This paper makes a comparison between these two models in terms of accuracy, sensitivity, specificity and the area under ROC curve. The prevalence rate of T2DM was 14% among these subjects. The decision tree model has 64.9% accuracy, 64.5% sensitivity, 66.8% specificity, and area under the ROC curve measuring 68.6%, while the random forest model has 71.1% accuracy, 71.3% sensitivity, 69.9% specificity, and area under the ROC curve measuring 77.3% respectively. The random forest model, when used with demographic, clinical, and anthropometric and biochemical measurements, can provide a simple tool to identify associated risk factors for type 2 diabetes. Such identification can substantially use for managing the health policy to reduce the number of subjects with T2DM .
NASA Astrophysics Data System (ADS)
Ragettli, S.; Zhou, J.; Wang, H.; Liu, C.
2017-12-01
Flash floods in small mountain catchments are one of the most frequent causes of loss of life and property from natural hazards in China. Hydrological models can be a useful tool for the anticipation of these events and the issuing of timely warnings. Since sub-daily streamflow information is unavailable for most small basins in China, one of the main challenges is finding appropriate parameter values for simulating flash floods in ungauged catchments. In this study, we use decision tree learning to explore parameter set transferability between different catchments. For this purpose, the physically-based, semi-distributed rainfall-runoff model PRMS-OMS is set up for 35 catchments in ten Chinese provinces. Hourly data from more than 800 storm runoff events are used to calibrate the model and evaluate the performance of parameter set transfers between catchments. For each catchment, 58 catchment attributes are extracted from several data sets available for whole China. We then use a data mining technique (decision tree learning) to identify catchment similarities that can be related to good transfer performance. Finally, we use the splitting rules of decision trees for finding suitable donor catchments for ungauged target catchments. We show that decision tree learning allows to optimally utilize the information content of available catchment descriptors and outperforms regionalization based on a conventional measure of physiographic-climatic similarity by 15%-20%. Similar performance can be achieved with a regionalization method based on spatial proximity, but decision trees offer flexible rules for selecting suitable donor catchments, not relying on the vicinity of gauged catchments. This flexibility makes the method particularly suitable for implementation in sparsely gauged environments. We evaluate the probability to detect flood events exceeding a given return period, considering measured discharge and PRMS-OMS simulated flows with regionalized parameters. Overall, the probability of detection of an event with a return period of 10 years is 62%. 44% of all 10-year flood peaks can be detected with a timing error of 2 hours or less. These results indicate that the modeling system can provide useful information about the timing and magnitude of flood events at ungauged sites.
Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling.
Tsipouras, Markos G; Exarchos, Themis P; Fotiadis, Dimitrios I; Kotsia, Anna P; Vakalis, Konstantinos V; Naka, Katerina K; Michalis, Lampros K
2008-07-01
A fuzzy rule-based decision support system (DSS) is presented for the diagnosis of coronary artery disease (CAD). The system is automatically generated from an initial annotated dataset, using a four stage methodology: 1) induction of a decision tree from the data; 2) extraction of a set of rules from the decision tree, in disjunctive normal form and formulation of a crisp model; 3) transformation of the crisp set of rules into a fuzzy model; and 4) optimization of the parameters of the fuzzy model. The dataset used for the DSS generation and evaluation consists of 199 subjects, each one characterized by 19 features, including demographic and history data, as well as laboratory examinations. Tenfold cross validation is employed, and the average sensitivity and specificity obtained is 62% and 54%, respectively, using the set of rules extracted from the decision tree (first and second stages), while the average sensitivity and specificity increase to 80% and 65%, respectively, when the fuzzification and optimization stages are used. The system offers several advantages since it is automatically generated, it provides CAD diagnosis based on easily and noninvasively acquired features, and is able to provide interpretation for the decisions made.
PCA based feature reduction to improve the accuracy of decision tree c4.5 classification
NASA Astrophysics Data System (ADS)
Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.
2018-03-01
Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.
Cost-effectiveness Analysis with Influence Diagrams.
Arias, M; Díez, F J
2015-01-01
Cost-effectiveness analysis (CEA) is used increasingly in medicine to determine whether the health benefit of an intervention is worth the economic cost. Decision trees, the standard decision modeling technique for non-temporal domains, can only perform CEA for very small problems. To develop a method for CEA in problems involving several dozen variables. We explain how to build influence diagrams (IDs) that explicitly represent cost and effectiveness. We propose an algorithm for evaluating cost-effectiveness IDs directly, i.e., without expanding an equivalent decision tree. The evaluation of an ID returns a set of intervals for the willingness to pay - separated by cost-effectiveness thresholds - and, for each interval, the cost, the effectiveness, and the optimal intervention. The algorithm that evaluates the ID directly is in general much more efficient than the brute-force method, which is in turn more efficient than the expansion of an equivalent decision tree. Using OpenMarkov, an open-source software tool that implements this algorithm, we have been able to perform CEAs on several IDs whose equivalent decision trees contain millions of branches. IDs can perform CEA on large problems that cannot be analyzed with decision trees.
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Louis R. Iverson; Anantha M. Prasad; Stephen N. Matthews; Matthew P. Peters
2010-01-01
Climate change will likely cause impacts that are species specific and significant; modeling is critical to better understand potential changes in suitable habitat. We use empirical, abundance-based habitat models utilizing decision tree-based ensemble methods to explore potential changes of 134 tree species habitats in the eastern United States (http://www.nrs.fs.fed....
Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen
2017-10-11
Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.
A multivariate decision tree analysis of biophysical factors in tropical forest fire occurrence
Rey S. Ofren; Edward Harvey
2000-01-01
A multivariate decision tree model was used to quantify the relative importance of complex hierarchical relationships between biophysical variables and the occurrence of tropical forest fires. The study site is the Huai Kha Kbaeng wildlife sanctuary, a World Heritage Site in northwestern Thailand where annual fires are common and particularly destructive. Thematic...
Decision tree and PCA-based fault diagnosis of rotating machinery
NASA Astrophysics Data System (ADS)
Sun, Weixiang; Chen, Jin; Li, Jiaqing
2007-04-01
After analysing the flaws of conventional fault diagnosis methods, data mining technology is introduced to fault diagnosis field, and a new method based on C4.5 decision tree and principal component analysis (PCA) is proposed. In this method, PCA is used to reduce features after data collection, preprocessing and feature extraction. Then, C4.5 is trained by using the samples to generate a decision tree model with diagnosis knowledge. At last the tree model is used to make diagnosis analysis. To validate the method proposed, six kinds of running states (normal or without any defect, unbalance, rotor radial rub, oil whirl, shaft crack and a simultaneous state of unbalance and radial rub), are simulated on Bently Rotor Kit RK4 to test C4.5 and PCA-based method and back-propagation neural network (BPNN). The result shows that C4.5 and PCA-based diagnosis method has higher accuracy and needs less training time than BPNN.
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
2014-12-01
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
NASA Technical Reports Server (NTRS)
Lee, Charles; Alena, Richard L.; Robinson, Peter
2004-01-01
We started from ISS fault trees example to migrate to decision trees, presented a method to convert fault trees to decision trees. The method shows that the visualizations of root cause of fault are easier and the tree manipulating becomes more programmatic via available decision tree programs. The visualization of decision trees for the diagnostic shows a format of straight forward and easy understands. For ISS real time fault diagnostic, the status of the systems could be shown by mining the signals through the trees and see where it stops at. The other advantage to use decision trees is that the trees can learn the fault patterns and predict the future fault from the historic data. The learning is not only on the static data sets but also can be online, through accumulating the real time data sets, the decision trees can gain and store faults patterns in the trees and recognize them when they come.
Space/age forestry: Implications of planting density and rotation age in SRIC management decisions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Merriam, R.A.; Phillips, V.D.; Liu, W.
1993-12-31
Short-rotation intensive-culture (SRIC) of promising tree crops is being evaluated worldwide for the production of methanol, ethanol, and electricity from renewable biomass resources. Planting density and rotation age are fundamental management decisions associated with SRIC energy plantations. Most studies of these variables have been conducted without the benefit of a unifying theory of the effects of growing space and rotation age on individual tree growth and stand level productivity. A modeling procedure based on field trials of Eucalyptus spp. is presented that evaluates the growth potential of a tree in the absence and presence of competition of neighboring trees inmore » a stand. The results of this analysis are useful in clarifying economic implications of different growing space and rotation age decisions that tree plantation managers must make. The procedure is readily applicable to other species under consideration for SRIC plantations at any location.« less
Sharon Hood; Duncan Lutes
2017-01-01
Accurate prediction of fire-caused tree mortality is critical for making sound land management decisions such as developing burning prescriptions and post-fire management guidelines. To improve efforts to predict post-fire tree mortality, we developed 3-year post-fire mortality models for 12 Western conifer species - white fir (Abies concolor [Gord. &...
Rezaei-Darzi, Ehsan; Farzadfar, Farshad; Hashemi-Meshkini, Amir; Navidi, Iman; Mahmoudi, Mahmoud; Varmaghani, Mehdi; Mehdipour, Parinaz; Soudi Alamdari, Mahsa; Tayefi, Batool; Naderimagham, Shohreh; Soleymani, Fatemeh; Mesdaghinia, Alireza; Delavari, Alireza; Mohammad, Kazem
2014-12-01
This study aimed to evaluate and compare the prediction accuracy of two data mining techniques, including decision tree and neural network models in labeling diagnosis to gastrointestinal prescriptions in Iran. This study was conducted in three phases: data preparation, training phase, and testing phase. A sample from a database consisting of 23 million pharmacy insurance claim records, from 2004 to 2011 was used, in which a total of 330 prescriptions were assessed and used to train and test the models simultaneously. In the training phase, the selected prescriptions were assessed by both a physician and a pharmacist separately and assigned a diagnosis. To test the performance of each model, a k-fold stratified cross validation was conducted in addition to measuring their sensitivity and specificity. Generally, two methods had very similar accuracies. Considering the weighted average of true positive rate (sensitivity) and true negative rate (specificity), the decision tree had slightly higher accuracy in its ability for correct classification (83.3% and 96% versus 80.3% and 95.1%, respectively). However, when the weighted average of ROC area (AUC between each class and all other classes) was measured, the ANN displayed higher accuracies in predicting the diagnosis (93.8% compared with 90.6%). According to the result of this study, artificial neural network and decision tree model represent similar accuracy in labeling diagnosis to GI prescription.
Mudali, D; Teune, L K; Renken, R J; Leenders, K L; Roerdink, J B T M
2015-01-01
Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data.
ERIC Educational Resources Information Center
Chang, Ting-Cheng; Wang, Hui
2016-01-01
This paper proposes a cloud multi-criteria group decision-making model for teacher evaluation in higher education which is involving subjectivity, imprecision and fuzziness. First, selecting the appropriate evaluation index depending on the evaluation objectives, indicating a clear structural relationship between the evaluation index and…
Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin
2016-01-01
Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328
Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano
2016-07-07
Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.
Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin
2016-05-20
In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.
NASA Astrophysics Data System (ADS)
Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano
2016-07-01
Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.
A new approach to enhance the performance of decision tree for classifying gene expression data.
Hassan, Md; Kotagiri, Ramamohanarao
2013-12-20
Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.
Automated Decision Tree Classification of Corneal Shape
Twa, Michael D.; Parthasarathy, Srinivasan; Roberts, Cynthia; Mahmoud, Ashraf M.; Raasch, Thomas W.; Bullimore, Mark A.
2011-01-01
Purpose The volume and complexity of data produced during videokeratography examinations present a challenge of interpretation. As a consequence, results are often analyzed qualitatively by subjective pattern recognition or reduced to comparisons of summary indices. We describe the application of decision tree induction, an automated machine learning classification method, to discriminate between normal and keratoconic corneal shapes in an objective and quantitative way. We then compared this method with other known classification methods. Methods The corneal surface was modeled with a seventh-order Zernike polynomial for 132 normal eyes of 92 subjects and 112 eyes of 71 subjects diagnosed with keratoconus. A decision tree classifier was induced using the C4.5 algorithm, and its classification performance was compared with the modified Rabinowitz–McDonnell index, Schwiegerling’s Z3 index (Z3), Keratoconus Prediction Index (KPI), KISA%, and Cone Location and Magnitude Index using recommended classification thresholds for each method. We also evaluated the area under the receiver operator characteristic (ROC) curve for each classification method. Results Our decision tree classifier performed equal to or better than the other classifiers tested: accuracy was 92% and the area under the ROC curve was 0.97. Our decision tree classifier reduced the information needed to distinguish between normal and keratoconus eyes using four of 36 Zernike polynomial coefficients. The four surface features selected as classification attributes by the decision tree method were inferior elevation, greater sagittal depth, oblique toricity, and trefoil. Conclusions Automated decision tree classification of corneal shape through Zernike polynomials is an accurate quantitative method of classification that is interpretable and can be generated from any instrument platform capable of raw elevation data output. This method of pattern classification is extendable to other classification problems. PMID:16357645
Kamphuis, C; Mollenhorst, H; Heesterbeek, J A P; Hogeveen, H
2010-08-01
The objective was to develop and validate a clinical mastitis (CM) detection model by means of decision-tree induction. For farmers milking with an automatic milking system (AMS), it is desirable that the detection model has a high level of sensitivity (Se), especially for more severe cases of CM, at a very high specificity (Sp). In addition, an alert for CM should be generated preferably at the quarter milking (QM) at which the CM infection is visible for the first time. Data were collected from 9 Dutch dairy herds milking automatically during a 2.5-yr period. Data included sensor data (electrical conductivity, color, and yield) at the QM level and visual observations of quarters with CM recorded by the farmers. Visual observations of quarters with CM were combined with sensor data of the most recent automatic milking recorded for that same quarter, within a 24-h time window before the visual assessment time. Sensor data of 3.5 million QM were collected, of which 348 QM were combined with a CM observation. Data were divided into a training set, including two-thirds of all data, and a test set. Cows in the training set were not included in the test set and vice versa. A decision-tree model was trained using only clear examples of healthy (n=24,717) or diseased (n=243) QM. The model was tested on 105 QM with CM and a random sample of 50,000 QM without CM. While keeping the Se at a level comparable to that of models currently used by AMS, the decision-tree model was able to decrease the number of false-positive alerts by more than 50%. At an Sp of 99%, 40% of the CM cases were detected. Sixty-four percent of the severe CM cases were detected and only 12.5% of the CM that were scored as watery milk. The Se increased considerably from 40% to 66.7% when the time window increased from less than 24h before the CM observation, to a time window from 24h before to 24h after the CM observation. Even at very wide time windows, however, it was impossible to reach an Se of 100%. This indicates the inability to detect all CM cases based on sensor data alone. Sensitivity levels varied largely when the decision tree was validated per herd. This trend was confirmed when decision trees were trained using data from 8 herds and tested on data from the ninth herd. This indicates that when using the decision tree as a generic CM detection model in practice, some herds will continue having difficulties in detecting CM using mastitis alert lists, whereas others will perform well. Copyright (c) 2010 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
An object-oriented forest landscape model and its representation of tree species
Hong S. He; David J. Mladenoff; Joel Boeder
1999-01-01
LANDIS is a forest landscape model that simulates the interaction of large landscape processes and forest successional dynamics at tree species level. We discuss how object-oriented design (OOD) approaches such as modularity, abstraction and encapsulation are integrated into the design of LANDIS. We show that using OOD approaches, model decisions (olden as model...
Safety validation of decision trees for hepatocellular carcinoma.
Wang, Xian-Qiang; Liu, Zhe; Lv, Wen-Ping; Luo, Ying; Yang, Guang-Yun; Li, Chong-Hui; Meng, Xiang-Fei; Liu, Yang; Xu, Ke-Sen; Dong, Jia-Hong
2015-08-21
To evaluate a different decision tree for safe liver resection and verify its efficiency. A total of 2457 patients underwent hepatic resection between January 2004 and December 2010 at the Chinese PLA General Hospital, and 634 hepatocellular carcinoma (HCC) patients were eligible for the final analyses. Post-hepatectomy liver failure (PHLF) was identified by the association of prothrombin time < 50% and serum bilirubin > 50 μmol/L (the "50-50" criteria), which were assessed at day 5 postoperatively or later. The Swiss-Clavien decision tree, Tokyo University-Makuuchi decision tree, and Chinese consensus decision tree were adopted to divide patients into two groups based on those decision trees in sequence, and the PHLF rates were recorded. The overall mortality and PHLF rate were 0.16% and 3.0%. A total of 19 patients experienced PHLF. The numbers of patients to whom the Swiss-Clavien, Tokyo University-Makuuchi, and Chinese consensus decision trees were applied were 581, 573, and 622, and the PHLF rates were 2.75%, 2.62%, and 2.73%, respectively. Significantly more cases satisfied the Chinese consensus decision tree than the Swiss-Clavien decision tree and Tokyo University-Makuuchi decision tree (P < 0.01,P < 0.01); nevertheless, the latter two shared no difference (P = 0.147). The PHLF rate exhibited no significant difference with respect to the three decision trees. The Chinese consensus decision tree expands the indications for hepatic resection for HCC patients and does not increase the PHLF rate compared to the Swiss-Clavien and Tokyo University-Makuuchi decision trees. It would be a safe and effective algorithm for hepatectomy in patients with hepatocellular carcinoma.
Accuracy and Calibration of Computational Approaches for Inpatient Mortality Predictive Modeling.
Nakas, Christos T; Schütz, Narayan; Werners, Marcus; Leichtle, Alexander B
2016-01-01
Electronic Health Record (EHR) data can be a key resource for decision-making support in clinical practice in the "big data" era. The complete database from early 2012 to late 2015 involving hospital admissions to Inselspital Bern, the largest Swiss University Hospital, was used in this study, involving over 100,000 admissions. Age, sex, and initial laboratory test results were the features/variables of interest for each admission, the outcome being inpatient mortality. Computational decision support systems were utilized for the calculation of the risk of inpatient mortality. We assessed the recently proposed Acute Laboratory Risk of Mortality Score (ALaRMS) model, and further built generalized linear models, generalized estimating equations, artificial neural networks, and decision tree systems for the predictive modeling of the risk of inpatient mortality. The Area Under the ROC Curve (AUC) for ALaRMS marginally corresponded to the anticipated accuracy (AUC = 0.858). Penalized logistic regression methodology provided a better result (AUC = 0.872). Decision tree and neural network-based methodology provided even higher predictive performance (up to AUC = 0.912 and 0.906, respectively). Additionally, decision tree-based methods can efficiently handle Electronic Health Record (EHR) data that have a significant amount of missing records (in up to >50% of the studied features) eliminating the need for imputation in order to have complete data. In conclusion, we show that statistical learning methodology can provide superior predictive performance in comparison to existing methods and can also be production ready. Statistical modeling procedures provided unbiased, well-calibrated models that can be efficient decision support tools for predicting inpatient mortality and assigning preventive measures.
A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks.
Alauthaman, Mohammad; Aslam, Nauman; Zhang, Li; Alasem, Rafe; Hossain, M A
2018-01-01
In recent years, Botnets have been adopted as a popular method to carry and spread many malicious codes on the Internet. These malicious codes pave the way to execute many fraudulent activities including spam mail, distributed denial-of-service attacks and click fraud. While many Botnets are set up using centralized communication architecture, the peer-to-peer (P2P) Botnets can adopt a decentralized architecture using an overlay network for exchanging command and control data making their detection even more difficult. This work presents a method of P2P Bot detection based on an adaptive multilayer feed-forward neural network in cooperation with decision trees. A classification and regression tree is applied as a feature selection technique to select relevant features. With these features, a multilayer feed-forward neural network training model is created using a resilient back-propagation learning algorithm. A comparison of feature set selection based on the decision tree, principal component analysis and the ReliefF algorithm indicated that the neural network model with features selection based on decision tree has a better identification accuracy along with lower rates of false positives. The usefulness of the proposed approach is demonstrated by conducting experiments on real network traffic datasets. In these experiments, an average detection rate of 99.08 % with false positive rate of 0.75 % was observed.
Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo
2016-08-31
Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.
Chen, Weisheng; Sun, Cheng; Wei, Ru; Zhang, Yanlin; Ye, Heng; Chi, Ruibin; Zhang, Yichen; Hu, Bei; Lv, Bo; Chen, Lifang; Zhang, Xiunong; Lan, Huilan; Chen, Chunbo
2018-01-01
Despite the use of prokinetic agents, the overall success rate for postpyloric placement via a self-propelled spiral nasoenteric tube is quite low. This retrospective study was conducted in the intensive care units of 11 university hospitals from 2006 to 2016 among adult patients who underwent self-propelled spiral nasoenteric tube insertion. Success was defined as postpyloric nasoenteric tube placement confirmed by abdominal x-ray scan 24 hours after tube insertion. Chi-square automatic interaction detection (CHAID), simple classification and regression trees (SimpleCart), and J48 methodologies were used to develop decision tree models, and multiple logistic regression (LR) methodology was used to develop an LR model for predicting successful postpyloric nasoenteric tube placement. The area under the receiver operating characteristic curve (AUC) was used to evaluate the performance of these models. Successful postpyloric nasoenteric tube placement was confirmed in 427 of 939 patients enrolled. For predicting successful postpyloric nasoenteric tube placement, the performance of the 3 decision trees was similar in terms of the AUCs: 0.715 for the CHAID model, 0.682 for the SimpleCart model, and 0.671 for the J48 model. The AUC of the LR model was 0.729, which outperformed the J48 model. Both the CHAID and LR models achieved an acceptable discrimination for predicting successful postpyloric nasoenteric tube placement and were useful for intensivists in the setting of self-propelled spiral nasoenteric tube insertion. © 2016 American Society for Parenteral and Enteral Nutrition.
Circum-Arctic petroleum systems identified using decision-tree chemometrics
Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.
2007-01-01
Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.
NASA Astrophysics Data System (ADS)
ShiouWei, L.
2014-12-01
Reservoirs are the most important water resources facilities in Taiwan.However,due to the steep slope and fragile geological conditions in the mountain area,storm events usually cause serious debris flow and flood,and the flood then will flush large amount of sediment into reservoirs.The sedimentation caused by flood has great impact on the reservoirs life.Hence,how to operate a reservoir during flood events to increase the efficiency of sediment desilting without risk the reservoir safety and impact the water supply afterward is a crucial issue in Taiwan. Therefore,this study developed a novel optimization planning model for reservoir flood operation considering flood control and sediment desilting,and proposed easy to use operating rules represented by decision trees.The decision trees rules have considered flood mitigation,water supply and sediment desilting.The optimal planning model computes the optimal reservoir release for each flood event that minimum water supply impact and maximum sediment desilting without risk the reservoir safety.Beside the optimal flood operation planning model,this study also proposed decision tree based flood operating rules that were trained by the multiple optimal reservoir releases to synthesis flood scenarios.The synthesis flood scenarios consists of various synthesis storm events,reservoir's initial storage and target storages at the end of flood operating. Comparing the results operated by the decision tree operation rules(DTOR) with that by historical operation for Krosa Typhoon in 2007,the DTOR removed sediment 15.4% more than that of historical operation with reservoir storage only8.38×106m3 less than that of historical operation.For Jangmi Typhoon in 2008,the DTOR removed sediment 24.4% more than that of historical operation with reservoir storage only 7.58×106m3 less than that of historical operation.The results show that the proposed DTOR model can increase the sediment desilting efficiency and extend the reservoir life.
Structural Equation Model Trees
Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman
2015-01-01
In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree structures that separate a data set recursively into subsets with significantly different parameter estimates in a SEM. SEM Trees provide means for finding covariates and covariate interactions that predict differences in structural parameters in observed as well as in latent space and facilitate theory-guided exploration of empirical data. We describe the methodology, discuss theoretical and practical implications, and demonstrate applications to a factor model and a linear growth curve model. PMID:22984789
Using decision trees to understand structure in missing data
Tierney, Nicholas J; Harden, Fiona A; Harden, Maurice J; Mengersen, Kerrie L
2015-01-01
Objectives Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting Data taken from employees at 3 different industrial sites in Australia. Participants 7915 observations were included. Materials and methods The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions Researchers are encouraged to use CART and BRT models to explore and understand missing data. PMID:26124509
The risk factors of laryngeal pathology in Korean adults using a decision tree model.
Byeon, Haewon
2015-01-01
The purpose of this study was to identify risk factors affecting laryngeal pathology in the Korean population and to evaluate the derived prediction model. Cross-sectional study. Data were drawn from the 2008 Korea National Health and Nutritional Examination Survey. The subjects were 3135 persons (1508 male and 2114 female) aged 19 years and older living in the community. The independent variables were age, sex, occupation, smoking, alcohol drinking, and self-reported voice problems. A decision tree analysis was done to identify risk factors for predicting a model of laryngeal pathology. The significant risk factors of laryngeal pathology were age, gender, occupation, smoking, and self-reported voice problem in decision tree model. Four significant paths were identified in the decision tree model for the prediction of laryngeal pathology. Those identified as high risk groups for laryngeal pathology included those who self-reported a voice problem, those who were males in their 50s who did not recognize a voice problem, those who were not economically active males in their 40s, and male workers aged 19 and over and under 50 or 60 and over who currently smoked. The results of this study suggest that individual risk factors, such as age, sex, occupation, health behavior, and self-reported voice problem, affect the onset of laryngeal pathology in a complex manner. Based on the results of this study, early management of the high-risk groups is needed for the prevention of laryngeal pathology. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036
ERIC Educational Resources Information Center
Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.
2014-01-01
This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…
Theory of the decision/problem state
NASA Technical Reports Server (NTRS)
Dieterly, D. L.
1980-01-01
A theory of the decision-problem state was introduced and elaborated. Starting with the basic model of a decision-problem condition, an attempt was made to explain how a major decision-problem may consist of subsets of decision-problem conditions composing different condition sequences. In addition, the basic classical decision-tree model was modified to allow for the introduction of a series of characteristics that may be encountered in an analysis of a decision-problem state. The resulting hierarchical model reflects the unique attributes of the decision-problem state. The basic model of a decision-problem condition was used as a base to evolve a more complex model that is more representative of the decision-problem state and may be used to initiate research on decision-problem states.
Mani, Ashutosh; Rao, Marepalli; James, Kelley; Bhattacharya, Amit
2015-01-01
The purpose of this study was to explore data-driven models, based on decision trees, to develop practical and easy to use predictive models for early identification of firefighters who are likely to cross the threshold of hyperthermia during live-fire training. Predictive models were created for three consecutive live-fire training scenarios. The final predicted outcome was a categorical variable: will a firefighter cross the upper threshold of hyperthermia - Yes/No. Two tiers of models were built, one with and one without taking into account the outcome (whether a firefighter crossed hyperthermia or not) from the previous training scenario. First tier of models included age, baseline heart rate and core body temperature, body mass index, and duration of training scenario as predictors. The second tier of models included the outcome of the previous scenario in the prediction space, in addition to all the predictors from the first tier of models. Classification and regression trees were used independently for prediction. The response variable for the regression tree was the quantitative variable: core body temperature at the end of each scenario. The predicted quantitative variable from regression trees was compared to the upper threshold of hyperthermia (38°C) to predict whether a firefighter would enter hyperthermia. The performance of classification and regression tree models was satisfactory for the second (success rate = 79%) and third (success rate = 89%) training scenarios but not for the first (success rate = 43%). Data-driven models based on decision trees can be a useful tool for predicting physiological response without modeling the underlying physiological systems. Early prediction of heat stress coupled with proactive interventions, such as pre-cooling, can help reduce heat stress in firefighters.
Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung
2015-12-01
This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
Branch: an interactive, web-based tool for testing hypotheses and developing predictive models.
Gangavarapu, Karthik; Babji, Vyshakh; Meißner, Tobias; Su, Andrew I; Good, Benjamin M
2016-07-01
Branch is a web application that provides users with the ability to interact directly with large biomedical datasets. The interaction is mediated through a collaborative graphical user interface for building and evaluating decision trees. These trees can be used to compose and test sophisticated hypotheses and to develop predictive models. Decision trees are built and evaluated based on a library of imported datasets and can be stored in a collective area for sharing and re-use. Branch is hosted at http://biobranch.org/ and the open source code is available at http://bitbucket.org/sulab/biobranch/ asu@scripps.edu or bgood@scripps.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
An Intelligent Decision Support System for Workforce Forecast
2011-01-01
ARIMA ) model to forecast the demand for construction skills in Hong Kong. This model was based...Decision Trees ARIMA Rule Based Forecasting Segmentation Forecasting Regression Analysis Simulation Modeling Input-Output Models LP and NLP Markovian...data • When results are needed as a set of easily interpretable rules 4.1.4 ARIMA Auto-regressive, integrated, moving-average ( ARIMA ) models
Angelis, Aris; Kanavos, Panos
2017-09-01
Escalating drug prices have catalysed the generation of numerous "value frameworks" with the aim of informing payers, clinicians and patients on the assessment and appraisal process of new medicines for the purpose of coverage and treatment selection decisions. Although this is an important step towards a more inclusive Value Based Assessment (VBA) approach, aspects of these frameworks are based on weak methodologies and could potentially result in misleading recommendations or decisions. In this paper, a Multiple Criteria Decision Analysis (MCDA) methodological process, based on Multi Attribute Value Theory (MAVT), is adopted for building a multi-criteria evaluation model. A five-stage model-building process is followed, using a top-down "value-focused thinking" approach, involving literature reviews and expert consultations. A generic value tree is structured capturing decision-makers' concerns for assessing the value of new medicines in the context of Health Technology Assessment (HTA) and in alignment with decision theory. The resulting value tree (Advance Value Tree) consists of three levels of criteria (top level criteria clusters, mid-level criteria, bottom level sub-criteria or attributes) relating to five key domains that can be explicitly measured and assessed: (a) burden of disease, (b) therapeutic impact, (c) safety profile (d) innovation level and (e) socioeconomic impact. A number of MAVT modelling techniques are introduced for operationalising (i.e. estimating) the model, for scoring the alternative treatment options, assigning relative weights of importance to the criteria, and combining scores and weights. Overall, the combination of these MCDA modelling techniques for the elicitation and construction of value preferences across the generic value tree provides a new value framework (Advance Value Framework) enabling the comprehensive measurement of value in a structured and transparent way. Given its flexibility to meet diverse requirements and become readily adaptable across different settings, the Advance Value Framework could be offered as a decision-support tool for evaluators and payers to aid coverage and reimbursement of new medicines. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Moon, Mikyung; Lee, Soo-Kyoung
2017-01-01
The purpose of this study was to use decision tree analysis to explore the factors associated with pressure ulcers (PUs) among elderly people admitted to Korean long-term care facilities. The data were extracted from the 2014 National Inpatient Sample (NIS)-data of Health Insurance Review and Assessment Service (HIRA). A MapReduce-based program was implemented to join and filter 5 tables of the NIS. The outcome predicted by the decision tree model was the prevalence of PUs as defined by the Korean Standard Classification of Disease-7 (KCD-7; code L89 * ). Using R 3.3.1, a decision tree was generated with the finalized 15,856 cases and 830 variables. The decision tree displayed 15 subgroups with 8 variables showing 0.804 accuracy, 0.820 sensitivity, and 0.787 specificity. The most significant primary predictor of PUs was length of stay less than 0.5 day. Other predictors were the presence of an infectious wound dressing, followed by having diagnoses numbering less than 3.5 and the presence of a simple dressing. Among diagnoses, "injuries to the hip and thigh" was the top predictor ranking 5th overall. Total hospital cost exceeding 2,200,000 Korean won (US $2,000) rounded out the top 7. These results support previous studies that showed length of stay, comorbidity, and total hospital cost were associated with PUs. Moreover, wound dressings were commonly used to treat PUs. They also show that machine learning, such as a decision tree, could effectively predict PUs using big data.
Tools of the Future: How Decision Tree Analysis Will Impact Mission Planning
NASA Technical Reports Server (NTRS)
Otterstatter, Matthew R.
2005-01-01
The universe is infinitely complex; however, the human mind has a finite capacity. The multitude of possible variables, metrics, and procedures in mission planning are far too many to address exhaustively. This is unfortunate because, in general, considering more possibilities leads to more accurate and more powerful results. To compensate, we can get more insightful results by employing our greatest tool, the computer. The power of the computer will be utilized through a technology that considers every possibility, decision tree analysis. Although decision trees have been used in many other fields, this is innovative for space mission planning. Because this is a new strategy, no existing software is able to completely accommodate all of the requirements. This was determined through extensive research and testing of current technologies. It was necessary to create original software, for which a short-term model was finished this summer. The model was built into Microsoft Excel to take advantage of the familiar graphical interface for user input, computation, and viewing output. Macros were written to automate the process of tree construction, optimization, and presentation. The results are useful and promising. If this tool is successfully implemented in mission planning, our reliance on old-fashioned heuristics, an error-prone shortcut for handling complexity, will be reduced. The computer algorithms involved in decision trees will revolutionize mission planning. The planning will be faster and smarter, leading to optimized missions with the potential for more valuable data.
Decision tree analysis of factors influencing rainfall-related building damage
NASA Astrophysics Data System (ADS)
Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.
2014-04-01
Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a large part of the variance in claim frequency is left unexplained, which is likely to be caused by variations in data at subdistrict scale and missing explanatory variables.
Decision-Tree Formulation With Order-1 Lateral Execution
NASA Technical Reports Server (NTRS)
James, Mark
2007-01-01
A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive-disjunctive paths to a flattened conjunctive form composed only of equality checks when possible. If each reduced conjunctive form contains only equality checks and all of these forms use the same variables, then the decision tree can be reduced to an order-one operation through a table lookup. The speedup to order one is accomplished by distributing each decision variable over a surface of a multidimensional object by mapping the equality constant to an index
Awaysheh, Abdullah; Wilcke, Jeffrey; Elvinger, François; Rees, Loren; Fan, Weiguo; Zimmerman, Kurt L
2016-11-01
Inflammatory bowel disease (IBD) and alimentary lymphoma (ALA) are common gastrointestinal diseases in cats. The very similar clinical signs and histopathologic features of these diseases make the distinction between them diagnostically challenging. We tested the use of supervised machine-learning algorithms to differentiate between the 2 diseases using data generated from noninvasive diagnostic tests. Three prediction models were developed using 3 machine-learning algorithms: naive Bayes, decision trees, and artificial neural networks. The models were trained and tested on data from complete blood count (CBC) and serum chemistry (SC) results for the following 3 groups of client-owned cats: normal, inflammatory bowel disease (IBD), or alimentary lymphoma (ALA). Naive Bayes and artificial neural networks achieved higher classification accuracy (sensitivities of 70.8% and 69.2%, respectively) than the decision tree algorithm (63%, p < 0.0001). The areas under the receiver-operating characteristic curve for classifying cases into the 3 categories was 83% by naive Bayes, 79% by decision tree, and 82% by artificial neural networks. Prediction models using machine learning provided a method for distinguishing between ALA-IBD, ALA-normal, and IBD-normal. The naive Bayes and artificial neural networks classifiers used 10 and 4 of the CBC and SC variables, respectively, to outperform the C4.5 decision tree, which used 5 CBC and SC variables in classifying cats into the 3 classes. These models can provide another noninvasive diagnostic tool to assist clinicians with differentiating between IBD and ALA, and between diseased and nondiseased cats. © 2016 The Author(s).
Context-Sensitive Ethics in School Psychology
ERIC Educational Resources Information Center
Lasser, Jon; Klose, Laurie McGarry; Robillard, Rachel
2013-01-01
Ethical codes and licensing rules provide foundational guidance for practicing school psychologists, but these sources fall short in their capacity to facilitate effective decision-making. When faced with ethical dilemmas, school psychologists can turn to decision-making models, but step-wise decision trees frequently lack the situation…
Sancak, Eyup Burak; Kılınç, Muhammet Fatih; Yücebaş, Sait Can
2017-01-01
The decision on the choice of proximal ureteral stone therapy depends on many factors, and sometimes urologists have difficulty in choosing the treatment option. This study is aimed at evaluating the factors affecting the success of semirigid ureterorenoscopy (URS) using the "decision tree" method. From January 2005 to November 2015, the data of consecutive patients treated for proximal ureteral stone were retrospectively analyzed. A total of 920 patients with proximal ureteral stone treated with semirigid URS were included in the study. All statistically significant attributes were tested using the decision tree method. The model created using decision tree had a sensitivity of 0.993 and an accuracy of 0.857. While URS treatment was successful in 752 patients (81.7%), it was unsuccessful in 168 patients (18.3%). According to the decision tree method, the most important factor affecting the success of URS is whether the stone is impacted to the ureteral wall. The second most important factor affecting treatment was intramural stricture requiring dilatation if the stone is impacted, and the size of the stone if not impacted. Our study suggests that the impacted stone, intramural stricture requiring dilatation and stone size may have a significant effect on the success rate of semirigid URS for proximal ureteral stone. Further studies with population-based and longitudinal design should be conducted to confirm this finding. © 2017 S. Karger AG, Basel.
Online adaptive decision trees: pattern classification and function approximation.
Basak, Jayanta
2006-09-01
Recently we have shown that decision trees can be trained in the online adaptive (OADT) mode (Basak, 2004), leading to better generalization score. OADTs were bottlenecked by the fact that they are able to handle only two-class classification tasks with a given structure. In this article, we provide an architecture based on OADT, ExOADT, which can handle multiclass classification tasks and is able to perform function approximation. ExOADT is structurally similar to OADT extended with a regression layer. We also show that ExOADT is capable not only of adapting the local decision hyperplanes in the nonterminal nodes but also has the potential of smoothly changing the structure of the tree depending on the data samples. We provide the learning rules based on steepest gradient descent for the new model ExOADT. Experimentally we demonstrate the effectiveness of ExOADT in the pattern classification and function approximation tasks. Finally, we briefly discuss the relationship of ExOADT with other classification models.
A hybrid method for classifying cognitive states from fMRI data.
Parida, S; Dehuri, S; Cho, S-B; Cacha, L A; Poznanski, R R
2015-09-01
Functional magnetic resonance imaging (fMRI) makes it possible to detect brain activities in order to elucidate cognitive-states. The complex nature of fMRI data requires under-standing of the analyses applied to produce possible avenues for developing models of cognitive state classification and improving brain activity prediction. While many models of classification task of fMRI data analysis have been developed, in this paper, we present a novel hybrid technique through combining the best attributes of genetic algorithms (GAs) and ensemble decision tree technique that consistently outperforms all other methods which are being used for cognitive-state classification. Specifically, this paper illustrates the combined effort of decision-trees ensemble and GAs for feature selection through an extensive simulation study and discusses the classification performance with respect to fMRI data. We have shown that our proposed method exhibits significant reduction of the number of features with clear edge classification accuracy over ensemble of decision-trees.
The economic impact of pig-associated parasitic zoonosis in Northern Lao PDR.
Choudhury, Adnan Ali Khan; Conlan, James V; Racloz, Vanessa Nadine; Reid, Simon Andrew; Blacksell, Stuart D; Fenwick, Stanley G; Thompson, Andrew R C; Khamlome, Boualam; Vongxay, Khamphouth; Whittaker, Maxine
2013-03-01
The parasitic zoonoses human cysticercosis (Taenia solium), taeniasis (other Taenia species) and trichinellosis (Trichinella species) are endemic in the Lao People's Democratic Republic (Lao PDR). This study was designed to quantify the economic burden pig-associated zoonotic disease pose in Lao PDR. In particular, the analysis included estimation of the losses in the pork industry as well as losses due to human illness and lost productivity. A Markov-probability based decision-tree model was chosen to form the basis of the calculations to estimate the economic and public health impacts of taeniasis, trichinellosis and cysticercosis. Two different decision trees were run simultaneously on the model's human cohort. A third decision tree simulated the potential impacts on pig production. The human capital method was used to estimate productivity loss. The results found varied significantly depending on the rate of hospitalisation due to neurocysticerosis. This study is the first systematic estimate of the economic impact of pig-associated zoonotic diseases in Lao PDR that demonstrates the significance of the diseases in that country.
Prediction of Weather Impacted Airport Capacity using Ensemble Learning
NASA Technical Reports Server (NTRS)
Wang, Yao Xun
2011-01-01
Ensemble learning with the Bagging Decision Tree (BDT) model was used to assess the impact of weather on airport capacities at selected high-demand airports in the United States. The ensemble bagging decision tree models were developed and validated using the Federal Aviation Administration (FAA) Aviation System Performance Metrics (ASPM) data and weather forecast at these airports. The study examines the performance of BDT, along with traditional single Support Vector Machines (SVM), for airport runway configuration selection and airport arrival rates (AAR) prediction during weather impacts. Testing of these models was accomplished using observed weather, weather forecast, and airport operation information at the chosen airports. The experimental results show that ensemble methods are more accurate than a single SVM classifier. The airport capacity ensemble method presented here can be used as a decision support model that supports air traffic flow management to meet the weather impacted airport capacity in order to reduce costs and increase safety.
Anantha M. Prasad; Louis R. Iverson; Stephen N. Matthews; Matthew P. Peters
2016-01-01
Context. No single model can capture the complex species range dynamics under changing climates--hence the need for a combination approach that addresses management concerns. Objective. A multistage approach is illustrated to manage forested landscapes under climate change. We combine a tree species habitat model--DISTRIB II, a species colonization model--SHIFT, and...
Case-based explanation of non-case-based learning methods.
Caruana, R.; Kangarloo, H.; Dionisio, J. D.; Sinha, U.; Johnson, D.
1999-01-01
We show how to generate case-based explanations for non-case-based learning methods such as artificial neural nets or decision trees. The method uses the trained model (e.g., the neural net or the decision tree) as a distance metric to determine which cases in the training set are most similar to the case that needs to be explained. This approach is well suited to medical domains, where it is important to understand predictions made by complex machine learning models, and where training and clinical practice makes users adept at case interpretation. PMID:10566351
Derivative Trade Optimizing Model Utilizing GP Based on Behavioral Finance Theory
NASA Astrophysics Data System (ADS)
Matsumura, Koki; Kawamoto, Masaru
This paper proposed a new technique which makes the strategy trees for the derivative (option) trading investment decision based on the behavioral finance theory and optimizes it using evolutionary computation, in order to achieve high profitability. The strategy tree uses a technical analysis based on a statistical, experienced technique for the investment decision. The trading model is represented by various technical indexes, and the strategy tree is optimized by the genetic programming(GP) which is one of the evolutionary computations. Moreover, this paper proposed a method using the prospect theory based on the behavioral finance theory to set psychological bias for profit and deficit and attempted to select the appropriate strike price of option for the higher investment efficiency. As a result, this technique produced a good result and found the effectiveness of this trading model by the optimized dealings strategy.
Aguirre-Junco, Angel-Ricardo; Colombet, Isabelle; Zunino, Sylvain; Jaulent, Marie-Christine; Leneveut, Laurence; Chatellier, Gilles
2004-01-01
The initial step for the computerization of guidelines is the knowledge specification from the prose text of guidelines. We describe a method of knowledge specification based on a structured and systematic analysis of text allowing detailed specification of a decision tree. We use decision tables to validate the decision algorithm and decision trees to specify and represent this algorithm, along with elementary messages of recommendation. Edition tools are also necessary to facilitate the process of validation and workflow between expert physicians who will validate the specified knowledge and computer scientist who will encode the specified knowledge in a guide-line model. Applied to eleven different guidelines issued by an official agency, the method allows a quick and valid computerization and integration in a larger decision support system called EsPeR (Personalized Estimate of Risks). The quality of the text guidelines is however still to be developed further. The method used for computerization could help to define a framework usable at the initial step of guideline development in order to produce guidelines ready for electronic implementation.
Chen, Xiao Yu; Ma, Li Zhuang; Chu, Na; Zhou, Min; Hu, Yiyang
2013-01-01
Chronic hepatitis B (CHB) is a serious public health problem, and Traditional Chinese Medicine (TCM) plays an important role in the control and treatment for CHB. In the treatment of TCM, zheng discrimination is the most important step. In this paper, an approach based on CFS-GA (Correlation based Feature Selection and Genetic Algorithm) and C5.0 boost decision tree is used for zheng classification and progression in the TCM treatment of CHB. The CFS-GA performs better than the typical method of CFS. By CFS-GA, the acquired attribute subset is classified by C5.0 boost decision tree for TCM zheng classification of CHB, and C5.0 decision tree outperforms two typical decision trees of NBTree and REPTree on CFS-GA, CFS, and nonselection in comparison. Based on the critical indicators from C5.0 decision tree, important lab indicators in zheng progression are obtained by the method of stepwise discriminant analysis for expressing TCM zhengs in CHB, and alterations of the important indicators are also analyzed in zheng progression. In conclusion, all the three decision trees perform better on CFS-GA than on CFS and nonselection, and C5.0 decision tree outperforms the two typical decision trees both on attribute selection and nonselection.
Spatial distribution of block falls using volumetric GIS-decision-tree models
NASA Astrophysics Data System (ADS)
Abdallah, C.
2010-10-01
Block falls are considered a significant aspect of surficial instability contributing to losses in land and socio-economic aspects through their damaging effects to natural and human environments. This paper predicts and maps the geographic distribution and volumes of block falls in central Lebanon using remote sensing, geographic information systems (GIS) and decision-tree modeling (un-pruned and pruned trees). Eleven terrain parameters (lithology, proximity to fault line, karst type, soil type, distance to drainage line, elevation, slope gradient, slope aspect, slope curvature, land cover/use, and proximity to roads) were generated to statistically explain the occurrence of block falls. The latter were discriminated using SPOT4 satellite imageries, and their dimensions were determined during field surveys. The un-pruned tree model based on all considered parameters explained 86% of the variability in field block fall measurements. Once pruned, it classifies 50% in block falls' volumes by selecting just four parameters (lithology, slope gradient, soil type, and land cover/use). Both tree models (un-pruned and pruned) were converted to quantitative 1:50,000 block falls' maps with different classes; starting from Nil (no block falls) to more than 4000 m 3. These maps are fairly matching with coincidence value equal to 45%; however, both can be used to prioritize the choice of specific zones for further measurement and modeling, as well as for land-use management. The proposed tree models are relatively simple, and may also be applied to other areas (i.e. the choice of un-pruned or pruned model is related to the availability of terrain parameters in a given area).
James, Lachlan P; Robertson, Sam; Haff, G Gregory; Beckman, Emma M; Kelly, Vincent G
2017-03-01
To determine those performance indicators that have the greatest influence on classifying outcome at the elite level of mixed martial arts (MMA). A secondary objective was to establish the efficacy of decision tree analysis in explaining the characteristics of victory when compared to alternate statistical methods. Cross-sectional observational. Eleven raw performance indicators from male Ultimate Fighting Championship bouts (n=234) from July 2014 to December 2014 were screened for analysis. Each raw performance indicator was also converted to a rate-dependent measure to be scaled to fight duration. Further, three additional performance indicators were calculated from the dataset and included in the analysis. Cohen's d effect sizes were employed to determine the magnitude of the differences between Wins and Losses, while decision tree (chi-square automatic interaction detector (CHAID)) and discriminant function analyses (DFA) were used to classify outcome (Win and Loss). Effect size comparisons revealed differences between Wins and Losses across a number of performance indicators. Decision tree (raw: 71.8%; rate-scaled: 76.3%) and DFA (raw: 71.4%; rate-scaled 71.2%) achieved similar classification accuracies. Grappling and accuracy performance indicators were the most influential in explaining outcome. The decision tree models also revealed multiple combinations of performance indicators leading to victory. The decision tree analyses suggest that grappling activity and technique accuracy are of particular importance in achieving victory in elite-level MMA competition. The DFA results supported the importance of these performance indicators. Decision tree induction represents an intuitive and slightly more accurate approach to explaining bout outcome in this sport when compared to DFA. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
2012-03-01
with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random
VC-dimension of univariate decision trees.
Yildiz, Olcay Taner
2015-02-01
In this paper, we give and prove the lower bounds of the Vapnik-Chervonenkis (VC)-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. Via a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively, we show that our VC-dimension bounds are tight for simple trees. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using structural risk minimization in decision trees, i.e., pruning. Our simulation results show that structural risk minimization pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross validation.
A template-finding algorithm and a comprehensive benchmark for homology modeling of proteins
Vallat, Brinda Kizhakke; Pillardy, Jaroslaw; Elber, Ron
2010-01-01
The first step in homology modeling is to identify a template protein for the target sequence. The template structure is used in later phases of the calculation to construct an atomically detailed model for the target. We have built from the Protein Data Bank a large-scale learning set that includes tens of millions of pair matches that can be either a true template or a false one. Discriminatory learning (learning from positive and negative examples) is employed to train a decision tree. Each branch of the tree is a mathematical programming model. The decision tree is tested on an independent set from PDB entries and on the sequences of CASP7. It provides significant enrichment of true templates (between 50-100 percent) when compared to PSI-BLAST. The model is further verified by building atomically detailed structures for each of the tentative true templates with modeller. The probability that a true match does not yield an acceptable structural model (within 6Å RMSD from the native structure), decays linearly as a function of the TM structural-alignment score. PMID:18300226
A Comparison of Four Software Programs for Implementing Decision Analytic Cost-Effectiveness Models.
Hollman, Chase; Paulden, Mike; Pechlivanoglou, Petros; McCabe, Christopher
2017-08-01
The volume and technical complexity of both academic and commercial research using decision analytic modelling has increased rapidly over the last two decades. The range of software programs used for their implementation has also increased, but it remains true that a small number of programs account for the vast majority of cost-effectiveness modelling work. We report a comparison of four software programs: TreeAge Pro, Microsoft Excel, R and MATLAB. Our focus is on software commonly used for building Markov models and decision trees to conduct cohort simulations, given their predominance in the published literature around cost-effectiveness modelling. Our comparison uses three qualitative criteria as proposed by Eddy et al.: "transparency and validation", "learning curve" and "capability". In addition, we introduce the quantitative criterion of processing speed. We also consider the cost of each program to academic users and commercial users. We rank the programs based on each of these criteria. We find that, whilst Microsoft Excel and TreeAge Pro are good programs for educational purposes and for producing the types of analyses typically required by health technology assessment agencies, the efficiency and transparency advantages of programming languages such as MATLAB and R become increasingly valuable when more complex analyses are required.
Tanaka, Tomohiro; Voigt, Michael D
2018-03-01
Non-melanoma skin cancer (NMSC) is the most common de novo malignancy in liver transplant (LT) recipients; it behaves more aggressively and it increases mortality. We used decision tree analysis to develop a tool to stratify and quantify risk of NMSC in LT recipients. We performed Cox regression analysis to identify which predictive variables to enter into the decision tree analysis. Data were from the Organ Procurement Transplant Network (OPTN) STAR files of September 2016 (n = 102984). NMSC developed in 4556 of the 105984 recipients, a mean of 5.6 years after transplant. The 5/10/20-year rates of NMSC were 2.9/6.3/13.5%, respectively. Cox regression identified male gender, Caucasian race, age, body mass index (BMI) at LT, and sirolimus use as key predictive or protective factors for NMSC. These factors were entered into a decision tree analysis. The final tree stratified non-Caucasians as low risk (0.8%), and Caucasian males > 47 years, BMI < 40 who did not receive sirolimus, as high risk (7.3% cumulative incidence of NMSC). The predictions in the derivation set were almost identical to those in the validation set (r 2 = 0.971, p < 0.0001). Cumulative incidence of NMSC in low, moderate and high risk groups at 5/10/20 year was 0.5/1.2/3.3, 2.1/4.8/11.7 and 5.6/11.6/23.1% (p < 0.0001). The decision tree model accurately stratifies the risk of developing NMSC in the long-term after LT.
NASA Astrophysics Data System (ADS)
Park, J.; Yoo, K.
2013-12-01
For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.
The Decision Tree: A Tool for Achieving Behavioral Change.
ERIC Educational Resources Information Center
Saren, Dru
1999-01-01
Presents a "Decision Tree" process for structuring team decision making and problem solving about specific student behavioral goals. The Decision Tree involves a sequence of questions/decisions that can be answered in "yes/no" terms. Questions address reasonableness of the goal, time factors, importance of the goal, responsibilities, safety,…
Using Predictive Analytics to Predict Power Outages from Severe Weather
NASA Astrophysics Data System (ADS)
Wanik, D. W.; Anagnostou, E. N.; Hartman, B.; Frediani, M. E.; Astitha, M.
2015-12-01
The distribution of reliable power is essential to businesses, public services, and our daily lives. With the growing abundance of data being collected and created by industry (i.e. outage data), government agencies (i.e. land cover), and academia (i.e. weather forecasts), we can begin to tackle problems that previously seemed too complex to solve. In this session, we will present newly developed tools to aid decision-support challenges at electric distribution utilities that must mitigate, prepare for, respond to and recover from severe weather. We will show a performance evaluation of outage predictive models built for Eversource Energy (formerly Connecticut Light & Power) for storms of all types (i.e. blizzards, thunderstorms and hurricanes) and magnitudes (from 20 to >15,000 outages). High resolution weather simulations (simulated with the Weather and Research Forecast Model) were joined with utility outage data to calibrate four types of models: a decision tree (DT), random forest (RF), boosted gradient tree (BT) and an ensemble (ENS) decision tree regression that combined predictions from DT, RF and BT. The study shows that the ENS model forced with weather, infrastructure and land cover data was superior to the other models we evaluated, especially in terms of predicting the spatial distribution of outages. This research has the potential to be used for other critical infrastructure systems (such as telecommunications, drinking water and gas distribution networks), and can be readily expanded to the entire New England region to facilitate better planning and coordination among decision-makers when severe weather strikes.
Lee, Daniel Joseph; Veneri, Diana A
2018-05-01
The most common complaint lower limb prosthesis users report is inadequacy of a proper socket fit. Adjustments to the residual limb-socket interface can be made by the prosthesis user without consultation of a clinician in many scenarios through skilled self-management. Decision trees guide prosthesis wearers through the self-management process, empowering them to rectify fit issues, or referring them to a clinician when necessary. This study examines the development and acceptability testing of patient-centered decision trees for lower limb prosthesis users. Decision trees underwent a four-stage process: literature review and expert consultation, designing, two-rounds of expert panel review and revisions, and target audience testing. Fifteen lower limb prosthesis users (average age 61 years) reviewed the decision trees and completed an acceptability questionnaire. Participants reported agreement of 80% or above in five of the eight questions related to acceptability of the decision trees. Disagreement was related to the level of experience of the respondent. Decision trees were found to be easy to use, illustrate correct solutions to common issues, and have terminology consistent with that of a new prosthesis user. Some users with greater than 1.5 years of experience would not use the decision trees based on their own self-management skills. Implications for Rehabilitation Discomfort of the residual limb-prosthetic socket interface is the most common reason for clinician visits. Prosthesis users can use decision trees to guide them through the process of obtaining a proper socket fit independently. Newer users may benefit from using the decision trees more than experienced users.
Korucu, M Kemal; Karademir, Aykan
2014-02-01
The procedure of a multi-criteria decision analysis supported by the geographic information systems was applied to the site selection process of a planning municipal solid waste management practice based on twelve different scenarios. The scenarios included two different decision tree modes and two different weighting models for three different area requirements. The suitability rankings of the suitable sites obtained from the application of the decision procedure for the scenarios were assessed by a factorial experimental design concerning the effect of some external criteria on the final decision of the site selection process. The external criteria used in the factorial experimental design were defined as "Risk perception and approval of stakeholders" and "Visibility". The effects of the presence of these criteria in the decision trees were evaluated in detail. For a quantitative expression of the differentiations observed in the suitability rankings, the ranking data were subjected to ANOVA test after a normalization process. Then the results of these tests were evaluated by Tukey test to measure the effects of external criteria on the final decision. The results of Tukey tests indicated that the involvement of the external criteria into the decision trees produced statistically meaningful differentiations in the suitability rankings. Since the external criteria could cause considerable external costs during the operation of the disposal facilities, the presence of these criteria in the decision tree in addition to the other criteria related to environmental and legislative requisites could prevent subsequent external costs in the first place.
Petrović, Jelena; Ibrić, Svetlana; Betz, Gabriele; Đurić, Zorica
2012-05-30
The main objective of the study was to develop artificial intelligence methods for optimization of drug release from matrix tablets regardless of the matrix type. Static and dynamic artificial neural networks of the same topology were developed to model dissolution profiles of different matrix tablets types (hydrophilic/lipid) using formulation composition, compression force used for tableting and tablets porosity and tensile strength as input data. Potential application of decision trees in discovering knowledge from experimental data was also investigated. Polyethylene oxide polymer and glyceryl palmitostearate were used as matrix forming materials for hydrophilic and lipid matrix tablets, respectively whereas selected model drugs were diclofenac sodium and caffeine. Matrix tablets were prepared by direct compression method and tested for in vitro dissolution profiles. Optimization of static and dynamic neural networks used for modeling of drug release was performed using Monte Carlo simulations or genetic algorithms optimizer. Decision trees were constructed following discretization of data. Calculated difference (f(1)) and similarity (f(2)) factors for predicted and experimentally obtained dissolution profiles of test matrix tablets formulations indicate that Elman dynamic neural networks as well as decision trees are capable of accurate predictions of both hydrophilic and lipid matrix tablets dissolution profiles. Elman neural networks were compared to most frequently used static network, Multi-layered perceptron, and superiority of Elman networks have been demonstrated. Developed methods allow simple, yet very precise way of drug release predictions for both hydrophilic and lipid matrix tablets having controlled drug release. Copyright © 2012 Elsevier B.V. All rights reserved.
Improving ensemble decision tree performance using Adaboost and Bagging
NASA Astrophysics Data System (ADS)
Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie
2015-12-01
Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, R.; Beaudet, P.
1982-01-01
An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.
Climate-diameter growth relationships of black spruce and jack pine trees in boreal Ontario, Canada.
Subedi, Nirmal; Sharma, Mahadev
2013-02-01
To predict the long-term effects of climate change - global warming and changes in precipitation - on the diameter (radial) growth of jack pine (Pinus banksiana Lamb.) and black spruce (Picea mariana [Mill.] B.S.P.) trees in boreal Ontario, we modified an existing diameter growth model to include climate variables. Diameter chronologies of 927 jack pine and 1173 black spruce trees, growing in the area from 47°N to 50°N and 80°W to 92°W, were used to develop diameter growth models in a nonlinear mixed-effects approach. Our results showed that the variables long-term average of mean growing season temperature, precipitation during wettest quarter, and total precipitation during growing season were significant (alpha = 0.05) in explaining variation in diameter growth of the sample trees. Model results indicated that higher temperatures during the growing season would increase the diameter growth of jack pine trees, but decrease that of black spruce trees. More precipitation during the wettest quarter would favor the diameter growth of both species. On the other hand, a wetter growing season, which may decrease radiation inputs, increase nutrient leaching, and reduce the decomposition rate, would reduce the diameter growth of both species. Moreover, our results indicated that future (2041-2070) diameter growth rate may differ from current (1971-2000) growth rates for both species, with conditions being more favorable for jack pine than black spruce trees. Expected future changes in the growth rate of boreal trees need to be considered in forest management decisions. We recommend that knowledge of climate-growth relationships, as represented by models, be combined with learning from adaptive management to reduce the risks and uncertainties associated with forest management decisions. © 2012 Blackwell Publishing Ltd.
Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa
2017-03-01
Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm 3 , there was 80% probability of a cyst. If volume was <247 mm 3 and root displacement was present, cyst probability was 60% (78% accuracy). The good accuracy and high specificity of the decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.
Creating ensembles of decision trees through sampling
Kamath, Chandrika; Cantu-Paz, Erick
2005-08-30
A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.
Bioinformatics in proteomics: application, terminology, and pitfalls.
Wiemer, Jan C; Prokudin, Alexander
2004-01-01
Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.
Bayesian averaging over Decision Tree models for trauma severity scoring.
Schetinin, V; Jakaite, L; Krzanowski, W
2018-01-01
Health care practitioners analyse possible risks of misleading decisions and need to estimate and quantify uncertainty in predictions. We have examined the "gold" standard of screening a patient's conditions for predicting survival probability, based on logistic regression modelling, which is used in trauma care for clinical purposes and quality audit. This methodology is based on theoretical assumptions about data and uncertainties. Models induced within such an approach have exposed a number of problems, providing unexplained fluctuation of predicted survival and low accuracy of estimating uncertainty intervals within which predictions are made. Bayesian method, which in theory is capable of providing accurate predictions and uncertainty estimates, has been adopted in our study using Decision Tree models. Our approach has been tested on a large set of patients registered in the US National Trauma Data Bank and has outperformed the standard method in terms of prediction accuracy, thereby providing practitioners with accurate estimates of the predictive posterior densities of interest that are required for making risk-aware decisions. Copyright © 2017 Elsevier B.V. All rights reserved.
Wheeler, David C.; Burstyn, Igor; Vermeulen, Roel; Yu, Kai; Shortreed, Susan M.; Pronk, Anjoeka; Stewart, Patricia A.; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Schwenn, Molly; Johnson, Alison; Silverman, Debra T.; Friesen, Melissa C.
2014-01-01
Objectives Evaluating occupational exposures in population-based case-control studies often requires exposure assessors to review each study participants' reported occupational information job-by-job to derive exposure estimates. Although such assessments likely have underlying decision rules, they usually lack transparency, are time-consuming and have uncertain reliability and validity. We aimed to identify the underlying rules to enable documentation, review, and future use of these expert-based exposure decisions. Methods Classification and regression trees (CART, predictions from a single tree) and random forests (predictions from many trees) were used to identify the underlying rules from the questionnaire responses and an expert's exposure assignments for occupational diesel exhaust exposure for several metrics: binary exposure probability and ordinal exposure probability, intensity, and frequency. Data were split into training (n=10,488 jobs), testing (n=2,247), and validation (n=2,248) data sets. Results The CART and random forest models' predictions agreed with 92–94% of the expert's binary probability assignments. For ordinal probability, intensity, and frequency metrics, the two models extracted decision rules more successfully for unexposed and highly exposed jobs (86–90% and 57–85%, respectively) than for low or medium exposed jobs (7–71%). Conclusions CART and random forest models extracted decision rules and accurately predicted an expert's exposure decisions for the majority of jobs and identified questionnaire response patterns that would require further expert review if the rules were applied to other jobs in the same or different study. This approach makes the exposure assessment process in case-control studies more transparent and creates a mechanism to efficiently replicate exposure decisions in future studies. PMID:23155187
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
A Decision Tree to Identify Children Affected by Prenatal Alcohol Exposure
Goh, Patrick K.; Doyle, Lauren R.; Glass, Leila; Jones, Kenneth L.; Riley, Edward P.; Coles, Claire D.; Hoyme, H. Eugene; Kable, Julie A.; May, Philip A.; Kalberg, Wendy O.; Elizabeth, R. Sowell; Wozniak, Jeffrey R.; Mattson, Sarah N.
2017-01-01
Objective To develop and validate a hierarchical decision tree model, combining neurobehavioral and physical measures, for identification of children affected by prenatal alcohol exposure even when facial dysmorphology is not present. Study design Data were collected as part of a multisite study across the United States. The model was developed after evaluating over 1000 neurobehavioral and dysmorphology variables collected from 434 children (8–16y) with prenatal alcohol exposure, with and without fetal alcohol syndrome (FAS), and non-exposed controls, with and without other clinically-relevant behavioral or cognitive concerns. The model was subsequently validated in an independent sample of 454 children in two age ranges (5–7y or 10–16y). In all analyses, the discriminatory ability of each model step was tested with logistic regression. Classification accuracies and positive and negative predictive values were calculated. Results The model consisted of variables from 4 measures (2 parent questionnaires, an IQ score, and a physical examination). Overall accuracy rates for both the development and validation samples met or exceeded our goal of 80% overall accuracy. Conclusions The decision tree model distinguished children affected by prenatal alcohol exposure from non-exposed controls, including those with other behavioral concerns or conditions. Improving identification of this population will streamline access to clinical services, including multidisciplinary evaluation and treatment. PMID:27476634
Sankari, E Siva; Manimegalai, D
2017-12-21
Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
Using decision tree models to depict primary care physicians CRC screening decision heuristics.
Wackerbarth, Sarah B; Tarasenko, Yelena N; Curtis, Laurel A; Joyce, Jennifer M; Haist, Steven A
2007-10-01
The purpose of this study was to identify decision heuristics utilized by primary care physicians in formulating colorectal cancer screening recommendations. Qualitative research using in-depth semi-structured interviews. We interviewed 66 primary care internists and family physicians evenly drawn from academic and community practices. A majority of physicians were male, and almost all were white, non-Hispanic. Three researchers independently reviewed each transcript to determine the physician's decision criteria and developed decision trees. Final trees were developed by consensus. The constant comparative methodology was used to define the categories. Physicians were found to use 1 of 4 heuristics ("age 50," "age 50, if family history, then earlier," "age 50, if family history, then screen at age 40," or "age 50, if family history, then adjust relative to reference case") for the timing recommendation and 5 heuristics ["fecal occult blood test" (FOBT), "colonoscopy," "if not colonoscopy, then...," "FOBT and another test," and "a choice between options"] for the type decision. No connection was found between timing and screening type heuristics. We found evidence of heuristic use. Further research is needed to determine the potential impact on quality of care.
Shi, Ting-Ting; Zhang, Xiao-Bo; Guo, Lan-Ping; Huang, Lu-Qi
2017-11-01
The herbs used as the material for traditional Chinese medicine are always planted in the mountainous area where the natural environment is suitable. As the mountain terrain is complex and the distribution of planting plots is scattered, the traditional survey method is difficult to obtain accurate planting area. It is of great significance to provide decision support for the conservation and utilization of traditional Chinese medicine resources by studying the method of extraction of Chinese herbal medicine planting area based on remote sensing and realizing the dynamic monitoring and reserve estimation of Chinese herbal medicines. In this paper, taking the Panax notoginseng plots in Wenshan prefecture of Yunnan province as an example, the China-made GF-1multispectral remote sensing images with a 16 m×16 m resolution were obtained. Then, the time series that can reflect the difference of spectrum of P. notoginseng shed and the background objects were selected to the maximum extent, and the decision tree model of extraction the of P. notoginseng plots was constructed according to the spectral characteristics of the surface features. The results showed that the remote sensing classification method based on the decision tree model could extract P. notoginseng plots in the study area effectively. The method can provide technical support for extraction of P. notoginseng plots at county level. Copyright© by the Chinese Pharmaceutical Association.
NASA Astrophysics Data System (ADS)
Książek, Judyta
2015-10-01
At present, there has been a great interest in the development of texture based image classification methods in many different areas. This study presents the results of research carried out to assess the usefulness of selected textural features for detection of asbestos-cement roofs in orthophotomap classification. Two different orthophotomaps of southern Poland (with ground resolution: 5 cm and 25 cm) were used. On both orthoimages representative samples for two classes: asbestos-cement roofing sheets and other roofing materials were selected. Estimation of texture analysis usefulness was conducted using machine learning methods based on decision trees (C5.0 algorithm). For this purpose, various sets of texture parameters were calculated in MaZda software. During the calculation of decision trees different numbers of texture parameters groups were considered. In order to obtain the best settings for decision trees models cross-validation was performed. Decision trees models with the lowest mean classification error were selected. The accuracy of the classification was held based on validation data sets, which were not used for the classification learning. For 5 cm ground resolution samples, the lowest mean classification error was 15.6%. The lowest mean classification error in the case of 25 cm ground resolution was 20.0%. The obtained results confirm potential usefulness of the texture parameter image processing for detection of asbestos-cement roofing sheets. In order to improve the accuracy another extended study should be considered in which additional textural features as well as spectral characteristics should be analyzed.
Metric Sex Determination of the Human Coxal Bone on a Virtual Sample using Decision Trees.
Savall, Frédéric; Faruch-Bilfeld, Marie; Dedouit, Fabrice; Sans, Nicolas; Rousseau, Hervé; Rougé, Daniel; Telmon, Norbert
2015-11-01
Decision trees provide an alternative to multivariate discriminant analysis, which is still the most commonly used in anthropometric studies. Our study analyzed the metric characterization of a recent virtual sample of 113 coxal bones using decision trees for sex determination. From 17 osteometric type I landmarks, a dataset was built with five classic distances traditionally reported in the literature and six new distances selected using the two-step ratio method. A ten-fold cross-validation was performed, and a decision tree was established on two subsamples (training and test sets). The decision tree established on the training set included three nodes and its application to the test set correctly classified 92% of individuals. This percentage was similar to the data of the literature. The usefulness of decision trees has been demonstrated in numerous fields. They have been already used in sex determination, body mass prediction, and ancestry estimation. This study shows another use of decision trees enabling simple and accurate sex determination. © 2015 American Academy of Forensic Sciences.
Spam comments prediction using stacking with ensemble learning
NASA Astrophysics Data System (ADS)
Mehmood, Arif; On, Byung-Won; Lee, Ingyu; Ashraf, Imran; Choi, Gyu Sang
2018-01-01
Illusive comments of product or services are misleading for people in decision making. The current methodologies to predict deceptive comments are concerned for feature designing with single training model. Indigenous features have ability to show some linguistic phenomena but are hard to reveal the latent semantic meaning of the comments. We propose a prediction model on general features of documents using stacking with ensemble learning. Term Frequency/Inverse Document Frequency (TF/IDF) features are inputs to stacking of Random Forest and Gradient Boosted Trees and the outputs of the base learners are encapsulated with decision tree to make final training of the model. The results exhibits that our approach gives the accuracy of 92.19% which outperform the state-of-the-art method.
ERIC Educational Resources Information Center
Raju, Dheeraj; Schumacker, Randall
2015-01-01
The study used earliest available student data from a flagship university in the southeast United States to build data mining models like logistic regression with different variable selection methods, decision trees, and neural networks to explore important student characteristics associated with retention leading to graduation. The decision tree…
Joseph Buongiorno
2001-01-01
Faustmann's formula gives the land value, or the forest value of land with trees, under deterministic assumptions regarding future stand growth and prices, over an infinite horizon. Markov decision process (MDP) models generalize Faustmann's approach by recognizing that future stand states and prices are known only as probabilistic distributions. The...
Designing efficient nitrous oxide sampling strategies in agroecosystems using simulation models
NASA Astrophysics Data System (ADS)
Saha, Debasish; Kemanian, Armen R.; Rau, Benjamin M.; Adler, Paul R.; Montes, Felipe
2017-04-01
Annual cumulative soil nitrous oxide (N2O) emissions calculated from discrete chamber-based flux measurements have unknown uncertainty. We used outputs from simulations obtained with an agroecosystem model to design sampling strategies that yield accurate cumulative N2O flux estimates with a known uncertainty level. Daily soil N2O fluxes were simulated for Ames, IA (corn-soybean rotation), College Station, TX (corn-vetch rotation), Fort Collins, CO (irrigated corn), and Pullman, WA (winter wheat), representing diverse agro-ecoregions of the United States. Fertilization source, rate, and timing were site-specific. These simulated fluxes surrogated daily measurements in the analysis. We ;sampled; the fluxes using a fixed interval (1-32 days) or a rule-based (decision tree-based) sampling method. Two types of decision trees were built: a high-input tree (HI) that included soil inorganic nitrogen (SIN) as a predictor variable, and a low-input tree (LI) that excluded SIN. Other predictor variables were identified with Random Forest. The decision trees were inverted to be used as rules for sampling a representative number of members from each terminal node. The uncertainty of the annual N2O flux estimation increased along with the fixed interval length. A 4- and 8-day fixed sampling interval was required at College Station and Ames, respectively, to yield ±20% accuracy in the flux estimate; a 12-day interval rendered the same accuracy at Fort Collins and Pullman. Both the HI and the LI rule-based methods provided the same accuracy as that of fixed interval method with up to a 60% reduction in sampling events, particularly at locations with greater temporal flux variability. For instance, at Ames, the HI rule-based and the fixed interval methods required 16 and 91 sampling events, respectively, to achieve the same absolute bias of 0.2 kg N ha-1 yr-1 in estimating cumulative N2O flux. These results suggest that using simulation models along with decision trees can reduce the cost and improve the accuracy of the estimations of cumulative N2O fluxes using the discrete chamber-based method.
Liu, Pei-Yang
2014-01-01
Metabolic syndrome (MetS) in young adults (age 20–39) is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID) decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES) 2009-2010 Cohort as a representative sample of the United States population (n = 745). Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP) classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS. PMID:24817904
Modelling the risk-benefit impact of H1N1 influenza vaccines.
Phillips, Lawrence D; Fasolo, Barbara; Zafiropoulous, Nikolaos; Eichler, Hans-Georg; Ehmann, Falk; Jekerle, Veronika; Kramarz, Piotr; Nicoll, Angus; Lönngren, Thomas
2013-08-01
Shortly after the H1N1 influenza virus reached pandemic status in June 2009, the benefit-risk project team at the European Medicines Agency recognized this presented a research opportunity for testing the usefulness of a decision analysis model in deliberations about approving vaccines soon based on limited data or waiting for more data. Undertaken purely as a research exercise, the model was not connected to the ongoing assessment by the European Medicines Agency, which approved the H1N1 vaccines on 25 September 2009. A decision tree model constructed initially on 1 September 2009, and slightly revised subsequently as new data were obtained, represented an end-of-September or end-of-October approval of vaccines. The model showed combinations of uncertain events, the severity of the disease and the vaccines' efficacy and safety, leading to estimates of numbers of deaths and serious disabilities. The group based their probability assessments on available information and background knowledge about vaccines and similar pandemics in the past. Weighting the numbers by their joint probabilities for all paths through the decision tree gave a weighted average for a September decision of 216 500 deaths and serious disabilities, and for a decision delayed to October of 291 547, showing that an early decision was preferable. The process of constructing the model facilitated communications among the group's members and led to new insights for several participants, while its robustness built confidence in the decision. These findings suggest that models might be helpful to regulators, as they form their preferences during the process of deliberation and debate, and more generally, for public health issues when decision makers face considerable uncertainty.
ERIC Educational Resources Information Center
Horner, Stacy B.; Fireman, Gary D.; Wang, Eugene W.
2010-01-01
Peer nominations and demographic information were collected from a diverse sample of 1493 elementary school participants to examine behavior (overt and relational aggression, impulsivity, and prosociality), context (peer status), and demographic characteristics (race and gender) as predictors of teacher and administrator decisions about…
Using histograms to introduce randomization in the generation of ensembles of decision trees
Kamath, Chandrika; Cantu-Paz, Erick; Littau, David
2005-02-22
A system for decision tree ensembles that includes a module to read the data, a module to create a histogram, a module to evaluate a potential split according to some criterion using the histogram, a module to select a split point randomly in an interval around the best split, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method includes the steps of reading the data; creating a histogram; evaluating a potential split according to some criterion using the histogram, selecting a split point randomly in an interval around the best split, splitting the data, and combining multiple decision trees in ensembles.
Ramezankhani, Roghieh; Sajjadi, Nooshin; Nezakati Esmaeilzadeh, Roya; Jozi, Seyed Ali; Shirzadi, Mohammad Reza
2018-05-08
Cutaneous Leishmaniasis (CL) is a neglected tropical disease that continues to be a health problem in Iran. Nearly 350 million people are thought to be at risk. We investigated the impact of the environmental factors on CL incidence during the period 2007- 2015 in a known endemic area for this disease in Isfahan Province, Iran. After collecting data with regard to the climatic, topographic, vegetation coverage and CL cases in the study area, a decision tree model was built using the classification and regression tree algorithm. CL data for the years 2007 until 2012 were used for model construction and the data for the years 2013 until 2015 were used for testing the model. The Root Mean Square error and the correlation factor were used to evaluate the predictive performance of the decision tree model. We found that wind speeds less than 14 m/s, altitudes between 1234 and 1810 m above the mean sea level, vegetation coverage according to the normalized difference vegetation index (NDVI) less than 0.12, rainfall less than 1.6 mm and air temperatures higher than 30°C would correspond to a seasonal incidence of 163.28 per 100,000 persons, while if wind speed is less than 14 m/s, altitude less than 1,810 m and NDVI higher than 0.12, then the mean seasonal incidence of the disease would be 2.27 per 100,000 persons. Environmental factors were found to be important predictive variables for CL incidence and should be considered in surveillance and prevention programmes for CL control.
To Spray or Not to Spray: A Decision Analysis of Coffee Berry Borer in Hawaii
2017-01-01
Integrated pest management strategies were adopted to combat the coffee berry borer (CBB) after its arrival in Hawaii in 2010. A decision tree framework is used to model the CBB integrated pest management recommendations, for potential use by growers and to assist in developing and evaluating management strategies and policies. The model focuses on pesticide spraying (spray/no spray) as the most significant pest management decision within each period over the entire crop season. The main result from the analysis suggests the most important parameter to maximize net benefit is to ensure a low initial infestation level. A second result looks at the impact of a subsidy for the cost of pesticides and shows a typical farmer receives a positive net benefit of $947.17. Sensitivity analysis of parameters checks the robustness of the model and further confirms the importance of a low initial infestation level vis-a-vis any level of subsidy. The use of a decision tree is shown to be an effective method for understanding integrated pest management strategies and solutions. PMID:29065464
Fault trees for decision making in systems analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lambert, Howard E.
1975-10-09
The application of fault tree analysis (FTA) to system safety and reliability is presented within the framework of system safety analysis. The concepts and techniques involved in manual and automated fault tree construction are described and their differences noted. The theory of mathematical reliability pertinent to FTA is presented with emphasis on engineering applications. An outline of the quantitative reliability techniques of the Reactor Safety Study is given. Concepts of probabilistic importance are presented within the fault tree framework and applied to the areas of system design, diagnosis and simulation. The computer code IMPORTANCE ranks basic events and cut setsmore » according to a sensitivity analysis. A useful feature of the IMPORTANCE code is that it can accept relative failure data as input. The output of the IMPORTANCE code can assist an analyst in finding weaknesses in system design and operation, suggest the most optimal course of system upgrade, and determine the optimal location of sensors within a system. A general simulation model of system failure in terms of fault tree logic is described. The model is intended for efficient diagnosis of the causes of system failure in the event of a system breakdown. It can also be used to assist an operator in making decisions under a time constraint regarding the future course of operations. The model is well suited for computer implementation. New results incorporated in the simulation model include an algorithm to generate repair checklists on the basis of fault tree logic and a one-step-ahead optimization procedure that minimizes the expected time to diagnose system failure.« less
Dynamic and Contextual Information in HMM Modeling for Handwritten Word Recognition.
Bianne-Bernard, Anne-Laure; Menasri, Farès; Al-Hajj Mohamad, Rami; Mokbel, Chafic; Kermorvant, Christopher; Likforman-Sulem, Laurence
2011-10-01
This study aims at building an efficient word recognition system resulting from the combination of three handwriting recognizers. The main component of this combined system is an HMM-based recognizer which considers dynamic and contextual information for a better modeling of writing units. For modeling the contextual units, a state-tying process based on decision tree clustering is introduced. Decision trees are built according to a set of expert-based questions on how characters are written. Questions are divided into global questions, yielding larger clusters, and precise questions, yielding smaller ones. Such clustering enables us to reduce the total number of models and Gaussians densities by 10. We then apply this modeling to the recognition of handwritten words. Experiments are conducted on three publicly available databases based on Latin or Arabic languages: Rimes, IAM, and OpenHart. The results obtained show that contextual information embedded with dynamic modeling significantly improves recognition.
Objective consensus from decision trees.
Putora, Paul Martin; Panje, Cedric M; Papachristofilou, Alexandros; Dal Pra, Alan; Hundsberger, Thomas; Plasswilm, Ludwig
2014-12-05
Consensus-based approaches provide an alternative to evidence-based decision making, especially in situations where high-level evidence is limited. Our aim was to demonstrate a novel source of information, objective consensus based on recommendations in decision tree format from multiple sources. Based on nine sample recommendations in decision tree format a representative analysis was performed. The most common (mode) recommendations for each eventuality (each permutation of parameters) were determined. The same procedure was applied to real clinical recommendations for primary radiotherapy for prostate cancer. Data was collected from 16 radiation oncology centres, converted into decision tree format and analyzed in order to determine the objective consensus. Based on information from multiple sources in decision tree format, treatment recommendations can be assessed for every parameter combination. An objective consensus can be determined by means of mode recommendations without compromise or confrontation among the parties. In the clinical example involving prostate cancer therapy, three parameters were used with two cut-off values each (Gleason score, PSA, T-stage) resulting in a total of 27 possible combinations per decision tree. Despite significant variations among the recommendations, a mode recommendation could be found for specific combinations of parameters. Recommendations represented as decision trees can serve as a basis for objective consensus among multiple parties.
The decision tree approach to classification
NASA Technical Reports Server (NTRS)
Wu, C.; Landgrebe, D. A.; Swain, P. H.
1975-01-01
A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.
Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin
2015-08-01
Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.
[Decision modeling for economic evaluation of health technologies].
de Soárez, Patrícia Coelho; Soares, Marta Oliveira; Novaes, Hillegonda Maria Dutilh
2014-10-01
Most economic evaluations that participate in decision-making processes for incorporation and financing of technologies of health systems use decision models to assess the costs and benefits of the compared strategies. Despite the large number of economic evaluations conducted in Brazil, there is a pressing need to conduct an in-depth methodological study of the types of decision models and their applicability in our setting. The objective of this literature review is to contribute to the knowledge and use of decision models in the national context of economic evaluations of health technologies. This article presents general definitions about models and concerns with their use; it describes the main models: decision trees, Markov chains, micro-simulation, simulation of discrete and dynamic events; it discusses the elements involved in the choice of model; and exemplifies the models addressed in national economic evaluation studies of diagnostic and therapeutic preventive technologies and health programs.
Pollution mitigation and carbon sequestration by an urban forest.
Brack, C L
2002-01-01
At the beginning of the 1900s, the Canberra plain was largely treeless. Graziers had carried out extensive clearing of the original trees since the 1820s leaving only scattered remnants and some plantings near homesteads. With the selection of Canberra as the site for the new capital of Australia, extensive tree plantings began in 1911. These trees have delivered a number of benefits, including aesthetic values and the amelioration of climatic extremes. Recently, however, it was considered that the benefits might extend to pollution mitigation and the sequestration of carbon. This paper outlines a case study of the value of the Canberra urban forest with particular reference to pollution mitigation. This study uses a tree inventory, modelling and decision support system developed to collect and use data about trees for tree asset management. The decision support system (DISMUT) was developed to assist in the management of about 400,000 trees planted in Canberra. The size of trees during the 5-year Kyoto Commitment Period was estimated using DISMUT and multiplied by estimates of value per square meter of canopy derived from available literature. The planted trees are estimated to have a combined energy reduction, pollution mitigation and carbon sequestration value of US$20-67 million during the period 2008-2012.
Decision tree and ensemble learning algorithms with their applications in bioinformatics.
Che, Dongsheng; Liu, Qi; Rasheed, Khaled; Tao, Xiuping
2011-01-01
Machine learning approaches have wide applications in bioinformatics, and decision tree is one of the successful approaches applied in this field. In this chapter, we briefly review decision tree and related ensemble algorithms and show the successful applications of such approaches on solving biological problems. We hope that by learning the algorithms of decision trees and ensemble classifiers, biologists can get the basic ideas of how machine learning algorithms work. On the other hand, by being exposed to the applications of decision trees and ensemble algorithms in bioinformatics, computer scientists can get better ideas of which bioinformatics topics they may work on in their future research directions. We aim to provide a platform to bridge the gap between biologists and computer scientists.
Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I
2017-11-01
Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.
Delphin, S; Escobedo, F J; Abd-Elrahman, A; Cropper, W
2013-11-15
Information on the effect of direct drivers such as hurricanes on ecosystem services is relevant to landowners and policy makers due to predicted effects from climate change. We identified forest damage risk zones due to hurricanes and estimated the potential loss of 2 key ecosystem services: aboveground carbon storage and timber volume. Using land cover, plot-level forest inventory data, the Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) model, and a decision tree-based framework; we determined potential damage to subtropical forests from hurricanes in the Lower Suwannee River (LS) and Pensacola Bay (PB) watersheds in Florida, US. We used biophysical factors identified in previous studies as being influential in forest damage in our decision tree and hurricane wind risk maps. Results show that 31% and 0.5% of the total aboveground carbon storage in the LS and PB, respectively was located in high forest damage risk (HR) zones. Overall 15% and 0.7% of the total timber net volume in the LS and PB, respectively, was in HR zones. This model can also be used for identifying timber salvage areas, developing ecosystem service provision and management scenarios, and assessing the effect of other drivers on ecosystem services and goods. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, C.; Pan, X.; Zhang, S. Q.; Li, H. P.; Atkinson, P. M.
2017-09-01
Recent advances in remote sensing have witnessed a great amount of very high resolution (VHR) images acquired at sub-metre spatial resolution. These VHR remotely sensed data has post enormous challenges in processing, analysing and classifying them effectively due to the high spatial complexity and heterogeneity. Although many computer-aid classification methods that based on machine learning approaches have been developed over the past decades, most of them are developed toward pixel level spectral differentiation, e.g. Multi-Layer Perceptron (MLP), which are unable to exploit abundant spatial details within VHR images. This paper introduced a rough set model as a general framework to objectively characterize the uncertainty in CNN classification results, and further partition them into correctness and incorrectness on the map. The correct classification regions of CNN were trusted and maintained, whereas the misclassification areas were reclassified using a decision tree with both CNN and MLP. The effectiveness of the proposed rough set decision tree based MLP-CNN was tested using an urban area at Bournemouth, United Kingdom. The MLP-CNN, well capturing the complementarity between CNN and MLP through the rough set based decision tree, achieved the best classification performance both visually and numerically. Therefore, this research paves the way to achieve fully automatic and effective VHR image classification.
MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination
NASA Technical Reports Server (NTRS)
Riggs, George A.; Hall, Dorothy K.
2010-01-01
Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.
A Decision Tree for Psychology Majors: Supplying Questions as Well as Answers.
ERIC Educational Resources Information Center
Poe, Retta E.
1988-01-01
Outlines the development of a psychology careers decision tree to help faculty advise students plan their program. States that students using the decision tree may benefit by learning more about their career options and by acquiring better question-asking skills. (GEA)
Finding structure in data using multivariate tree boosting
Miller, Patrick J.; Lubke, Gitta H.; McArtor, Daniel B.; Bergeman, C. S.
2016-01-01
Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary. We provide the R package ‘mvtboost’ to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package ‘gbm’ (Ridgeway et al., 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. PMID:27918183
The value of decision tree analysis in planning anaesthetic care in obstetrics.
Bamber, J H; Evans, S A
2016-08-01
The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Extraction of decision rules via imprecise probabilities
NASA Astrophysics Data System (ADS)
Abellán, Joaquín; López, Griselda; Garach, Laura; Castellano, Javier G.
2017-05-01
Data analysis techniques can be applied to discover important relations among features. This is the main objective of the Information Root Node Variation (IRNV) technique, a new method to extract knowledge from data via decision trees. The decision trees used by the original method were built using classic split criteria. The performance of new split criteria based on imprecise probabilities and uncertainty measures, called credal split criteria, differs significantly from the performance obtained using the classic criteria. This paper extends the IRNV method using two credal split criteria: one based on a mathematical parametric model, and other one based on a non-parametric model. The performance of the method is analyzed using a case study of traffic accident data to identify patterns related to the severity of an accident. We found that a larger number of rules is generated, significantly supplementing the information obtained using the classic split criteria.
Stacked Denoising Autoencoders Applied to Star/Galaxy Classification
NASA Astrophysics Data System (ADS)
Qin, Hao-ran; Lin, Ji-ming; Wang, Jun-yi
2017-04-01
In recent years, the deep learning algorithm, with the characteristics of strong adaptability, high accuracy, and structural complexity, has become more and more popular, but it has not yet been used in astronomy. In order to solve the problem that the star/galaxy classification accuracy is high for the bright source set, but low for the faint source set of the Sloan Digital Sky Survey (SDSS) data, we introduced the new deep learning algorithm, namely the SDA (stacked denoising autoencoder) neural network and the dropout fine-tuning technique, which can greatly improve the robustness and antinoise performance. We randomly selected respectively the bright source sets and faint source sets from the SDSS DR12 and DR7 data with spectroscopic measurements, and made preprocessing on them. Then, we randomly selected respectively the training sets and testing sets without replacement from the bright source sets and faint source sets. At last, using these training sets we made the training to obtain the SDA models of the bright sources and faint sources in the SDSS DR7 and DR12, respectively. We compared the test result of the SDA model on the DR12 testing set with the test results of the Library for Support Vector Machines (LibSVM), J48 decision tree, Logistic Model Tree (LMT), Support Vector Machine (SVM), Logistic Regression, and Decision Stump algorithm, and compared the test result of the SDA model on the DR7 testing set with the test results of six kinds of decision trees. The experiments show that the SDA has a better classification accuracy than other machine learning algorithms for the faint source sets of DR7 and DR12. Especially, when the completeness function is used as the evaluation index, compared with the decision tree algorithms, the correctness rate of SDA has improved about 15% for the faint source set of SDSS-DR7.
Evolutionary Algorithm Based Automated Reverse Engineering and Defect Discovery
2007-09-21
a previous application of a GP as a data mining function to evolve fuzzy decision trees symbolically [3-5], the terminal set consisted of fuzzy...of input and output information is required. In the case of fuzzy decision trees, the database represented a collection of scenarios about which the...fuzzy decision tree to be evolved would make decisions . The database also had entries created by experts representing decisions about the scenarios
Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan; Sakr, Sherif
2017-01-01
Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
NASA Astrophysics Data System (ADS)
Galelli, S.; Castelletti, A.
2013-02-01
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling
NASA Astrophysics Data System (ADS)
Galelli, S.; Castelletti, A.
2013-07-01
Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.
Creating ensembles of oblique decision trees with evolutionary algorithms and sampling
Cantu-Paz, Erick [Oakland, CA; Kamath, Chandrika [Tracy, CA
2006-06-13
A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.
The decision tree classifier - Design and potential. [for Landsat-1 data
NASA Technical Reports Server (NTRS)
Hauska, H.; Swain, P. H.
1975-01-01
A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.
Automated rule-base creation via CLIPS-Induce
NASA Technical Reports Server (NTRS)
Murphy, Patrick M.
1994-01-01
Many CLIPS rule-bases contain one or more rule groups that perform classification. In this paper we describe CLIPS-Induce, an automated system for the creation of a CLIPS classification rule-base from a set of test cases. CLIPS-Induce consists of two components, a decision tree induction component and a CLIPS production extraction component. ID3, a popular decision tree induction algorithm, is used to induce a decision tree from the test cases. CLIPS production extraction is accomplished through a top-down traversal of the decision tree. Nodes of the tree are used to construct query rules, and branches of the tree are used to construct classification rules. The learned CLIPS productions may easily be incorporated into a large CLIPS system that perform tasks such as accessing a database or displaying information.
NASA Technical Reports Server (NTRS)
Garner, Gregory G.; Thompson, Anne M.
2013-01-01
An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone. The user is provided the freedom to tailor the forecast to the decision at hand by using decision-specific probability thresholds that define a forecast for an ozone exceedance. Taking advantage of the ESP, the user not only receives an increase in value over the NAQFC, but also receives value for An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone. The user is provided the freedom to tailor the forecast to the decision at hand by using decision-specific probability thresholds that define a forecast for an ozone exceedance. Taking advantage of the ESP, the user not only receives an increase in value over the NAQFC, but also receives value for
Learning from examples - Generation and evaluation of decision trees for software resource analysis
NASA Technical Reports Server (NTRS)
Selby, Richard W.; Porter, Adam A.
1988-01-01
A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.
Hill, Ryan M; Oosterhoff, Benjamin; Kaplow, Julie B
2017-07-01
Although a large number of risk markers for suicide ideation have been identified, little guidance has been provided to prospectively identify adolescents at risk for suicide ideation within community settings. The current study addressed this gap in the literature by utilizing classification tree analysis (CTA) to provide a decision-making model for screening adolescents at risk for suicide ideation. Participants were N = 4,799 youth (Mage = 16.15 years, SD = 1.63) who completed both Waves 1 and 2 of the National Longitudinal Study of Adolescent to Adult Health. CTA was used to generate a series of decision rules for identifying adolescents at risk for reporting suicide ideation at Wave 2. Findings revealed 3 distinct solutions with varying sensitivity and specificity for identifying adolescents who reported suicide ideation. Sensitivity of the classification trees ranged from 44.6% to 77.6%. The tree with greatest specificity and lowest sensitivity was based on a history of suicide ideation. The tree with moderate sensitivity and high specificity was based on depressive symptoms, suicide attempts or suicide among family and friends, and social support. The most sensitive but least specific tree utilized these factors and gender, ethnicity, hours of sleep, school-related factors, and future orientation. These classification trees offer community organizations options for instituting large-scale screenings for suicide ideation risk depending on the available resources and modality of services to be provided. This study provides a theoretically and empirically driven model for prospectively identifying adolescents at risk for suicide ideation and has implications for preventive interventions among at-risk youth. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Effect of altering local protein fluctuations using artificial intelligence
NASA Astrophysics Data System (ADS)
Nishiyama, Katsuhiko
2017-03-01
The fluctuations in Arg111, a significantly fluctuating residue in cathepsin K, were locally regulated by modifying Arg111 to Gly111. The binding properties of 15 dipeptides in the modified protein were analyzed by molecular simulations, and modeled as decision trees using artificial intelligence. The decision tree of the modified protein significantly differed from that of unmodified cathepsin K, and the Arg-to-Gly modification exerted a remarkable effect on the peptide binding properties. By locally regulating the fluctuations of a protein, we may greatly alter the original functions of the protein, enabling novel applications in several fields.
Karakülah, G.; Dicle, O.; Sökmen, S.; Çelikoğlu, C.C.
2015-01-01
Summary Background The selection of appropriate rectal cancer treatment is a complex multi-criteria decision making process, in which clinical decision support systems might be used to assist and enrich physicians’ decision making. Objective The objective of the study was to develop a web-based clinical decision support tool for physicians in the selection of potentially beneficial treatment options for patients with rectal cancer. Methods The updated decision model contained 8 and 10 criteria in the first and second steps respectively. The decision support model, developed in our previous study by combining the Analytic Hierarchy Process (AHP) method which determines the priority of criteria and decision tree that formed using these priorities, was updated and applied to 388 patients data collected retrospectively. Later, a web-based decision support tool named corRECTreatment was developed. The compatibility of the treatment recommendations by the expert opinion and the decision support tool was examined for its consistency. Two surgeons were requested to recommend a treatment and an overall survival value for the treatment among 20 different cases that we selected and turned into a scenario among the most common and rare treatment options in the patient data set. Results In the AHP analyses of the criteria, it was found that the matrices, generated for both decision steps, were consistent (consistency ratio<0.1). Depending on the decisions of experts, the consistency value for the most frequent cases was found to be 80% for the first decision step and 100% for the second decision step. Similarly, for rare cases consistency was 50% for the first decision step and 80% for the second decision step. Conclusions The decision model and corRECTreatment, developed by applying these on real patient data, are expected to provide potential users with decision support in rectal cancer treatment processes and facilitate them in making projections about treatment options. PMID:25848413
Suner, A; Karakülah, G; Dicle, O; Sökmen, S; Çelikoğlu, C C
2015-01-01
The selection of appropriate rectal cancer treatment is a complex multi-criteria decision making process, in which clinical decision support systems might be used to assist and enrich physicians' decision making. The objective of the study was to develop a web-based clinical decision support tool for physicians in the selection of potentially beneficial treatment options for patients with rectal cancer. The updated decision model contained 8 and 10 criteria in the first and second steps respectively. The decision support model, developed in our previous study by combining the Analytic Hierarchy Process (AHP) method which determines the priority of criteria and decision tree that formed using these priorities, was updated and applied to 388 patients data collected retrospectively. Later, a web-based decision support tool named corRECTreatment was developed. The compatibility of the treatment recommendations by the expert opinion and the decision support tool was examined for its consistency. Two surgeons were requested to recommend a treatment and an overall survival value for the treatment among 20 different cases that we selected and turned into a scenario among the most common and rare treatment options in the patient data set. In the AHP analyses of the criteria, it was found that the matrices, generated for both decision steps, were consistent (consistency ratio<0.1). Depending on the decisions of experts, the consistency value for the most frequent cases was found to be 80% for the first decision step and 100% for the second decision step. Similarly, for rare cases consistency was 50% for the first decision step and 80% for the second decision step. The decision model and corRECTreatment, developed by applying these on real patient data, are expected to provide potential users with decision support in rectal cancer treatment processes and facilitate them in making projections about treatment options.
Delgado-Gomez, D; Baca-Garcia, E; Aguado, D; Courtet, P; Lopez-Castroman, J
2016-12-01
Several Computerized Adaptive Tests (CATs) have been proposed to facilitate assessments in mental health. These tests are built in a standard way, disregarding useful and usually available information not included in the assessment scales that could increase the precision and utility of CATs, such as the history of suicide attempts. Using the items of a previously developed scale for suicidal risk, we compared the performance of a standard CAT and a decision tree in a support decision system to identify suicidal behavior. We included the history of past suicide attempts as a class for the separation of patients in the decision tree. The decision tree needed an average of four items to achieve a similar accuracy than a standard CAT with nine items. The accuracy of the decision tree, obtained after 25 cross-validations, was 81.4%. A shortened test adapted for the separation of suicidal and non-suicidal patients was developed. CATs can be very useful tools for the assessment of suicidal risk. However, standard CATs do not use all the information that is available. A decision tree can improve the precision of the assessment since they are constructed using a priori information. Copyright © 2016 Elsevier B.V. All rights reserved.
Doubravsky, Karel; Dohnal, Mirko
2015-01-01
Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details. PMID:26158662
Doubravsky, Karel; Dohnal, Mirko
2015-01-01
Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details.
Goodman, Katherine E; Lessler, Justin; Cosgrove, Sara E; Harris, Anthony D; Lautenbach, Ebbing; Han, Jennifer H; Milstone, Aaron M; Massey, Colin J; Tamma, Pranita D
2016-10-01
Timely identification of extended-spectrum β-lactamase (ESBL) bacteremia can improve clinical outcomes while minimizing unnecessary use of broad-spectrum antibiotics, including carbapenems. However, most clinical microbiology laboratories currently require at least 24 additional hours from the time of microbial genus and species identification to confirm ESBL production. Our objective was to develop a user-friendly decision tree to predict which organisms are ESBL producing, to guide appropriate antibiotic therapy. We included patients ≥18 years of age with bacteremia due to Escherichia coli or Klebsiella species from October 2008 to March 2015 at Johns Hopkins Hospital. Isolates with ceftriaxone minimum inhibitory concentrations ≥2 µg/mL underwent ESBL confirmatory testing. Recursive partitioning was used to generate a decision tree to determine the likelihood that a bacteremic patient was infected with an ESBL producer. Discrimination of the original and cross-validated models was evaluated using receiver operating characteristic curves and by calculation of C-statistics. A total of 1288 patients with bacteremia met eligibility criteria. For 194 patients (15%), bacteremia was due to a confirmed ESBL producer. The final classification tree for predicting ESBL-positive bacteremia included 5 predictors: history of ESBL colonization/infection, chronic indwelling vascular hardware, age ≥43 years, recent hospitalization in an ESBL high-burden region, and ≥6 days of antibiotic exposure in the prior 6 months. The decision tree's positive and negative predictive values were 90.8% and 91.9%, respectively. Our findings suggest that a clinical decision tree can be used to estimate a bacteremic patient's likelihood of infection with ESBL-producing bacteria. Recursive partitioning offers a practical, user-friendly approach for addressing important diagnostic questions. © The Author 2016. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail journals.permissions@oup.com.
Probabilistic flood damage modelling at the meso-scale
NASA Astrophysics Data System (ADS)
Kreibich, Heidi; Botto, Anna; Schröter, Kai; Merz, Bruno
2014-05-01
Decisions on flood risk management and adaptation are usually based on risk analyses. Such analyses are associated with significant uncertainty, even more if changes in risk due to global change are expected. Although uncertainty analysis and probabilistic approaches have received increased attention during the last years, they are still not standard practice for flood risk assessments. Most damage models have in common that complex damaging processes are described by simple, deterministic approaches like stage-damage functions. Novel probabilistic, multi-variate flood damage models have been developed and validated on the micro-scale using a data-mining approach, namely bagging decision trees (Merz et al. 2013). In this presentation we show how the model BT-FLEMO (Bagging decision Tree based Flood Loss Estimation MOdel) can be applied on the meso-scale, namely on the basis of ATKIS land-use units. The model is applied in 19 municipalities which were affected during the 2002 flood by the River Mulde in Saxony, Germany. The application of BT-FLEMO provides a probability distribution of estimated damage to residential buildings per municipality. Validation is undertaken on the one hand via a comparison with eight other damage models including stage-damage functions as well as multi-variate models. On the other hand the results are compared with official damage data provided by the Saxon Relief Bank (SAB). The results show, that uncertainties of damage estimation remain high. Thus, the significant advantage of this probabilistic flood loss estimation model BT-FLEMO is that it inherently provides quantitative information about the uncertainty of the prediction. Reference: Merz, B.; Kreibich, H.; Lall, U. (2013): Multi-variate flood damage assessment: a tree-based data-mining approach. NHESS, 13(1), 53-64.
NASA Technical Reports Server (NTRS)
Shiffman, Smadar
2004-01-01
Automated cloud detection and tracking is an important step in assessing global climate change via remote sensing. Cloud masks, which indicate whether individual pixels depict clouds, are included in many of the data products that are based on data acquired on- board earth satellites. Many cloud-mask algorithms have the form of decision trees, which employ sequential tests that scientists designed based on empirical astrophysics studies and astrophysics simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In this study we explored the potential benefits of automatically-learned decision trees for detecting clouds from images acquired using the Advanced Very High Resolution Radiometer (AVHRR) instrument on board the NOAA-14 weather satellite of the National Oceanic and Atmospheric Administration. We constructed three decision trees for a sample of 8km-daily AVHRR data from 2000 using a decision-tree learning procedure provided within MATLAB(R), and compared the accuracy of the decision trees to the accuracy of the cloud mask. We used ground observations collected by the National Aeronautics and Space Administration Clouds and the Earth s Radiant Energy Systems S COOL project as the gold standard. For the sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks included in the AVHRR data product.
Implementation of Data Mining to Analyze Drug Cases Using C4.5 Decision Tree
NASA Astrophysics Data System (ADS)
Wahyuni, Sri
2018-03-01
Data mining was the process of finding useful information from a large set of databases. One of the existing techniques in data mining was classification. The method used was decision tree method and algorithm used was C4.5 algorithm. The decision tree method was a method that transformed a very large fact into a decision tree which was presenting the rules. Decision tree method was useful for exploring data, as well as finding a hidden relationship between a number of potential input variables with a target variable. The decision tree of the C4.5 algorithm was constructed with several stages including the selection of attributes as roots, created a branch for each value and divided the case into the branch. These stages would be repeated for each branch until all the cases on the branch had the same class. From the solution of the decision tree there would be some rules of a case. In this case the researcher classified the data of prisoners at Labuhan Deli prison to know the factors of detainees committing criminal acts of drugs. By applying this C4.5 algorithm, then the knowledge was obtained as information to minimize the criminal acts of drugs. From the findings of the research, it was found that the most influential factor of the detainee committed the criminal act of drugs was from the address variable.
Sugimoto, Masahiro; Takada, Masahiro; Toi, Masakazu
2014-12-09
Nomograms are a standard computational tool to predict the likelihood of an outcome using multiple available patient features. We have developed a more powerful data mining methodology, to predict axillary lymph node (AxLN) metastasis and response to neoadjuvant chemotherapy (NAC) in primary breast cancer patients. We developed websites to use these tools. The tools calculate the probability of AxLN metastasis (AxLN model) and pathological complete response to NAC (NAC model). As a calculation algorithm, we employed a decision tree-based prediction model known as the alternative decision tree (ADTree), which is an analog development of if-then type decision trees. An ensemble technique was used to combine multiple ADTree predictions, resulting in higher generalization abilities and robustness against missing values. The AxLN model was developed with training datasets (n=148) and test datasets (n=143), and validated using an independent cohort (n=174), yielding an area under the receiver operating characteristic curve (AUC) of 0.768. The NAC model was developed and validated with n=150 and n=173 datasets from a randomized controlled trial, yielding an AUC of 0.787. AxLN and NAC models require users to input up to 17 and 16 variables, respectively. These include pathological features, including human epidermal growth factor receptor 2 (HER2) status and imaging findings. Each input variable has an option of "unknown," to facilitate prediction for cases with missing values. The websites developed facilitate the use of these tools, and serve as a database for accumulating new datasets.
Mereta, Seid Tiku; Yewhalaw, Delenasaw; Boets, Pieter; Ahmed, Abdulhakim; Duchateau, Luc; Speybroeck, Niko; Vanwambeke, Sophie O; Legesse, Worku; De Meester, Luc; Goethals, Peter L M
2013-11-04
A fundamental understanding of the spatial distribution and ecology of mosquito larvae is essential for effective vector control intervention strategies. In this study, data-driven decision tree models, generalized linear models and ordination analysis were used to identify the most important biotic and abiotic factors that affect the occurrence and abundance of mosquito larvae in Southwest Ethiopia. In total, 220 samples were taken at 180 sampling locations during the years 2010 and 2012. Sampling sites were characterized based on physical, chemical and biological attributes. The predictive performance of decision tree models was evaluated based on correctly classified instances (CCI), Cohen's kappa statistic (κ) and the determination coefficient (R2). A conditional analysis was performed on the regression tree models to test the relation between key environmental and biological parameters and the abundance of mosquito larvae. The decision tree model developed for anopheline larvae showed a good model performance (CCI = 84 ± 2%, and κ = 0.66 ± 0.04), indicating that the genus has clear habitat requirements. Anopheline mosquito larvae showed a widespread distribution and especially occurred in small human-made aquatic habitats. Water temperature, canopy cover, emergent vegetation cover, and presence of predators and competitors were found to be the main variables determining the abundance and distribution of anopheline larvae. In contrast, anopheline mosquito larvae were found to be less prominently present in permanent larval habitats. This could be attributed to the high abundance and diversity of natural predators and competitors suppressing the mosquito population densities. The findings of this study suggest that targeting smaller human-made aquatic habitats could result in effective larval control of anopheline mosquitoes in the study area. Controlling the occurrence of mosquito larvae via drainage of permanent wetlands may not be a good management strategy as it negatively affects the occurrence and abundance of mosquito predators and competitors and promotes an increase in anopheline population densities.
2013-01-01
Background A fundamental understanding of the spatial distribution and ecology of mosquito larvae is essential for effective vector control intervention strategies. In this study, data-driven decision tree models, generalized linear models and ordination analysis were used to identify the most important biotic and abiotic factors that affect the occurrence and abundance of mosquito larvae in Southwest Ethiopia. Methods In total, 220 samples were taken at 180 sampling locations during the years 2010 and 2012. Sampling sites were characterized based on physical, chemical and biological attributes. The predictive performance of decision tree models was evaluated based on correctly classified instances (CCI), Cohen’s kappa statistic (κ) and the determination coefficient (R2). A conditional analysis was performed on the regression tree models to test the relation between key environmental and biological parameters and the abundance of mosquito larvae. Results The decision tree model developed for anopheline larvae showed a good model performance (CCI = 84 ± 2%, and κ = 0.66 ± 0.04), indicating that the genus has clear habitat requirements. Anopheline mosquito larvae showed a widespread distribution and especially occurred in small human-made aquatic habitats. Water temperature, canopy cover, emergent vegetation cover, and presence of predators and competitors were found to be the main variables determining the abundance and distribution of anopheline larvae. In contrast, anopheline mosquito larvae were found to be less prominently present in permanent larval habitats. This could be attributed to the high abundance and diversity of natural predators and competitors suppressing the mosquito population densities. Conclusions The findings of this study suggest that targeting smaller human-made aquatic habitats could result in effective larval control of anopheline mosquitoes in the study area. Controlling the occurrence of mosquito larvae via drainage of permanent wetlands may not be a good management strategy as it negatively affects the occurrence and abundance of mosquito predators and competitors and promotes an increase in anopheline population densities. PMID:24499518
NASA Astrophysics Data System (ADS)
Ragettli, S.; Zhou, J.; Wang, H.; Liu, C.; Guo, L.
2017-12-01
Flash floods in small mountain catchments are one of the most frequent causes of loss of life and property from natural hazards in China. Hydrological models can be a useful tool for the anticipation of these events and the issuing of timely warnings. One of the main challenges of setting up such a system is finding appropriate model parameter values for ungauged catchments. Previous studies have shown that the transfer of parameter sets from hydrologically similar gauged catchments is one of the best performing regionalization methods. However, a remaining key issue is the identification of suitable descriptors of similarity. In this study, we use decision tree learning to explore parameter set transferability in the full space of catchment descriptors. For this purpose, a semi-distributed rainfall-runoff model is set up for 35 catchments in ten Chinese provinces. Hourly runoff data from in total 858 storm events are used to calibrate the model and to evaluate the performance of parameter set transfers between catchments. We then present a novel technique that uses the splitting rules of classification and regression trees (CART) for finding suitable donor catchments for ungauged target catchments. The ability of the model to detect flood events in assumed ungauged catchments is evaluated in series of leave-one-out tests. We show that CART analysis increases the probability of detection of 10-year flood events in comparison to a conventional measure of physiographic-climatic similarity by up to 20%. Decision tree learning can outperform other regionalization approaches because it generates rules that optimally consider spatial proximity and physical similarity. Spatial proximity can be used as a selection criteria but is skipped in the case where no similar gauged catchments are in the vicinity. We conclude that the CART regionalization concept is particularly suitable for implementation in sparsely gauged and topographically complex environments where a proximity-based regionalization concept is not applicable.
Mohammed, Mohammed A.; Rudge, Gavin; Watson, Duncan; Wood, Gordon; Smith, Gary B.; Prytherch, David R.; Girling, Alan; Stevens, Andrew
2013-01-01
Background We explored the use of routine blood tests and national early warning scores (NEWS) reported within ±24 hours of admission to predict in-hospital mortality in emergency admissions, using empirical decision Tree models because they are intuitive and may ultimately be used to support clinical decision making. Methodology A retrospective analysis of adult emergency admissions to a large acute hospital during April 2009 to March 2010 in the West Midlands, England, with a full set of index blood tests results (albumin, creatinine, haemoglobin, potassium, sodium, urea, white cell count and an index NEWS undertaken within ±24 hours of admission). We developed a Tree model by randomly splitting the admissions into a training (50%) and validation dataset (50%) and assessed its accuracy using the concordance (c-) statistic. Emergency admissions (about 30%) did not have a full set of index blood tests and/or NEWS and so were not included in our analysis. Results There were 23248 emergency admissions with a full set of blood tests and NEWS with an in-hospital mortality of 5.69%. The Tree model identified age, NEWS, albumin, sodium, white cell count and urea as significant (p<0.001) predictors of death, which described 17 homogeneous subgroups of admissions with mortality ranging from 0.2% to 60%. The c-statistic for the training model was 0.864 (95%CI 0.852 to 0.87) and when applied to the testing data set this was 0.853 (95%CI 0.840 to 0.866). Conclusions An easy to interpret validated risk adjustment Tree model using blood test and NEWS taken within ±24 hours of admission provides good discrimination and offers a novel approach to risk adjustment which may potentially support clinical decision making. Given the nature of the clinical data, the results are likely to be generalisable but further research is required to investigate this promising approach. PMID:23734195
Decision-analytic modeling studies: An overview for clinicians using multiple myeloma as an example.
Rochau, U; Jahn, B; Qerimi, V; Burger, E A; Kurzthaler, C; Kluibenschaedl, M; Willenbacher, E; Gastl, G; Willenbacher, W; Siebert, U
2015-05-01
The purpose of this study was to provide a clinician-friendly overview of decision-analytic models evaluating different treatment strategies for multiple myeloma (MM). We performed a systematic literature search to identify studies evaluating MM treatment strategies using mathematical decision-analytic models. We included studies that were published as full-text articles in English, and assessed relevant clinical endpoints, and summarized methodological characteristics (e.g., modeling approaches, simulation techniques, health outcomes, perspectives). Eleven decision-analytic modeling studies met our inclusion criteria. Five different modeling approaches were adopted: decision-tree modeling, Markov state-transition modeling, discrete event simulation, partitioned-survival analysis and area-under-the-curve modeling. Health outcomes included survival, number-needed-to-treat, life expectancy, and quality-adjusted life years. Evaluated treatment strategies included novel agent-based combination therapies, stem cell transplantation and supportive measures. Overall, our review provides a comprehensive summary of modeling studies assessing treatment of MM and highlights decision-analytic modeling as an important tool for health policy decision making. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
ERIC Educational Resources Information Center
Kriston, Levente; Melchior, Hanne; Hergert, Anika; Bergelt, Corinna; Watzke, Birgit; Schulz, Holger; von Wolff, Alessa
2011-01-01
The aim of our study was to develop a graphical tool that can be used in addition to standard statistical criteria to support decisions on the number of classes in explorative categorical latent variable modeling for rehabilitation research. Data from two rehabilitation research projects were used. In the first study, a latent profile analysis was…
An Improved Decision Tree for Predicting a Major Product in Competing Reactions
ERIC Educational Resources Information Center
Graham, Kate J.
2014-01-01
When organic chemistry students encounter competing reactions, they are often overwhelmed by the task of evaluating multiple factors that affect the outcome of a reaction. The use of a decision tree is a useful tool to teach students to evaluate a complex situation and propose a likely outcome. Specifically, a decision tree can help students…
Decision Tree Phytoremediation
1999-12-01
aromatic hydrocarbons, and landfill leachates . Phytoremediation has been used for point and nonpoint source hazardous waste control. 1.2 Types of... Phytoremediation Prepared by Interstate Technology and Regulatory Cooperation Work Group Phytoremediation Work Team December 1999 Decision Tree...1999 2. REPORT TYPE N/A 3. DATES COVERED - 4. TITLE AND SUBTITLE Phytoremediation Decision Tree 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c
DOE Office of Scientific and Technical Information (OSTI.GOV)
Soner Yorgun, M.; Rood, Richard B.
An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
Soner Yorgun, M.; Rood, Richard B.
2016-11-11
An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
Macmillan, Donna S; Canipa, Steven J; Chilton, Martyn L; Williams, Richard V; Barber, Christopher G
2016-04-01
There is a pressing need for non-animal methods to predict skin sensitisation potential and a number of in chemico and in vitro assays have been designed with this in mind. However, some compounds can fall outside the applicability domain of these in chemico/in vitro assays and may not be predicted accurately. Rule-based in silico models such as Derek Nexus are expert-derived from animal and/or human data and the mechanism-based alert domain can take a number of factors into account (e.g. abiotic/biotic activation). Therefore, Derek Nexus may be able to predict for compounds outside the applicability domain of in chemico/in vitro assays. To this end, an integrated testing strategy (ITS) decision tree using Derek Nexus and a maximum of two assays (from DPRA, KeratinoSens, LuSens, h-CLAT and U-SENS) was developed. Generally, the decision tree improved upon other ITS evaluated in this study with positive and negative predictivity calculated as 86% and 81%, respectively. Our results demonstrate that an ITS using an in silico model such as Derek Nexus with a maximum of two in chemico/in vitro assays can predict the sensitising potential of a number of chemicals, including those outside the applicability domain of existing non-animal assays. Copyright © 2016 Elsevier Inc. All rights reserved.
Ben-Assuli, Ofir; Leshno, Moshe
2016-09-01
In the last decade, health providers have implemented information systems to improve accuracy in medical diagnosis and decision-making. This article evaluates the impact of an electronic health record on emergency department physicians' diagnosis and admission decisions. A decision analytic approach using a decision tree was constructed to model the admission decision process to assess the added value of medical information retrieved from the electronic health record. Using a Bayesian statistical model, this method was evaluated on two coronary artery disease scenarios. The results show that the cases of coronary artery disease were better diagnosed when the electronic health record was consulted and led to more informed admission decisions. Furthermore, the value of medical information required for a specific admission decision in emergency departments could be quantified. The findings support the notion that physicians and patient healthcare can benefit from implementing electronic health record systems in emergency departments. © The Author(s) 2015.
Nair, Shalini Rajandran; Tan, Li Kuo; Mohd Ramli, Norlisah; Lim, Shen Yang; Rahmat, Kartini; Mohd Nor, Hazman
2013-06-01
To develop a decision tree based on standard magnetic resonance imaging (MRI) and diffusion tensor imaging to differentiate multiple system atrophy (MSA) from Parkinson's disease (PD). 3-T brain MRI and DTI (diffusion tensor imaging) were performed on 26 PD and 13 MSA patients. Regions of interest (ROIs) were the putamen, substantia nigra, pons, middle cerebellar peduncles (MCP) and cerebellum. Linear, volumetry and DTI (fractional anisotropy and mean diffusivity) were measured. A three-node decision tree was formulated, with design goals being 100 % specificity at node 1, 100 % sensitivity at node 2 and highest combined sensitivity and specificity at node 3. Nine parameters (mean width, fractional anisotropy (FA) and mean diffusivity (MD) of MCP; anteroposterior diameter of pons; cerebellar FA and volume; pons and mean putamen volume; mean FA substantia nigra compacta-rostral) showed statistically significant (P < 0.05) differences between MSA and PD with mean MCP width, anteroposterior diameter of pons and mean FA MCP chosen for the decision tree. Threshold values were 14.6 mm, 21.8 mm and 0.55, respectively. Overall performance of the decision tree was 92 % sensitivity, 96 % specificity, 92 % PPV and 96 % NPV. Twelve out of 13 MSA patients were accurately classified. Formation of the decision tree using these parameters was both descriptive and predictive in differentiating between MSA and PD. • Parkinson's disease and multiple system atrophy can be distinguished on MR imaging. • Combined conventional MRI and diffusion tensor imaging improves the accuracy of diagnosis. • A decision tree is descriptive and predictive in differentiating between clinical entities. • A decision tree can reliably differentiate Parkinson's disease from multiple system atrophy.
Multi-agent simulation of generation expansion in electricity markets.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Botterud, A; Mahalik, M. R.; Veselka, T. D.
2007-06-01
We present a new multi-agent model of generation expansion in electricity markets. The model simulates generation investment decisions of decentralized generating companies (GenCos) interacting in a complex, multidimensional environment. A probabilistic dispatch algorithm calculates prices and profits for new candidate units in different future states of the system. Uncertainties in future load, hydropower conditions, and competitors actions are represented in a scenario tree, and decision analysis is used to identify the optimal expansion decision for each individual GenCo. We test the model using real data for the Korea power system under different assumptions about market design, market concentration, and GenCo'smore » assumed expectations about their competitors investment decisions.« less
[Modeling in value-based medicine].
Neubauer, A S; Hirneiss, C; Kampik, A
2010-03-01
Modeling plays an important role in value-based medicine (VBM). It allows decision support by predicting potential clinical and economic consequences, frequently combining different sources of evidence. Based on relevant publications and examples focusing on ophthalmology the key economic modeling methods are explained and definitions are given. The most frequently applied model types are decision trees, Markov models, and discrete event simulation (DES) models. Model validation includes besides verifying internal validity comparison with other models (external validity) and ideally validation of its predictive properties. The existing uncertainty with any modeling should be clearly stated. This is true for economic modeling in VBM as well as when using disease risk models to support clinical decisions. In economic modeling uni- and multivariate sensitivity analyses are usually applied; the key concepts here are tornado plots and cost-effectiveness acceptability curves. Given the existing uncertainty, modeling helps to make better informed decisions than without this additional information.
Extensions and applications of ensemble-of-trees methods in machine learning
NASA Astrophysics Data System (ADS)
Bleich, Justin
Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems.
Application of preprocessing filtering on Decision Tree C4.5 and rough set theory
NASA Astrophysics Data System (ADS)
Chan, Joseph C. C.; Lin, Tsau Y.
2001-03-01
This paper compares two artificial intelligence methods: the Decision Tree C4.5 and Rough Set Theory on the stock market data. The Decision Tree C4.5 is reviewed with the Rough Set Theory. An enhanced window application is developed to facilitate the pre-processing filtering by introducing the feature (attribute) transformations, which allows users to input formulas and create new attributes. Also, the application produces three varieties of data set with delaying, averaging, and summation. The results prove the improvement of pre-processing by applying feature (attribute) transformations on Decision Tree C4.5. Moreover, the comparison between Decision Tree C4.5 and Rough Set Theory is based on the clarity, automation, accuracy, dimensionality, raw data, and speed, which is supported by the rules sets generated by both algorithms on three different sets of data.
NASA Astrophysics Data System (ADS)
Basye, Austin T.
A matrix element method analysis of the Standard Model Higgs boson, produced in association with two top quarks decaying to the lepton-plus-jets channel is presented. Based on 20.3 fb--1 of s=8 TeV data, produced at the Large Hadron Collider and collected by the ATLAS detector, this analysis utilizes multiple advanced techniques to search for ttH signatures with a 125 GeV Higgs boson decaying to two b -quarks. After categorizing selected events based on their jet and b-tag multiplicities, signal rich regions are analyzed using the matrix element method. Resulting variables are then propagated to two parallel multivariate analyses utilizing Neural Networks and Boosted Decision Trees respectively. As no significant excess is found, an observed (expected) limit of 3.4 (2.2) times the Standard Model cross-section is determined at 95% confidence, using the CLs method, for the Neural Network analysis. For the Boosted Decision Tree analysis, an observed (expected) limit of 5.2 (2.7) times the Standard Model cross-section is determined at 95% confidence, using the CLs method. Corresponding unconstrained fits of the Higgs boson signal strength to the observed data result in the measured signal cross-section to Standard Model cross-section prediction of mu = 1.2 +/- 1.3(total) +/- 0.7(stat.) for the Neural Network analysis, and mu = 2.9 +/- 1.4(total) +/- 0.8(stat.) for the Boosted Decision Tree analysis.
Sung, Ki Hyuk; Chung, Chin Youb; Lee, Kyoung Min; Lee, Seung Yeol; Choi, In Ho; Cho, Tae-Joon; Yoo, Won Joon; Park, Moon Seok
2014-01-01
This study aimed to determine the best treatment modality for coronal angular deformity of the knee joint in growing children using decision analysis. A decision tree was created to evaluate 3 treatment modalities for coronal angular deformity in growing children: temporary hemiepiphysiodesis using staples, percutaneous screws, or a tension band plate. A decision analysis model was constructed containing the final outcome score, probability of metal failure, and incomplete correction of deformity. The final outcome was defined as health-related quality of life and was used as a utility in the decision tree. The probabilities associated with each case were obtained by literature review, and health-related quality of life was evaluated by a questionnaire completed by 25 pediatric orthopedic experts. Our decision analysis model favored temporary hemiepiphysiodesis using a tension band plate over temporary hemiepiphysiodesis using percutaneous screws or stapling, with utilities of 0.969, 0.957, and 0.962, respectively. One-way sensitivity analysis showed that hemiepiphysiodesis using a tension band plate was better than temporary hemiepiphysiodesis using percutaneous screws, when the overall complication rate of hemiepiphysiodesis using a tension band plate was lower than 15.7%. Two-way sensitivity analysis showed that hemiepiphysiodesis using a tension band plate was more beneficial than temporary hemiepiphysiodesis using percutaneous screws. PMID:25276801
Multivariate analysis of flow cytometric data using decision trees.
Simon, Svenja; Guthke, Reinhard; Kamradt, Thomas; Frey, Oliver
2012-01-01
Characterization of the response of the host immune system is important in understanding the bidirectional interactions between the host and microbial pathogens. For research on the host site, flow cytometry has become one of the major tools in immunology. Advances in technology and reagents allow now the simultaneous assessment of multiple markers on a single cell level generating multidimensional data sets that require multivariate statistical analysis. We explored the explanatory power of the supervised machine learning method called "induction of decision trees" in flow cytometric data. In order to examine whether the production of a certain cytokine is depended on other cytokines, datasets from intracellular staining for six cytokines with complex patterns of co-expression were analyzed by induction of decision trees. After weighting the data according to their class probabilities, we created a total of 13,392 different decision trees for each given cytokine with different parameter settings. For a more realistic estimation of the decision trees' quality, we used stratified fivefold cross validation and chose the "best" tree according to a combination of different quality criteria. While some of the decision trees reflected previously known co-expression patterns, we found that the expression of some cytokines was not only dependent on the co-expression of others per se, but was also dependent on the intensity of expression. Thus, for the first time we successfully used induction of decision trees for the analysis of high dimensional flow cytometric data and demonstrated the feasibility of this method to reveal structural patterns in such data sets.
15 CFR Supplement 1 to Part 732 - Decision Tree
Code of Federal Regulations, 2010 CFR
2010-01-01
... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000 [69 FR 5687, Feb. 6, 2004] ...
15 CFR Supplement No 1 to Part 732 - Decision Tree
Code of Federal Regulations, 2013 CFR
2013-01-01
... 15 Commerce and Foreign Trade 2 2013-01-01 2013-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000 [69 FR 5687, Feb. 6...
15 CFR Supplement No 1 to Part 732 - Decision Tree
Code of Federal Regulations, 2014 CFR
2014-01-01
... 15 Commerce and Foreign Trade 2 2014-01-01 2014-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000 [69 FR 5687, Feb. 6...
15 CFR Supplement 1 to Part 732 - Decision Tree
Code of Federal Regulations, 2012 CFR
2012-01-01
... 15 Commerce and Foreign Trade 2 2012-01-01 2012-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000 [69 FR 5687, Feb. 6, 2004] ...
15 CFR Supplement 1 to Part 732 - Decision Tree
Code of Federal Regulations, 2011 CFR
2011-01-01
... 15 Commerce and Foreign Trade 2 2011-01-01 2011-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000 [69 FR 5687, Feb. 6, 2004] ...
Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree
NASA Astrophysics Data System (ADS)
Kim, Jong Kyu; Kim, Nam Soo
In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.
Software tool for data mining and its applications
NASA Astrophysics Data System (ADS)
Yang, Jie; Ye, Chenzhou; Chen, Nianyi
2002-03-01
A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.
Activity classification using realistic data from wearable sensors.
Pärkkä, Juha; Ermes, Miikka; Korpipää, Panu; Mäntyjärvi, Jani; Peltola, Johannes; Korhonen, Ilkka
2006-01-01
Automatic classification of everyday activities can be used for promotion of health-enhancing physical activities and a healthier lifestyle. In this paper, methods used for classification of everyday activities like walking, running, and cycling are described. The aim of the study was to find out how to recognize activities, which sensors are useful and what kind of signal processing and classification is required. A large and realistic data library of sensor data was collected. Sixteen test persons took part in the data collection, resulting in approximately 31 h of annotated, 35-channel data recorded in an everyday environment. The test persons carried a set of wearable sensors while performing several activities during the 2-h measurement session. Classification results of three classifiers are shown: custom decision tree, automatically generated decision tree, and artificial neural network. The classification accuracies using leave-one-subject-out cross validation range from 58 to 97% for custom decision tree classifier, from 56 to 97% for automatically generated decision tree, and from 22 to 96% for artificial neural network. Total classification accuracy is 82 % for custom decision tree classifier, 86% for automatically generated decision tree, and 82% for artificial neural network.
Decision analysis in clinical cardiology: When is coronary angiography required in aortic stenosis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Georgeson, S.; Meyer, K.B.; Pauker, S.G.
1990-03-15
Decision analysis offers a reproducible, explicit approach to complex clinical decisions. It consists of developing a model, typically a decision tree, that separates choices from chances and that specifies and assigns relative values to outcomes. Sensitivity analysis allows exploration of alternative assumptions. Cost-effectiveness analysis shows the relation between dollars spent and improved health outcomes achieved. In a tutorial format, this approach is applied to the decision whether to perform coronary angiography in a patient who requires aortic valve replacement for critical aortic stenosis.
Identifying pollution sources and predicting urban air quality using ensemble learning methods
NASA Astrophysics Data System (ADS)
Singh, Kunwar P.; Gupta, Shikha; Rai, Premanjali
2013-12-01
In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management.
An Isometric Mapping Based Co-Location Decision Tree Algorithm
NASA Astrophysics Data System (ADS)
Zhou, G.; Wei, J.; Zhou, X.; Zhang, R.; Huang, W.; Sha, H.; Chen, J.
2018-05-01
Decision tree (DT) induction has been widely used in different pattern classification. However, most traditional DTs have the disadvantage that they consider only non-spatial attributes (ie, spectral information) as a result of classifying pixels, which can result in objects being misclassified. Therefore, some researchers have proposed a co-location decision tree (Cl-DT) method, which combines co-location and decision tree to solve the above the above-mentioned traditional decision tree problems. Cl-DT overcomes the shortcomings of the existing DT algorithms, which create a node for each value of a given attribute, which has a higher accuracy than the existing decision tree approach. However, for non-linearly distributed data instances, the euclidean distance between instances does not reflect the true positional relationship between them. In order to overcome these shortcomings, this paper proposes an isometric mapping method based on Cl-DT (called, (Isomap-based Cl-DT), which is a method that combines heterogeneous and Cl-DT together. Because isometric mapping methods use geodetic distances instead of Euclidean distances between non-linearly distributed instances, the true distance between instances can be reflected. The experimental results and several comparative analyzes show that: (1) The extraction method of exposed carbonate rocks is of high accuracy. (2) The proposed method has many advantages, because the total number of nodes, the number of leaf nodes and the number of nodes are greatly reduced compared to Cl-DT. Therefore, the Isomap -based Cl-DT algorithm can construct a more accurate and faster decision tree.
Wang, Ting; Li, Weiying; Zheng, Xiaofeng; Lin, Zhifen; Kong, Deyang
2014-02-01
During the last past decades, there is an increasing number of studies about estrogenic activities of the environmental pollutants on amphibians and many determination methods have been proposed. However, these determination methods are time-consuming and expensive, and a rapid and simple method to screen and test the chemicals for estrogenic activities to amphibians is therefore imperative. Herein is proposed a new decision tree formulated not only with physicochemical parameters but also a biological parameter that was successfully used to screen estrogenic activities of the chemicals on amphibians. The biological parameter, CDOCKER interaction energy (Ebinding ) between chemicals and the target proteins was calculated based on the method of molecular docking, and it was used to revise the decision tree formulated by Hong only with physicochemical parameters for screening estrogenic activity of chemicals in rat. According to the correlation between Ebinding of rat and Xenopus laevis, a new decision tree for estrogenic activities in Xenopus laevis is finally proposed. Then it was validated by using the randomly 8 chemicals which can be frequently exposed to Xenopus laevis, and the agreement between the results from the new decision tree and the ones from experiments is generally satisfactory. Consequently, the new decision tree can be used to screen the estrogenic activities of the chemicals, and combinational use of the Ebinding and classical physicochemical parameters can greatly improves Hong's decision tree. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Stonecipher, Karl; Parrish, Joseph; Stonecipher, Megan
2018-05-18
This review is intended to update and educate the reader on the currently available options for laser vision correction, more specifically, laser-assisted in-situ keratomileusis (LASIK). In addition, some related clinical outcomes data from over 1000 cases performed over a 1-year are presented to highlight some differences between the various treatment profiles currently available including the rapidity of visual recovery. The cases in question were performed on the basis of a decision tree to segregate patients on the basis of anatomical, topographic and aberrometry findings; the decision tree was formulated based on the data available in some of the reviewed articles. Numerous recent studies reported in the literature provide data related to the risks and benefits of LASIK; alternatives to a laser refractive procedure are also discussed. The results from these studies have been used to prepare a decision tree to assist the surgeon in choosing the best option for the patient based on the data from several standard preoperative diagnostic tests. The data presented here should aid surgeons in understanding the effects of currently available LASIK treatment profiles. Surgeons should also be able to appreciate how the findings were used to create a decision tree to help choose the most appropriate treatment profile for patients. Finally, the retrospective evaluation of clinical outcomes based on the decision tree should provide surgeons with a realistic expectation for their own outcomes should they adopt such a decision tree in their own practice.
NASA Astrophysics Data System (ADS)
McKenney, D.; Pedlar, J.
2011-12-01
Climate is one of the major influences on forests and much effort has gone into projecting the impacts of rapid climate change on forest distribution and productivity. Such efforts are premised on the notion that the current generation of Global Climate Models (GCMs) provide reasonably accurate representations of future climate. But what is the appropriate level of faith to put in these projections when making relatively fine-scale resource management decisions such as the movement of plant genetic material? In this talk we review recent outcomes of climate envelope models for North American tree species that suggest optimal climate regimes could move on average ~700km within the next 100 years. Newer generation GCMs seem to confirm these results but much uncertainty remains for practical decision-making. Despite these uncertainties, assisted migration has been suggested as a climate change adaptation tool wherein populations of trees are moved up to a few hundred kilometers north (or a few hundred meters upslope) to keep pace with the anticipated changes in optimal climate regimes. A continent-wide web based tool (SEEDWHERE) is presented, which assists in identifying appropriate translocation distances for assisted migration initiatives. We finish with some suggestions for future work on the topic of forest regeneration decisions under an evolving and uncertain future climate.
NASA Astrophysics Data System (ADS)
Attaluri, Pavan K.; Chen, Zhengxin; Weerakoon, Aruna M.; Lu, Guoqing
Multiple criteria decision making (MCDM) has significant impact in bioinformatics. In the research reported here, we explore the integration of decision tree (DT) and Hidden Markov Model (HMM) for subtype prediction of human influenza A virus. Infection with influenza viruses continues to be an important public health problem. Viral strains of subtype H3N2 and H1N1 circulates in humans at least twice annually. The subtype detection depends mainly on the antigenic assay, which is time-consuming and not fully accurate. We have developed a Web system for accurate subtype detection of human influenza virus sequences. The preliminary experiment showed that this system is easy-to-use and powerful in identifying human influenza subtypes. Our next step is to examine the informative positions at the protein level and extend its current functionality to detect more subtypes. The web functions can be accessed at http://glee.ist.unomaha.edu/.
Generation of 2D Land Cover Maps for Urban Areas Using Decision Tree Classification
NASA Astrophysics Data System (ADS)
Höhle, J.
2014-09-01
A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects like buildings, roads, grassland, trees, hedges, and walls from such an "intelligent" point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using image analysis techniques. The proposed methodology is described step by step. The classification, assessment, and refinement is carried out by the open source software "R"; the generation of the dense and accurate digital surface model by the "Match-T DSM" program of the Trimble Company. A practical example of a 2D land cover map generation is carried out. Images of a multispectral medium-format aerial camera covering an urban area in Switzerland are used. The assessment of the produced land cover map is based on class-wise stratified sampling where reference values of samples are determined by means of stereo-observations of false-colour stereopairs. The stratified statistical assessment of the produced land cover map with six classes and based on 91 points per class reveals a high thematic accuracy for classes "building" (99 %, 95 % CI: 95 %-100 %) and "road and parking lot" (90 %, 95 % CI: 83 %-95 %). Some other accuracy measures (overall accuracy, kappa value) and their 95 % confidence intervals are derived as well. The proposed methodology has a high potential for automation and fast processing and may be applied to other scenes and sensors.
An effective method on pornographic images realtime recognition
NASA Astrophysics Data System (ADS)
Wang, Baosong; Lv, Xueqiang; Wang, Tao; Wang, Chengrui
2013-03-01
In this paper, skin detection, texture filtering and face detection are used to extract feature on an image library, training them with the decision tree arithmetic to create some rules as a decision tree classifier to distinguish an unknown image. Experiment based on more than twenty thousand images, the precision rate can get 76.21% when testing on 13025 pornographic images and elapsed time is less than 0.2s. This experiment shows it has a good popularity. Among the steps mentioned above, proposing a new skin detection model which called irregular polygon region skin detection model based on YCbCr color space. This skin detection model can lower the false detection rate on skin detection. A new method called sequence region labeling on binary connected area can calculate features on connected area, it is faster and needs less memory than other recursive methods.
Liang, Shih-Hsiung; Walther, Bruno Andreas; Shieh, Bao-Sen
2017-01-01
Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables associated with reproduction. Our final optimal models achieved relatively high performance values, and we discuss differences in performance with regard to sample size and variable treatments. Our results showed that, for both the establishment model and introduction model, the number of invaded countries was the most important or second most important determinant, respectively. Therefore, we suggest that future success for introduction and establishment of exotic birds may be gauged by simply looking at previous success in invading other countries. Finally, we found that species traits related to reproduction were more important in establishment models than in introduction models; importantly, these determinants were not averaged but either minimum or maximum values of species traits. Therefore, we suggest that in addition to averaged values, reproductive potential represented by minimum and maximum values of species traits should be considered in invasion studies.
Liang, Shih-Hsiung; Walther, Bruno Andreas
2017-01-01
Background Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. Methods We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. Results The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables associated with reproduction. Discussion Our final optimal models achieved relatively high performance values, and we discuss differences in performance with regard to sample size and variable treatments. Our results showed that, for both the establishment model and introduction model, the number of invaded countries was the most important or second most important determinant, respectively. Therefore, we suggest that future success for introduction and establishment of exotic birds may be gauged by simply looking at previous success in invading other countries. Finally, we found that species traits related to reproduction were more important in establishment models than in introduction models; importantly, these determinants were not averaged but either minimum or maximum values of species traits. Therefore, we suggest that in addition to averaged values, reproductive potential represented by minimum and maximum values of species traits should be considered in invasion studies. PMID:28316893
ERIC Educational Resources Information Center
Block, Stephanie D.; Foster, E. Michael; Pierce, Matthew W.; Berkoff, Molly C.; Runyan, Desmond K.
2013-01-01
In suspected child sexual abuse some professionals recommend multiple child interviews to increase the likelihood of disclosure or more details to improve decision-making and increase convictions. We modeled the yield of a policy of routinely conducting multiple child interviews and increased convictions. Our decision tree reflected the path of a…
Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H
2016-01-01
Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
A survey of decision tree classifier methodology
NASA Technical Reports Server (NTRS)
Safavian, S. R.; Landgrebe, David
1991-01-01
Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.
A survey of decision tree classifier methodology
NASA Technical Reports Server (NTRS)
Safavian, S. Rasoul; Landgrebe, David
1990-01-01
Decision Tree Classifiers (DTC's) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps, the most important feature of DTC's is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issue. After considering potential advantages of DTC's over single stage classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.
NASA Astrophysics Data System (ADS)
Ghosh, S. M.; Behera, M. D.
2017-12-01
Forest aboveground biomass (AGB) is an important factor for preparation of global policy making decisions to tackle the impact of climate change. Several previous studies has concluded that remote sensing methods are more suitable for estimating forest biomass on regional scale. Among all available remote sensing data and methods, Synthetic Aperture Radar (SAR) data in combination with decision tree based machine learning algorithms has shown better promise in estimating higher biomass values. There aren't many studies done for biomass estimation of dense Indian tropical forests with high biomass density. In this study aboveground biomass was estimated for two major tree species, Sal (Shorea robusta) and Teak (Tectona grandis), of Katerniaghat Wildlife Sanctuary, a tropical forest situated in northern India. Biomass was estimated by combining C-band SAR data from Sentinel-1A satellite, vegetation indices produced using Sentinel-2A data and ground inventory plots. Along with SAR backscatter value, SAR texture images were also used as input as earlier studies had found that image texture has a correlation with vegetation biomass. Decision tree based nonlinear machine learning algorithms were used in place of parametric regression models for establishing relationship between fields measured values and remotely sensed parameters. Using random forest model with a combination of vegetation indices with SAR backscatter as predictor variables shows best result for Sal forest, with a coefficient of determination value of 0.71 and a RMSE value of 105.027 t/ha. In teak forest also best result can be found in the same combination but for stochastic gradient boosted model with a coefficient of determination value of 0.6 and a RMSE value of 79.45 t/ha. These results are mostly better than the results of other studies done for similar kind of forests. This study shows that Sentinel series satellite data has exceptional capabilities in estimating dense forest AGB and machine learning algorithms are better means to do so than parametric regression models.
Development of a diagnostic decision tree for obstructive pulmonary diseases based on real-life data
in ’t Veen, Johannes C.C.M.; Dekhuijzen, P.N. Richard; van Heijst, Ellen; Kocks, Janwillem W.H.; Muilwijk-Kroes, Jacqueline B.; Chavannes, Niels H.; van der Molen, Thys
2016-01-01
The aim of this study was to develop and explore the diagnostic accuracy of a decision tree derived from a large real-life primary care population. Data from 9297 primary care patients (45% male, mean age 53±17 years) with suspicion of an obstructive pulmonary disease was derived from an asthma/chronic obstructive pulmonary disease (COPD) service where patients were assessed using spirometry, the Asthma Control Questionnaire, the Clinical COPD Questionnaire, history data and medication use. All patients were diagnosed through the Internet by a pulmonologist. The Chi-squared Automatic Interaction Detection method was used to build the decision tree. The tree was externally validated in another real-life primary care population (n=3215). Our tree correctly diagnosed 79% of the asthma patients, 85% of the COPD patients and 32% of the asthma–COPD overlap syndrome (ACOS) patients. External validation showed a comparable pattern (correct: asthma 78%, COPD 83%, ACOS 24%). Our decision tree is considered to be promising because it was based on real-life primary care patients with a specialist's diagnosis. In most patients the diagnosis could be correctly predicted. Predicting ACOS, however, remained a challenge. The total decision tree can be implemented in computer-assisted diagnostic systems for individual patients. A simplified version of this tree can be used in daily clinical practice as a desk tool. PMID:27730177
Using Boosting Decision Trees in Gravitational Wave Searches triggered by Gamma-ray Bursts
NASA Astrophysics Data System (ADS)
Zuraw, Sarah; LIGO Collaboration
2015-04-01
The search for gravitational wave bursts requires the ability to distinguish weak signals from background detector noise. Gravitational wave bursts are characterized by their transient nature, making them particularly difficult to detect as they are similar to non-Gaussian noise fluctuations in the detector. The Boosted Decision Tree method is a powerful machine learning algorithm which uses Multivariate Analysis techniques to explore high-dimensional data sets in order to distinguish between gravitational wave signal and background detector noise. It does so by training with known noise events and simulated gravitational wave events. The method is tested using waveform models and compared with the performance of the standard gravitational wave burst search pipeline for Gamma-ray Bursts. It is shown that the method is able to effectively distinguish between signal and background events under a variety of conditions and over multiple Gamma-ray Burst events. This example demonstrates the usefulness and robustness of the Boosted Decision Tree and Multivariate Analysis techniques as a detection method for gravitational wave bursts. LIGO, UMass, PREP, NEGAP.
Jin, Mingwu; Deng, Weishu
2018-05-15
There is a spectrum of the progression from healthy control (HC) to mild cognitive impairment (MCI) without conversion to Alzheimer's disease (AD), to MCI with conversion to AD (cMCI), and to AD. This study aims to predict the different disease stages using brain structural information provided by magnetic resonance imaging (MRI) data. The neighborhood component analysis (NCA) is applied to select most powerful features for prediction. The ensemble decision tree classifier is built to predict which group the subject belongs to. The best features and model parameters are determined by cross validation of the training data. Our results show that 16 out of a total of 429 features were selected by NCA using 240 training subjects, including MMSE score and structural measures in memory-related regions. The boosting tree model with NCA features can achieve prediction accuracy of 56.25% on 160 test subjects. Principal component analysis (PCA) and sequential feature selection (SFS) are used for feature selection, while support vector machine (SVM) is used for classification. The boosting tree model with NCA features outperforms all other combinations of feature selection and classification methods. The results suggest that NCA be a better feature selection strategy than PCA and SFS for the data used in this study. Ensemble tree classifier with boosting is more powerful than SVM to predict the subject group. However, more advanced feature selection and classification methods or additional measures besides structural MRI may be needed to improve the prediction performance. Copyright © 2018 Elsevier B.V. All rights reserved.
Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree
2008-04-01
REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music
Kleinhans, Sonja; Herrmann, Eva; Kohnen, Thomas; Bühren, Jens
2017-08-15
Background Iatrogenic keratectasia is one of the most dreaded complications of refractive surgery. In most cases, keratectasia develops after refractive surgery of eyes suffering from subclinical stages of keratoconus with few or no signs. Unfortunately, there has been no reliable procedure for the early detection of keratoconus. In this study, we used binary decision trees (recursive partitioning) to assess their suitability for discrimination between normal eyes and eyes with subclinical keratoconus. Patients and Methods The method of decision tree analysis was compared with discriminant analysis which has shown good results in previous studies. Input data were 32 eyes of 32 patients with newly diagnosed keratoconus in the contralateral eye and preoperative data of 10 eyes of 5 patients with keratectasia after laser in-situ keratomileusis (LASIK). The control group was made up of 245 normal eyes after LASIK and 12-month follow-up without any signs of iatrogenic keratectasia. Results Decision trees gave better accuracy and specificity than did discriminant analysis. The sensitivity of decision trees was lower than the sensitivity of discriminant analysis. Conclusion On the basis of the patient population of this study, decision trees did not prove to be superior to linear discriminant analysis for the detection of subclinical keratoconus. Georg Thieme Verlag KG Stuttgart · New York.
Chi, Chia-Fen; Tseng, Li-Kai; Jang, Yuh
2012-07-01
Many disabled individuals lack extensive knowledge about assistive technology, which could help them use computers. In 1997, Denis Anson developed a decision tree of 49 evaluative questions designed to evaluate the functional capabilities of the disabled user and choose an appropriate combination of assistive devices, from a selection of 26, that enable the individual to use a computer. In general, occupational therapists guide the disabled users through this process. They often have to go over repetitive questions in order to find an appropriate device. A disabled user may require an alphanumeric entry device, a pointing device, an output device, a performance enhancement device, or some combination of these. Therefore, the current research eliminates redundant questions and divides Anson's decision tree into multiple independent subtrees to meet the actual demand of computer users with disabilities. The modified decision tree was tested by six disabled users to prove it can determine a complete set of assistive devices with a smaller number of evaluative questions. The means to insert new categories of computer-related assistive devices was included to ensure the decision tree can be expanded and updated. The current decision tree can help the disabled users and assistive technology practitioners to find appropriate computer-related assistive devices that meet with clients' individual needs in an efficient manner.
Orlando, Lori A.; Buchanan, Adam H.; Hahn, Susan E.; Christianson, Carol A.; Powell, Karen P.; Skinner, Celette Sugg; Chesnut, Blair; Blach, Colette; Due, Barbara; Ginsburg, Geoffrey S.; Henrich, Vincent C.
2016-01-01
INTRODUCTION Family health history is a strong predictor of disease risk. To reduce the morbidity and mortality of many chronic diseases, risk-stratified evidence-based guidelines strongly encourage the collection and synthesis of family health history to guide selection of primary prevention strategies. However, the collection and synthesis of such information is not well integrated into clinical practice. To address barriers to collection and use of family health histories, the Genomedical Connection developed and validated MeTree, a Web-based, patient-facing family health history collection and clinical decision support tool. MeTree is designed for integration into primary care practices as part of the genomic medicine model for primary care. METHODS We describe the guiding principles, operational characteristics, algorithm development, and coding used to develop MeTree. Validation was performed through stakeholder cognitive interviewing, a genetic counseling pilot program, and clinical practice pilot programs in 2 community-based primary care clinics. RESULTS Stakeholder feedback resulted in changes to MeTree’s interface and changes to the phrasing of clinical decision support documents. The pilot studies resulted in the identification and correction of coding errors and the reformatting of clinical decision support documents. MeTree’s strengths in comparison with other tools are its seamless integration into clinical practice and its provision of action-oriented recommendations guided by providers’ needs. LIMITATIONS The tool was validated in a small cohort. CONCLUSION MeTree can be integrated into primary care practices to help providers collect and synthesize family health history information from patients with the goal of improving adherence to risk-stratified evidence-based guidelines. PMID:24044145
Uncertain decision tree inductive inference
NASA Astrophysics Data System (ADS)
Zarban, L.; Jafari, S.; Fakhrahmad, S. M.
2011-10-01
Induction is the process of reasoning in which general rules are formulated based on limited observations of recurring phenomenal patterns. Decision tree learning is one of the most widely used and practical inductive methods, which represents the results in a tree scheme. Various decision tree algorithms have already been proposed such as CLS, ID3, Assistant C4.5, REPTree and Random Tree. These algorithms suffer from some major shortcomings. In this article, after discussing the main limitations of the existing methods, we introduce a new decision tree induction algorithm, which overcomes all the problems existing in its counterparts. The new method uses bit strings and maintains important information on them. This use of bit strings and logical operation on them causes high speed during the induction process. Therefore, it has several important features: it deals with inconsistencies in data, avoids overfitting and handles uncertainty. We also illustrate more advantages and the new features of the proposed method. The experimental results show the effectiveness of the method in comparison with other methods existing in the literature.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elter, M.; Schulz-Wendtland, R.; Wittenberg, T.
2007-11-15
Mammography is the most effective method for breast cancer screening available today. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to approximately 70% unnecessary biopsies with benign outcomes. To reduce the high number of unnecessary breast biopsies, several computer-aided diagnosis (CAD) systems have been proposed in the last several years. These systems help physicians in their decision to perform a breast biopsy on a suspicious lesion seen in a mammogram or to perform a short term follow-up examination instead. We present two novel CAD approaches that both emphasize an intelligible decision process to predictmore » breast biopsy outcomes from BI-RADS findings. An intelligible reasoning process is an important requirement for the acceptance of CAD systems by physicians. The first approach induces a global model based on decison-tree learning. The second approach is based on case-based reasoning and applies an entropic similarity measure. We have evaluated the performance of both CAD approaches on two large publicly available mammography reference databases using receiver operating characteristic (ROC) analysis, bootstrap sampling, and the ANOVA statistical significance test. Both approaches outperform the diagnosis decisions of the physicians. Hence, both systems have the potential to reduce the number of unnecessary breast biopsies in clinical practice. A comparison of the performance of the proposed decision tree and CBR approaches with a state of the art approach based on artificial neural networks (ANN) shows that the CBR approach performs slightly better than the ANN approach, which in turn results in slightly better performance than the decision-tree approach. The differences are statistically significant (p value <0.001). On 2100 masses extracted from the DDSM database, the CRB approach for example resulted in an area under the ROC curve of A(z)=0.89{+-}0.01, the decision-tree approach in A(z)=0.87{+-}0.01, and the ANN approach in A(z)=0.88{+-}0.01.« less
Comparative Issues and Methods in Organizational Diagnosis. Report II. The Decision Tree Approach.
organizational diagnosis . The advantages and disadvantages of the decision-tree approach generally, and in this study specifically, are examined. A pre-test, using a civilian sample of 174 work groups with Survey of Organizations data, was conducted to assess various decision-tree classification criteria, in terms of their similarity to the distance function used by Bowers and Hausser (1977). The results suggested the use of a large developmental sample, which should result in more distinctly defined boundary lines between classification profiles. Also, the decision matrix
Application of a hybrid association rules/decision tree model for drought monitoring
NASA Astrophysics Data System (ADS)
Nourani, Vahid; Molajou, Amir
2017-12-01
The previous researches have shown that the incorporation of the oceanic-atmospheric climate phenomena such as Sea Surface Temperature (SST) into hydro-climatic models could provide important predictive information about hydro-climatic variability. In this paper, the hybrid application of two data mining techniques (decision tree and association rules) was offered to discover affiliation between drought of Tabriz and Kermanshah synoptic stations (located in Iran) and de-trend SSTs of the Black, Mediterranean and Red Seas. Two major steps of the proposed model were the classification of de-trend SST data and selecting the most effective groups and extracting hidden information involved in the data. The techniques of decision tree which can identify the good traits from a data set for the classification purpose were used for classification and selecting the most effective groups and association rules were employed to extract the hidden predictive information from the large observed data. To examine the accuracy of the rules, confidence and Heidke Skill Score (HSS) measures were calculated and compared for different considering lag times. The computed measures confirm reliable performance of the proposed hybrid data mining method to forecast drought and the results show a relative correlation between the Mediterranean, Black and Red Sea de-trend SSTs and drought of Tabriz and Kermanshah synoptic stations so that the confidence between the monthly Standardized Precipitation Index (SPI) values and the de-trend SST of seas is higher than 70 and 80% respectively for Tabriz and Kermanshah synoptic stations.
Durham, Erin-Elizabeth A; Yu, Xiaxia; Harrison, Robert W
2014-12-01
Effective machine-learning handles large datasets efficiently. One key feature of handling large data is the use of databases such as MySQL. The freeware fuzzy decision tree induction tool, FDT, is a scalable supervised-classification software tool implementing fuzzy decision trees. It is based on an optimized fuzzy ID3 (FID3) algorithm. FDT 2.0 improves upon FDT 1.0 by bridging the gap between data science and data engineering: it combines a robust decisioning tool with data retention for future decisions, so that the tool does not need to be recalibrated from scratch every time a new decision is required. In this paper we briefly review the analytical capabilities of the freeware FDT tool and its major features and functionalities; examples of large biological datasets from HIV, microRNAs and sRNAs are included. This work shows how to integrate fuzzy decision algorithms with modern database technology. In addition, we show that integrating the fuzzy decision tree induction tool with database storage allows for optimal user satisfaction in today's Data Analytics world.
Section-Based Tree Species Identification Using Airborne LIDAR Point Cloud
NASA Astrophysics Data System (ADS)
Yao, C.; Zhang, X.; Liu, H.
2017-09-01
The application of LiDAR data in forestry initially focused on mapping forest community, particularly and primarily intended for largescale forest management and planning. Then with the smaller footprint and higher sampling density LiDAR data available, detecting individual tree overstory, estimating crowns parameters and identifying tree species are demonstrated practicable. This paper proposes a section-based protocol of tree species identification taking palm tree as an example. Section-based method is to detect objects through certain profile among different direction, basically along X-axis or Y-axis. And this method improve the utilization of spatial information to generate accurate results. Firstly, separate the tree points from manmade-object points by decision-tree-based rules, and create Crown Height Mode (CHM) by subtracting the Digital Terrain Model (DTM) from the digital surface model (DSM). Then calculate and extract key points to locate individual trees, thus estimate specific tree parameters related to species information, such as crown height, crown radius, and cross point etc. Finally, with parameters we are able to identify certain tree species. Comparing to species information measured on ground, the portion correctly identified trees on all plots could reach up to 90.65 %. The identification result in this research demonstrate the ability to distinguish palm tree using LiDAR point cloud. Furthermore, with more prior knowledge, section-based method enable the process to classify trees into different classes.
Short communication: Prediction of retention pay-off using a machine learning algorithm.
Shahinfar, Saleh; Kalantari, Afshin S; Cabrera, Victor; Weigel, Kent
2014-05-01
Replacement decisions have a major effect on dairy farm profitability. Dynamic programming (DP) has been widely studied to find the optimal replacement policies in dairy cattle. However, DP models are computationally intensive and might not be practical for daily decision making. Hence, the ability of applying machine learning on a prerun DP model to provide fast and accurate predictions of nonlinear and intercorrelated variables makes it an ideal methodology. Milk class (1 to 5), lactation number (1 to 9), month in milk (1 to 20), and month of pregnancy (0 to 9) were used to describe all cows in a herd in a DP model. Twenty-seven scenarios based on all combinations of 3 levels (base, 20% above, and 20% below) of milk production, milk price, and replacement cost were solved with the DP model, resulting in a data set of 122,716 records, each with a calculated retention pay-off (RPO). Then, a machine learning model tree algorithm was used to mimic the evaluated RPO with DP. The correlation coefficient factor was used to observe the concordance of RPO evaluated by DP and RPO predicted by the model tree. The obtained correlation coefficient was 0.991, with a corresponding value of 0.11 for relative absolute error. At least 100 instances were required per model constraint, resulting in 204 total equations (models). When these models were used for binary classification of positive and negative RPO, error rates were 1% false negatives and 9% false positives. Applying this trained model from simulated data for prediction of RPO for 102 actual replacement records from the University of Wisconsin-Madison dairy herd resulted in a 0.994 correlation with 0.10 relative absolute error rate. Overall results showed that model tree has a potential to be used in conjunction with DP to assist farmers in their replacement decisions. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Changnon, David; Ritsche, Michael; Elyea, Karen; Shelton, Steve; Schramm, Kevin
2000-09-01
This paper illustrates a key lesson related to most uses of long-range climate forecast information, namely that effective weather-related decision-making requires understanding and integration of weather information with other, often complex factors. Northern Illinois University's heating plant manager and staff meteorologist, along with a group of meteorology students, worked together to assess different types of available information that could be used in an autumn natural gas purchasing decision. Weather information assessed included the impact of ENSO events on winters in northern Illinois and the Climate Prediction Center's (CPC) long-range climate outlooks. Non-weather factors, such as the cost and available supplies of natural gas prior to the heating season, contribute to the complexity of the natural gas purchase decision. A decision tree was developed and it incorporated three parts: (a) natural gas supply levels, (b) the CPC long-lead climate outlooks for the region, and (c) an ENSO model developed for DeKalb. The results were used to decide in autumn whether to lock in a price or ride the market each winter. The decision tree was tested for the period 1995-99, and returned a cost-effective decision in three of the four winters.
Satomi, Junichiro; Ghaibeh, A Ammar; Moriguchi, Hiroki; Nagahiro, Shinji
2015-07-01
The severity of clinical signs and symptoms of cranial dural arteriovenous fistulas (DAVFs) are well correlated with their pattern of venous drainage. Although the presence of cortical venous drainage can be considered a potential predictor of aggressive DAVF behaviors, such as intracranial hemorrhage or progressive neurological deficits due to venous congestion, accurate statistical analyses are currently not available. Using a decision tree data mining method, the authors aimed at clarifying the predictability of the future development of aggressive behaviors of DAVF and at identifying the main causative factors. Of 266 DAVF patients, 89 were eligible for analysis. Under observational management, 51 patients presented with intracranial hemorrhage/infarction during the follow-up period. The authors created a decision tree able to assess the risk for the development of aggressive DAVF behavior. Evaluated by 10-fold cross-validation, the decision tree's accuracy, sensitivity, and specificity were 85.28%, 88.33%, and 80.83%, respectively. The tree shows that the main factor in symptomatic patients was the presence of cortical venous drainage. In its absence, the lesion location determined the risk of a DAVF developing aggressive behavior. Decision tree analysis accurately predicts the future development of aggressive DAVF behavior.
Applied Swarm-based medicine: collecting decision trees for patterns of algorithms analysis.
Panje, Cédric M; Glatzer, Markus; von Rappard, Joscha; Rothermundt, Christian; Hundsberger, Thomas; Zumstein, Valentin; Plasswilm, Ludwig; Putora, Paul Martin
2017-08-16
The objective consensus methodology has recently been applied in consensus finding in several studies on medical decision-making among clinical experts or guidelines. The main advantages of this method are an automated analysis and comparison of treatment algorithms of the participating centers which can be performed anonymously. Based on the experience from completed consensus analyses, the main steps for the successful implementation of the objective consensus methodology were identified and discussed among the main investigators. The following steps for the successful collection and conversion of decision trees were identified and defined in detail: problem definition, population selection, draft input collection, tree conversion, criteria adaptation, problem re-evaluation, results distribution and refinement, tree finalisation, and analysis. This manuscript provides information on the main steps for successful collection of decision trees and summarizes important aspects at each point of the analysis.
Shao, Q; Rowe, R C; York, P
2007-06-01
Understanding of the cause-effect relationships between formulation ingredients, process conditions and product properties is essential for developing a quality product. However, the formulation knowledge is often hidden in experimental data and not easily interpretable. This study compares neurofuzzy logic and decision tree approaches in discovering hidden knowledge from an immediate release tablet formulation database relating formulation ingredients (silica aerogel, magnesium stearate, microcrystalline cellulose and sodium carboxymethylcellulose) and process variables (dwell time and compression force) to tablet properties (tensile strength, disintegration time, friability, capping and drug dissolution at various time intervals). Both approaches successfully generated useful knowledge in the form of either "if then" rules or decision trees. Although different strategies are employed by the two approaches in generating rules/trees, similar knowledge was discovered in most cases. However, as decision trees are not able to deal with continuous dependent variables, data discretisation procedures are generally required.
Using Unix system auditing for detecting network intrusions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christensen, M.J.
1993-03-01
Intrusion Detection Systems (IDSs) are designed to detect actions of individuals who use computer resources without authorization as well as legitimate users who exceed their privileges. This paper describes a novel approach to IDS research, namely a decision aiding approach to intrusion detection. The introduction of a decision tree represents the logical steps necessary to distinguish and identify different types of attacks. This tool, the Intrusion Decision Aiding Tool (IDAT), utilizes IDS-based attack models and standard Unix audit data. Since attacks have certain characteristics and are based on already developed signature attack models, experienced and knowledgeable Unix system administrators knowmore » what to look for in system audit logs to determine if a system has been attacked. Others, however, are usually less able to recognize common signatures of unauthorized access. Users can traverse the tree using available audit data displayed by IDAT and general knowledge they possess to reach a conclusion regarding suspicious activity. IDAT is an easy-to-use window based application that gathers, analyzes, and displays pertinent system data according to Unix attack characteristics. IDAT offers a more practical approach and allows the user to make an informed decision regarding suspicious activity.« less
Parallel object-oriented decision tree system
Kamath,; Chandrika, Cantu-Paz [Dublin, CA; Erick, [Oakland, CA
2006-02-28
A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.
Generation and Termination of Binary Decision Trees for Nonparametric Multiclass Classification.
1984-10-01
O M coF=F;; UMBER2. GOVT ACCE5SION NO.1 3 . REC,PINS :A7AL:,G NUMBER ( ’eneration and Terminat_,on :)f Binary D-ecision jC j ik; Trees for Nonnararetrc...1-I . v)IAMO 0~I4 EDvt" O F I 00 . 3 15I OR%.OL.ETL - S-S OCTOBER 1984 LIDS-P-1411 GENERATION AND TERMINATION OF BINARY DECISION TREES FOR...minimizes the Bayes risk. Tree generation and termination are based on the training and test samples, respectively. 0 0 0/ 6 0¢ A 3 I. Introduction We state
EEG feature selection method based on decision tree.
Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun
2015-01-01
This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results.
The Decision Tree for Teaching Management of Uncertainty
ERIC Educational Resources Information Center
Knaggs, Sara J.; And Others
1974-01-01
A 'decision tree' consists of an outline of the patient's symptoms and a logic for decision and action. It is felt that this approach to the decisionmaking process better facilitates each learner's application of his own level of knowledge and skills. (Author)
Predicting metabolic syndrome using decision tree and support vector machine methods.
Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh
2016-05-01
Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in predicting metabolic syndrome. According to this study, in cases where only the final result of the decision is regarded significant, SVM method can be used with acceptable accuracy in decision making medical issues. This method has not been implemented in the previous research.
Spatial modeling and classification of corneal shape.
Marsolo, Keith; Twa, Michael; Bullimore, Mark A; Parthasarathy, Srinivasan
2007-03-01
One of the most promising applications of data mining is in biomedical data used in patient diagnosis. Any method of data analysis intended to support the clinical decision-making process should meet several criteria: it should capture clinically relevant features, be computationally feasible, and provide easily interpretable results. In an initial study, we examined the feasibility of using Zernike polynomials to represent biomedical instrument data in conjunction with a decision tree classifier to distinguish between the diseased and non-diseased eyes. Here, we provide a comprehensive follow-up to that work, examining a second representation, pseudo-Zernike polynomials, to determine whether they provide any increase in classification accuracy. We compare the fidelity of both methods using residual root-mean-square (rms) error and evaluate accuracy using several classifiers: neural networks, C4.5 decision trees, Voting Feature Intervals, and Naïve Bayes. We also examine the effect of several meta-learning strategies: boosting, bagging, and Random Forests (RFs). We present results comparing accuracy as it relates to dataset and transformation resolution over a larger, more challenging, multi-class dataset. They show that classification accuracy is similar for both data transformations, but differs by classifier. We find that the Zernike polynomials provide better feature representation than the pseudo-Zernikes and that the decision trees yield the best balance of classification accuracy and interpretability.
Using data mining to predict success in a weight loss trial.
Batterham, M; Tapsell, L; Charlton, K; O'Shea, J; Thorne, R
2017-08-01
Traditional methods for predicting weight loss success use regression approaches, which make the assumption that the relationships between the independent and dependent (or logit of the dependent) variable are linear. The aim of the present study was to investigate the relationship between common demographic and early weight loss variables to predict weight loss success at 12 months without making this assumption. Data mining methods (decision trees, generalised additive models and multivariate adaptive regression splines), in addition to logistic regression, were employed to predict: (i) weight loss success (defined as ≥5%) at the end of a 12-month dietary intervention using demographic variables [body mass index (BMI), sex and age]; percentage weight loss at 1 month; and (iii) the difference between actual and predicted weight loss using an energy balance model. The methods were compared by assessing model parsimony and the area under the curve (AUC). The decision tree provided the most clinically useful model and had a good accuracy (AUC 0.720 95% confidence interval = 0.600-0.840). Percentage weight loss at 1 month (≥0.75%) was the strongest predictor for successful weight loss. Within those individuals losing ≥0.75%, individuals with a BMI (≥27 kg m -2 ) were more likely to be successful than those with a BMI between 25 and 27 kg m -2 . Data mining methods can provide a more accurate way of assessing relationships when conventional assumptions are not met. In the present study, a decision tree provided the most parsimonious model. Given that early weight loss cannot be predicted before randomisation, incorporating this information into a post randomisation trial design may give better weight loss results. © 2017 The British Dietetic Association Ltd.
Modeling the Emergent Impacts of Harvesting Acadian Forests over 100+ Years
NASA Astrophysics Data System (ADS)
Luus, K. A.; Plug, L. J.
2007-12-01
Harvesting strategies and policies for Acadian forest in Nova Scotia, Canada, presently are set using Decision Support Models (DSMs) that aim to maximize the long-term (>100y) value of forests through decisions implemented over short time horizons (5-80 years). However, DSMs typically are aspatial, lack ecological processes and do not treat erosion, so the long-term (>100y) emergent impacts of the prescribed forestry decisions on erosion and vegetation in Acadian forests remain poorly known. To better understand these impacts, we created an equation-based model that simulates the evolution of a ≥4 km2 forest in time steps of 1 y and at a spatial resolution of 3 m2, the footprint of a single mature tree. The model combines 1) ecological processes of recruitment, competition, and mortality; 2) geomorphic processes of hillslope erosion; 3) anthropic processes of tree harvesting, replanting, and road construction under constraints imposed by regulations and cost/benefit ratio. The model uses digital elevation models, parameters (where available), and calibration (where measurements are not available) for conditions presently found in central Cape Breton, Nova Scotia. The model is unique because it 1) deals with the impacts of harvesting on an Acadian forest; and 2) vegetation and erosion are coupled. The model was tested by comparing the species-specific biomass of long-term (40 y) forest plot data to simulated results. At the spatial scale of individual 1 ha plots, model predictions presently account for approximately 50% of observed biomass changes through time, but predictions are hampered by the effects of serendipitous "random" events such as single tree windfall. Harvesting increases the cumulative erosion over 3000 years by 240% when compared to an old growth forest and significantly suppresses the growth of Balsam Fir and Sugar Maple. We discuss further tests of the model, and how it might be used to investigate the long-term sustainability of the recommendations made by DSMs and to better understand the relationship between vegetation, erosion, and forest management strategies.
Machine Learning Techniques for Prediction of Early Childhood Obesity.
Dugan, T M; Mukhopadhyay, S; Carroll, A; Downs, S
2015-01-01
This paper aims to predict childhood obesity after age two, using only data collected prior to the second birthday by a clinical decision support system called CHICA. Analyses of six different machine learning methods: RandomTree, RandomForest, J48, ID3, Naïve Bayes, and Bayes trained on CHICA data show that an accurate, sensitive model can be created. Of the methods analyzed, the ID3 model trained on the CHICA dataset proved the best overall performance with accuracy of 85% and sensitivity of 89%. Additionally, the ID3 model had a positive predictive value of 84% and a negative predictive value of 88%. The structure of the tree also gives insight into the strongest predictors of future obesity in children. Many of the strongest predictors seen in the ID3 modeling of the CHICA dataset have been independently validated in the literature as correlated with obesity, thereby supporting the validity of the model. This study demonstrated that data from a production clinical decision support system can be used to build an accurate machine learning model to predict obesity in children after age two.
ERIC Educational Resources Information Center
Chen, Gwo-Dong; Liu, Chen-Chung; Ou, Kuo-Liang; Liu, Baw-Jhiune
2000-01-01
Discusses the use of Web logs to record student behavior that can assist teachers in assessing performance and making curriculum decisions for distance learning students who are using Web-based learning systems. Adopts decision tree and data cube information processing methodologies for developing more effective pedagogical strategies. (LRW)
Steensels, M; Antler, A; Bahr, C; Berckmans, D; Maltz, E; Halachmi, I
2016-09-01
Early detection of post-calving health problems is critical for dairy operations. Separating sick cows from the herd is important, especially in robotic-milking dairy farms, where searching for a sick cow can disturb the other cows' routine. The objectives of this study were to develop and apply a behaviour- and performance-based health-detection model to post-calving cows in a robotic-milking dairy farm, with the aim of detecting sick cows based on available commercial sensors. The study was conducted in an Israeli robotic-milking dairy farm with 250 Israeli-Holstein cows. All cows were equipped with rumination- and neck-activity sensors. Milk yield, visits to the milking robot and BW were recorded in the milking robot. A decision-tree model was developed on a calibration data set (historical data of the 10 months before the study) and was validated on the new data set. The decision model generated a probability of being sick for each cow. The model was applied once a week just before the veterinarian performed the weekly routine post-calving health check. The veterinarian's diagnosis served as a binary reference for the model (healthy-sick). The overall accuracy of the model was 78%, with a specificity of 87% and a sensitivity of 69%, suggesting its practical value.
Assessing School Readiness for a Practice Arrangement Using Decision Tree Methodology.
ERIC Educational Resources Information Center
Barger, Sara E.
1998-01-01
Questions in a decision-tree address mission, faculty interest, administrative support, and practice plan as a way of assessing arrangements for nursing faculty's clinical practice. Decisions should be based on congruence between the human resource allocation and the reward systems. (SK)
Surucu, Murat; Shah, Karan K; Mescioglu, Ibrahim; Roeske, John C; Small, William; Choi, Mehee; Emami, Bahman
2016-02-01
To develop decision trees predicting for tumor volume reduction in patients with head and neck (H&N) cancer using pretreatment clinical and pathological parameters. Forty-eight patients treated with definitive concurrent chemoradiotherapy for squamous cell carcinoma of the nasopharynx, oropharynx, oral cavity, or hypopharynx were retrospectively analyzed. These patients were rescanned at a median dose of 37.8 Gy and replanned to account for anatomical changes. The percentages of gross tumor volume (GTV) change from initial to rescan computed tomography (CT; %GTVΔ) were calculated. Two decision trees were generated to correlate %GTVΔ in primary and nodal volumes with 14 characteristics including age, gender, Karnofsky performance status (KPS), site, human papilloma virus (HPV) status, tumor grade, primary tumor growth pattern (endophytic/exophytic), tumor/nodal/group stages, chemotherapy regimen, and primary, nodal, and total GTV volumes in the initial CT scan. The C4.5 Decision Tree induction algorithm was implemented. The median %GTVΔ for primary, nodal, and total GTVs was 26.8%, 43.0%, and 31.2%, respectively. Type of chemotherapy, age, primary tumor growth pattern, site, KPS, and HPV status were the most predictive parameters for primary %GTVΔ decision tree, whereas for nodal %GTVΔ, KPS, site, age, primary tumor growth pattern, initial primary GTV, and total GTV volumes were predictive. Both decision trees had an accuracy of 88%. There can be significant changes in primary and nodal tumor volumes during the course of H&N chemoradiotherapy. Considering the proposed decision trees, radiation oncologists can select patients predicted to have high %GTVΔ, who would theoretically gain the most benefit from adaptive radiotherapy, in order to better use limited clinical resources. © The Author(s) 2015.
Decision Modeling Framework to Minimize Arrival Delays from Ground Delay Programs
NASA Astrophysics Data System (ADS)
Mohleji, Nandita
Convective weather and other constraints create uncertainty in air transportation, leading to costly delays. A Ground Delay Program (GDP) is a strategy to mitigate these effects. Systematic decision support can increase GDP efficacy, reduce delays, and minimize direct operating costs. In this study, a decision analysis (DA) model is constructed by combining a decision tree and Bayesian belief network. Through a study of three New York region airports, the DA model demonstrates that larger GDP scopes that include more flights in the program, along with longer lead times that provide stakeholders greater notice of a pending program, trigger the fewest average arrival delays. These findings are demonstrated to result in a savings of up to $1,850 per flight. Furthermore, when convective weather is predicted, forecast weather confidences remain the same level or greater at least 70% of the time, supporting more strategic decision making. The DA model thus enables quantification of uncertainties and insights on causal relationships, providing support for future GDP decisions.
On Parallelism and the Penman Natural Language Generation System.
1988-04-01
TagfiniteA Tagsubject L untag ed Figure 2-2: System network with choosers & realization statements 7 decision . We will give a more detailed account of...2: enter the current system. The chooser of the system is in charge of * selection of features. The chooser is itself a decision tree with certain...organization of a chooser is the same as a decision (discrimination) tree, and each branching point in the tree is defined by Ask operation. For example, in
NASA Astrophysics Data System (ADS)
Hamedianfar, Alireza; Shafri, Helmi Zulhaidi Mohd
2016-04-01
This paper integrates decision tree-based data mining (DM) and object-based image analysis (OBIA) to provide a transferable model for the detailed characterization of urban land-cover classes using WorldView-2 (WV-2) satellite images. Many articles have been published on OBIA in recent years based on DM for different applications. However, less attention has been paid to the generation of a transferable model for characterizing detailed urban land cover features. Three subsets of WV-2 images were used in this paper to generate transferable OBIA rule-sets. Many features were explored by using a DM algorithm, which created the classification rules as a decision tree (DT) structure from the first study area. The developed DT algorithm was applied to object-based classifications in the first study area. After this process, we validated the capability and transferability of the classification rules into second and third subsets. Detailed ground truth samples were collected to assess the classification results. The first, second, and third study areas achieved 88%, 85%, and 85% overall accuracies, respectively. Results from the investigation indicate that DM was an efficient method to provide the optimal and transferable classification rules for OBIA, which accelerates the rule-sets creation stage in the OBIA classification domain.
Chen, Jianxin; Chuo, Wenjing; Liu, Lei; Lian, Hongjian; Zheng, Lei; Wang, Yong; Xie, Hua; Luo, Liangtao; Zheng, Chenglong; Fu, Bangze; Wang, Wei
2013-01-01
Objective. To explore new diagnostic patterns for syndromes to overcome the insufficiency of obtainable macrocharacteristics and specific biomarkers. Methods. Chinese miniswines were subjected to Ameroid constrictor, placed around the proximal left anterior descending branch. On the 4th week, macrocharacteristics, coronary angiography, echocardiography, and hemorheology indices were detected for diagnosis. IL-1, IL-6, IL-8, IL-10, TNF-α, and hsCRP in serum were detected, and Decision Tree was built. Results. According to current official-issued standard, model animals matched the diagnosis of blood stasis syndrome with myocardial ischemia based on findings, including >90% occlusion, attenuated left ventricular segmental motion, dark red or purple tongues, and higher blood viscosity. Significant decrease of IL-10 and increase of TNF-α were found in model animals. However, in the Decision Tree, besides IL-10 and TNF-α, IL-8 helped to increase the accuracy of classification to 86%. Conclusions. The Decision Tree building with TNF-α, IL-10, and IL-8 is helpful for the diagnosis of blood stasis syndrome in myocardial ischemia animals. What is more is that our data set up a new path to the differentiation of syndrome by feature patterns consisting of multiple biomarkers not only for animals but also for patients. We believe that it will contribute to the standardization and international application of syndromes. PMID:24371451
Fernández, M. Paulina; Norero, Aldo; Vera, Jorge R.; Pérez, Eduardo
2011-01-01
Backgrounds and Aims Functional–structural models are interesting tools to relate environmental and management conditions with forest growth. Their three-dimensional images can reveal important characteristics of wood used for industrial products. Like virtual laboratories, they can be used to evaluate relationships among species, sites and management, and to support silvicultural design and decision processes. Our aim was to develop a functional–structural model for radiata pine (Pinus radiata) given its economic importance in many countries. Methods The plant model uses the L-system language. The structure of the model is based on operational units, which obey particular rules, and execute photosynthesis, respiration and morphogenesis, according to their particular characteristics. Plant allometry is adhered to so that harmonic growth and plant development are achieved. Environmental signals for morphogenesis are used. Dynamic turnover guides the normal evolution of the tree. Monthly steps allow for detailed information of wood characteristics. The model is independent of traditional forest inventory relationships and is conceived as a mechanistic model. For model parameterization, three databases which generated new information relating to P. radiata were analysed and incorporated. Key Results Simulations under different and contrasting environmental and management conditions were run and statistically tested. The model was validated against forest inventory data for the same sites and times and against true crown architectural data. The performance of the model for 6-year-old trees was encouraging. Total height, diameter and lengths of growth units were adequately estimated. Branch diameters were slightly overestimated. Wood density values were not satisfactory, but the cyclical pattern and increase of growth rings were reasonably well modelled. Conclusions The model was able to reproduce the development and growth of the species based on mechanistic formulations. It may be valuable in assessing stand behaviour under different environmental and management conditions, assisting in decision-making with regard to management, and as a research tool to formulate hypothesis regarding forest tree growth and development. PMID:21987452
Probabilistic, meso-scale flood loss modelling
NASA Astrophysics Data System (ADS)
Kreibich, Heidi; Botto, Anna; Schröter, Kai; Merz, Bruno
2016-04-01
Flood risk analyses are an important basis for decisions on flood risk management and adaptation. However, such analyses are associated with significant uncertainty, even more if changes in risk due to global change are expected. Although uncertainty analysis and probabilistic approaches have received increased attention during the last years, they are still not standard practice for flood risk assessments and even more for flood loss modelling. State of the art in flood loss modelling is still the use of simple, deterministic approaches like stage-damage functions. Novel probabilistic, multi-variate flood loss models have been developed and validated on the micro-scale using a data-mining approach, namely bagging decision trees (Merz et al. 2013). In this presentation we demonstrate and evaluate the upscaling of the approach to the meso-scale, namely on the basis of land-use units. The model is applied in 19 municipalities which were affected during the 2002 flood by the River Mulde in Saxony, Germany (Botto et al. submitted). The application of bagging decision tree based loss models provide a probability distribution of estimated loss per municipality. Validation is undertaken on the one hand via a comparison with eight deterministic loss models including stage-damage functions as well as multi-variate models. On the other hand the results are compared with official loss data provided by the Saxon Relief Bank (SAB). The results show, that uncertainties of loss estimation remain high. Thus, the significant advantage of this probabilistic flood loss estimation approach is that it inherently provides quantitative information about the uncertainty of the prediction. References: Merz, B.; Kreibich, H.; Lall, U. (2013): Multi-variate flood damage assessment: a tree-based data-mining approach. NHESS, 13(1), 53-64. Botto A, Kreibich H, Merz B, Schröter K (submitted) Probabilistic, multi-variable flood loss modelling on the meso-scale with BT-FLEMO. Risk Analysis.
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, P.; Beaudet, P.
1980-01-01
The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.
Evaluation of Decision Trees for Cloud Detection from AVHRR Data
NASA Technical Reports Server (NTRS)
Shiffman, Smadar; Nemani, Ramakrishna
2005-01-01
Automated cloud detection and tracking is an important step in assessing changes in radiation budgets associated with global climate change via remote sensing. Data products based on satellite imagery are available to the scientific community for studying trends in the Earth's atmosphere. The data products include pixel-based cloud masks that assign cloud-cover classifications to pixels. Many cloud-mask algorithms have the form of decision trees. The decision trees employ sequential tests that scientists designed based on empirical astrophysics studies and simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In a previous study we compared automatically learned decision trees to cloud masks included in Advanced Very High Resolution Radiometer (AVHRR) data products from the year 2000. In this paper we report the replication of the study for five-year data, and for a gold standard based on surface observations performed by scientists at weather stations in the British Islands. For our sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks p < 0.001.
Sequential decision tree using the analytic hierarchy process for decision support in rectal cancer.
Suner, Aslı; Çelikoğlu, Can Cengiz; Dicle, Oğuz; Sökmen, Selman
2012-09-01
The aim of the study is to determine the most appropriate method for construction of a sequential decision tree in the management of rectal cancer, using various patient-specific criteria and treatments such as surgery, chemotherapy, and radiotherapy. An analytic hierarchy process (AHP) was used to determine the priorities of variables. Relevant criteria used in two decision steps and their relative priorities were established by a panel of five general surgeons. Data were collected via a web-based application and analyzed using the "Expert Choice" software specifically developed for the AHP. Consistency ratios in the AHP method were calculated for each set of judgments, and the priorities of sub-criteria were determined. A sequential decision tree was constructed for the best treatment decision process, using priorities determined by the AHP method. Consistency ratios in the AHP method were calculated for each decision step, and the judgments were considered consistent. The tumor-related criterion "presence of perforation" (0.331) and the patient-surgeon-related criterion "surgeon's experience" (0.630) had the highest priority in the first decision step. In the second decision step, the tumor-related criterion "the stage of the disease" (0.230) and the patient-surgeon-related criterion "surgeon's experience" (0.281) were the paramount criteria. The results showed some variation in the ranking of criteria between the decision steps. In the second decision step, for instance, the tumor-related criterion "presence of perforation" was just the fifth. The consistency of decision support systems largely depends on the quality of the underlying decision tree. When several choices and variables have to be considered in a decision, it is very important to determine priorities. The AHP method seems to be effective for this purpose. The decision algorithm developed by this method is more realistic and will improve the quality of the decision tree. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Akkaş, Efe; Evren Çubukçu, H.; Akin, Lutfiye; Erkut, Volkan; Yurdakul, Yasin; Karayigit, Ali Ihsan
2016-04-01
Identification of zeolite group minerals is complicated due to their similar chemical formulas and habits. Although the morphologies of various zeolite crystals can be recognized under Scanning Electron Microscope (SEM), it is relatively more challenging and problematic process to identify zeolites using their mineral chemical data. SEMs integrated with energy dispersive X-ray spectrometers (EDS) provide fast and reliable chemical data of minerals. However, considering elemental similarities of characteristic chemical formulae of zeolite species (e.g. Clinoptilolite ((Na,K,Ca)2 -3Al3(Al,Si)2Si13O3612H2O) and Erionite ((Na2,K2,Ca)2Al4Si14O36ṡ15H2O)) EDS data alone does not seem to be sufficient for correct identification. Furthermore, the physical properties of the specimen (e.g. roughness, electrical conductivity) and the applied analytical conditions (e.g. accelerating voltage, beam current, spot size) of the SEM-EDS should be uniform in order to obtain reliable elemental results of minerals having high alkali (Na, K) and H2O (approx. %14-18) contents. This study which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK Project No: 113Y439), aims to construct a database as large as possible for various zeolite minerals and to develop a general prediction model for the identification of zeolite minerals using SEM-EDS data. For this purpose, an artificial neural network and rule based decision tree algorithm were employed. Throughout the analyses, a total of 1850 chemical data were collected from four distinct zeolite species, (Clinoptilolite-Heulandite, Erionite, Analcime and Mordenite) observed in various rocks (e.g. coals, pyroclastics). In order to obtain a representative training data set for each minerals, a selection procedure for reference mineral analyses was applied. During the selection procedure, SEM based crystal morphology data, XRD spectra and re-calculated cationic distribution, obtained by EDS have been used for the characterization of the training set. Consequently, for each zeolite species 250 EDS data (as elemental intensities) used for training and 200 ±50 analyses were tested. Finally, two prediction models were developed. The constructed models with various cross-correlation values (r) yielded an average accuracy of >91% for the best predictions using C5.0 Decision Tree algorithm and back propagation artificial neural network. Despite having similar accuracies, the developed models exhibit different prediction behaviors for some zeolite minerals. The results demonstrate that artificial neural network as a nonlinear tool and decision tree algorithm as a rule based prediction model would be employed to provide considerably efficient and reliable identification/classification of some zeolite minerals regardless of their similar elemental compositions. Keywords: mineral identification; zeolites; energy dispersive spectrometry; artificial neural networks; decision tree.
Linearly Adjustable International Portfolios
NASA Astrophysics Data System (ADS)
Fonseca, R. J.; Kuhn, D.; Rustem, B.
2010-09-01
We present an approach to multi-stage international portfolio optimization based on the imposition of a linear structure on the recourse decisions. Multiperiod decision problems are traditionally formulated as stochastic programs. Scenario tree based solutions however can become intractable as the number of stages increases. By restricting the space of decision policies to linear rules, we obtain a conservative tractable approximation to the original problem. Local asset prices and foreign exchange rates are modelled separately, which allows for a direct measure of their impact on the final portfolio value.
Equality of Shapley value and fair proportion index in phylogenetic trees.
Fuchs, Michael; Jin, Emma Yu
2015-11-01
The Shapley value and the fair proportion index of phylogenetic trees have been introduced recently for the purpose of making conservation decisions in genetics. Moreover, also very recently, Hartmann (J Math Biol 67:1163-1170, 2013) has presented data which shows that there is a strong correlation between a slightly modified version of the Shapley value (which we call the modified Shapley value) and the fair proportion index. He gave an explanation of this correlation by showing that the contribution of both indices to an edge of the tree becomes identical as the number of taxa tends to infinity. In this note, we show that the Shapley value and the fair proportion index are in fact the same. Moreover, we also consider the modified Shapley value and show that its covariance with the fair proportion index in random phylogenetic trees under the Yule-Harding model and uniform model is indeed close to one.
Ontology based decision system for breast cancer diagnosis
NASA Astrophysics Data System (ADS)
Trabelsi Ben Ameur, Soumaya; Cloppet, Florence; Wendling, Laurent; Sellami, Dorra
2018-04-01
In this paper, we focus on analysis and diagnosis of breast masses inspired by expert concepts and rules. Accordingly, a Bag of Words is built based on the ontology of breast cancer diagnosis, accurately described in the Breast Imaging Reporting and Data System. To fill the gap between low level knowledge and expert concepts, a semantic annotation is developed using a machine learning tool. Then, breast masses are classified into benign or malignant according to expert rules implicitly modeled with a set of classifiers (KNN, ANN, SVM and Decision Tree). This semantic context of analysis offers a frame where we can include external factors and other meta-knowledge such as patient risk factors as well as exploiting more than one modality. Based on MRI and DECEDM modalities, our developed system leads a recognition rate of 99.7% with Decision Tree where an improvement of 24.7 % is obtained owing to semantic analysis.
NASA Astrophysics Data System (ADS)
Sheehan, T.; Baker, B.; Degagne, R. S.
2015-12-01
With the abundance of data sources, analytical methods, and computer models, land managers are faced with the overwhelming task of making sense of a profusion of data of wildly different types. Luckily, fuzzy logic provides a method to work with different types of data using language-based propositions such as "the landscape is undisturbed," and a simple set of logic constructs. Just as many surveys allow different levels of agreement with a proposition, fuzzy logic allows values reflecting different levels of truth for a proposition. Truth levels fall within a continuum ranging from Fully True to Fully False. Hence a fuzzy logic model produces continuous results. The Environmental Evaluation Modeling System (EEMS) is a platform-independent, tree-based, fuzzy logic modeling framework. An EEMS model provides a transparent definition of an evaluation model and is commonly developed as a collaborative effort among managers, scientists, and GIS experts. Managers specify a set of evaluative propositions used to characterize the landscape. Scientists, working with managers, formulate functions that convert raw data values into truth values for the propositions and produce a logic tree to combine results into a single metric used to guide decisions. Managers, scientists, and GIS experts then work together to implement and iteratively tune the logic model and produce final results. We present examples of two successful EEMS projects that provided managers with map-based results suitable for guiding decisions: sensitivity and climate change exposure in Utah and the Colorado Plateau modeled for the Bureau of Land Management; and terrestrial ecological intactness in the Mojave and Sonoran region of southern California modeled for the Desert Renewable Energy Conservation Plan.
Serial, parallel and hierarchical decision making in primates
Zylberberg, Ariel; Lorteije, Jeannette AM; Ouellette, Brian G; De Zeeuw, Chris I; Sigman, Mariano; Roelfsema, Pieter
2017-01-01
The study of decision-making has mainly focused on isolated decisions where choices are associated with motor actions. However, problem-solving often involves considering a hierarchy of sub-decisions. In a recent study (Lorteije et al. 2015), we reported behavioral and neuronal evidence for hierarchical decision making in a task with a small decision tree. We observed a first phase of parallel evidence integration for multiple sub-decisions, followed by a phase in which the overall strategy formed. It has been suggested that a 'flat' competition between the ultimate motor actions might also explain these results. A reanalysis of the data does not support the critical predictions of flat models. We also examined the time-course of decision making in other, related tasks and report conditions where evidence integration for successive decisions is decoupled, which excludes flat models. We conclude that the flexibility of decision-making implies that the strategies are genuinely hierarchical. DOI: http://dx.doi.org/10.7554/eLife.17331.001 PMID:28648172
RE-Powering’s Electronic Decision Tree
Developed by US EPA's RE-Powering America's Land Initiative, the RE-Powering Decision Trees tool guides interested parties through a process to screen sites for their suitability for solar photovoltaics or wind installations
Toward the Decision Tree for Inferring Requirements Maturation Types
NASA Astrophysics Data System (ADS)
Nakatani, Takako; Kondo, Narihito; Shirogane, Junko; Kaiya, Haruhiko; Hori, Shozo; Katamine, Keiichi
Requirements are elicited step by step during the requirements engineering (RE) process. However, some types of requirements are elicited completely after the scheduled requirements elicitation process is finished. Such a situation is regarded as problematic situation. In our study, the difficulties of eliciting various kinds of requirements is observed by components. We refer to the components as observation targets (OTs) and introduce the word “Requirements maturation.” It means when and how requirements are elicited completely in the project. The requirements maturation is discussed on physical and logical OTs. OTs Viewed from a logical viewpoint are called logical OTs, e.g. quality requirements. The requirements of physical OTs, e.g., modules, components, subsystems, etc., includes functional and non-functional requirements. They are influenced by their requesters' environmental changes, as well as developers' technical changes. In order to infer the requirements maturation period of each OT, we need to know how much these factors influence the OTs' requirements maturation. According to the observation of actual past projects, we defined the PRINCE (Pre Requirements Intelligence Net Consideration and Evaluation) model. It aims to guide developers in their observation of the requirements maturation of OTs. We quantitatively analyzed the actual cases with their requirements elicitation process and extracted essential factors that influence the requirements maturation. The results of interviews of project managers are analyzed by WEKA, a data mining system, from which the decision tree was derived. This paper introduces the PRINCE model and the category of logical OTs to be observed. The decision tree that helps developers infer the maturation type of an OT is also described. We evaluate the tree through real projects and discuss its ability to infer the requirements maturation types.
Prediction model of critical weight loss in cancer patients during particle therapy.
Zhang, Zhihong; Zhu, Yu; Zhang, Lijuan; Wang, Ziying; Wan, Hongwei
2018-01-01
The objective of this study is to investigate the predictors of critical weight loss in cancer patients receiving particle therapy, and build a prediction model based on its predictive factors. Patients receiving particle therapy were enroled between June 2015 and June 2016. Body weight was measured at the start and end of particle therapy. Association between critical weight loss (defined as >5%) during particle therapy and patients' demographic, clinical characteristic, pre-therapeutic nutrition risk screening (NRS 2002) and BMI were evaluated by logistic regression and decision tree analysis. Finally, 375 cancer patients receiving particle therapy were included. Mean weight loss was 0.55 kg, and 11.5% of patients experienced critical weight loss during particle therapy. The main predictors of critical weight loss during particle therapy were head and neck tumour location, total radiation dose ≥70 Gy on the primary tumour, and without post-surgery, as indicated by both logistic regression and decision tree analysis. Prediction model that includes tumour locations, total radiation dose and post-surgery had a good predictive ability, with the area under receiver operating characteristic curve 0.79 (95% CI: 0.71-0.88) and 0.78 (95% CI: 0.69-0.86) for decision tree and logistic regression model, respectively. Cancer patients with head and neck tumour location, total radiation dose ≥70 Gy and without post-surgery were at higher risk of critical weight loss during particle therapy, and early intensive nutrition counselling or intervention should be target at this population. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Nikolov, Nikolai G; Dybdahl, Marianne; Jónsdóttir, Svava Ó; Wedebye, Eva B
2014-11-01
Ionization is a key factor in hERG K(+) channel blocking, and acids and zwitterions are known to be less probable hERG blockers than bases and neutral compounds. However, a considerable number of acidic compounds block hERG, and the physico-chemical attributes which discriminate acidic blockers from acidic non-blockers have not been fully elucidated. We propose a rule for prediction of hERG blocking by acids and zwitterionic ampholytes based on thresholds for only three descriptors related to acidity, size and reactivity. The training set of 153 acids and zwitterionic ampholytes was predicted with a concordance of 91% by a decision tree based on the rule. Two external validations were performed with sets of 35 and 48 observations, respectively, both showing concordances of 91%. In addition, a global QSAR model of hERG blocking was constructed based on a large diverse training set of 1374 chemicals covering all ionization classes, externally validated showing high predictivity and compared to the decision tree. The decision tree was found to be superior for the acids and zwitterionic ampholytes classes. Copyright © 2014 Elsevier Ltd. All rights reserved.
A fuzzy decision tree for fault classification.
Zio, Enrico; Baraldi, Piero; Popescu, Irina C
2008-02-01
In plant accident management, the control room operators are required to identify the causes of the accident, based on the different patterns of evolution of the monitored process variables thereby developing. This task is often quite challenging, given the large number of process parameters monitored and the intense emotional states under which it is performed. To aid the operators, various techniques of fault classification have been engineered. An important requirement for their practical application is the physical interpretability of the relationships among the process variables underpinning the fault classification. In this view, the present work propounds a fuzzy approach to fault classification, which relies on fuzzy if-then rules inferred from the clustering of available preclassified signal data, which are then organized in a logical and transparent decision tree structure. The advantages offered by the proposed approach are precisely that a transparent fault classification model is mined out of the signal data and that the underlying physical relationships among the process variables are easily interpretable as linguistic if-then rules that can be explicitly visualized in the decision tree structure. The approach is applied to a case study regarding the classification of simulated faults in the feedwater system of a boiling water reactor.
Fast Image Texture Classification Using Decision Trees
NASA Technical Reports Server (NTRS)
Thompson, David R.
2011-01-01
Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.
Assessing visual green effects of individual urban trees using airborne Lidar data.
Chen, Ziyue; Xu, Bing; Gao, Bingbo
2015-12-01
Urban trees benefit people's daily life in terms of air quality, local climate, recreation and aesthetics. Among these functions, a growing number of studies have been conducted to understand the relationship between residents' preference towards local environments and visual green effects of urban greenery. However, except for on-site photography, there are few quantitative methods to calculate green visibility, especially tree green visibility, from viewers' perspectives. To fill this research gap, a case study was conducted in the city of Cambridge, which has a diversity of tree species, sizes and shapes. Firstly, a photograph-based survey was conducted to approximate the actual value of visual green effects of individual urban trees. In addition, small footprint airborne Lidar (Light detection and ranging) data was employed to measure the size and shape of individual trees. Next, correlations between visual tree green effects and tree structural parameters were examined. Through experiments and gradual refinement, a regression model with satisfactory R2 and limited large errors is proposed. Considering the diversity of sample trees and the result of cross-validation, this model has the potential to be applied to other study sites. This research provides urban planners and decision makers with an innovative method to analyse and evaluate landscape patterns in terms of tree greenness. Copyright © 2015 Elsevier B.V. All rights reserved.
Multicriteria evaluation of simulated logging scenarios in a tropical rain forest.
Huth, Andreas; Drechsler, Martin; Köhler, Peter
2004-07-01
Forest growth models are useful tools for investigating the long-term impacts of logging. In this paper, the results of the rain forest growth model FORMIND were assessed by a multicriteria decision analysis. The main processes covered by FORMIND include tree growth, mortality, regeneration and competition. Tree growth is calculated based on a carbon balance approach. Trees compete for light and space; dying large trees fall down and create gaps in the forest. Sixty-four different logging scenarios for an initially undisturbed forest stand at Deramakot (Malaysia) were simulated. The scenarios differ regarding the logging cycle, logging method, cutting limit and logging intensity. We characterise the impacts with four criteria describing the yield, canopy opening and changes in species composition. Multicriteria decision analysis was used for the first time to evaluate the scenarios and identify the efficient ones. Our results plainly show that reduced-impact logging scenarios are more 'efficient' than the others, since in these scenarios forest damage is minimised without significantly reducing yield. Nevertheless, there is a trade-off between yield and achieving a desired ecological state of logged forest; the ecological state of the logged forests can only be improved by reducing yields and enlarging the logging cycles. Our study also demonstrates that high cutting limits or low logging intensities cannot compensate for the high level of damage caused by conventional logging techniques.
Plant Intellectual Property Transfer Mechanisms at US Universities.
ERIC Educational Resources Information Center
Price, Steven C.; Renk, Bryan Z.
2000-01-01
U.S. colleges of agriculture and technology transfer offices have historically been in conflict over the management of plant varieties. A simple model that would allow these competing systems to become integrated uses a decision tree. (Author/JOW)
Thematic and spatial resolutions affect model-based predictions of tree species distribution.
Liang, Yu; He, Hong S; Fraser, Jacob S; Wu, ZhiWei
2013-01-01
Subjective decisions of thematic and spatial resolutions in characterizing environmental heterogeneity may affect the characterizations of spatial pattern and the simulation of occurrence and rate of ecological processes, and in turn, model-based tree species distribution. Thus, this study quantified the importance of thematic and spatial resolutions, and their interaction in predictions of tree species distribution (quantified by species abundance). We investigated how model-predicted species abundances changed and whether tree species with different ecological traits (e.g., seed dispersal distance, competitive capacity) had different responses to varying thematic and spatial resolutions. We used the LANDIS forest landscape model to predict tree species distribution at the landscape scale and designed a series of scenarios with different thematic (different numbers of land types) and spatial resolutions combinations, and then statistically examined the differences of species abundance among these scenarios. Results showed that both thematic and spatial resolutions affected model-based predictions of species distribution, but thematic resolution had a greater effect. Species ecological traits affected the predictions. For species with moderate dispersal distance and relatively abundant seed sources, predicted abundance increased as thematic resolution increased. However, for species with long seeding distance or high shade tolerance, thematic resolution had an inverse effect on predicted abundance. When seed sources and dispersal distance were not limiting, the predicted species abundance increased with spatial resolution and vice versa. Results from this study may provide insights into the choice of thematic and spatial resolutions for model-based predictions of tree species distribution.
Thematic and Spatial Resolutions Affect Model-Based Predictions of Tree Species Distribution
Liang, Yu; He, Hong S.; Fraser, Jacob S.; Wu, ZhiWei
2013-01-01
Subjective decisions of thematic and spatial resolutions in characterizing environmental heterogeneity may affect the characterizations of spatial pattern and the simulation of occurrence and rate of ecological processes, and in turn, model-based tree species distribution. Thus, this study quantified the importance of thematic and spatial resolutions, and their interaction in predictions of tree species distribution (quantified by species abundance). We investigated how model-predicted species abundances changed and whether tree species with different ecological traits (e.g., seed dispersal distance, competitive capacity) had different responses to varying thematic and spatial resolutions. We used the LANDIS forest landscape model to predict tree species distribution at the landscape scale and designed a series of scenarios with different thematic (different numbers of land types) and spatial resolutions combinations, and then statistically examined the differences of species abundance among these scenarios. Results showed that both thematic and spatial resolutions affected model-based predictions of species distribution, but thematic resolution had a greater effect. Species ecological traits affected the predictions. For species with moderate dispersal distance and relatively abundant seed sources, predicted abundance increased as thematic resolution increased. However, for species with long seeding distance or high shade tolerance, thematic resolution had an inverse effect on predicted abundance. When seed sources and dispersal distance were not limiting, the predicted species abundance increased with spatial resolution and vice versa. Results from this study may provide insights into the choice of thematic and spatial resolutions for model-based predictions of tree species distribution. PMID:23861828
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beckingsal, David; Gamblin, Todd
Modern performance portability frameworks provide application developers with a flexible way to determine how to run application kernels, however, they provide no guidance as to the best configuration for a given kernel. Apollo provides a model-generation framework that, when integrated with the RAJA library, uses lightweight decision tree models to select the fastest execution configuration on a per-kernel basis
Assessment and Mapping of Forest Parcel Sizes
Brett J. Butler; Susan L. King
2005-01-01
A method for analyzing and mapping forest parcel sizes in the Northeastern United States is presented. A decision tree model was created that predicts forest parcel size from spatially explicit predictor variables: population density, State, percentage forest land cover, and road density. The model correctly predicted parcel size for 60 percent of the observations in a...
PRIA 3 Fee Determination Decision Tree
The PRIA 3 decision tree will help applicants requesting a pesticide registration or certain tolerance action to accurately identify the category of their application and the amount of the required fee before they submit the application.
Solar and Wind Site Screening Decision Trees
EPA and NREL created a decision tree to guide state and local governments and other stakeholders through a process for screening sites for their suitability for future redevelopment with solar photovoltaic (PV) energy and wind energy.
NASA Astrophysics Data System (ADS)
Wang, Jun; Chen, J. M.; Li, Manchun; Ju, Weimin
2007-06-01
As the major eligible land use activities in the Clean Development Mechanism (CDM), afforestation and reforestation offer opportunities and potential economic benefits for developing countries to participate in carbon-trade in the potential international carbon (C) sink markets. However, the design and selection of appropriate afforestation and reforestation locations in CDM are complex processes which need integrated assessment (IA) of C sequestration (CS) potential, environmental effects, and socio-economic impacts. This paper promotes the consideration of CS benefits in local land use planning and presents a GIS-based integrated assessment and spatial decision support system (IA-SDSS) to support decision-making on 'where' and 'how' to afforest. It integrates an Integrated Terrestrial Ecosystem Carbon Model (InTEC) and a GIS platform for modeling regional long-term CS potential and assessment of geo-referenced land use criteria including CS consequence, and produces ranking of plantation schemes with different tree species using the Analytic hierarchy process (AHP) method. Three land use scenarios are investigated: (i) traditional land use planning criteria without C benefits, (ii) land use for CS with low C price, and (iii) land use for CS with high price. Different scenarios and consequences will influence the weights of tree-species selection in the AHP decision process.
Kumar, Ashwani; Singh, Tiratha Raj
2017-03-01
Alzheimer's disease (AD) is a progressive, incurable and terminal neurodegenerative disorder of the brain and is associated with mutations in amyloid precursor protein, presenilin 1, presenilin 2 or apolipoprotein E, but its underlying mechanisms are still not fully understood. Healthcare sector is generating a large amount of information corresponding to diagnosis, disease identification and treatment of an individual. Mining knowledge and providing scientific decision-making for the diagnosis and treatment of disease from the clinical dataset are therefore increasingly becoming necessary. The current study deals with the construction of classifiers that can be human readable as well as robust in performance for gene dataset of AD using a decision tree. Models of classification for different AD genes were generated according to Mini-Mental State Examination scores and all other vital parameters to achieve the identification of the expression level of different proteins of disorder that may possibly determine the involvement of genes in various AD pathogenesis pathways. The effectiveness of decision tree in AD diagnosis is determined by information gain with confidence value (0.96), specificity (92 %), sensitivity (98 %) and accuracy (77 %). Besides this functional gene classification using different parameters and enrichment analysis, our finding indicates that the measures of all the gene assess in single cohorts are sufficient to diagnose AD and will help in the prediction of important parameters for other relevant assessments.
[Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].
Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao
2016-03-01
Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.
Baneshi, Mohammad Reza; Haghdoost, Ali Akbar; Zolala, Farzaneh; Nakhaee, Nouzar; Jalali, Maryam; Tabrizi, Reza; Akbari, Maryam
2017-04-01
This study aimed to assess using tree-based models the impact of different dimensions of religion and other risk factors on suicide attempts in the Islamic Republic of Iran. Three hundred patients who attempted suicide and 300 age- and sex-matched patient attendants with other types of disease who referred to Kerman Afzalipour Hospital were recruited for this study following a convenience sampling. Religiosity was assessed by the Duke University Religion Index. A tree-based model was constructed using the Gini Index as the homogeneity criterion. A complementary discrimination analysis was also applied. Variables contributing to the construction of the tree were stressful life events, mental disorder, family support, and religious belief. Strong religious belief was a protective factor for those with a low number of stressful life events and those with a high mental disorder score; 72 % of those who formed these two groups had not attempted suicide. Moreover, 63 % of those with a high number of stressful life events, strong family support, strong problem-solving skills, and a low mental disorder score were less likely to attempt suicide. The significance of four other variables, GHQ, problem-coping skills, friend support, and neuroticism, was revealed in the discrimination analysis. Religious beliefs seem to be an independent factor that can predict risk for suicidal behavior. Based on the decision tree, religious beliefs among people with a high number of stressful life events might not be a dissuading factor. Such subjects need more family support and problem-solving skills.
Bayesian Weibull tree models for survival analysis of clinico-genomic data
Clarke, Jennifer; West, Mike
2008-01-01
An important goal of research involving gene expression data for outcome prediction is to establish the ability of genomic data to define clinically relevant risk factors. Recent studies have demonstrated that microarray data can successfully cluster patients into low- and high-risk categories. However, the need exists for models which examine how genomic predictors interact with existing clinical factors and provide personalized outcome predictions. We have developed clinico-genomic tree models for survival outcomes which use recursive partitioning to subdivide the current data set into homogeneous subgroups of patients, each with a specific Weibull survival distribution. These trees can provide personalized predictive distributions of the probability of survival for individuals of interest. Our strategy is to fit multiple models; within each model we adopt a prior on the Weibull scale parameter and update this prior via Empirical Bayes whenever the sample is split at a given node. The decision to split is based on a Bayes factor criterion. The resulting trees are weighted according to their relative likelihood values and predictions are made by averaging over models. In a pilot study of survival in advanced stage ovarian cancer we demonstrate that clinical and genomic data are complementary sources of information relevant to survival, and we use the exploratory nature of the trees to identify potential genomic biomarkers worthy of further study. PMID:18618012
Predicting U.S. Army Reserve Unit Manning Using Market Demographics
2015-06-01
develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
Rajavel, Rajkumar; Thangarathinam, Mala
2015-01-01
Optimization of negotiation conflict in the cloud service negotiation framework is identified as one of the major challenging issues. This negotiation conflict occurs during the bilateral negotiation process between the participants due to the misperception, aggressive behavior, and uncertain preferences and goals about their opponents. Existing research work focuses on the prerequest context of negotiation conflict optimization by grouping similar negotiation pairs using distance, binary, context-dependent, and fuzzy similarity approaches. For some extent, these approaches can maximize the success rate and minimize the communication overhead among the participants. To further optimize the success rate and communication overhead, the proposed research work introduces a novel probabilistic decision making model for optimizing the negotiation conflict in the long-term negotiation context. This decision model formulates the problem of managing different types of negotiation conflict that occurs during negotiation process as a multistage Markov decision problem. At each stage of negotiation process, the proposed decision model generates the heuristic decision based on the past negotiation state information without causing any break-off among the participants. In addition, this heuristic decision using the stochastic decision tree scenario can maximize the revenue among the participants available in the cloud service negotiation framework. PMID:26543899
Rajavel, Rajkumar; Thangarathinam, Mala
2015-01-01
Optimization of negotiation conflict in the cloud service negotiation framework is identified as one of the major challenging issues. This negotiation conflict occurs during the bilateral negotiation process between the participants due to the misperception, aggressive behavior, and uncertain preferences and goals about their opponents. Existing research work focuses on the prerequest context of negotiation conflict optimization by grouping similar negotiation pairs using distance, binary, context-dependent, and fuzzy similarity approaches. For some extent, these approaches can maximize the success rate and minimize the communication overhead among the participants. To further optimize the success rate and communication overhead, the proposed research work introduces a novel probabilistic decision making model for optimizing the negotiation conflict in the long-term negotiation context. This decision model formulates the problem of managing different types of negotiation conflict that occurs during negotiation process as a multistage Markov decision problem. At each stage of negotiation process, the proposed decision model generates the heuristic decision based on the past negotiation state information without causing any break-off among the participants. In addition, this heuristic decision using the stochastic decision tree scenario can maximize the revenue among the participants available in the cloud service negotiation framework.
Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T
2016-05-01
Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.
Multi-modal management of acromegaly: a value perspective.
Kimmell, Kristopher T; Weil, Robert J; Marko, Nicholas F
2015-10-01
The Acromegaly Consensus Group recently released updated guidelines for medical management of acromegaly patients. We subjected these guidelines to a cost analysis. We conducted a cost analysis of the recommendations based on published efficacy rates as well as publicly available cost data. The results were compared to findings from a previously reported comparative effectiveness analysis of acromegaly treatments. Using decision tree software, two models were created based on the Acromegaly Consensus Group's recommendations and the comparative effectiveness analysis. The decision tree for the Consensus Group's recommendations was subjected to multi-way tornado analysis to identify variables that most impacted the value analysis of the decision tree. The value analysis confirmed the Consensus Group's recommendations of somatostatin analogs as first line therapy for medical management. Our model also demonstrated significant value in using dopamine agonist agents as upfront therapy as well. Sensitivity analysis identified the cost of somatostatin analogs and growth hormone receptor antagonists as having the most significant impact on the cost effectiveness of medical therapies. Our analysis confirmed the value of surgery as first-line therapy for patients with surgically accessible lesions. Surgery provides the greatest value for management of patients with acromegaly. However, in accordance with the Acromegaly Consensus Group's recent recommendations, somatostatin analogs provide the greatest value and should be used as first-line therapy for patients who cannot be managed surgically. At present, the substantial cost is the most significant negative factor in the value of medical therapies for acromegaly.
Allergenic potential of novel foods.
Meredith, Clive
2005-11-01
Concerns have been expressed that the introduction of novel foods into the diet might lead to the development of new food allergies in consumers. Novel foods can be conveniently divided into GM and non-GM categories. Decision-tree approaches (e.g. International Life Sciences Institute-International Food Biotechnology Council and WHO/FAO) to assess the allergenic potential of GM foods were developed following the discovery, during product development, of the allergenic potential of GM soyabean expressing a gene encoding a storage protein from Brazil nut (Bertolletia excelsa). Within these decision trees considerations include: the source of the transgene; amino acid homology with known allergens; cross-reactivity with IgE from food-allergic individuals; resistance to proteolysis; prediction using animal models of food allergy. Such decision trees are under constant review as new knowledge and improved models emerge, but they provide a useful framework for the assessment of the allergenic potential of GM foods. For novel non-GM foods the assessment of allergenic potential is more subjective; some foods or food ingredients will need no assessment other than a robust protein assay to demonstrate the absence of protein. Where protein is present in the novel non-GM food, hazard and risk assessments need to be made in terms of the quantity of protein that might be consumed, the identity of individual protein components and their relationships to known food allergens. Where necessary, this assessment would extend to serum screening for potential cross-reactivities, skin-prick tests in previously-sensitised individuals and double-blind placebo-controlled food challenges.
Tree-, stand- and site-specific controls on landscape-scale patterns of transpiration
NASA Astrophysics Data System (ADS)
Hassler, Sibylle; Markus, Weiler; Theresa, Blume
2017-04-01
Transpiration is a key process in the hydrological cycle and a sound understanding and quantification of transpiration and its spatial variability is essential for management decisions as well as for improving the parameterisation of hydrological and soil-vegetation-atmosphere transfer models. For individual trees, transpiration is commonly estimated by measuring sap flow. Besides evaporative demand and water availability, tree-specific characteristics such as species, size or social status control sap flow amounts of individual trees. Within forest stands, properties such as species composition, basal area or stand density additionally affect sap flow, for example via competition mechanisms. Finally, sap flow patterns might also be influenced by landscape-scale characteristics such as geology, slope position or aspect because they affect water and energy availability; however, little is known about the dynamic interplay of these controls. We studied the relative importance of various tree-, stand- and site-specific characteristics with multiple linear regression models to explain the variability of sap velocity measurements in 61 beech and oak trees, located at 24 sites spread over a 290 km2-catchment in Luxembourg. For each of 132 consecutive days of the growing season of 2014 we modelled the daily sap velocities of these 61 trees and determined the importance of the different predictors. Results indicate that a combination of tree-, stand- and site-specific factors controls sap velocity patterns in the landscape, namely tree species, tree diameter, the stand density, geology and aspect. Compared to these predictors, spatial variability of atmospheric demand and soil moisture explains only a small fraction of the variability in the daily datasets. However, the temporal dynamics of the explanatory power of the tree-specific characteristics, especially species, are correlated to the temporal dynamics of potential evaporation. Thus, transpiration estimates at the landscape scale would benefit from not only considering hydro-meteorological drivers, but also including tree, stand and site characteristics in order to improve the spatial representation of transpiration for hydrological and soil-vegetation-atmosphere transfer models.
Why do verification and validation?
Hu, Kenneth T.; Paez, Thomas L.
2016-02-19
In this discussion paper, we explore different ways to assess the value of verification and validation (V&V) of engineering models. We first present a literature review on the value of V&V and then use value chains and decision trees to show how value can be assessed from a decision maker's perspective. In this context, the value is what the decision maker is willing to pay for V&V analysis with the understanding that the V&V results are uncertain. As a result, the 2014 Sandia V&V Challenge Workshop is used to illustrate these ideas.
Risk-Based Prioritization of Research for Aviation Security Using Logic-Evolved Decision Analysis
NASA Technical Reports Server (NTRS)
Eisenhawer, S. W.; Bott, T. F.; Sorokach, M. R.; Jones, F. P.; Foggia, J. R.
2004-01-01
The National Aeronautics and Space Administration is developing advanced technologies to reduce terrorist risk for the air transportation system. Decision support tools are needed to help allocate assets to the most promising research. An approach to rank ordering technologies (using logic-evolved decision analysis), with risk reduction as the metric, is presented. The development of a spanning set of scenarios using a logic-gate tree is described. Baseline risk for these scenarios is evaluated with an approximate reasoning model. Illustrative risk and risk reduction results are presented.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Using CART to Identify Thresholds and Hierarchies in the Determinants of Funding Decisions.
Schilling, Chris; Mortimer, Duncan; Dalziel, Kim
2017-02-01
There is much interest in understanding decision-making processes that determine funding outcomes for health interventions. We use classification and regression trees (CART) to identify cost-effectiveness thresholds and hierarchies in the determinants of funding decisions. The hierarchical structure of CART is suited to analyzing complex conditional and nonlinear relationships. Our analysis uncovered hierarchies where interventions were grouped according to their type and objective. Cost-effectiveness thresholds varied markedly depending on which group the intervention belonged to: lifestyle-type interventions with a prevention objective had an incremental cost-effectiveness threshold of $2356, suggesting that such interventions need to be close to cost saving or dominant to be funded. For lifestyle-type interventions with a treatment objective, the threshold was much higher at $37,024. Lower down the tree, intervention attributes such as the level of patient contribution and the eligibility for government reimbursement influenced the likelihood of funding within groups of similar interventions. Comparison between our CART models and previously published results demonstrated concurrence with standard regression techniques while providing additional insights regarding the role of the funding environment and the structure of decision-maker preferences.
Song, Xiang; Zeng, Xiaodong
2017-02-01
The climate has important influences on the distribution and structure of forest ecosystems, which may lead to vital feedback to climate change. However, much of the existing work focuses on the changes in carbon fluxes or water cycles due to climate change and/or atmospheric CO 2 , and few studies have considered how and to what extent climate change and CO 2 influence the ecosystem structure (e.g., fractional coverage change) and the changes in the responses of ecosystems with different characteristics. In this work, two dynamic global vegetation models (DGVMs): IAP-DGVM coupled with CLM3 and CLM4-CNDV, were used to investigate the response of the forest ecosystem structure to changes in climate (temperature and precipitation) and CO 2 concentration. In the temperature sensitivity tests, warming reduced the global area-averaged ecosystem gross primary production in the two models, which decreased global forest area. Furthermore, the changes in tree fractional coverage (Δ F tree ; %) from the two models were sensitive to the regional temperature and ecosystem structure, i.e., the mean annual temperature (MAT; °C) largely determined whether Δ F tree was positive or negative, while the tree fractional coverage ( F tree ; %) played a decisive role in the amplitude of Δ F tree around the globe, and the dependence was more remarkable in IAP-DGVM. In cases with precipitation change, F tree had a uniformly positive relationship with precipitation, especially in the transition zones of forests (30% < F tree < 60%) for IAP-DGVM and in semiarid and arid regions for CLM4-CNDV. Moreover, Δ F tree had a stronger dependence on F tree than on the mean annual precipitation (MAP; mm/year). It was also demonstrated that both models captured the fertilization effects of the CO 2 concentration.
2013-05-01
specifics of the correlation will be explored followed by discussion of new paradigms— the ordered event list (OEL) and the decision tree — that result from...4.2.1 Brief Overview of the Decision Tree Paradigm ................................................15 4.2.2 OEL Explained...6 Figure 3. A depiction of a notional fault/activation tree . ................................................................7
Du, Hua Qiang; Sun, Xiao Yan; Han, Ning; Mao, Fang Jie
2017-10-01
By synergistically using the object-based image analysis (OBIA) and the classification and regression tree (CART) methods, the distribution information, the indexes (including diameter at breast, tree height, and crown closure), and the aboveground carbon storage (AGC) of moso bamboo forest in Shanchuan Town, Anji County, Zhejiang Province were investigated. The results showed that the moso bamboo forest could be accurately delineated by integrating the multi-scale ima ge segmentation in OBIA technique and CART, which connected the image objects at various scales, with a pretty good producer's accuracy of 89.1%. The investigation of indexes estimated by regression tree model that was constructed based on the features extracted from the image objects reached normal or better accuracy, in which the crown closure model archived the best estimating accuracy of 67.9%. The estimating accuracy of diameter at breast and tree height was relatively low, which was consistent with conclusion that estimating diameter at breast and tree height using optical remote sensing could not achieve satisfactory results. Estimation of AGC reached relatively high accuracy, and accuracy of the region of high value achieved above 80%.
Montorsi, Francesco; Oelke, Matthias; Henneges, Carsten; Brock, Gerald; Salonia, Andrea; d'Anzeo, Gianluca; Rossi, Andrea; Mulhall, John P; Büttner, Hartwig
2016-09-01
Understanding predictors for the recovery of erectile function (EF) after nerve-sparing radical prostatectomy (nsRP) might help clinicians and patients in preoperative counseling and expectation management of EF rehabilitation strategies. To describe the effect of potential predictors on EF recovery after nsRP by post hoc decision-tree modeling of data from A Study of Tadalafil After Radical Prostatectomy (REACTT). Randomized double-blind double-dummy placebo-controlled trial in 423 men aged <68 yr with adenocarcinoma of the prostate (Gleason ≤7, normal preoperative EF) who underwent nsRP at 50 centers from nine European countries and Canada. Postsurgery 1:1:1 randomization to 9-mo double-blind treatment with tadalafil 5mg once a day (OaD), tadalafil 20mg on demand, or placebo, followed by a 6-wk drug-free-washout, and a 3-mo open-label tadalafil OaD treatment. Three decision-tree models, using the International Index of Erectile Function-Erectile Function (IIEF-EF) domain score at the end of double-blind treatment, washout, and open-label treatment as response variable. Each model evaluated the association between potential predictors: presurgery IIEF domain and IIEF single-item scores, surgical approach, nerve-sparing score (NSS), and postsurgery randomized treatment group. The first decision-tree model (n=422, intention-to-treat population) identified high presurgery sexual desire (IIEF item 12: ≥3.5 and <3.5) as the key predictor for IIEF-EF at the end of double-blind treatment (mean IIEF-EF: 14.9 and 11.1), followed by high confidence to get and maintain an erection (IIEF item 15: ≥3.5 and <3.5; IIEF-EF: 15.4 and 7.1). For patients meeting these criteria, additional non-IIEF-related predictors included robot-assisted laparoscopic surgery (yes or no; IIEF-EF: 19.3 and 12.6), quality of nerve sparing (NSS: <2.5 and ≥2.5; IIEF-EF: 14.3 and 10.5), and treatment with tadalafil OaD (yes and no; IIEF-EF: 17.6 and 14.3). Additional analyses after washout and open-label treatment identified high presurgery intercourse satisfaction as the key predictor. Exploratory decision-tree analyses identified high presurgery sexual desire, confidence, and intercourse satisfaction as key predictors for EF recovery. Patients meeting these criteria might benefit the most from conserving surgery and early postsurgery EF rehabilitation. Strategies for improving EF after surgery should be discussed preoperatively with all patients; this information may support expectation management for functional recovery on an individual patient level. Understanding how patient characteristics and different treatment options affect the recovery of erectile function (EF) after radical surgery for prostate cancer might help physicians select the optimal treatment for their patients. This analysis of data from a clinical trial suggested that high presurgery sexual desire, sexual confidence, and intercourse satisfaction are key factors predicting EF recovery. Patients meeting these criteria might benefit the most from conserving surgery (robot-assisted surgery, perfect nerve sparing) and postsurgery medical rehabilitation of EF. ClinicalTrials.gov, NCT01026818. Copyright © 2016. Published by Elsevier B.V.
Analysis of driver merging behavior at lane drops on freeways.
DOT National Transportation Integrated Search
2013-12-01
Lane changing assistance systems advise drivers on safe gaps for making mandatory lane changes at lane drops. In this : study, such a system was developed using a Bayes classifier and a decision tree to model lane changes. Detailed vehicle : trajecto...
Monte Carlo Simulation of Effective Coordination Mechanisms for e-Commerce
NASA Astrophysics Data System (ADS)
Sakas, D. P.; Vlachos, D. S.; Simos, T. E.
2008-11-01
Making decisions in a dynamic environment is considered extremely important in today's market. Decision trees which can be used to model these systems, are not easily constructed and solved, especially in the case of infinite sets of consequences (for example, consider the case where only the mean and the variance of an outcome is known). In this work, discrete approximation and Monte Carlo techniques are used to overcome the aforementioned difficulties.
Capel, Paul D.; Wolock, David M.; Coupe, Richard H.; Roth, Jason L.
2018-01-10
Agricultural activities can affect water quality and the health of aquatic ecosystems; many water-quality issues originate with the movement of water, agricultural chemicals, and eroded soil from agricultural areas to streams and groundwater. Most agricultural activities are designed to sustain or increase crop production, while some are designed to protect soil and water resources. Numerous soil- and water-protection practices are designed to reduce the volume and velocity of runoff and increase infiltration. This report presents a conceptual framework that combines generalized concepts on the movement of water, the environmental behavior of chemicals and eroded soil, and the designed functions of various agricultural activities, as they relate to hydrology, to create attainable expectations for the protection of—with the goal of improving—water quality through changes in an agricultural activity.The framework presented uses two types of decision trees to guide decision making toward attainable expectations regarding the effectiveness of changing agricultural activities to protect and improve water quality in streams. One decision tree organizes decision making by considering the hydrologic setting and chemical behaviors, largely at the field scale. This decision tree can help determine which agricultural activities could effectively protect and improve water quality in a stream from the movement of chemicals, or sediment, from a field. The second decision tree is a chemical fate accounting tree. This decision tree helps set attainable expectations for the permanent removal of sediment, elements, and organic chemicals—such as herbicides and insecticides—through trapping or conservation tillage practices. Collectively, this conceptual framework consolidates diverse hydrologic settings, chemicals, and agricultural activities into a single, broad context that can be used to set attainable expectations for agricultural activities. This framework also enables better decision making for future agricultural activities as a means to reduce current, and prevent new, water-quality issues.
Career Path Suggestion using String Matching and Decision Trees
NASA Astrophysics Data System (ADS)
Nagpal, Akshay; P. Panda, Supriya
2015-05-01
High school and college graduates seemingly are often battling for the courses they should major in order to achieve their target career. In this paper, we worked on suggesting a career path to a graduate to reach his/her dream career given the current educational status. Firstly, we collected the career data of professionals and academicians from various career fields and compiled the data set by using the necessary information from the data. Further, this was used as the basis to suggest the most appropriate career path for the person given his/her current educational status. Decision trees and string matching algorithms were employed to suggest the appropriate career path for a person. Finally, an analysis of the result has been done directing to further improvements in the model.
Multi-Agent simulation of generation capacity expansion decisions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Botterud, A.; Mahalik, M.; Conzelmann, G.
2008-01-01
In this paper, we use a multi-agent simulation model, EMCAS, to analyze generation expansion in the Iberian electricity market. The expansion model simulates generation investment decisions of decentralized generating companies (GenCos) interacting in a complex, multidimensional environment. A probabilistic dispatch algorithm calculates prices and profits for new candidate units in different future states of the system. Uncertainties in future load, hydropower conditions, and competitorspsila actions are represented in a scenario tree, and decision analysis is used to identify the optimal expansion decision for each individual GenCo. We run the model using detailed data for the Iberian market. In a scenariomore » analysis, we look at the impact of market design variables, such as the energy price cap and carbon emission prices. We also analyze how market concentration and GenCospsila risk preferences influence the timing and choice of new generating capacity.« less
Vlsi implementation of flexible architecture for decision tree classification in data mining
NASA Astrophysics Data System (ADS)
Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak
2017-07-01
The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.
Scalable Regression Tree Learning on Hadoop using OpenPlanet
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yin, Wei; Simmhan, Yogesh; Prasanna, Viktor
As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework usingmore » a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.« less
Using decision tree analysis to identify risk factors for relapse to smoking
Piper, Megan E.; Loh, Wei-Yin; Smith, Stevens S.; Japuntich, Sandra J.; Baker, Timothy B.
2010-01-01
This research used classification tree analysis and logistic regression models to identify risk factors related to short- and long-term abstinence. Baseline and cessation outcome data from two smoking cessation trials, conducted from 2001 to 2002, in two Midwestern urban areas, were analyzed. There were 928 participants (53.1% women, 81.8% white) with complete data. Both analyses suggest that relapse risk is produced by interactions of risk factors and that early and late cessation outcomes reflect different vulnerability factors. The results illustrate the dynamic nature of relapse risk and suggest the importance of efficient modeling of interactions in relapse prediction. PMID:20397871
Model-Based Design of Tree WSNs for Decentralized Detection.
Tantawy, Ashraf; Koutsoukos, Xenofon; Biswas, Gautam
2015-08-20
The classical decentralized detection problem of finding the optimal decision rules at the sensor and fusion center, as well as variants that introduce physical channel impairments have been studied extensively in the literature. The deployment of WSNs in decentralized detection applications brings new challenges to the field. Protocols for different communication layers have to be co-designed to optimize the detection performance. In this paper, we consider the communication network design problem for a tree WSN. We pursue a system-level approach where a complete model for the system is developed that captures the interactions between different layers, as well as different sensor quality measures. For network optimization, we propose a hierarchical optimization algorithm that lends itself to the tree structure, requiring only local network information. The proposed design approach shows superior performance over several contentionless and contention-based network design approaches.
Bariatric Outcomes and Obesity Modeling: Study Meeting
2010-09-17
to obesity. 15. SUBJECT TERMS Bariatric Surgery , Cost Effectiveness, Surgical Outcome 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF a. REPORT...EFFECTIVENESS MODEL OVERVIEW Two parts: 1) Decision Tree and 2) Natural History Model Results: Bariatric Surgery is cost-effective compared to no...9,300 for AGB $10,600 for LRYGB AGB: Adjustable gastric banding LRYGB: laparoscopic Roux-en-Y gastric bypass A Financial Model of Bariatric Surgery for
Russell A. Parsons; William Mell; Peter McCauley
2010-01-01
Crown fire poses challenges to fire managers and can endanger fire fighters. Understanding of how fire interacts with tree crowns is essential to informed decisions about crown fire. Current operational crown fire predictions in the United States assume homogeneous crown fuels. While a new class of research fire models, which model fire behavior with computational...
Kaufmann, Liane; Huber, Stefan; Mayer, Daniel; Moeller, Korbinian; Marksteiner, Josef
2018-04-01
Adverse effects of heavy drinking on cognition have frequently been reported. In the present study, we systematically examined for the first time whether clinical neuropsychological assessments may be sensitive to alcohol abuse in elderly patients with suspected minor neurocognitive disorder. A total of 144 elderly with and without alcohol abuse (each group n=72; mean age 66.7 years) were selected from a patient pool of n=738 by applying propensity score matching (a statistical method allowing to match participants in experimental and control group by balancing various covariates to reduce selection bias). Accordingly, study groups were almost perfectly matched regarding age, education, gender, and Mini Mental State Examination score. Neuropsychological performance was measured using the CERAD (Consortium to Establish a Registry for Alzheimer's Disease). Classification analyses (i.e., decision tree and boosted trees models) were conducted to examine whether CERAD variables or total score contributed to group classification. Decision tree models disclosed that groups could be reliably classified based on the CERAD variables "Word List Discriminability" (tapping verbal recognition memory, 64% classification accuracy) and "Trail Making Test A" (measuring visuo-motor speed, 59% classification accuracy). Boosted tree analyses further indicated the sensitivity of "Word List Recall" (measuring free verbal recall) for discriminating elderly with versus without a history of alcohol abuse. This indicates that specific CERAD variables seem to be sensitive to alcohol-related cognitive dysfunctions in elderly patients with suspected minor neurocognitive disorder. (JINS, 2018, 24, 360-371).
Prediction of adverse drug reactions using decision tree modeling.
Hammann, F; Gutmann, H; Vogt, N; Helma, C; Drewe, J
2010-07-01
Drug safety is of great importance to public health. The detrimental effects of drugs not only limit their application but also cause suffering in individual patients and evoke distrust of pharmacotherapy. For the purpose of identifying drugs that could be suspected of causing adverse reactions, we present a structure-activity relationship analysis of adverse drug reactions (ADRs) in the central nervous system (CNS), liver, and kidney, and also of allergic reactions, for a broad variety of drugs (n = 507) from the Swiss drug registry. Using decision tree induction, a machine learning method, we determined the chemical, physical, and structural properties of compounds that predispose them to causing ADRs. The models had high predictive accuracies (78.9-90.2%) for allergic, renal, CNS, and hepatic ADRs. We show the feasibility of predicting complex end-organ effects using simple models that involve no expensive computations and that can be used (i) in the selection of the compound during the drug discovery stage, (ii) to understand how drugs interact with the target organ systems, and (iii) for generating alerts in postmarketing drug surveillance and pharmacovigilance.
The Utility of Decision Trees in Oncofertility Care in Japan.
Ito, Yuki; Shiraishi, Eriko; Kato, Atsuko; Haino, Takayuki; Sugimoto, Kouhei; Okamoto, Aikou; Suzuki, Nao
2017-03-01
To identify the utility and issues associated with the use of decision trees in oncofertility patient care in Japan. A total of 35 women who had been diagnosed with cancer, but had not begun anticancer treatment, were enrolled. We applied the oncofertility decision tree for women published by Gardino et al. to counsel a consecutive series of women on fertility preservation (FP) options following cancer diagnosis. Percentage of women who decided to undergo oocyte retrieval for embryo cryopreservation and the expected live-birth rate for these patients were calculated using the following equation: expected live-birth rate = pregnancy rate at each age per embryo transfer × (1 - miscarriage rate) × No. of cryopreserved embryos. Oocyte retrieval was performed for 17 patients (48.6%; mean ± standard deviation [SD] age, 36.35 ± 3.82 years). The mean ± SD number of cryopreserved embryos was 5.29 ± 4.63. The expected live-birth rate was 0.66. The expected live-birth rate with FP indicated that one in three oncofertility patients would not expect to have a live birth following oocyte retrieval and embryo cryopreservation. While the decision trees were useful as decision-making tools for women contemplating FP, in the context of the current restrictions on oocyte donation and the extremely small number of adoptions in Japan, the remaining options for fertility after cancer are limited. In order for cancer survivors to feel secure in their decisions, the decision tree may need to be adapted simultaneously with improvements to the social environment, such as greater support for adoption.
NASA Astrophysics Data System (ADS)
Lev, S. M.; Gallo, J.
2017-12-01
The international Arctic scientific community has identified the need for a sustained and integrated portfolio of pan-Arctic Earth-observing systems. In 2017, an international effort was undertaken to develop the first ever Value Tree framework for identifying common research and operational objectives that rely on Earth observation data derived from Earth-observing systems, sensors, surveys, networks, models, and databases to deliver societal benefits in the Arctic. A Value Tree Analysis is a common tool used to support decision making processes and is useful for defining concepts, identifying objectives, and creating a hierarchical framework of objectives. A multi-level societal benefit area value tree establishes the connection from societal benefits to the set of observation inputs that contribute to delivering those benefits. A Value Tree that relies on expert domain knowledge from Arctic and non-Arctic nations, international researchers, Indigenous knowledge holders, and other experts to develop a framework to serve as a logical and interdependent decision support tool will be presented. Value tree examples that map the contribution of Earth observations in the Arctic to achieving societal benefits will be presented in the context of the 2017 International Arctic Observations Assessment Framework. These case studies will highlight specific observing products and capability groups where investment is needed to contribute to the development of a sustained portfolio of Arctic observing systems.
Briggs, Andrew H; Ades, A E; Price, Martin J
2003-01-01
In structuring decision models of medical interventions, it is commonly recommended that only 2 branches be used for each chance node to avoid logical inconsistencies that can arise during sensitivity analyses if the branching probabilities do not sum to 1. However, information may be naturally available in an unconditional form, and structuring a tree in conditional form may complicate rather than simplify the sensitivity analysis of the unconditional probabilities. Current guidance emphasizes using probabilistic sensitivity analysis, and a method is required to provide probabilistic probabilities over multiple branches that appropriately represents uncertainty while satisfying the requirement that mutually exclusive event probabilities should sum to 1. The authors argue that the Dirichlet distribution, the multivariate equivalent of the beta distribution, is appropriate for this purpose and illustrate its use for generating a fully probabilistic transition matrix for a Markov model. Furthermore, they demonstrate that by adopting a Bayesian approach, the problem of observing zero counts for transitions of interest can be overcome.
Simulation of California's Major Reservoirs Outflow Using Data Mining Technique
NASA Astrophysics Data System (ADS)
Yang, T.; Gao, X.; Sorooshian, S.
2014-12-01
The reservoir's outflow is controlled by reservoir operators, which is different from the upstream inflow. The outflow is more important than the reservoir's inflow for the downstream water users. In order to simulate the complicated reservoir operation and extract the outflow decision making patterns for California's 12 major reservoirs, we build a data-driven, computer-based ("artificial intelligent") reservoir decision making tool, using decision regression and classification tree approach. This is a well-developed statistical and graphical modeling methodology in the field of data mining. A shuffled cross validation approach is also employed to extract the outflow decision making patterns and rules based on the selected decision variables (inflow amount, precipitation, timing, water type year etc.). To show the accuracy of the model, a verification study is carried out comparing the model-generated outflow decisions ("artificial intelligent" decisions) with that made by reservoir operators (human decisions). The simulation results show that the machine-generated outflow decisions are very similar to the real reservoir operators' decisions. This conclusion is based on statistical evaluations using the Nash-Sutcliffe test. The proposed model is able to detect the most influential variables and their weights when the reservoir operators make an outflow decision. While the proposed approach was firstly applied and tested on California's 12 major reservoirs, the method is universally adaptable to other reservoir systems.
Suchetana, Bihu; Rajagopalan, Balaji; Silverstein, JoAnn
2017-11-15
A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Copyright © 2017 Elsevier B.V. All rights reserved.
How trees allocate carbon for optimal growth: insight from a game-theoretic model.
Fu, Liyong; Sun, Lidan; Han, Hao; Jiang, Libo; Zhu, Sheng; Ye, Meixia; Tang, Shouzheng; Huang, Minren; Wu, Rongling
2017-02-01
How trees allocate photosynthetic products to primary height growth and secondary radial growth reflects their capacity to best use environmental resources. Despite substantial efforts to explore tree height-diameter relationship empirically and through theoretical modeling, our understanding of the biological mechanisms that govern this phenomenon is still limited. By thinking of stem woody biomass production as an ecological system of apical and lateral growth components, we implement game theory to model and discern how these two components cooperate symbiotically with each other or compete for resources to determine the size of a tree stem. This resulting allometry game theory is further embedded within a genetic mapping and association paradigm, allowing the genetic loci mediating the carbon allocation of stemwood growth to be characterized and mapped throughout the genome. Allometry game theory was validated by analyzing a mapping data of stem height and diameter growth over perennial seasons in a poplar tree. Several key quantitative trait loci were found to interpret the process and pattern of stemwood growth through regulating the ecological interactions of stem apical and lateral growth. The application of allometry game theory enables the prediction of the situations in which the cooperation, competition or altruism is an optimal decision of a tree to fully use the environmental resources it owns. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Gantner, Melisa E; Peroni, Roxana N; Morales, Juan F; Villalba, María L; Ruiz, María E; Talevi, Alan
2017-08-28
Breast Cancer Resistance Protein (BCRP) is an ATP-dependent efflux transporter linked to the multidrug resistance phenomenon in many diseases such as epilepsy and cancer and a potential source of drug interactions. For these reasons, the early identification of substrates and nonsubstrates of this transporter during the drug discovery stage is of great interest. We have developed a computational nonlinear model ensemble based on conformational independent molecular descriptors using a combined strategy of genetic algorithms, J48 decision tree classifiers, and data fusion. The best model ensemble consists in averaging the ranking of the 12 decision trees that showed the best performance on the training set, which also demonstrated a good performance for the test set. It was experimentally validated using the ex vivo everted rat intestinal sac model. Five anticonvulsant drugs classified as nonsubstrates for BRCP by the model ensemble were experimentally evaluated, and none of them proved to be a BCRP substrate under the experimental conditions used, thus confirming the predictive ability of the model ensemble. The model ensemble reported here is a potentially valuable tool to be used as an in silico ADME filter in computer-aided drug discovery campaigns intended to overcome BCRP-mediated multidrug resistance issues and to prevent drug-drug interactions.
C-fuzzy variable-branch decision tree with storage and classification error rate constraints
NASA Astrophysics Data System (ADS)
Yang, Shiueng-Bien
2009-10-01
The C-fuzzy decision tree (CFDT), which is based on the fuzzy C-means algorithm, has recently been proposed. The CFDT is grown by selecting the nodes to be split according to its classification error rate. However, the CFDT design does not consider the classification time taken to classify the input vector. Thus, the CFDT can be improved. We propose a new C-fuzzy variable-branch decision tree (CFVBDT) with storage and classification error rate constraints. The design of the CFVBDT consists of two phases-growing and pruning. The CFVBDT is grown by selecting the nodes to be split according to the classification error rate and the classification time in the decision tree. Additionally, the pruning method selects the nodes to prune based on the storage requirement and the classification time of the CFVBDT. Furthermore, the number of branches of each internal node is variable in the CFVBDT. Experimental results indicate that the proposed CFVBDT outperforms the CFDT and other methods.
A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem
Liu, Dong-sheng; Fan, Shu-jiang
2014-01-01
In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389
Planning effectiveness may grow on fault trees.
Chow, C W; Haddad, K; Mannino, B
1991-10-01
The first step of a strategic planning process--identifying and analyzing threats and opportunities--requires subjective judgments. By using an analytical tool known as a fault tree, healthcare administrators can reduce the unreliability of subjective decision making by creating a logical structure for problem solving and decision making. A case study of 11 healthcare administrators showed that an analysis technique called prospective hindsight can add to a fault tree's ability to improve a strategic planning process.
Evapotranspiration of the urban forest at the municipal scale in Los Angeles, CA
NASA Astrophysics Data System (ADS)
Litvak, E.; Pataki, D. E.
2015-12-01
The severest drought on record in southern California and predictions of continued water shortages make it essential to understand urban water use. However, urban evapotranspiration (ET), which is an important part of municipal water budgets, remains a major uncertainty. Urban ET is difficult to measure and model, particularly in cities with diverse plant composition. The city of Los Angeles contains more than 6 million trees, most of which are non-natives that originate from multiple geographic regions, which further complicates predictions of urban forest transpiration. Previously, we made extensive in situ measurements of tree transpiration and turfgrass ET in greater Los Angeles area. Here, we utilize these data to systematize transpiration of different tree species based on physiological mechanisms underlying plant water relations. The resulting empirical model estimates Los Angeles urban forest ET from easy-to-collect plant characteristics and freely available environmental parameters. Plant characteristics are tree diameter, wood type (e.g. coniferous), phenological type (e.g. evergreen) and plant composition. Environmental parameters are vapor pressure deficit of the air, incoming solar radiation and reference ET (all available at http://cimis.water.ca.gov). By combining this model with existing surveys of urban trees in Los Angeles, we estimated that citywide ET of irrigated landscapes varies from 1.2 ± 0.5 mm/d in winter to 2.8 ± 1.1 mm/d in summer. On average, trees and turfgrass contributed 27% and 73% to total tree+turfgrass ET, correspondingly. To our knowledge, this model provides the first citywide estimates of Los Angeles ET differentiated by wood types and plant composition. These results will inform decision makers about species-specific water use by urban trees and assist with determining landscape designs that are beneficial for water conservation. This model may also be incorporated into a regional hydrologic model to provide spatially resolved ET at the municipal scale.
Can Sap Flow Help Us to Better Understand Transpiration Patterns in Landscapes?
NASA Astrophysics Data System (ADS)
Hassler, S. K.; Weiler, M.; Blume, T.
2017-12-01
Transpiration is a key process in the hydrological cycle and a sound understanding and quantification of transpiration and its spatial variability is essential for management decisions and for improving the parameterisation of hydrological and soil-vegetation-atmosphere transfer models. At the tree scale, transpiration is commonly estimated by measuring sap flow. Besides evaporative demand and water availability, tree-specific characteristics such as species, size or social status, stand-specific characteristics such as basal area or stand density and site-specific characteristics such as geology, slope position or aspect control sap flow of individual trees. However, little is known about the relative importance or the dynamic interplay of these controls. We studied these influences with multiple linear regression models to explain the variability of sap velocity measurements in 61 beech and oak trees, located at 24 sites spread over a 290 km²-catchment in Luxembourg. For each of 132 consecutive days of the growing season of 2014 we applied linear models to the daily spatial pattern of sap velocity and determined the importance of the different predictors. By upscaling sap velocities to the tree level with the help of species-dependent empirical estimates for sapwood area we also examined patterns of sap flow as a more direct representation of transpiration. Results indicate that a combination of mainly tree- and site-specific factors controls sap velocity patterns in this landscape, namely tree species, tree diameter, geology and aspect. For sap flow, the site-specific predictors provided the largest contribution to the explained variance, however, in contrast to the sap velocity analysis, geology was more important than aspect. Spatial variability of atmospheric demand and soil moisture explained only a small fraction of the variance. However, the temporal dynamics of the explanatory power of the tree-specific characteristics, especially species, were correlated to the temporal dynamics of potential evaporation. We conclude that spatial representation of transpiration in models could benefit from including patterns according to tree and site characteristics.
Decision trees for the analysis of genes involved in Alzheimer's disease pathology.
Mestizo Gutiérrez, Sonia L; Herrera Rivero, Marisol; Cruz Ramírez, Nicandro; Hernández, Elena; Aranda-Abreu, Gonzalo E
2014-09-21
Alzheimer's disease (AD) is characterized by a gradual loss of memory, orientation, judgement and language. There is still no cure for this disorder. AD pathogenesis remains fairly unknown and its underlying molecular mechanisms are not yet fully understood. Several studies have shown that the abnormal accumulation of beta-amyloid and tau proteins occurs 10 to 20 years before the onset of symptoms of the disease, so it is extremely important to identify changes in the brain before the first symptoms. We used decision trees to classify 31 individuals (9 healthy controls and 22 AD patients in three different stages of disease) according to the expression of 69 genes previously reported in a meta-analysis, plus the expression levels of APP, APOE, BACE1, NCSTN, PSEN1, PSEN2 and MAPT. We also included in our analysis the MMSE (Mini-Mental State Examination) scores and number of NFT (neurofibrillary tangles). Results allowed us to generate a model of classification values for different AD stages of severity, according to MMSE scores, and achieve the identification of the expression level of protein tau that may possibly determine the onset (incipient stage) of AD. We used decision trees to model the different stages of AD (severe, moderate, incipient and control) based on the meta-analysis of gene expression levels plus MMSE and NFT scores. Both classifiers reported the variable MMSE as most informative, however it we were found that the protein tau also an important role in the onset of AD. Copyright © 2014 Elsevier Ltd. All rights reserved.
Heart rate time series characteristics for early detection of infections in critically ill patients.
Tambuyzer, T; Guiza, F; Boonen, E; Meersseman, P; Vervenne, H; Hansen, T K; Bjerre, M; Van den Berghe, G; Berckmans, D; Aerts, J M; Meyfroidt, G
2017-04-01
It is difficult to make a distinction between inflammation and infection. Therefore, new strategies are required to allow accurate detection of infection. Here, we hypothesize that we can distinguish infected from non-infected ICU patients based on dynamic features of serum cytokine concentrations and heart rate time series. Serum cytokine profiles and heart rate time series of 39 patients were available for this study. The serum concentration of ten cytokines were measured using blood sampled every 10 min between 2100 and 0600 hours. Heart rate was recorded every minute. Ten metrics were used to extract features from these time series to obtain an accurate classification of infected patients. The predictive power of the metrics derived from the heart rate time series was investigated using decision tree analysis. Finally, logistic regression methods were used to examine whether classification performance improved with inclusion of features derived from the cytokine time series. The AUC of a decision tree based on two heart rate features was 0.88. The model had good calibration with 0.09 Hosmer-Lemeshow p value. There was no significant additional value of adding static cytokine levels or cytokine time series information to the generated decision tree model. The results suggest that heart rate is a better marker for infection than information captured by cytokine time series when the exact stage of infection is not known. The predictive value of (expensive) biomarkers should always be weighed against the routinely monitored data, and such biomarkers have to demonstrate added value.
A Signal-Detection Analysis of Fast-and-Frugal Trees
ERIC Educational Resources Information Center
Luan, Shenghua; Schooler, Lael J.; Gigerenzer, Gerd
2011-01-01
Models of decision making are distinguished by those that aim for an optimal solution in a world that is precisely specified by a set of assumptions (a so-called "small world") and those that aim for a simple but satisfactory solution in an uncertain world where the assumptions of optimization models may not be met (a so-called "large world"). Few…
Linking 3D spatial models of fuels and fire: Effects of spatial heterogeneity on fire behavior
Russell A. Parsons; William E. Mell; Peter McCauley
2011-01-01
Crownfire endangers fire fighters and can have severe ecological consequences. Prediction of fire behavior in tree crowns is essential to informed decisions in fire management. Current methods used in fire management do not address variability in crown fuels. New mechanistic physics-based fire models address convective heat transfer with computational fluid dynamics (...
NASA Astrophysics Data System (ADS)
Isaacson, B. N.; Singh, A.; Serbin, S. P.; Townsend, P. A.
2009-12-01
Rapid ecosystem invasion by the emerald ash borer (Agrilus planipennis Fairemaire) is forcing resource managers to make decisions regarding how best to manage the pest, but a detailed map of abundance of the host, ash trees of the genus Fraxinus, does not exist, frustrating fully informed management decisions. We have developed methods to map ash tree abundance across a broad spatial extent in Wisconsin using their unique phenology (late leaf-out, early leaf-fall) and the rich dataset of Landsat imagery that can be used to characterize ash senescence with respect to other deciduous species. However, across environmental gradients in Wisconsin, senescence can vary by days or even weeks such that leaf-drop within one species can temporally vary even within a single Landsat footprint. To address this issue, we used phenology products from NASA’s MODIS for North American Carbon Program (NACP) coupled with vegetation indices derived from a time series of Landsat imagery across multiple years to determine the phenological position of each Landsat pixel within a single idealized growing season. Pixels within Landsat images collected in different years were re-arranged in a phenologically-informed time series that described autumn senescence. This characterization of leaf-drop was then related to the abundance of ash trees, producing a spatially-generalizable model of moderate resolution capable of predicting ash abundance across the state using multiple Landsat scenes. Empirical models predicting ash abundance for two Landsat footprints in Wisconsin indicate model fits for ash abundance of R^2=0.65 in north-central WI, and R^2>0.70 in southeastern WI.
Some Results of Weak Anticipative Concept Applied in Simulation Based Decision Support in Enterprise
NASA Astrophysics Data System (ADS)
Kljajić, Miroljub; Kofjač, Davorin; Kljajić Borštnar, Mirjana; Škraba, Andrej
2010-11-01
The simulation models are used as for decision support and learning in enterprises and in schools. Tree cases of successful applications demonstrate usefulness of weak anticipative information. Job shop scheduling production with makespan criterion presents a real case customized flexible furniture production optimization. The genetic algorithm for job shop scheduling optimization is presented. Simulation based inventory control for products with stochastic lead time and demand describes inventory optimization for products with stochastic lead time and demand. Dynamic programming and fuzzy control algorithms reduce the total cost without producing stock-outs in most cases. Values of decision making information based on simulation were discussed too. All two cases will be discussed from optimization, modeling and learning point of view.
Dexter H. Locke; J. Morgan Grove; Michael Galvin; Jarlath P.M. ONeil-Dunne; Charles Murphy
2013-01-01
Urban Tree Canopy (UTC) Prioritizations can be both a set of geographic analysis tools and a planning process for collaborative decision-making. In this paper, we describe how UTC Prioritizations can be used as a planning process to provide decision support to multiple government agencies, civic groups and private businesses to aid in reaching a canopy target. Linkages...
New Splitting Criteria for Decision Trees in Stationary Data Streams.
Jaworski, Maciej; Duda, Piotr; Rutkowski, Leszek; Jaworski, Maciej; Duda, Piotr; Rutkowski, Leszek; Rutkowski, Leszek; Duda, Piotr; Jaworski, Maciej
2018-06-01
The most popular tools for stream data mining are based on decision trees. In previous 15 years, all designed methods, headed by the very fast decision tree algorithm, relayed on Hoeffding's inequality and hundreds of researchers followed this scheme. Recently, we have demonstrated that although the Hoeffding decision trees are an effective tool for dealing with stream data, they are a purely heuristic procedure; for example, classical decision trees such as ID3 or CART cannot be adopted to data stream mining using Hoeffding's inequality. Therefore, there is an urgent need to develop new algorithms, which are both mathematically justified and characterized by good performance. In this paper, we address this problem by developing a family of new splitting criteria for classification in stationary data streams and investigating their probabilistic properties. The new criteria, derived using appropriate statistical tools, are based on the misclassification error and the Gini index impurity measures. The general division of splitting criteria into two types is proposed. Attributes chosen based on type- splitting criteria guarantee, with high probability, the highest expected value of split measure. Type- criteria ensure that the chosen attribute is the same, with high probability, as it would be chosen based on the whole infinite data stream. Moreover, in this paper, two hybrid splitting criteria are proposed, which are the combinations of single criteria based on the misclassification error and Gini index.
Jansa, Václav
2017-01-01
Height to crown base (HCB) of a tree is an important variable often included as a predictor in various forest models that serve as the fundamental tools for decision-making in forestry. We developed spatially explicit and spatially inexplicit mixed-effects HCB models using measurements from a total 19,404 trees of Norway spruce (Picea abies (L.) Karst.) and European beech (Fagus sylvatica L.) on the permanent sample plots that are located across the Czech Republic. Variables describing site quality, stand density or competition, and species mixing effects were included into the HCB model with use of dominant height (HDOM), basal area of trees larger in diameters than a subject tree (BAL- spatially inexplicit measure) or Hegyi’s competition index (HCI—spatially explicit measure), and basal area proportion of a species of interest (BAPOR), respectively. The parameters describing sample plot-level random effects were included into the HCB model by applying the mixed-effects modelling approach. Among several functional forms evaluated, the logistic function was found most suited to our data. The HCB model for Norway spruce was tested against the data originated from different inventory designs, but model for European beech was tested using partitioned dataset (a part of the main dataset). The variance heteroscedasticity in the residuals was substantially reduced through inclusion of a power variance function into the HCB model. The results showed that spatially explicit model described significantly a larger part of the HCB variations [R2adj = 0.86 (spruce), 0.85 (beech)] than its spatially inexplicit counterpart [R2adj = 0.84 (spruce), 0.83 (beech)]. The HCB increased with increasing competitive interactions described by tree-centered competition measure: BAL or HCI, and species mixing effects described by BAPOR. A test of the mixed-effects HCB model with the random effects estimated using at least four trees per sample plot in the validation data confirmed that the model was precise enough for the prediction of HCB for a range of site quality, tree size, stand density, and stand structure. We therefore recommend measuring of HCB on four randomly selected trees of a species of interest on each sample plot for localizing the mixed-effects model and predicting HCB of the remaining trees on the plot. Growth simulations can be made from the data that lack the values for either crown ratio or HCB using the HCB models. PMID:29049391
Deo, Ravinesh C; Downs, Nathan; Parisi, Alfio V; Adamowski, Jan F; Quilty, John M
2017-05-01
Exposure to erythemally-effective solar ultraviolet radiation (UVR) that contributes to malignant keratinocyte cancers and associated health-risk is best mitigated through innovative decision-support systems, with global solar UV index (UVI) forecast necessary to inform real-time sun-protection behaviour recommendations. It follows that the UVI forecasting models are useful tools for such decision-making. In this study, a model for computationally-efficient data-driven forecasting of diffuse and global very short-term reactive (VSTR) (10-min lead-time) UVI, enhanced by drawing on the solar zenith angle (θ s ) data, was developed using an extreme learning machine (ELM) algorithm. An ELM algorithm typically serves to address complex and ill-defined forecasting problems. UV spectroradiometer situated in Toowoomba, Australia measured daily cycles (0500-1700h) of UVI over the austral summer period. After trialling activations functions based on sine, hard limit, logarithmic and tangent sigmoid and triangular and radial basis networks for best results, an optimal ELM architecture utilising logarithmic sigmoid equation in hidden layer, with lagged combinations of θ s as the predictor data was developed. ELM's performance was evaluated using statistical metrics: correlation coefficient (r), Willmott's Index (WI), Nash-Sutcliffe efficiency coefficient (E NS ), root mean square error (RMSE), and mean absolute error (MAE) between observed and forecasted UVI. Using these metrics, the ELM model's performance was compared to that of existing methods: multivariate adaptive regression spline (MARS), M5 Model Tree, and a semi-empirical (Pro6UV) clear sky model. Based on RMSE and MAE values, the ELM model (0.255, 0.346, respectively) outperformed the MARS (0.310, 0.438) and M5 Model Tree (0.346, 0.466) models. Concurring with these metrics, the Willmott's Index for the ELM, MARS and M5 Model Tree models were 0.966, 0.942 and 0.934, respectively. About 57% of the ELM model's absolute errors were small in magnitude (±0.25), whereas the MARS and M5 Model Tree models generated 53% and 48% of such errors, respectively, indicating the latter models' errors to be distributed in larger magnitude error range. In terms of peak global UVI forecasting, with half the level of error, the ELM model outperformed MARS and M5 Model Tree. A comparison of the magnitude of hourly-cumulated errors of 10-min lead time forecasts for diffuse and global UVI highlighted ELM model's greater accuracy compared to MARS, M5 Model Tree or Pro6UV models. This confirmed the versatility of an ELM model drawing on θ s data for VSTR forecasting of UVI at near real-time horizon. When applied to the goal of enhancing expert systems, ELM-based accurate forecasts capable of reacting quickly to measured conditions can enhance real-time exposure advice for the public, mitigating the potential for solar UV-exposure-related disease. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.
Catry, Filipe X.; Moreira, Francisco; Pausas, Juli G.; Fernandes, Paulo M.; Rego, Francisco; Cardillo, Enrique; Curt, Thomas
2012-01-01
Forest ecosystems where periodical tree bark harvesting is a major economic activity may be particularly vulnerable to disturbances such as fire, since debarking usually reduces tree vigour and protection against external agents. In this paper we asked how cork oak Quercus suber trees respond after wildfires and, in particular, how bark harvesting affects post-fire tree survival and resprouting. We gathered data from 22 wildfires (4585 trees) that occurred in three southern European countries (Portugal, Spain and France), covering a wide range of conditions characteristic of Q. suber ecosystems. Post-fire tree responses (tree mortality, stem mortality and crown resprouting) were examined in relation to management and ecological factors using generalized linear mixed-effects models. Results showed that bark thickness and bark harvesting are major factors affecting resistance of Q. suber to fire. Fire vulnerability was higher for trees with thin bark (young or recently debarked individuals) and decreased with increasing bark thickness until cork was 3–4 cm thick. This bark thickness corresponds to the moment when exploited trees are debarked again, meaning that exploited trees are vulnerable to fire during a longer period. Exploited trees were also more likely to be top-killed than unexploited trees, even for the same bark thickness. Additionally, vulnerability to fire increased with burn severity and with tree diameter, and was higher in trees burned in early summer or located in drier south-facing aspects. We provided tree response models useful to help estimating the impact of fire and to support management decisions. The results suggested that an appropriate management of surface fuels and changes in the bark harvesting regime (e.g. debarking coexisting trees in different years or increasing the harvesting cycle) would decrease vulnerability to fire and contribute to the conservation of cork oak ecosystems. PMID:22787521
Model-Based Design of Tree WSNs for Decentralized Detection †
Tantawy, Ashraf; Koutsoukos, Xenofon; Biswas, Gautam
2015-01-01
The classical decentralized detection problem of finding the optimal decision rules at the sensor and fusion center, as well as variants that introduce physical channel impairments have been studied extensively in the literature. The deployment of WSNs in decentralized detection applications brings new challenges to the field. Protocols for different communication layers have to be co-designed to optimize the detection performance. In this paper, we consider the communication network design problem for a tree WSN. We pursue a system-level approach where a complete model for the system is developed that captures the interactions between different layers, as well as different sensor quality measures. For network optimization, we propose a hierarchical optimization algorithm that lends itself to the tree structure, requiring only local network information. The proposed design approach shows superior performance over several contentionless and contention-based network design approaches. PMID:26307989
Stratification of the severity of critically ill patients with classification trees
2009-01-01
Background Development of three classification trees (CT) based on the CART (Classification and Regression Trees), CHAID (Chi-Square Automatic Interaction Detection) and C4.5 methodologies for the calculation of probability of hospital mortality; the comparison of the results with the APACHE II, SAPS II and MPM II-24 scores, and with a model based on multiple logistic regression (LR). Methods Retrospective study of 2864 patients. Random partition (70:30) into a Development Set (DS) n = 1808 and Validation Set (VS) n = 808. Their properties of discrimination are compared with the ROC curve (AUC CI 95%), Percent of correct classification (PCC CI 95%); and the calibration with the Calibration Curve and the Standardized Mortality Ratio (SMR CI 95%). Results CTs are produced with a different selection of variables and decision rules: CART (5 variables and 8 decision rules), CHAID (7 variables and 15 rules) and C4.5 (6 variables and 10 rules). The common variables were: inotropic therapy, Glasgow, age, (A-a)O2 gradient and antecedent of chronic illness. In VS: all the models achieved acceptable discrimination with AUC above 0.7. CT: CART (0.75(0.71-0.81)), CHAID (0.76(0.72-0.79)) and C4.5 (0.76(0.73-0.80)). PCC: CART (72(69-75)), CHAID (72(69-75)) and C4.5 (76(73-79)). Calibration (SMR) better in the CT: CART (1.04(0.95-1.31)), CHAID (1.06(0.97-1.15) and C4.5 (1.08(0.98-1.16)). Conclusion With different methodologies of CTs, trees are generated with different selection of variables and decision rules. The CTs are easy to interpret, and they stratify the risk of hospital mortality. The CTs should be taken into account for the classification of the prognosis of critically ill patients. PMID:20003229
Interpretation of diagnostic data: 6. How to do it with more complex maths.
1983-11-15
We have now shown you how to use decision analysis in making those rare, tough diagnostic decisions that are not soluble through other, easier routes. In summary, to "use more complex maths" the following steps will be useful: Create a decision tree or map of all the pertinent courses of action and their consequences. Assign probabilities to the branches of each chance node. Assign utilities to each of the potential outcomes shown on the decision tree. Combine the probabilities and utilities for each node on the decision tree. Pick the decision that leads to the highest expected utility. Test your decision for its sensitivity to clinically sensible changes in probabilities and utilities. That concludes this series of clinical epidemiology rounds. You've come a long way from "doing it with pictures" and are now able to extract most of the diagnostic information that can be provided from signs, symptoms and laboratory investigations. We would appreciate learning whether you have found this series useful and how we can do a better job of presenting these and other elements of "the science of the art of medicine".
Bayesian Decision Tree for the Classification of the Mode of Motion in Single-Molecule Trajectories
Türkcan, Silvan; Masson, Jean-Baptiste
2013-01-01
Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. The method is based on Bayesian inference to calculate the posteriori probability of an observed trajectory according to a certain model. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and we built a decision tree. This decision tree first uses the BIC to distinguish between free Brownian motion and confined motion. In a second step, it classifies the confining potential further using the AIC. We apply the method to experimental Clostridium Perfingens -toxin (CPT) receptor trajectories to show that these receptors are confined by a spring-like potential. An adaptation of this technique was applied on a sliding window in the temporal dimension along the trajectory. We applied this adaptation to experimental CPT trajectories that lose confinement due to disaggregation of confining domains. This new technique adds another dimension to the discussion of SMT data. The mode of motion of a receptor might hold more biologically relevant information than the diffusion coefficient or domain size and may be a better tool to classify and compare different SMT experiments. PMID:24376584
Marty, Rémi; Roze, Stéphane; Kurth, Hannah
2012-01-01
Long-acting somatostatin receptor ligands (SRL) with product-specific formulation and means of administration are injected periodically in patients with acromegaly and neuroendocrine tumors. A simple decision-tree model aimed at comparing cost savings with ready-to-use Somatuline Autogel(®) (lanreotide) and Sandostatin LAR(®) (octreotide) for the UK, France, and Germany. The drivers of cost savings studied were the reduction of time to administer as well as a reduced baseline risk of clogging during product administration reported for Somatuline Autogel(®). The decision-tree model assumed two settings for SRL administration, ie, by either hospital-based or community-based nurses. In the case of clogging, the first dose was assumed to be lost and a second injection performed. Successful injection depended on the probability of clogging. Direct medical costs were included. A set of scenarios were run, varying the cost drivers, such as the baseline risk of clogging, SRL administration time, and percentage of patients injected during a hospital stay. Costs per successful injection were less for Somatuline Autogel(®)/Depot, ranging from Euros (EUR) 13-45, EUR 52-108, and EUR 127-151, respectively, for France, Germany, and the UK. The prices for both long-acting SRL were the same in France, and cost savings came to 100% from differences other than drug prices. For Germany and the UK, the proportion of savings due to less clogging and shorter administration time was estimated to be around 32% and 20%, respectively. Based on low and high country-specific patient cohort size estimations of individuals eligible for SRL treatment among the patient population with acromegaly and neuroendocrine tumors, annual savings were estimated to be up to EUR 2,000,000 for France, EUR 6,000,000 for Germany, and EUR 7,000,000 for the UK. This model suggests that increasing usage of the Somatuline device for injection of SRL might lead to substantial savings for health care providers across Europe.
Policy Route Map for Academic Libraries' Digital Content
ERIC Educational Resources Information Center
Koulouris, Alexandros; Kapidakis, Sarantos
2012-01-01
This paper presents a policy decision tree for digital information management in academic libraries. The decision tree is a policy guide, which offers alternative access and reproduction policy solutions according to the prevailing circumstances (for example acquisition method, copyright ownership). It refers to the digital information life cycle,…
Efforts are increasingly being made to classify the world’s wetland resources, an important ecosystem and habitat that is diminishing in abundance. There are multiple remote sensing classification methods, including a suite of nonparametric classifiers such as decision-tree...
Welsh Bilinguals' English Spelling: An Error Analysis.
ERIC Educational Resources Information Center
James, Carl; And Others
1993-01-01
The extent to which the second-language English spelling of young Welsh-English bilinguals is systematically idiosyncratic was examined from free compositions written by 10- to 11-year-old children. A model is presented of the second-language spelling process in the form of a "decision tree." (Contains 29 references.) (Author/LB)
Computational Models for Belief Revision, Group Decision-Making and Cultural Shifts
2010-10-25
34social" networks; the green numbers are pseudo-trees or artificial (non-social) constructions. The dashed blue line indicates the range of Erdos- Renyi ...non-social networks such as Erdos- Renyi random graphs or the more passive non-cognitive spreading of disease or information flow, As mentioned
Binary recursive partitioning: background, methods, and application to psychology.
Merkle, Edgar C; Shaffer, Victoria A
2011-02-01
Binary recursive partitioning (BRP) is a computationally intensive statistical method that can be used in situations where linear models are often used. Instead of imposing many assumptions to arrive at a tractable statistical model, BRP simply seeks to accurately predict a response variable based on values of predictor variables. The method outputs a decision tree depicting the predictor variables that were related to the response variable, along with the nature of the variables' relationships. No significance tests are involved, and the tree's 'goodness' is judged based on its predictive accuracy. In this paper, we describe BRP methods in a detailed manner and illustrate their use in psychological research. We also provide R code for carrying out the methods.
Montorsi, Francesco; Oelke, Matthias; Henneges, Carsten; Brock, Gerald; Salonia, Andrea; d’Anzeo, Gianluca; Rossi, Andrea; Mulhall, John P.; Büttner, Hartwig
2017-01-01
Background Understanding predictors for the recovery of erectile function (EF) after nerve-sparing radical prostatectomy (nsRP) might help clinicians and patients in preoperative counseling and expectation management of EF rehabilitation strategies. Objective To describe the effect of potential predictors on EF recovery after nsRP by post hoc decision-tree modeling of data from A Study of Tadalafil After Radical Prostatectomy (REACTT). Design, setting, and participants Randomized double-blind double-dummy placebo-controlled trial in 423 men aged <68 yr with adenocarcinoma of the prostate (Gleason ≤7, normal preoperative EF) who underwent nsRP at 50 centers from nine European countries and Canada. Intervention Postsurgery 1:1:1 randomization to 9-mo double-blind treatment with tadalafil 5 mg once a day (OaD), tadalafil 20 mg on demand, or placebo, followed by a 6-wk drug-free-washout, and a 3-mo open-label tadalafil OaD treatment. Outcome measurements and statistical analysis Three decision-tree models, using the International Index of Erectile Function-Erectile Function (IIEF-EF) domain score at the end of double-blind treatment, washout, and open-label treatment as response variable. Each model evaluated the association between potential predictors: presurgery IIEF domain and IIEF single-item scores, surgical approach, nerve-sparing score (NSS), and postsurgery randomized treatment group. Results and limitations The first decision-tree model (n = 422, intention-to-treat population) identified high presurgery sexual desire (IIEF item 12: ≥3.5 and <3.5) as the key predictor for IIEF-EF at the end of double-blind treatment (mean IIEF-EF: 14.9 and 11.1), followed by high confidence to get and maintain an erection (IIEF item 15: ≥3.5 and <3.5; IIEF-EF: 15.4 and 7.1). For patients meeting these criteria, additional non-IIEF–related predictors included robot-assisted laparoscopic surgery (yes or no; IIEF-EF: 19.3 and 12.6), quality of nerve sparing (NSS: <2.5 and ≥2.5; IIEF-EF: 14.3 and 10.5), and treatment with tadalafil OaD (yes and no; IIEF-EF: 17.6 and 14.3). Additional analyses after washout and open-label treatment identified high presurgery intercourse satisfaction as the key predictor. Conclusions Exploratory decision-tree analyses identified high presurgery sexual desire, confidence, and intercourse satisfaction as key predictors for EF recovery. Patients meeting these criteria might benefit the most from conserving surgery and early postsurgery EF rehabilitation. Strategies for improving EF after surgery should be discussed preoperatively with all patients; this information may support expectation management for functional recovery on an individual patient level. Patient summary Understanding how patient characteristics and different treatment options affect the recovery of erectile function (EF) after radical surgery for prostate cancer might help physicians select the optimal treatment for their patients. This analysis of data from a clinical trial suggested that high presurgery sexual desire, sexual confidence, and intercourse satisfaction are key factors predicting EF recovery. Patients meeting these criteria might benefit the most from conserving surgery (robot-assisted surgery, perfect nerve sparing) and postsurgery medical rehabilitation of EF. Trial registration ClinicalTrials.gov, NCT01026818 PMID:26947602
Poulos, H M; Camp, A E
2010-02-01
Vegetation management is a critical component of rights-of-way (ROW) maintenance for preventing electrical outages and safety hazards resulting from tree contact with conductors during storms. Northeast Utility's (NU) transmission lines are a critical element of the nation's power grid; NU is therefore under scrutiny from federal agencies charged with protecting the electrical transmission infrastructure of the United States. We developed a decision support system to focus right-of-way maintenance and minimize the potential for a tree fall episode that disables transmission capacity across the state of Connecticut. We used field data on tree characteristics to develop a system for identifying hazard trees (HTs) in the field using limited equipment to manage Connecticut power line ROW. Results from this study indicated that the tree height-to-diameter ratio, total tree height, and live crown ratio were the key characteristics that differentiated potential risk trees (danger trees) from trees with a high probability of tree fall (HTs). Products from this research can be transferred to adaptive right-of-way management, and the methods we used have great potential for future application to other regions of the United States and elsewhere where tree failure can disrupt electrical power.
Fraccaro, Paolo; Nicolo, Massimo; Bonetto, Monica; Giacomini, Mauro; Weller, Peter; Traverso, Carlo Enrico; Prosperi, Mattia; OSullivan, Dympna
2015-01-27
To investigate machine learning methods, ranging from simpler interpretable techniques to complex (non-linear) "black-box" approaches, for automated diagnosis of Age-related Macular Degeneration (AMD). Data from healthy subjects and patients diagnosed with AMD or other retinal diseases were collected during routine visits via an Electronic Health Record (EHR) system. Patients' attributes included demographics and, for each eye, presence/absence of major AMD-related clinical signs (soft drusen, retinal pigment epitelium, defects/pigment mottling, depigmentation area, subretinal haemorrhage, subretinal fluid, macula thickness, macular scar, subretinal fibrosis). Interpretable techniques known as white box methods including logistic regression and decision trees as well as less interpreitable techniques known as black box methods, such as support vector machines (SVM), random forests and AdaBoost, were used to develop models (trained and validated on unseen data) to diagnose AMD. The gold standard was confirmed diagnosis of AMD by physicians. Sensitivity, specificity and area under the receiver operating characteristic (AUC) were used to assess performance. Study population included 487 patients (912 eyes). In terms of AUC, random forests, logistic regression and adaboost showed a mean performance of (0.92), followed by SVM and decision trees (0.90). All machine learning models identified soft drusen and age as the most discriminating variables in clinicians' decision pathways to diagnose AMD. Both black-box and white box methods performed well in identifying diagnoses of AMD and their decision pathways. Machine learning models developed through the proposed approach, relying on clinical signs identified by retinal specialists, could be embedded into EHR to provide physicians with real time (interpretable) support.
Hydrochemical analysis of groundwater using a tree-based model
NASA Astrophysics Data System (ADS)
Litaor, M. Iggy; Brielmann, H.; Reichmann, O.; Shenker, M.
2010-06-01
SummaryHydrochemical indices are commonly used to ascertain aquifer characteristics, salinity problems, anthropogenic inputs and resource management, among others. This study was conducted to test the applicability of a binary decision tree model to aquifer evaluation using hydrochemical indices as input. The main advantage of the tree-based model compared to other commonly used statistical procedures such as cluster and factor analyses is the ability to classify groundwater samples with assigned probability and the reduction of a large data set into a few significant variables without creating new factors. We tested the model using data sets collected from headwater springs of the Jordan River, Israel. The model evaluation consisted of several levels of complexity, from simple separation between the calcium-magnesium-bicarbonate water type of karstic aquifers to the more challenging separation of calcium-sodium-bicarbonate water type flowing through perched and regional basaltic aquifers. In all cases, the model assigned measures for goodness of fit in the form of misclassification errors and singled out the most significant variable in the analysis. The model proceeded through a sequence of partitions providing insight into different possible pathways and changing lithology. The model results were extremely useful in constraining the interpretation of geological heterogeneity and constructing a conceptual flow model for a given aquifer. The tree model clearly identified the hydrochemical indices that were excluded from the analysis, thus providing information that can lead to a decrease in the number of routinely analyzed variables and a significant reduction in laboratory cost.
Development of Interpretable Predictive Models for BPH and Prostate Cancer.
Bermejo, Pablo; Vivo, Alicia; Tárraga, Pedro J; Rodríguez-Montes, J A
2015-01-01
Traditional methods for deciding whether to recommend a patient for a prostate biopsy are based on cut-off levels of stand-alone markers such as prostate-specific antigen (PSA) or any of its derivatives. However, in the last decade we have seen the increasing use of predictive models that combine, in a non-linear manner, several predictives that are better able to predict prostate cancer (PC), but these fail to help the clinician to distinguish between PC and benign prostate hyperplasia (BPH) patients. We construct two new models that are capable of predicting both PC and BPH. An observational study was performed on 150 patients with PSA ≥3 ng/mL and age >50 years. We built a decision tree and a logistic regression model, validated with the leave-one-out methodology, in order to predict PC or BPH, or reject both. Statistical dependence with PC and BPH was found for prostate volume (P-value < 0.001), PSA (P-value < 0.001), international prostate symptom score (IPSS; P-value < 0.001), digital rectal examination (DRE; P-value < 0.001), age (P-value < 0.002), antecedents (P-value < 0.006), and meat consumption (P-value < 0.08). The two predictive models that were constructed selected a subset of these, namely, volume, PSA, DRE, and IPSS, obtaining an area under the ROC curve (AUC) between 72% and 80% for both PC and BPH prediction. PSA and volume together help to build predictive models that accurately distinguish among PC, BPH, and patients without any of these pathologies. Our decision tree and logistic regression models outperform the AUC obtained in the compared studies. Using these models as decision support, the number of unnecessary biopsies might be significantly reduced.
Sugimoto, Katsutoshi; Shiraishi, Junji; Moriyasu, Fuminori; Doi, Kunio
2009-04-01
To develop a computer-aided diagnostic (CAD) scheme for classifying focal liver lesions (FLLs) by use of physicians' subjective classification of echogenic patterns of FLLs on baseline and contrast-enhanced ultrasonography (US). A total of 137 hepatic lesions in 137 patients were evaluated with B-mode and NC100100 (Sonazoid)-enhanced pulse-inversion US; lesions included 74 hepatocellular carcinomas (HCCs) (23: well-differentiated, 36: moderately differentiated, 15: poorly differentiated HCCs), 33 liver metastases, and 30 liver hemangiomas. Three physicians evaluated single images at B-mode and arterial phases with a cine mode. Physicians were asked to classify each lesion into one of eight B-mode and one of eight enhancement patterns, but did not make a diagnosis. To classify five types of FLLs, we employed a decision tree model with four decision nodes and four artificial neural networks (ANNs). The results of the physicians' pattern classifications were used successively for four different ANNs in making decisions at each of the decision nodes in the decision tree model. The classification accuracies for the 137 FLLs were 84.8% for metastasis, 93.3% for hemangioma, and 98.6% for all HCCs. In addition, the classification accuracies for histological differentiation types of HCCs were 65.2% for well-differentiated HCC, 41.7% for moderately differentiated HCC, and 80.0% for poorly differentiated HCC. This CAD scheme has the potential to improve the diagnostic accuracy of liver lesions. However, the accuracy in the histologic differential diagnosis of HCC based on baseline and contrast-enhanced US is still limited.
Bou Kheir, Rania; Greve, Mogens H; Bøcher, Peder K; Greve, Mette B; Larsen, René; McCloy, Keith
2010-05-01
Soil organic carbon (SOC) is one of the most important carbon stocks globally and has large potential to affect global climate. Distribution patterns of SOC in Denmark constitute a nation-wide baseline for studies on soil carbon changes (with respect to Kyoto protocol). This paper predicts and maps the geographic distribution of SOC across Denmark using remote sensing (RS), geographic information systems (GISs) and decision-tree modeling (un-pruned and pruned classification trees). Seventeen parameters, i.e. parent material, soil type, landscape type, elevation, slope gradient, slope aspect, mean curvature, plan curvature, profile curvature, flow accumulation, specific catchment area, tangent slope, tangent curvature, steady-state wetness index, Normalized Difference Vegetation Index (NDVI), Normalized Difference Wetness Index (NDWI) and Soil Color Index (SCI) were generated to statistically explain SOC field measurements in the area of interest (Denmark). A large number of tree-based classification models (588) were developed using (i) all of the parameters, (ii) all Digital Elevation Model (DEM) parameters only, (iii) the primary DEM parameters only, (iv), the remote sensing (RS) indices only, (v) selected pairs of parameters, (vi) soil type, parent material and landscape type only, and (vii) the parameters having a high impact on SOC distribution in built pruned trees. The best constructed classification tree models (in the number of three) with the lowest misclassification error (ME) and the lowest number of nodes (N) as well are: (i) the tree (T1) combining all of the parameters (ME=29.5%; N=54); (ii) the tree (T2) based on the parent material, soil type and landscape type (ME=31.5%; N=14); and (iii) the tree (T3) constructed using parent material, soil type, landscape type, elevation, tangent slope and SCI (ME=30%; N=39). The produced SOC maps at 1:50,000 cartographic scale using these trees are highly matching with coincidence values equal to 90.5% (Map T1/Map T2), 95% (Map T1/Map T3) and 91% (Map T2/Map T3). The overall accuracies of these maps once compared with field observations were estimated to be 69.54% (Map T1), 68.87% (Map T2) and 69.41% (Map T3). The proposed tree models are relatively simple, and may be also applied to other areas. Copyright 2010 Elsevier Ltd. All rights reserved.
[A prediction model for internet game addiction in adolescents: using a decision tree analysis].
Kim, Ki Sook; Kim, Kyung Hee
2010-06-01
This study was designed to build a theoretical frame to provide practical help to prevent and manage adolescent internet game addiction by developing a prediction model through a comprehensive analysis of related factors. The participants were 1,318 students studying in elementary, middle, and high schools in Seoul and Gyeonggi Province, Korea. Collected data were analyzed using the SPSS program. Decision Tree Analysis using the Clementine program was applied to build an optimum and significant prediction model to predict internet game addiction related to various factors, especially parent related factors. From the data analyses, the prediction model for factors related to internet game addiction presented with 5 pathways. Causative factors included gender, type of school, siblings, economic status, religion, time spent alone, gaming place, payment to Internet café, frequency, duration, parent's ability to use internet, occupation (mother), trust (father), expectations regarding adolescent's study (mother), supervising (both parents), rearing attitude (both parents). The results suggest preventive and managerial nursing programs for specific groups by path. Use of this predictive model can expand the role of school nurses, not only in counseling addicted adolescents but also, in developing and carrying out programs with parents and approaching adolescents individually through databases and computer programming.
Automatic energy expenditure measurement for health science.
Catal, Cagatay; Akbulut, Akhan
2018-04-01
It is crucial to predict the human energy expenditure in any sports activity and health science application accurately to investigate the impact of the activity. However, measurement of the real energy expenditure is not a trivial task and involves complex steps. The objective of this work is to improve the performance of existing estimation models of energy expenditure by using machine learning algorithms and several data from different sensors and provide this estimation service in a cloud-based platform. In this study, we used input data such as breathe rate, and hearth rate from three sensors. Inputs are received from a web form and sent to the web service which applies a regression model on Azure cloud platform. During the experiments, we assessed several machine learning models based on regression methods. Our experimental results showed that our novel model which applies Boosted Decision Tree Regression in conjunction with the median aggregation technique provides the best result among other five regression algorithms. This cloud-based energy expenditure system which uses a web service showed that cloud computing technology is a great opportunity to develop estimation systems and the new model which applies Boosted Decision Tree Regression with the median aggregation provides remarkable results. Copyright © 2018 Elsevier B.V. All rights reserved.
Estimation of Sub Hourly Glacier Albedo Values Using Artificial Intelligence Techniques
NASA Astrophysics Data System (ADS)
Moya Quiroga, Vladimir; Mano, Akira; Asaoka, Yoshihiro; Udo, Keiko; Kure, Shuichi; Mendoza, Javier
2013-04-01
Glaciers are the most important fresh water reservoirs storing about 67% of total fresh water. Unfortunately, they are retreating and some small glaciers have already disappeared. Thus, snow glacier melt (SGM) estimation plays an important role in water resources management. Whether SGM is estimated by complete energy balance or a simplified method, albedo is an important data present in most of the methods. However, this is a variable value depending on the ground surface and local conditions. The present research presents a new approach for estimating sub hourly albedo values using different artificial intelligence techniques such as artificial neural networks and decision trees along with measured and easy to obtain data. . The models were developed using measured data from the Zongo-Ore station located in the Bolivian tropical glacier Zongo (68°10' W, 16°15' S). This station automatically records every 30 minutes several meteorological parameters such as incoming short wave radiation, outgoing short wave radiation, temperature or relative humidity. The ANN model used was the Multi Layer Perceptron, while the decision tree used was the M5 model. Both models were trained using the WEKA software and validated using the cross validation method. After analysing the model performances, it was concluded that the decision tree models have a better performance. The model with the best performance was then validated with measured data from the Equatorian tropical glacier Antizana (78°09'W, 0°28'S). The model predicts the sub hourly albedo with an overall mean absolute error of 0.103. The highest errors occur for albedo measured values higher than 0.9. Considering that this is an extreme value coincident with low measured values of incoming short wave radiation, it is reasonable to assume that such values include errors due to censored data. Assuming a maximum albedo of 0.9 improved the accuracy of the model reducing the MAE to less than 0.1. Considering that the model was successfully verified both in the inner tropics and the outer tropics, this model is a valuable contribution that may be used to project future scenarios in tropical glaciers. This research is developed within the GRANDE project (Glacier Retreat impact Assessment and National policy Development), financed by SATREPS from JST-JICA.
Modeling small-scale variability in the composition of goshawk habitat on the Kaibab National Forest
Suzanne M. Joy; Robin M. Reich; Richard T. Reynolds
2000-01-01
We used field data, topographical information (elevation, slope, aspect, landform), and Landsat Thematic Mapper imagery to model forest vegetative types to a 10-m resolution on the Kaibab National Forest in northern Arizona. Forest types were identified by clustering the field data and then using a decision tree based on the spectral characteristics of a Landsat image...
Shi, Guo; Zhang, Shun-xiang
2013-03-01
To synthesize relevant data and to analyze the benefit-cost ratio on strategies related to preventing the maternal-infantile transmission of hepatitis B virus infection and to explore the optimal strategy. A decision tree model was constructed according to the strategies of hepatitis B immunization and a Markov model was conducted to simulate the complex disease progress after HBV infection. Parameters in the models were drawn from meta-analysis and information was collected from field study and review of literature. Economic evaluation was performed to calculate costs, benefit, and the benefit-cost ratio. Sensitivity analysis was also conducted and a tornado graph was drawn. In view of the current six possible strategies in preventing maternal-infantile transmission of hepatitis B virus infection, a multi-stage decision tree model was constructed to screen hepatitis B surface antigen (HBsAg) or screen for HBsAg then hepatitis B e antigen (HBeAg). Dose and the number of injections of HBIG and hepatitis B vaccine were taken into consideration in the model. All the strategies were considered to be cost-saving, while the strategy of screening for HBsAg and then offering hepatitis B vaccine of 10 µg×3 for all neonates with hepatitis B immunoglobulin (HBIG) of 100 IU×1 for the neonates born to mothers who tested positive for HBsAg appeared with most cost-saving. In the strategies, the benefit-cost ratio of using 100 IU HBIG was similar to 200 IU HBIG, and one shot of HBIG was superior to two shots. from sensitivity analysis suggested that the rates of immunization and the efficacy of the strategy in preventing maternal-infantile transmission were the main sensitive variables in the model. The passive-active immune-prophylaxis strategy that using 10 µg hepatitis B vaccine combined with 100 IU HBIG seemed to be the optimal strategy in preventing maternal-infantile transmission, while the rates of immunization and the efficacy of the strategy played the key roles in choosing the ideal strategy.
NASA Astrophysics Data System (ADS)
Jin, Shan
This dissertation concerns power system expansion planning under different market mechanisms. The thesis follows a three paper format, in which each paper emphasizes a different perspective. The first paper investigates the impact of market uncertainties on a long term centralized generation expansion planning problem. The problem is modeled as a two-stage stochastic program with uncertain fuel prices and demands, which are represented as probabilistic scenario paths in a multi-period tree. Two measurements, expected cost (EC) and Conditional Value-at-Risk (CVaR), are used to minimize, respectively, the total expected cost among scenarios and the risk of incurring high costs in unfavorable scenarios. We sample paths from the scenario tree to reduce the problem scale and determine the sufficient number of scenarios by computing confidence intervals on the objective values. The second paper studies an integrated electricity supply system including generation, transmission and fuel transportation with a restructured wholesale electricity market. This integrated system expansion problem is modeled as a bi-level program in which a centralized system expansion decision is made in the upper level and the operational decisions of multiple market participants are made in the lower level. The difficulty of solving a bi-level programming problem to global optimality is discussed and three problem relaxations obtained by reformulation are explored. The third paper solves a more realistic market-based generation and transmission expansion problem. It focuses on interactions among a centralized transmission expansion decision and decentralized generation expansion decisions. It allows each generator to make its own strategic investment and operational decisions both in response to a transmission expansion decision and in anticipation of a market price settled by an Independent System Operator (ISO) market clearing problem. The model poses a complicated tri-level structure including an equilibrium problem with equilibrium constraints (EPEC) sub-problem. A hybrid iterative algorithm is proposed to solve the problem efficiently and reliably.
What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis
ERIC Educational Resources Information Center
Thomas, Emily H.; Galambos, Nora
2004-01-01
To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…
Management of precancerous cervical lesions in iran: a cost minimizing study.
Nahvijou, Azin; Sari, Ali Akbari; Zendehdel, Kazem; Marnani, Ahmad Barati
2014-01-01
Cervical cancer is a common, preventable and manageable disease in women worldwide. This study was conducted to determine the cost of follow-up for suspicious precancerous cervical lesions within a screening program using Pap smear or HPV DNA test through the decision tree. Patient follow-up processes were determined using standard guidelines and consultation with specialists to design a decision tree model. Costs of treatment in both public and private sectors were identified according to the national tariffs in 2010 and determined based on decision tree and provided services (visits to specialists, colposcopy, and conization) with two modalities: Pap smear and HPV DNA test. The number of patients and the mean cost of treatment in each sector were calculated. The prevalence of lesions and HPV were obtained from literature to estimate the cost of treatment for each woman in the population. Follow-up costs were determined using seven processes for Pap smear and 11 processes for HPV DNA test. The total cost of using Pap smear and HPV DNA process for each woman in the population was 36.1$ and 174 $ respectively. The follow-up process for patients with suspicious cervical lesions needs to be included in the existing screening program. HPV DNA test is currently more expensive than Pap smear, it is suggested that we manage precancerous cervical lesions with this latter test.
NASA Astrophysics Data System (ADS)
Luo, Qiu; Xin, Wu; Qiming, Xiong
2017-06-01
In the process of vegetation remote sensing information extraction, the problem of phenological features and low performance of remote sensing analysis algorithm is not considered. To solve this problem, the method of remote sensing vegetation information based on EVI time-series and the classification of decision-tree of multi-source branch similarity is promoted. Firstly, to improve the time-series stability of recognition accuracy, the seasonal feature of vegetation is extracted based on the fitting span range of time-series. Secondly, the decision-tree similarity is distinguished by adaptive selection path or probability parameter of component prediction. As an index, it is to evaluate the degree of task association, decide whether to perform migration of multi-source decision tree, and ensure the speed of migration. Finally, the accuracy of classification and recognition of pests and diseases can reach 87%--98% of commercial forest in Dalbergia hainanensis, which is significantly better than that of MODIS coverage accuracy of 80%--96% in this area. Therefore, the validity of the proposed method can be verified.
Tree-, stand- and site-specific controls on landscape-scale patterns of transpiration
NASA Astrophysics Data System (ADS)
Kathrin Hassler, Sibylle; Weiler, Markus; Blume, Theresa
2018-01-01
Transpiration is a key process in the hydrological cycle, and a sound understanding and quantification of transpiration and its spatial variability is essential for management decisions as well as for improving the parameterisation and evaluation of hydrological and soil-vegetation-atmosphere transfer models. For individual trees, transpiration is commonly estimated by measuring sap flow. Besides evaporative demand and water availability, tree-specific characteristics such as species, size or social status control sap flow amounts of individual trees. Within forest stands, properties such as species composition, basal area or stand density additionally affect sap flow, for example via competition mechanisms. Finally, sap flow patterns might also be influenced by landscape-scale characteristics such as geology and soils, slope position or aspect because they affect water and energy availability; however, little is known about the dynamic interplay of these controls.We studied the relative importance of various tree-, stand- and site-specific characteristics with multiple linear regression models to explain the variability of sap velocity measurements in 61 beech and oak trees, located at 24 sites across a 290 km2 catchment in Luxembourg. For each of 132 consecutive days of the growing season of 2014 we modelled the daily sap velocity and derived sap flow patterns of these 61 trees, and we determined the importance of the different controls.Results indicate that a combination of mainly tree- and site-specific factors controls sap velocity patterns in the landscape, namely tree species, tree diameter, geology and aspect. For sap flow we included only the stand- and site-specific predictors in the models to ensure variable independence. Of those, geology and aspect were most important. Compared to these predictors, spatial variability of atmospheric demand and soil moisture explains only a small fraction of the variability in the daily datasets. However, the temporal dynamics of the explanatory power of the tree-specific characteristics, especially species, are correlated to the temporal dynamics of potential evaporation. We conclude that transpiration estimates on the landscape scale would benefit from not only consideration of hydro-meteorological drivers, but also tree, stand and site characteristics in order to improve the spatial and temporal representation of transpiration for hydrological and soil-vegetation-atmosphere transfer models.
[Parameter of evidence-based medicine in health care economics].
Wasem, J; Siebert, U
1999-08-01
In the view of scarcity of resources, economic evaluations in health care, in which not only effects but also costs related to a medical intervention are examined and a incremental cost-outcome-ratio is build, are an important supplement to the program of evidence based medicine. Outcomes of a medical intervention can be measured by clinical effectiveness, quality-adjusted life years, and monetary evaluation of benefits. As far as costs are concerned, direct medical costs, direct non-medical costs and indirect costs have to be considered in an economic evaluation. Data can be used from primary studies or secondary analysis; metaanalysis for synthesizing of data may be adequate. For calculation of incremental cost-benefit-ratios, models of decision analysis (decision tree models, Markov-models) often are necessary. Methodological and ethical limits for application of the results of economic evaluation in resource allocation decision in health care have to be regarded: Economic evaluations and the calculation of cost-outcome-rations should only support decision making but cannot replace it.
Prediction of strontium bromide laser efficiency using cluster and decision tree analysis
NASA Astrophysics Data System (ADS)
Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar
2018-01-01
Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.
Kempfle, Judith S.; BuSaba, Nicholas Y.; Dobrowski, John M.; Westover, Michael B.; Bianchi, Matt T.
2017-01-01
Objectives/Hypothesis Nasal surgery has been implicated to improve continuous positive airway pressure (CPAP) compliance in patients with obstructive sleep apnea (OSA) and nasal obstruction. However, the cost-effectiveness of nasal surgery to improve CPAP compliance is not known. We modeled the cost-effectiveness of two types of nasal surgery versus no surgery in patients with OSA and nasal obstruction undergoing CPAP therapy. Study Design Cost-effectiveness decision tree model. Methods We built a decision tree model to identify conditions under which nasal surgery would be cost-effective to improve CPAP adherence over the standard of care. We compared turbinate reduction and septoplasty to nonsurgical treatment over varied time horizons from a third-party payer perspective. We included variables for cost of untreated OSA, surgical cost and complications, improved compliance postoperatively, and quality of life. Results Our study identified nasal surgery as a cost-effective strategy to improve compliance of OSA patients using CPAP across a range of plausible model assumptions regarding the cost of untreated OSA, the probability of adherence improvement, and a chronic time horizon. The relatively lower surgical cost of turbinate reduction made it more cost-effective at earlier time horizons, whereas septoplasty became cost-effective after a longer timespan. Conclusions Across a range of plausible values in a clinically relevant decision model, nasal surgery is a cost-effective strategy to improve CPAP compliance in OSA patients with nasal obstruction. Our results suggest that OSA patients with nasal obstruction who struggle with CPAP therapy compliance should undergo evaluation for nasal surgery. PMID:27653626
Detection of fraudulent financial statements using the hybrid data mining approach.
Chen, Suduan
2016-01-01
The purpose of this study is to construct a valid and rigorous fraudulent financial statement detection model. The research objects are companies which experienced both fraudulent and non-fraudulent financial statements between the years 2002 and 2013. In the first stage, two decision tree algorithms, including the classification and regression trees (CART) and the Chi squared automatic interaction detector (CHAID) are applied in the selection of major variables. The second stage combines CART, CHAID, Bayesian belief network, support vector machine and artificial neural network in order to construct fraudulent financial statement detection models. According to the results, the detection performance of the CHAID-CART model is the most effective, with an overall accuracy of 87.97 % (the FFS detection accuracy is 92.69 %).
NASA Astrophysics Data System (ADS)
Oza, Nikunj
2012-03-01
A supervised learning task involves constructing a mapping from input data (normally described by several features) to the appropriate outputs. A set of training examples— examples with known output values—is used by a learning algorithm to generate a model. This model is intended to approximate the mapping between the inputs and outputs. This model can be used to generate predicted outputs for inputs that have not been seen before. Within supervised learning, one type of task is a classification learning task, in which each output is one or more classes to which the input belongs. For example, we may have data consisting of observations of sunspots. In a classification learning task, our goal may be to learn to classify sunspots into one of several types. Each example may correspond to one candidate sunspot with various measurements or just an image. A learning algorithm would use the supplied examples to generate a model that approximates the mapping between each supplied set of measurements and the type of sunspot. This model can then be used to classify previously unseen sunspots based on the candidate’s measurements. The generalization performance of a learned model (how closely the target outputs and the model’s predicted outputs agree for patterns that have not been presented to the learning algorithm) would provide an indication of how well the model has learned the desired mapping. More formally, a classification learning algorithm L takes a training set T as its input. The training set consists of |T| examples or instances. It is assumed that there is a probability distribution D from which all training examples are drawn independently—that is, all the training examples are independently and identically distributed (i.i.d.). The ith training example is of the form (x_i, y_i), where x_i is a vector of values of several features and y_i represents the class to be predicted.* In the sunspot classification example given above, each training example would represent one sunspot’s classification (y_i) and the corresponding set of measurements (x_i). The output of a supervised learning algorithm is a model h that approximates the unknown mapping from the inputs to the outputs. In our example, h would map from the sunspot measurements to the type of sunspot. We may have a test set S—a set of examples not used in training that we use to test how well the model h predicts the outputs on new examples. Just as with the examples in T, the examples in S are assumed to be independent and identically distributed (i.i.d.) draws from the distribution D. We measure the error of h on the test set as the proportion of test cases that h misclassifies: 1/|S| Sigma(x,y union S)[I(h(x)!= y)] where I(v) is the indicator function—it returns 1 if v is true and 0 otherwise. In our sunspot classification example, we would identify additional examples of sunspots that were not used in generating the model, and use these to determine how accurate the model is—the fraction of the test samples that the model classifies correctly. An example of a classification model is the decision tree shown in Figure 23.1. We will discuss the decision tree learning algorithm in more detail later—for now, we assume that, given a training set with examples of sunspots, this decision tree is derived. This can be used to classify previously unseen examples of sunpots. For example, if a new sunspot’s inputs indicate that its "Group Length" is in the range 10-15, then the decision tree would classify the sunspot as being of type “E,” whereas if the "Group Length" is "NULL," the "Magnetic Type" is "bipolar," and the "Penumbra" is "rudimentary," then it would be classified as type "C." In this chapter, we will add to the above description of classification problems. We will discuss decision trees and several other classification models. In particular, we will discuss the learning algorithms that generate these classification models, how to use them to classify new examples, and the strengths and weaknesses of these models. We will end with pointers to further reading on classification methods applied to astronomy data.
Otsuka, Momoka; Uchida, Yuki; Kawaguchi, Takumi; Taniguchi, Eitaro; Kawaguchi, Atsushi; Kitani, Shingo; Itou, Minoru; Oriishi, Tetsuharu; Kakuma, Tatsuyuki; Tanaka, Suiko; Yagi, Minoru; Sata, Michio
2012-10-01
Dietary habits are involved in the development of chronic inflammation; however, the impact of dietary profiles of hepatitis C virus carriers with persistently normal alanine transaminase levels (HCV-PNALT) remains unclear. The decision-tree algorithm is a data-mining statistical technique, which uncovers meaningful profiles of factors from a data collection. We aimed to investigate dietary profiles associated with HCV-PNALT using a decision-tree algorithm. Twenty-seven HCV-PNALT and 41 patients with chronic hepatitis C were enrolled in this study. Dietary habit was assessed using a validated semiquantitative food frequency questionnaire. A decision-tree algorithm was created by dietary variables, and was evaluated by area under the receiver operating characteristic curve analysis (AUROC). In multivariate analysis, fish to meat ratio, dairy product and cooking oils were identified as independent variables associated with HCV-PNALT. The decision-tree algorithm was created with two variables: a fish to meat ratio and cooking oils/ideal bodyweight. When subjects showed a fish to meat ratio of 1.24 or more, 68.8% of the subjects were HCV-PNALT. On the other hand, 11.5% of the subjects were HCV-PNALT when subjects showed a fish to meat ratio of less than 1.24 and cooking oil/ideal bodyweight of less than 0.23 g/kg. The difference in the proportion of HCV-PNALT between these groups are significant (odds ratio 16.87, 95% CI 3.40-83.67, P = 0.0005). Fivefold cross-validation of the decision-tree algorithm showed an AUROC of 0.6947 (95% CI 0.5656-0.8238, P = 0.0067). The decision-tree algorithm disclosed that fish to meat ratio and cooking oil/ideal bodyweight were associated with HCV-PNALT. © 2012 The Japan Society of Hepatology.
Martínez-Martínez, F; Rupérez-Moreno, M J; Martínez-Sober, M; Solves-Llorens, J A; Lorente, D; Serrano-López, A J; Martínez-Sanchis, S; Monserrat, C; Martín-Guerrero, J D
2017-11-01
This work presents a data-driven method to simulate, in real-time, the biomechanical behavior of the breast tissues in some image-guided interventions such as biopsies or radiotherapy dose delivery as well as to speed up multimodal registration algorithms. Ten real breasts were used for this work. Their deformation due to the displacement of two compression plates was simulated off-line using the finite element (FE) method. Three machine learning models were trained with the data from those simulations. Then, they were used to predict in real-time the deformation of the breast tissues during the compression. The models were a decision tree and two tree-based ensemble methods (extremely randomized trees and random forest). Two different experimental setups were designed to validate and study the performance of these models under different conditions. The mean 3D Euclidean distance between nodes predicted by the models and those extracted from the FE simulations was calculated to assess the performance of the models in the validation set. The experiments proved that extremely randomized trees performed better than the other two models. The mean error committed by the three models in the prediction of the nodal displacements was under 2 mm, a threshold usually set for clinical applications. The time needed for breast compression prediction is sufficiently short to allow its use in real-time (<0.2 s). Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Wang, Hongcui; Kawahara, Tatsuya
CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.
Analysis of data mining classification by comparison of C4.5 and ID algorithms
NASA Astrophysics Data System (ADS)
Sudrajat, R.; Irianingsih, I.; Krisnawan, D.
2017-01-01
The rapid development of information technology, triggered by the intensive use of information technology. For example, data mining widely used in investment. Many techniques that can be used assisting in investment, the method that used for classification is decision tree. Decision tree has a variety of algorithms, such as C4.5 and ID3. Both algorithms can generate different models for similar data sets and different accuracy. C4.5 and ID3 algorithms with discrete data provide accuracy are 87.16% and 99.83% and C4.5 algorithm with numerical data is 89.69%. C4.5 and ID3 algorithms with discrete data provides 520 and 598 customers and C4.5 algorithm with numerical data is 546 customers. From the analysis of the both algorithm it can classified quite well because error rate less than 15%.
NASA Astrophysics Data System (ADS)
Xu, Yan; Dong, Zhao Yang; Zhang, Rui; Wong, Kit Po
2014-02-01
Maintaining transient stability is a basic requirement for secure power system operations. Preventive control deals with modifying the system operating point to withstand probable contingencies. In this article, a decision tree (DT)-based on-line preventive control strategy is proposed for transient instability prevention of power systems. Given a stability database, a distance-based feature estimation algorithm is first applied to identify the critical generators, which are then used as features to develop a DT. By interpreting the splitting rules of DT, preventive control is realised by formulating the rules in a standard optimal power flow model and solving it. The proposed method is transparent in control mechanism, on-line computation compatible and convenient to deal with multi-contingency. The effectiveness and efficiency of the method has been verified on New England 10-machine 39-bus test system.
Study on Ecological Risk Assessment of Guangxi Coastal Zone Based on 3s Technology
NASA Astrophysics Data System (ADS)
Zhong, Z.; Luo, H.; Ling, Z. Y.; Huang, Y.; Ning, W. Y.; Tang, Y. B.; Shao, G. Z.
2018-05-01
This paper takes Guangxi coastal zone as the study area, following the standards of land use type, divides the coastal zone of ecological landscape into seven kinds of natural wetland landscape types such as woodland, farmland, grassland, water, urban land and wetlands. Using TM data of 2000-2015 such 15 years, with the CART decision tree algorithm, for analysis the characteristic of types of landscape's remote sensing image and build decision tree rules of landscape classification to extract information classification. Analyzing of the evolution process of the landscape pattern in Guangxi coastal zone in nearly 15 years, we may understand the distribution characteristics and change rules. Combined with the natural disaster data, we use of landscape index and the related risk interference degree and construct ecological risk evaluation model in Guangxi coastal zone for ecological risk assessment results of Guangxi coastal zone.
NASA Astrophysics Data System (ADS)
Saran, Sameer; Sterk, Geert; Kumar, Suresh
2007-10-01
Land use/cover is an important watershed surface characteristic that affects surface runoff and erosion. Many of the available hydrological models divide the watershed into Hydrological Response Units (HRU), which are spatial units with expected similar hydrological behaviours. The division into HRU's requires good-quality spatial data on land use/cover. This paper presents different approaches to attain an optimal land use/cover map based on remote sensing imagery for a Himalayan watershed in northern India. First digital classifications using maximum likelihood classifier (MLC) and a decision tree classifier were applied. The results obtained from the decision tree were better and even improved after post classification sorting. But the obtained land use/cover map was not sufficient for the delineation of HRUs, since the agricultural land use/cover class did not discriminate between the two major crops in the area i.e. paddy and maize. Therefore we adopted a visual classification approach using optical data alone and also fused with ENVISAT ASAR data. This second step with detailed classification system resulted into better classification accuracy within the 'agricultural land' class which will be further combined with topography and soil type to derive HRU's for physically-based hydrological modelling.
On Elementary Affective Decisions: To Like Or Not to Like, That Is the Question
Jacobs, Arthur; Hofmann, Markus J.; Kinder, Annette
2016-01-01
Perhaps the most ubiquitous and basic affective decision of daily life is deciding whether we like or dislike something/somebody, or, in terms of psychological emotion theories, whether the object/subject has positive or negative valence. Indeed, people constantly make such liking decisions within a glimpse and, importantly, often without expecting any obvious benefit or knowing the exact reasons for their judgment. In this paper, we review research on such elementary affective decisions (EADs) that entail no direct overt reward with a special focus on Neurocognitive Poetics and discuss methods and models for investigating the neuronal and cognitive-affective bases of EADs to verbal materials with differing degrees of complexity. In line with evolutionary and appraisal theories of (aesthetic) emotions and data from recent neurocognitive studies, the results of a decision tree modeling approach simulating EADs to single words suggest that a main driving force behind EADs is the extent to which such high-dimensional stimuli are associated with the “basic” emotions joy/happiness and disgust. PMID:27933013
Silva, Neuza; Moreira, Helena; Canavarro, Maria Cristina; Carona, Carlos
2018-01-01
Most children and adolescents with chronic health conditions have impaired health-related quality of life and are at high risk of internalizing and externalizing problems. However, few patients present clinically significant symptoms. Using a decision-tree approach, this study aimed to identify risk profiles for psychological problems based on measures that can be easily scored and interpreted by healthcare professionals in pediatric settings. The participants were 736 children and adolescents between 8–18 years of age with asthma, epilepsy, cerebral palsy, type-1diabetes or obesity. The children and adolescents completed self-report measures of health-related quality of life (DISABKIDS-10) and psychological problems (Strengths and Difficulties Questionnaire). Sociodemographic and clinical data were collected from their parents/ physicians. Children and adolescents were classified into the normal (78.5%) or borderline/clinical range (21.5%) according to the Strengths and Difficulties Questionnaire cut-off values for psychological problems. The overall accuracy of the decision-tree model was 78.1% (sensitivity = 71.5%; specificity = 79.9%), with 4 profiles predicting 71.5% of borderline/clinical cases. The strongest predictor of psychological problems was a health-related quality of life standardized score below the threshold of 57.5 for patients with cerebral palsy, epilepsy or obesity and below 70.0 for patients with asthma or diabetes. Other significant predictors were low socio-economic status, single-parent household, medication intake and younger age. The model showed adequate validity (risk = .28, SE = .02) and accuracy (area under the Receiver Operating Characteristic curve = .84; CI = .80/.87). The identification of pediatric patients at high risk for psychological problems may contribute to a more efficient allocation of health resources, particularly with regard to their referral to specialized psychological assessment and intervention. PMID:29852026
Verbakel, Jan Y; Lemiengre, Marieke B; De Burghgraeve, Tine; De Sutter, An; Aertgeerts, Bert; Bullens, Dominique M A; Shinkins, Bethany; Van den Bruel, Ann; Buntinx, Frank
2015-08-07
Acute infection is the most common presentation of children in primary care with only few having a serious infection (eg, sepsis, meningitis, pneumonia). To avoid complications or death, early recognition and adequate referral are essential. Clinical prediction rules have the potential to improve diagnostic decision-making for rare but serious conditions. In this study, we aimed to validate a recently developed decision tree in a new but similar population. Diagnostic accuracy study validating a clinical prediction rule. Acutely ill children presenting to ambulatory care in Flanders, Belgium, consisting of general practice and paediatric assessment in outpatient clinics or the emergency department. Physicians were asked to score the decision tree in every child. The outcome of interest was hospital admission for at least 24 h with a serious infection within 5 days after initial presentation. We report the diagnostic accuracy of the decision tree in sensitivity, specificity, likelihood ratios and predictive values. In total, 8962 acute illness episodes were included, of which 283 lead to admission to hospital with a serious infection. Sensitivity of the decision tree was 100% (95% CI 71.5% to 100%) at a specificity of 83.6% (95% CI 82.3% to 84.9%) in the general practitioner setting with 17% of children testing positive. In the paediatric outpatient and emergency department setting, sensitivities were below 92%, with specificities below 44.8%. In an independent validation cohort, this clinical prediction rule has shown to be extremely sensitive to identify children at risk of hospital admission for a serious infection in general practice, making it suitable for ruling out. NCT02024282. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Fearful and Distracted in School: Predicting Bullying among Youths
ERIC Educational Resources Information Center
Brewer, Steven Lawrence, Jr.; Meckley-Brewer, Hannah; Stinson, Philip M.
2017-01-01
Bullying and aggression in schools can have a traumatic and lasting effect on the well-being of children and youths. Using data from the 2013 National Crime Victimization Survey's School Crime Supplement, this study uses a chi-square automatic interaction detection (CHAID) decision tree and logistic regression models to identify factors that…
Climate analyses to assess risks from invasive forest insects: Simple matching to advanced models
Robert C. Venette
2017-01-01
Purpose of Review. The number of invasive alien insects that adversely affect trees and forests continues to increase as do associated ecological, economic, and sociological impacts. Prevention strategies remain the most cost-effective approach to address the issue, but risk management decisions, particularly those affecting international trade,...
ERIC Educational Resources Information Center
Ben-Porath, Denise D.; Koons, Cedar R.
2005-01-01
Several studies have indicated that telephone coaching can play an important role in psychological intervention (Beebe, 2001; Burgess & Chalder, 2001; Meyersberg, 1985). Less well understood, however, is the role of telephone coaching with severe, complex, multiproblem clients, such as those diagnosed with borderline personality disorder.…
Decay fungi of oaks and associated hardwoods for western arborists
Jessie A. Glaeser; Kevin T. Smith
2010-01-01
Examination of trees for the presence and extent of decay should be part of any hazard tree assessment. Identification of the fungi responsible for the decay improves prediction of tree performance and the quality of management decisions, including tree pruning or removal. Scouting for Sudden Oak Death (SOD) in the West has drawn attention to hardwood tree species,...
Which Types of Leadership Styles Do Followers Prefer? A Decision Tree Approach
ERIC Educational Resources Information Center
Salehzadeh, Reza
2017-01-01
Purpose: The purpose of this paper is to propose a new method to find the appropriate leadership styles based on the followers' preferences using the decision tree technique. Design/methodology/approach: Statistical population includes the students of the University of Isfahan. In total, 750 questionnaires were distributed; out of which, 680…
The Americans with Disabilities Act: A Decision Tree for Social Services Administrators
ERIC Educational Resources Information Center
O'Brien, Gerald V.; Ellegood, Christina
2005-01-01
The 1990 Americans with Disabilities Act has had a profound influence on social workers and social services administrators in virtually all work settings. Because of the multiple elements of the act, however, assessing the validity of claims can be a somewhat arduous and complicated task. This article provides a "decision tree" for…
ERIC Educational Resources Information Center
Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung
2010-01-01
A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…
A decision tree approach using silvics to guide planning for forest restoration
Sharon M. Hermann; John S. Kush; John C. Gilbert
2013-01-01
We created a decision tree based on silvics of longleaf pine (Pinus palustris) and historical descriptions to develop approaches for restoration management at Horseshoe Bend National Military Park located in central Alabama. A National Park Service goal is to promote structure and composition of a forest that likely surrounded the 1814 battlefield....
ERIC Educational Resources Information Center
Thomas, Emily H.; Galambos, Nora
To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kupriyanov, M. S., E-mail: mikhail.kupriyanov@gmail.com; Shukeilo, E. Y., E-mail: eyshukeylo@gmail.com; Shichkina, J. A., E-mail: strange.y@mail.ru
2015-11-17
Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article.
NASA Astrophysics Data System (ADS)
Kupriyanov, M. S.; Shukeilo, E. Y.; Shichkina, J. A.
2015-11-01
Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient's health condition using data from a wearable device considers in this article.
i-Tree: Tools to assess and manage structure, function, and value of community forests
NASA Astrophysics Data System (ADS)
Hirabayashi, S.; Nowak, D.; Endreny, T. A.; Kroll, C.; Maco, S.
2011-12-01
Trees in urban communities can mitigate many adverse effects associated with anthropogenic activities and climate change (e.g. urban heat island, greenhouse gas, air pollution, and floods). To protect environmental and human health, managers need to make informed decisions regarding urban forest management practices. Here we present the i-Tree suite of software tools (www.itreetools.org) developed by the USDA Forest Service and their cooperators. This software suite can help urban forest managers assess and manage the structure, function, and value of urban tree populations regardless of community size or technical capacity. i-Tree is a state-of-the-art, peer-reviewed Windows GUI- or Web-based software that is freely available, supported, and continuously refined by the USDA Forest Service and their cooperators. Two major features of i-Tree are 1) to analyze current canopy structures and identify potential planting spots, and 2) to estimate the environmental benefits provided by the trees, such as carbon storage and sequestration, energy conservation, air pollution removal, and storm water reduction. To cover diverse forest topologies, various tools were developed within the i-Tree suite: i-Tree Design for points (individual trees), i-Tree Streets for lines (street trees), and i-Tree Eco, Vue, and Canopy (in the order of complexity) for areas (community trees). Once the forest structure is identified with these tools, ecosystem services provided by trees can be estimated with common models and protocols, and reports in the form of texts, charts, and figures are then created for users. Since i-Tree was developed with a client/server architecture, nationwide data in the US such as location-related parameters, weather, streamflow, and air pollution data are stored in the server and retrieved to a user's computer at run-time. Freely available remote-sensed images (e.g. NLCD and Google maps) are also employed to estimate tree canopy characteristics. As the demand for i-Tree grows internationally, environmental databases from more countries will be coupled with the software suite. Two more i-Tree applications, i-Tree Forecast and i-Tree Landscape are now under development. i-Tree Forecast simulates canopy structures for up to 100 years based on planting and mortality rates and adds capabilities for other i-Tree applications to estimate the benefits of future canopy scenarios. While most i-Tree applications employ a spatially lumped approach, i-Tree landscape employs a spatially distributed approach that allows users to map changes in canopy cover and ecosystem services through time and space. These new i-Tree tools provide an advanced platform for urban managers to assess the impact of current and future urban forests. i-Tree allows managers to promote effective urban forest management and sound arboricultural practices by providing information for advocacy and planning, baseline data for making informed decisions, and standardization for comparisons with other communities.
Parker, G; McCraw, S; Hadzi-Pavlovic, D
2015-07-15
Studies suggest that differentiating melancholic from non-melancholic depressive disorders is advanced by use of illness course as well as symptom variables but, in practice, potentially differentiating variables are generally positioned as having equal value. Judging that differentiating features are more likely to vary in their signal intensity, we sought to determine the number of features required to effect differentiation and their hierarchical order. The 24-item clinician-rated Sydney Melancholia Prototype Index (SMPI-CR) was completed for 364 unipolar depressed patients. The sample was divided into two cohorts according to the recruitment period. An RPART classification tree analysis identified the most discriminating SMPI items in the development sample of 197 patients, and examined the sensitivity and specificity of the diagnostic decisions, then sought to replicate findings in a validation sample of 169 patients. Independent analyses of putative SMPI items identified only seven items as required to discriminate those with clinically-diagnosed melancholic or non-melancholic depression when the conditions were examined separately. An RPART analysis considering differentiation of melancholic and non-melancholic depression in the total samples retained five of those items in the classification tree, three of which were non-symptom items, and with 92% sensitivity and 80% specificity in the development sample. This reduced item set showed 93% sensitivity and 82% specificity in the validation sample. Our clinical judgment of melancholic or non-melancholic depression may not correspond with the clinical logic employed by other clinicians. Only five SMPI items were required to derive a succinct and efficient decision tree, comprising high sensitivity and specificity in differentiating melancholic and non-melancholic depression. Current study findings provide an empirical model that could enrich clinicians׳ approach to differentiating melancholic and non-melancholic depression. Copyright © 2015 Elsevier B.V. All rights reserved.
Modelling Mediterranean agro-ecosystems by including agricultural trees in the LPJmL model
NASA Astrophysics Data System (ADS)
Fader, M.; von Bloh, W.; Shi, S.; Bondeau, A.; Cramer, W.
2015-11-01
In the Mediterranean region, climate and land use change are expected to impact on natural and agricultural ecosystems by warming, reduced rainfall, direct degradation of ecosystems and biodiversity loss. Human population growth and socioeconomic changes, notably on the eastern and southern shores, will require increases in food production and put additional pressure on agro-ecosystems and water resources. Coping with these challenges requires informed decisions that, in turn, require assessments by means of a comprehensive agro-ecosystem and hydrological model. This study presents the inclusion of 10 Mediterranean agricultural plants, mainly perennial crops, in an agro-ecosystem model (Lund-Potsdam-Jena managed Land - LPJmL): nut trees, date palms, citrus trees, orchards, olive trees, grapes, cotton, potatoes, vegetables and fodder grasses. The model was successfully tested in three model outputs: agricultural yields, irrigation requirements and soil carbon density. With the development presented in this study, LPJmL is now able to simulate in good detail and mechanistically the functioning of Mediterranean agriculture with a comprehensive representation of ecophysiological processes for all vegetation types (natural and agricultural) and in a consistent framework that produces estimates of carbon, agricultural and hydrological variables for the entire Mediterranean basin. This development paves the way for further model extensions aiming at the representation of alternative agro-ecosystems (e.g. agroforestry), and opens the door for a large number of applications in the Mediterranean region, for example assessments of the consequences of land use transitions, the influence of management practices and climate change impacts.
Modelling Mediterranean agro-ecosystems by including agricultural trees in the LPJmL model
NASA Astrophysics Data System (ADS)
Fader, M.; von Bloh, W.; Shi, S.; Bondeau, A.; Cramer, W.
2015-06-01
Climate and land use change in the Mediterranean region is expected to affect natural and agricultural ecosystems by decreases in precipitation, increases in temperature as well as biodiversity loss and anthropogenic degradation of natural resources. Demographic growth in the Eastern and Southern shores will require increases in food production and put additional pressure on agro-ecosystems and water resources. Coping with these challenges requires informed decisions that, in turn, require assessments by means of a comprehensive agro-ecosystem and hydrological model. This study presents the inclusion of 10 Mediterranean agricultural plants, mainly perennial crops, in an agro-ecosystem model (LPJmL): nut trees, date palms, citrus trees, orchards, olive trees, grapes, cotton, potatoes, vegetables and fodder grasses. The model was successfully tested in three model outputs: agricultural yields, irrigation requirements and soil carbon density. With the development presented in this study, LPJmL is now able to simulate in good detail and mechanistically the functioning of Mediterranean agriculture with a comprehensive representation of ecophysiological processes for all vegetation types (natural and agricultural) and in a consistent framework that produces estimates of carbon, agricultural and hydrological variables for the entire Mediterranean basin. This development pave the way for further model extensions aiming at the representation of alternative agro-ecosystems (e.g. agroforestry), and opens the door for a large number of applications in the Mediterranean region, for example assessments on the consequences of land use transitions, the influence of management practices and climate change impacts.
Rau, Cheng-Shyuan; Wu, Shao-Chun; Chien, Peng-Chen; Kuo, Pao-Jen; Chen, Yi-Chun; Hsieh, Hsiao-Yun; Hsieh, Ching-Hua
2017-11-22
Background: In contrast to patients with traumatic subarachnoid hemorrhage (tSAH) in the presence of other types of intracranial hemorrhage, the prognosis of patients with isolated tSAH is good. The incidence of mortality in these patients ranges from 0-2.5%. However, few data or predictive models are available for the identification of patients with a high mortality risk. In this study, we aimed to construct a model for mortality prediction using a decision tree (DT) algorithm, along with data obtained from a population-based trauma registry, in a Level 1 trauma center. Methods: Five hundred and forty-five patients with isolated tSAH, including 533 patients who survived and 12 who died, between January 2009 and December 2016, were allocated to training ( n = 377) or test ( n = 168) sets. Using the data on demographics and injury characteristics, as well as laboratory data of the patients, classification and regression tree (CART) analysis was performed based on the Gini impurity index, using the rpart function in the rpart package in R. Results: In this established DT model, three nodes (head Abbreviated Injury Scale (AIS) score ≤4, creatinine (Cr) <1.4 mg/dL, and age <76 years) were identified as important determinative variables in the prediction of mortality. Of the patients with isolated tSAH, 60% of those with a head AIS >4 died, as did the 57% of those with an AIS score ≤4, but Cr ≥1.4 and age ≥76 years. All patients who did not meet the above-mentioned criteria survived. With all the variables in the model, the DT achieved an accuracy of 97.9% (sensitivity of 90.9% and specificity of 98.1%) and 97.7% (sensitivity of 100% and specificity of 97.7%), for the training set and test set, respectively. Conclusions: The study established a DT model with three nodes (head AIS score ≤4, Cr <1.4, and age <76 years) to predict fatal outcomes in patients with isolated tSAH. The proposed decision-making algorithm may help identify patients with a high risk of mortality.
Classification tree for the assessment of sedentary lifestyle among hypertensive.
Castelo Guedes Martins, Larissa; Venícios de Oliveira Lopes, Marcos; Gomes Guedes, Nirla; Paixão de Menezes, Angélica; de Oliveira Farias, Odaleia; Alves Dos Santos, Naftale
2016-04-01
To develop a classification tree of clinical indicators for the correct prediction of the nursing diagnosis "Sedentary lifestyle" (SL) in people with high blood pressure (HTN). A cross-sectional study conducted in an outpatient care center specializing in high blood pressure and Mellitus diabetes located in northeastern Brazil. The sample consisted of 285 people between 19 and 59 years old diagnosed with high blood pressure and was applied an interview and physical examination, obtaining socio-demographic information, related factors and signs and symptoms that made the defining characteristics for the diagnosis under study. The tree was generated using the CHAID algorithm (Chi-square Automatic Interaction Detection). The construction of the decision tree allowed establishing the interactions between clinical indicators that facilitate a probabilistic analysis of multiple situations allowing quantify the probability of an individual presenting a sedentary lifestyle. The tree included the clinical indicator Choose daily routine without exercise as the first node. People with this indicator showed a probability of 0.88 of presenting the SL. The second node was composed of the indicator Does not perform physical activity during leisure, with 0.99 probability of presenting the SL with these two indicators. The predictive capacity of the tree was established at 69.5%. Decision trees help nurses who care HTN people in decision-making in assessing the characteristics that increase the probability of SL nursing diagnosis, optimizing the time for diagnostic inference.
Prediction of the effect of formulation on the toxicity of chemicals.
Mistry, Pritesh; Neagu, Daniel; Sanchez-Ruiz, Antonio; Trundle, Paul R; Vessey, Jonathan D; Gosling, John Paul
2017-01-01
Two approaches for the prediction of which of two vehicles will result in lower toxicity for anticancer agents are presented. Machine-learning models are developed using decision tree, random forest and partial least squares methodologies and statistical evidence is presented to demonstrate that they represent valid models. Separately, a clustering method is presented that allows the ordering of vehicles by the toxicity they show for chemically-related compounds.
Improving clinical models based on knowledge extracted from current datasets: a new approach.
Mendes, D; Paredes, S; Rocha, T; Carvalho, P; Henriques, J; Morais, J
2016-08-01
The Cardiovascular Diseases (CVD) are the leading cause of death in the world, being prevention recognized to be a key intervention able to contradict this reality. In this context, although there are several models and scores currently used in clinical practice to assess the risk of a new cardiovascular event, they present some limitations. The goal of this paper is to improve the CVD risk prediction taking into account the current models as well as information extracted from real and recent datasets. This approach is based on a decision tree scheme in order to assure the clinical interpretability of the model. An innovative optimization strategy is developed in order to adjust the decision tree thresholds (rule structure is fixed) based on recent clinical datasets. A real dataset collected in the ambit of the National Registry on Acute Coronary Syndromes, Portuguese Society of Cardiology is applied to validate this work. In order to assess the performance of the new approach, the metrics sensitivity, specificity and accuracy are used. This new approach achieves sensitivity, a specificity and an accuracy values of, 80.52%, 74.19% and 77.27% respectively, which represents an improvement of about 26% in relation to the accuracy of the original score.
A model of pathways to artificial superintelligence catastrophe for risk and decision analysis
NASA Astrophysics Data System (ADS)
Barrett, Anthony M.; Baum, Seth D.
2017-03-01
An artificial superintelligence (ASI) is an artificial intelligence that is significantly more intelligent than humans in all respects. Whilst ASI does not currently exist, some scholars propose that it could be created sometime in the future, and furthermore that its creation could cause a severe global catastrophe, possibly even resulting in human extinction. Given the high stakes, it is important to analyze ASI risk and factor the risk into decisions related to ASI research and development. This paper presents a graphical model of major pathways to ASI catastrophe, focusing on ASI created via recursive self-improvement. The model uses the established risk and decision analysis modelling paradigms of fault trees and influence diagrams in order to depict combinations of events and conditions that could lead to AI catastrophe, as well as intervention options that could decrease risks. The events and conditions include select aspects of the ASI itself as well as the human process of ASI research, development and management. Model structure is derived from published literature on ASI risk. The model offers a foundation for rigorous quantitative evaluation and decision-making on the long-term risk of ASI catastrophe.
Data mining application in customer relationship management for hospital inpatients.
Lee, Eun Whan
2012-09-01
This study aims to discover patients loyal to a hospital and model their medical service usage patterns. Consequently, this study proposes a data mining application in customer relationship management (CRM) for hospital inpatients. A recency, frequency, monetary (RFM) model has been applied toward 14,072 patients discharged from a university hospital. Cluster analysis was conducted to segment customers, and it modeled the patterns of the loyal customers' medical services usage via a decision tree. Patients were divided into two groups according to the variables of the RFM model and the group which had significantly high frequency of medical use and expenses was defined as loyal customers, a target market. As a result of the decision tree, the predictable factors of the loyal clients were; length of stay, certainty of selectable treatment, surgery, number of accompanying treatments, kind of patient room, and department from which they were discharged. Particularly, this research showed that when a patient within the internal medicine department who did not have surgery stayed for more than 13.5 days, their probability of being a classified as a loyal customer was 70.0%. To discover a hospital's loyal patients and model their medical usage patterns, the application of data-mining has been suggested. This paper suggests practical use of combining segmentation, targeting, positioning (STP) strategy and the RFM model with data-mining in CRM.
Data Mining Application in Customer Relationship Management for Hospital Inpatients
2012-01-01
Objectives This study aims to discover patients loyal to a hospital and model their medical service usage patterns. Consequently, this study proposes a data mining application in customer relationship management (CRM) for hospital inpatients. Methods A recency, frequency, monetary (RFM) model has been applied toward 14,072 patients discharged from a university hospital. Cluster analysis was conducted to segment customers, and it modeled the patterns of the loyal customers' medical services usage via a decision tree. Results Patients were divided into two groups according to the variables of the RFM model and the group which had significantly high frequency of medical use and expenses was defined as loyal customers, a target market. As a result of the decision tree, the predictable factors of the loyal clients were; length of stay, certainty of selectable treatment, surgery, number of accompanying treatments, kind of patient room, and department from which they were discharged. Particularly, this research showed that when a patient within the internal medicine department who did not have surgery stayed for more than 13.5 days, their probability of being a classified as a loyal customer was 70.0%. Conclusions To discover a hospital's loyal patients and model their medical usage patterns, the application of data-mining has been suggested. This paper suggests practical use of combining segmentation, targeting, positioning (STP) strategy and the RFM model with data-mining in CRM. PMID:23115740
A key for the Forest Service hardwood tree grades
Gary W. Miller; Leland F. Hanks; Harry V., Jr. Wiant
1986-01-01
A dichotomous key organizes the USDA Forest Service hardwood tree grade specifications into a stepwise procedure for those learning to grade hardwood sawtimber. The key addresses the major grade factors, tree size, surface characteristics, and allowable cull deductions in a series of paried choices that lead the user to a decision regarding tree grade.
Inferences from growing trees backwards
David W. Green; Kent A. McDonald
1997-01-01
The objective of this paper is to illustrate how longitudinal stress wave techniques can be useful in tracking the future quality of a growing tree. Monitoring the quality of selected trees in a plantation forest could provide early input to decisions on the effectiveness of management practices, or future utilization options, for trees in a plantation. There will...
Morales, Susana; Barros, Jorge; Echávarri, Orietta; García, Fabián; Osses, Alex; Moya, Claudia; Maino, María Paz; Fischman, Ronit; Núñez, Catalina; Szmulewicz, Tita; Tomicic, Alemka
2017-01-01
In efforts to develop reliable methods to detect the likelihood of impending suicidal behaviors, we have proposed the following. To gain a deeper understanding of the state of suicide risk by determining the combination of variables that distinguishes between groups with and without suicide risk. A study involving 707 patients consulting for mental health issues in three health centers in Greater Santiago, Chile. Using 345 variables, an analysis was carried out with artificial intelligence tools, Cross Industry Standard Process for Data Mining processes, and decision tree techniques. The basic algorithm was top-down, and the most suitable division produced by the tree was selected by using the lowest Gini index as a criterion and by looping it until the condition of belonging to the group with suicidal behavior was fulfilled. Four trees distinguishing the groups were obtained, of which the elements of one were analyzed in greater detail, since this tree included both clinical and personality variables. This specific tree consists of six nodes without suicide risk and eight nodes with suicide risk (tree decision 01, accuracy 0.674, precision 0.652, recall 0.678, specificity 0.670, F measure 0.665, receiver operating characteristic (ROC) area under the curve (AUC) 73.35%; tree decision 02, accuracy 0.669, precision 0.642, recall 0.694, specificity 0.647, F measure 0.667, ROC AUC 68.91%; tree decision 03, accuracy 0.681, precision 0.675, recall 0.638, specificity 0.721, F measure, 0.656, ROC AUC 65.86%; tree decision 04, accuracy 0.714, precision 0.734, recall 0.628, specificity 0.792, F measure 0.677, ROC AUC 58.85%). This study defines the interactions among a group of variables associated with suicidal ideation and behavior. By using these variables, it may be possible to create a quick and easy-to-use tool. As such, psychotherapeutic interventions could be designed to mitigate the impact of these variables on the emotional state of individuals, thereby reducing eventual risk of suicide. Such interventions may reinforce psychological well-being, feelings of self-worth, and reasons for living, for each individual in certain groups of patients.
NASA Astrophysics Data System (ADS)
Kaur, Parneet; Singh, Sukhwinder; Garg, Sushil; Harmanpreet
2010-11-01
In this paper we study about classification algorithms for farm DSS. By applying classification algorithms i.e. Limited search, ID3, CHAID, C4.5, Improved C4.5 and One VS all Decision Tree on common data set of crop with specified class, results are obtained. The tool used to derive results is SPINA. The graphical results obtained from tool are compared to suggest best technique to develop farm Decision Support System. This analysis would help to researchers to design effective and fast DSS for farmer to take decision for enhancing their yield.
Tree detection in orchards from VHR satellite images using scale-space theory
NASA Astrophysics Data System (ADS)
Mahour, Milad; Tolpekin, Valentyn; Stein, Alfred
2016-10-01
This study focused on extracting reliable and detailed information from very High Resolution (VHR) satellite images for the detection of individual trees in orchards. The images contain detailed information on spectral and geometrical properties of trees. Their scale level, however, is insufficient for spectral properties of individual trees, because adjacent tree canopies interlock. We modeled trees using a bell shaped spectral profile. Identifying the brightest peak was challenging due to sun illumination effects caused 1 by differences in positions of the sun and the satellite sensor. Crown boundary detection was solved by using the NDVI from the same image. We used Gaussian scale-space methods that search for extrema in the scale-space domain. The procedures were tested on two orchards with different tree types, tree sizes and tree observation patterns in Iran. Validation was done using reference data derived from an UltraCam digital aerial photo. Local extrema of the determinant of the Hessian corresponded well to the geographical coordinates and the size of individual trees. False detections arising from a slight asymmetry of trees were distinguished from multiple detections of the same tree with different extents. Uncertainty assessment was carried out on the presence and spatial extents of individual trees. The study demonstrated how the suggested approach can be used for image segmentation for orchards with different types of trees. We concluded that Gaussian scale-space theory can be applied to extract information from VHR satellite images for individual tree detection. This may lead to improved decision making for irrigation and crop water requirement purposes in future studies.
A Clinical Decision Support System for Breast Cancer Patients
NASA Astrophysics Data System (ADS)
Fernandes, Ana S.; Alves, Pedro; Jarman, Ian H.; Etchells, Terence A.; Fonseca, José M.; Lisboa, Paulo J. G.
This paper proposes a Web clinical decision support system for clinical oncologists and for breast cancer patients making prognostic assessments, using the particular characteristics of the individual patient. This system comprises three different prognostic modelling methodologies: the clinically widely used Nottingham prognostic index (NPI); the Cox regression modelling and a partial logistic artificial neural network with automatic relevance determination (PLANN-ARD). All three models yield a different prognostic index that can be analysed together in order to obtain a more accurate prognostic assessment of the patient. Missing data is incorporated in the mentioned models, a common issue in medical data that was overcome using multiple imputation techniques. Risk group assignments are also provided through a methodology based on regression trees, where Boolean rules can be obtained expressed with patient characteristics.
Uninjured trees - a meaningful guide to white-pine weevil control decisions
William E. Waters
1962-01-01
The white-pine weevil, Pissodes strobi, is a particularly insidious forest pest that can render a stand of host trees virtually worthless. It rarely, if ever, kills a tree; but the crooks, forks, and internal defects that develop in attacked trees over a period of years may reduce the merchantable volume and value of the tree at harvest age to zero. Dollar losses are...
Compensatory value of urban trees in the United States
David J. Nowak; Daniel E. Crane; John F. Dwyer
2002-01-01
Understanding the value of an urban forest can give decision makers a better foundation for urban tree namagement. Based on tree-valuation methods of the Council of Tree and Landscape Appraisers and field data from eight cities, total compensatory value of tree populations in U.S. cities ranges from $101 million in Jersey City, New Jersey, to $6.2 billion in New York,...
Prognostic Factors and Decision Tree for Long-term Survival in Metastatic Uveal Melanoma.
Lorenzo, Daniel; Ochoa, María; Piulats, Josep Maria; Gutiérrez, Cristina; Arias, Luis; Català, Jaum; Grau, María; Peñafiel, Judith; Cobos, Estefanía; Garcia-Bru, Pere; Rubio, Marcos Javier; Padrón-Pérez, Noel; Dias, Bruno; Pera, Joan; Caminal, Josep Maria
2017-12-04
The purpose of this study was to demonstrate the existence of a bimodal survival pattern in metastatic uveal melanoma. Secondary aims were to identify the characteristics and prognostic factors associated with long-term survival and to develop a clinical decision tree. The medical records of 99 metastatic uveal melanoma patients were retrospectively reviewed. Patients were classified as either short (≤ 12 months) or long-term survivors (> 12 months) based on a graphical interpretation of the survival curve after diagnosis of the first metastatic lesion. Ophthalmic and oncological characteristics were assessed in both groups. Of the 99 patients, 62 (62.6%) were classified as short-term survivors, and 37 (37.4%) as long-term survivors. The multivariate analysis identified the following predictors of long-term survival: age ≤ 65 years (p=0.012) and unaltered serum lactate dehydrogenase levels (p=0.018); additionally, the size (smaller vs. larger) of the largest liver metastasis showed a trend towards significance (p=0.063). Based on the variables significantly associated with long-term survival, we developed a decision tree to facilitate clinical decision-making. The findings of this study demonstrate the existence of a bimodal survival pattern in patients with metastatic uveal melanoma. The presence of certain clinical characteristics at diagnosis of distant disease is associated with long-term survival. A decision tree was developed to facilitate clinical decision-making and to counsel patients about the expected course of disease.
ERIC Educational Resources Information Center
Tansy, Michael
2009-01-01
The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…
Phytotechnology Technical and Regulatory Guidance Document
2001-04-01
contaminated media is rather new. Throughout the development process of this document, we referred to the science as “ phytoremediation .” Recently...the media containing contaminants, we now refer to “phytotechnologies” as the overarching terminology, while using “ phytoremediation ” more...publication of the ITRC document, Phytoremediation Decision Tree. The decision tree was designed to allow potential users to take basic information
Özdemir, Merve Erkınay; Telatar, Ziya; Eroğul, Osman; Tunca, Yusuf
2018-05-01
Dysmorphic syndromes have different facial malformations. These malformations are significant to an early diagnosis of dysmorphic syndromes and contain distinctive information for face recognition. In this study we define the certain features of each syndrome by considering facial malformations and classify Fragile X, Hurler, Prader Willi, Down, Wolf Hirschhorn syndromes and healthy groups automatically. The reference points are marked on the face images and ratios between the points' distances are taken into consideration as features. We suggest a neural network based hierarchical decision tree structure in order to classify the syndrome types. We also implement k-nearest neighbor (k-NN) and artificial neural network (ANN) classifiers to compare classification accuracy with our hierarchical decision tree. The classification accuracy is 50, 73 and 86.7% with k-NN, ANN and hierarchical decision tree methods, respectively. Then, the same images are shown to a clinical expert who achieve a recognition rate of 46.7%. We develop an efficient system to recognize different syndrome types automatically in a simple, non-invasive imaging data, which is independent from the patient's age, sex and race at high accuracy. The promising results indicate that our method can be used for pre-diagnosis of the dysmorphic syndromes by clinical experts.
Intelligent Diagnostic Assistant for Complicated Skin Diseases through C5's Algorithm.
Jeddi, Fatemeh Rangraz; Arabfard, Masoud; Kermany, Zahra Arab
2017-09-01
Intelligent Diagnostic Assistant can be used for complicated diagnosis of skin diseases, which are among the most common causes of disability. The aim of this study was to design and implement a computerized intelligent diagnostic assistant for complicated skin diseases through C5's Algorithm. An applied-developmental study was done in 2015. Knowledge base was developed based on interviews with dermatologists through questionnaires and checklists. Knowledge representation was obtained from the train data in the database using Excel Microsoft Office. Clementine Software and C5's Algorithms were applied to draw the decision tree. Analysis of test accuracy was performed based on rules extracted using inference chains. The rules extracted from the decision tree were entered into the CLIPS programming environment and the intelligent diagnostic assistant was designed then. The rules were defined using forward chaining inference technique and were entered into Clips programming environment as RULE. The accuracy and error rates obtained in the training phase from the decision tree were 99.56% and 0.44%, respectively. The accuracy of the decision tree was 98% and the error was 2% in the test phase. Intelligent diagnostic assistant can be used as a reliable system with high accuracy, sensitivity, specificity, and agreement.
Data mining for multiagent rules, strategies, and fuzzy decision tree structure
NASA Astrophysics Data System (ADS)
Smith, James F., III; Rhyne, Robert D., II; Fisher, Kristin
2002-03-01
A fuzzy logic based resource manager (RM) has been developed that automatically allocates electronic attack resources in real-time over many dissimilar platforms. Two different data mining algorithms have been developed to determine rules, strategies, and fuzzy decision tree structure. The first data mining algorithm uses a genetic algorithm as a data mining function and is called from an electronic game. The game allows a human expert to play against the resource manager in a simulated battlespace with each of the defending platforms being exclusively directed by the fuzzy resource manager and the attacking platforms being controlled by the human expert or operating autonomously under their own logic. This approach automates the data mining problem. The game automatically creates a database reflecting the domain expert's knowledge. It calls a data mining function, a genetic algorithm, for data mining of the database as required and allows easy evaluation of the information mined in the second step. The criterion for re- optimization is discussed as well as experimental results. Then a second data mining algorithm that uses a genetic program as a data mining function is introduced to automatically discover fuzzy decision tree structures. Finally, a fuzzy decision tree generated through this process is discussed.
Predictors of suicidal ideation in older people: a decision tree analysis.
Handley, Tonelle E; Hiles, Sarah A; Inder, Kerry J; Kay-Lambkin, Frances J; Kelly, Brian J; Lewin, Terry J; McEvoy, Mark; Peel, Roseanne; Attia, John R
2014-11-01
Suicide among older adults is a major public health issue worldwide. Although studies have identified psychological, physical, and social contributors to suicidal thoughts in older adults, few have explored the specific interactions between these factors. This article used a novel statistical approach to explore predictors of suicidal ideation in a community-based sample of older adults. Prospective cohort study. Participants aged 55-85 years were randomly selected from the Hunter Region, a large regional center in New South Wales, Australia. Baseline psychological, physical, and social factors, including psychological distress, physical functioning, and social support, were used to predict suicidal ideation at the 5-year follow-up. Classification and regression tree modeling was used to determine specific risk profiles for participants depending on their individual well-being in each of these key areas. Psychological distress was the strongest predictor, with 25% of people with high distress reporting suicidal ideation. Within high psychological distress, lower physical functioning significantly increased the likelihood of suicidal ideation, with high distress and low functioning being associated with ideation in 50% of cases. A substantial subgroup reported suicidal ideation in the absence of psychological distress; dissatisfaction with social support was the most important predictor among this group. The performance of the model was high (area under the curve: 0.81). Decision tree modeling enabled individualized "risk" profiles for suicidal ideation to be determined. Although psychological factors are important for predicting suicidal ideation, both physical and social factors significantly improved the predictive ability of the model. Assessing these factors may enhance identification of older people at risk of suicidal ideation. Copyright © 2014. Published by Elsevier Inc.
A BiomeBGC-based Evaluation of Dryness Stress of Central European Forests
NASA Astrophysics Data System (ADS)
Buddenbaum, H.; Hientgen, J.; Dotzler, S.; Werner, W.; Hill, J.
2015-04-01
Dryness stress is expected to become a more common problem in central European forests due to the predicted regional climate change. Forest management has to adapt to climate change in time and think ahead several decades in decisions on which tree species to plant at which locations. The summer of 2003 was the most severe dryness event in recent time, but more periods like this are expected. Since forests on different sites react quite differently to drought conditions, we used the process-based growth model BiomeBGC and climate time series from sites all over Germany to simulate the reaction of deciduous and coniferous tree stands in different characteristics of drought stress. Times with exceptionally high values of water vapour pressure deficit coincided with negative modelled values of net primary production (NPP). In addition, in these warmest periods the usually positive relationship between temperature and NPP was inversed, i.e., under stress conditions, more sunlight does not lead to more photosynthesis but to stomatal closure and reduced productivity. Thus we took negative NPP as an indicator for drought stress. In most regions, 2003 was the year with the most intense stress, but the results were quite variable regionally. We used the Modis MOD17 gross and net primary production product time series and MOD12 land cover classification to validate the spatial patterns observed in the model runs and found good agreement between modelled and observed behaviour. Thus, BiomeBGC simulations with realistic site parameterization and climate data in combination with species- and variety-specific ecophysiological constants can be used to assist in decisions on which trees to plant on a given site.
Hatz, Maximilian H M; Leidl, Reiner; Yates, Nichola A; Stollenwerk, Björn
2014-04-01
Thrombosis inhibitors can be used to treat acute coronary syndromes (ACS). However, there are various alternative treatment strategies, of which some have been compared using health economic decision models. To assess the quality of health economic decision models comparing thrombosis inhibitors in patients with ACS undergoing percutaneous coronary intervention, and to identify areas for quality improvement. The literature databases MEDLINE, EMBASE, EconLit, National Health Service Economic Evaluation Database (NHS EED), Database of Abstracts of Reviews of Effects (DARE) and Health Technology Assessment (HTA). A review of the quality of health economic decision models was conducted by two independent reviewers, using the Philips checklist. Twenty-one relevant studies were identified. Differences were apparent regarding the model type (six decision trees, four Markov models, eight combinations, three undefined models), the model structure (types of events, Markov states) and the incorporation of data (efficacy, cost and utility data). Critical issues were the absence of particular events (e.g. thrombocytopenia, stroke) and questionable usage of utility values within some studies. As we restricted our search to health economic decision models comparing thrombosis inhibitors, interesting aspects related to the quality of studies of adjacent medical areas that compared stents or procedures could have been missed. This review identified areas where recommendations are indicated regarding the quality of future ACS decision models. For example, all critical events and relevant treatment options should be included. Models also need to allow for changing event probabilities to correctly reflect ACS and to incorporate appropriate, age-specific utility values and decrements when conducting cost-utility analyses.
Machine Learning Through Signature Trees. Applications to Human Speech.
ERIC Educational Resources Information Center
White, George M.
A signature tree is a binary decision tree used to classify unknown patterns. An attempt was made to develop a computer program for manipulating signature trees as a general research tool for exploring machine learning and pattern recognition. The program was applied to the problem of speech recognition to test its effectiveness for a specific…
Predicting Tillage Patterns in the Tiffin River Watershed Using Remote Sensing Methods
NASA Astrophysics Data System (ADS)
Brooks, C.; McCarty, J. L.; Dean, D. B.; Mann, B. F.
2012-12-01
Previous research in tillage mapping has focused primarily on utilizing low to no-cost, moderate (30 m to 15 m) resolution satellite data. Successful data processing techniques published in the scientific literature have focused on extracting and/or classifying tillage patterns through manipulation of spectral bands. For instance, Daughtry et al. (2005) evaluated several spectral indices for crop residue cover using satellite multispectral and hyperspectral data and to categorize soil tillage intensity in agricultural fields. A weak to moderate relationship between Landsat Thematic Mapper (TM) indices and crop residue cover was found; similar results were reported in Minnesota. Building on the findings from the scientific literature and previous work done by MTRI in the heavily agricultural Tiffin watershed of northwest Ohio and southeast Michigan, a decision tree classifier approach (also referred to as a classification tree) was used, linking several satellite data to on-the-ground tillage information in order to boost classification results. This approach included five tillage indices and derived products. A decision tree methodology enabled the development of statistically optimized (i.e., minimizing misclassification rates) classification algorithms at various desired time steps: monthly, seasonally, and annual over the 2006-2010 time period. Due to their flexibility, processing speed, and availability within all major remote sensing and statistical software packages, decision trees can ingest several data inputs from multiple sensors and satellite products, selecting only the bands, band ratios, indices, and products that further reduce misclassification errors. The project team created crop-specific tillage pattern classification trees whereby a training data set (~ 50% of available ground data) was created for production of the actual decision tree and a validation data set was set aside (~ 50% of available ground data) in order to assess the accuracy of the classification. A seasonal time step was used, optimizing a decision tree based on seasonal ground data for tillage patterns and satellite data and products for years 2006 through 2010. Annual crop type maps derived by the project team and the USDA Cropland Data Layer project was used an input to understand locations of corn, soybeans, wheat, etc. on a yearly basis. As previously stated, the robustness of the decision tree approach is the ability to implement various satellite data and products across temporal, spectral, and spatial resolutions, thereby improving the resulting classification and providing a reliable method that is not sensor-dependent. Tillage pattern classification from satellite imagery is not a simple task and has proven a challenge to previous researchers investigating this remote sensing topic. The team's decision tree method produced a practical, usable output within a focused project time period. Daughtry, C.S.T., Hunt Jr., E.R., Doraiswamy, P.C., McMurtrey III, J.E. 2005. Remote sensing the spatial distribution of crop residues. Agron. J. 97, 864-871.
Connecting Psychological Science with Climate Change: A Persuasion and Social Influence Assignment
ERIC Educational Resources Information Center
Munro, Geoffrey D.; Behlen, Margaret M.
2017-01-01
Students often have little understanding of the role psychological science plays in informing us about the impact of human behavior when addressing climate change. We designed an assignment for a social psychology course based on Frantz and Mayer's use of the decision tree model of helping behavior to identify the psychological barriers that…
40 CFR Appendix C to Part 112 - Substantial Harm Criteria
Code of Federal Regulations, 2011 CFR
2011-07-01
...: 2 Huang, J.C. and Monastero, F.C., 1982. Review of the State-of-the-Art of Oil Pollution Models... POLLUTION PREVENTION Pt. 112, App. C Appendix C to Part 112—Substantial Harm Criteria 1.0Introduction The flowchart provided in Attachment C-I to this appendix shows the decision tree with the criteria to identify...
40 CFR Appendix C to Part 112 - Substantial Harm Criteria
Code of Federal Regulations, 2010 CFR
2010-07-01
...: 2 Huang, J.C. and Monastero, F.C., 1982. Review of the State-of-the-Art of Oil Pollution Models... POLLUTION PREVENTION Pt. 112, App. C Appendix C to Part 112—Substantial Harm Criteria 1.0Introduction The flowchart provided in Attachment C-I to this appendix shows the decision tree with the criteria to identify...
ERIC Educational Resources Information Center
Chen, Chih-Ming; Wang, Jung-Ying; Chen, Yong-Ting; Wu, Jhih-Hao
2016-01-01
To reduce effectively the reading anxiety of learners while reading English articles, a C4.5 decision tree, a widely used data mining technique, was used to develop a personalized reading anxiety prediction model (PRAPM) based on individual learners' reading annotation behavior in a collaborative digital reading annotation system (CDRAS). In…
Predicting the cover-up of dead branches using a simple single regressor equation
Christopher M. Oswalt; Wayne K. Clatterbuck; E.C. Burkhardt
2007-01-01
Information on the effects of branch diameter on branch occlusion is necessary for building models capable of forecasting the effect of management decisions on tree or log grade. We investigated the relationship between branch size and subsequent branch occlusion through diameter growth with special attention toward the development of a simple single regressor equation...
NASA Astrophysics Data System (ADS)
Dogon-Yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.
2016-09-01
Mapping of trees plays an important role in modern urban spatial data management, as many benefits and applications inherit from this detailed up-to-date data sources. Timely and accurate acquisition of information on the condition of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting trees include ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraints, such as labour intensive field work and a lot of financial requirement which can be overcome by means of integrated LiDAR and digital image datasets. Compared to predominant studies on trees extraction mainly in purely forested areas, this study concentrates on urban areas, which have a high structural complexity with a multitude of different objects. This paper presented a workflow about semi-automated approach for extracting urban trees from integrated processing of airborne based LiDAR point cloud and multispectral digital image datasets over Istanbul city of Turkey. The paper reveals that the integrated datasets is a suitable technology and viable source of information for urban trees management. As a conclusion, therefore, the extracted information provides a snapshot about location, composition and extent of trees in the study area useful to city planners and other decision makers in order to understand how much canopy cover exists, identify new planting, removal, or reforestation opportunities and what locations have the greatest need or potential to maximize benefits of return on investment. It can also help track trends or changes to the urban trees over time and inform future management decisions.
A Decision-Tree Approach to Cost Comparison of Newborn Screening Strategies for Cystic Fibrosis
Wells, Janelle; Rosenberg, Marjorie; Hoffman, Gary; Anstead, Michael
2012-01-01
OBJECTIVE: Because cystic fibrosis can be difficult to diagnose and treat early, newborn screening programs have rapidly developed nationwide but methods vary widely. We therefore investigated the costs and consequences or specific outcomes of the 2 most commonly used methods. METHODS: With available data on screening and follow-up, we used a simulation approach with decision trees to compare immunoreactive trypsinogen (IRT) screening followed by a second IRT test against an IRT/DNA analysis. By using a Monte Carlo simulation program, variation in the model parameters for counts at various nodes of the decision trees, as well as for costs, are included and applied to fictional cohorts of 100 000 newborns. The outcome measures included the numbers of newborns given a diagnosis of cystic fibrosis and costs of screening strategy at each branch and cost per newborn. RESULTS: Simulations revealed a substantial number of potential missed diagnoses for the IRT/IRT system versus IRT/DNA. Although the IRT/IRT strategy with commonly used cutoff values offers an average overall cost savings of $2.30 per newborn, a breakdown of costs by societal segments demonstrated higher out-of-pocket costs for families. Two potential system failures causing delayed diagnoses were identified relating to the screening protocols and the follow-up system. CONCLUSIONS: The IRT/IRT screening algorithm reduces the costs to laboratories and insurance companies but has more system failures. IRT/DNA offers other advantages, including fewer delayed diagnoses and lower out-of-pocket costs to families. PMID:22291119
Economic analysis of emerald ash borer (Coleoptera: Buprestidae) management options.
Vannatta, A R; Hauer, R H; Schuettpelz, N M
2012-02-01
Emerald ash borer, Agrilus planipennis (Fairmaire) (Coleoptera: Buprestidae), plays a significant role in the health and extent of management of native North American ash species in urban forests. An economic analysis of management options was performed to aid decision makers in preparing for likely future infestations. Separate ash tree population valuations were derived from the i-Tree Streets program and the Council of Tree and Landscape Appraisers (CTLA) methodology. A relative economic analysis was used to compare a control option (do-nothing approach, only removing ash trees as they die) to three distinct management options: 1) preemptive removal of all ash trees over a 5 yr period, 2) preemptive removal of all ash trees and replacement with comparable nonash trees, or 3) treating the entire population of ash trees with insecticides to minimize mortality. For each valuation and management option, an annual analysis was performed for both the remaining ash tree population and those lost to emerald ash borer. Retention of ash trees using insecticide treatments typically retained greater urban forest value, followed by doing nothing (control), which was better than preemptive removal and replacement. Preemptive removal without tree replacement, which was the least expensive management option, also provided the lowest net urban forest value over the 20-yr simulation. A "no emerald ash borer" scenario was modeled to further serve as a benchmark for each management option and provide a level of economic justification for regulatory programs aimed at slowing the movement of emerald ash borer.
NASA Technical Reports Server (NTRS)
Buntine, Wray
1994-01-01
IND computer program introduces Bayesian and Markov/maximum-likelihood (MML) methods and more-sophisticated methods of searching in growing trees. Produces more-accurate class-probability estimates important in applications like diagnosis. Provides range of features and styles with convenience for casual user, fine-tuning for advanced user or for those interested in research. Consists of four basic kinds of routines: data-manipulation, tree-generation, tree-testing, and tree-display. Written in C language.
Interpretable Categorization of Heterogeneous Time Series Data
NASA Technical Reports Server (NTRS)
Lee, Ritchie; Kochenderfer, Mykel J.; Mengshoel, Ole J.; Silbermann, Joshua
2017-01-01
We analyze data from simulated aircraft encounters to validate and inform the development of a prototype aircraft collision avoidance system. The high-dimensional and heterogeneous time series dataset is analyzed to discover properties of near mid-air collisions (NMACs) and categorize the NMAC encounters. Domain experts use these properties to better organize and understand NMAC occurrences. Existing solutions either are not capable of handling high-dimensional and heterogeneous time series datasets or do not provide explanations that are interpretable by a domain expert. The latter is critical to the acceptance and deployment of safety-critical systems. To address this gap, we propose grammar-based decision trees along with a learning algorithm. Our approach extends decision trees with a grammar framework for classifying heterogeneous time series data. A context-free grammar is used to derive decision expressions that are interpretable, application-specific, and support heterogeneous data types. In addition to classification, we show how grammar-based decision trees can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply grammar-based decision trees to a simulated aircraft encounter dataset and evaluate the performance of four variants of our learning algorithm. The best algorithm is used to analyze and categorize near mid-air collisions in the aircraft encounter dataset. We describe each discovered category in detail and discuss its relevance to aircraft collision avoidance.
NASA Astrophysics Data System (ADS)
Hao Chiang, Shou; Valdez, Miguel; Chen, Chi-Farn
2016-06-01
Forest is a very important ecosystem and natural resource for living things. Based on forest inventories, government is able to make decisions to converse, improve and manage forests in a sustainable way. Field work for forestry investigation is difficult and time consuming, because it needs intensive physical labor and the costs are high, especially surveying in remote mountainous regions. A reliable forest inventory can give us a more accurate and timely information to develop new and efficient approaches of forest management. The remote sensing technology has been recently used for forest investigation at a large scale. To produce an informative forest inventory, forest attributes, including tree species are unavoidably required to be considered. In this study the aim is to classify forest tree species in Erdenebulgan County, Huwsgul province in Mongolia, using Maximum Entropy method. The study area is covered by a dense forest which is almost 70% of total territorial extension of Erdenebulgan County and is located in a high mountain region in northern Mongolia. For this study, Landsat satellite imagery and a Digital Elevation Model (DEM) were acquired to perform tree species mapping. The forest tree species inventory map was collected from the Forest Division of the Mongolian Ministry of Nature and Environment as training data and also used as ground truth to perform the accuracy assessment of the tree species classification. Landsat images and DEM were processed for maximum entropy modeling, and this study applied the model with two experiments. The first one is to use Landsat surface reflectance for tree species classification; and the second experiment incorporates terrain variables in addition to the Landsat surface reflectance to perform the tree species classification. All experimental results were compared with the tree species inventory to assess the classification accuracy. Results show that the second one which uses Landsat surface reflectance coupled with terrain variables produced better result, with the higher overall accuracy and kappa coefficient than first experiment. The results indicate that the Maximum Entropy method is an applicable, and to classify tree species using satellite imagery data coupled with terrain information can improve the classification of tree species in the study area.
Graphic Representations as Tools for Decision Making.
ERIC Educational Resources Information Center
Howard, Judith
2001-01-01
Focuses on the use of graphic representations to enable students to improve their decision making skills in the social studies. Explores three visual aids used in assisting students with decision making: (1) the force field; (2) the decision tree; and (3) the decision making grid. (CMK)
2012-09-01
supported by the National Science Foundation (NSF) IGERT 9972762, the Army Research Institute (ARI) W91WAW07C0063, the Army Research Laboratory (ARL/CTA...prediction models in AutoMap .................................................. 144 Figure 13: Decision Tree for prediction model selection in...generated for nationally funded initiatives and made available through the Linguistic Data Consortium (LDC). An overview of these datasets is provided in
The use of economic evaluation in CAM: an introductory framework
2010-01-01
Background For CAM to feature prominently in health care decision-making there is a need to expand the evidence-base and to further incorporate economic evaluation into research priorities. In a world of scarce health care resources and an emphasis on efficiency and clinical efficacy, CAM, as indeed do all other treatments, requires rigorous evaluation to be considered in budget decision-making. Methods Economic evaluation provides the tools to measure the costs and health consequences of CAM interventions and thereby inform decision making. This article offers CAM researchers an introductory framework for understanding, undertaking and disseminating economic evaluation. The types of economic evaluation available for the study of CAM are discussed, and decision modelling is introduced as a method for economic evaluation with much potential for use in CAM. Two types of decision models are introduced, decision trees and Markov models, along with a worked example of how each method is used to examine costs and health consequences. This is followed by a discussion of how this information is used by decision makers. Conclusions Undoubtedly, economic evaluation methods form an important part of health care decision making. Without formal training it can seem a daunting task to consider economic evaluation, however, multidisciplinary teams provide an opportunity for health economists, CAM practitioners and other interested researchers, to work together to further develop the economic evaluation of CAM. PMID:21067622
The use of economic evaluation in CAM: an introductory framework.
Ford, Emily; Solomon, Daniela; Adams, Jon; Graves, Nicholas
2010-11-11
For CAM to feature prominently in health care decision-making there is a need to expand the evidence-base and to further incorporate economic evaluation into research priorities.In a world of scarce health care resources and an emphasis on efficiency and clinical efficacy, CAM, as indeed do all other treatments, requires rigorous evaluation to be considered in budget decision-making. Economic evaluation provides the tools to measure the costs and health consequences of CAM interventions and thereby inform decision making. This article offers CAM researchers an introductory framework for understanding, undertaking and disseminating economic evaluation. The types of economic evaluation available for the study of CAM are discussed, and decision modelling is introduced as a method for economic evaluation with much potential for use in CAM. Two types of decision models are introduced, decision trees and Markov models, along with a worked example of how each method is used to examine costs and health consequences. This is followed by a discussion of how this information is used by decision makers. Undoubtedly, economic evaluation methods form an important part of health care decision making. Without formal training it can seem a daunting task to consider economic evaluation, however, multidisciplinary teams provide an opportunity for health economists, CAM practitioners and other interested researchers, to work together to further develop the economic evaluation of CAM.
NASA Astrophysics Data System (ADS)
Coopersmith, Evan Joseph
The techniques and information employed for decision-making vary with the spatial and temporal scope of the assessment required. In modern agriculture, the farm owner or manager makes decisions on a day-to-day or even hour-to-hour basis for dozens of fields scattered over as much as a fifty-mile radius from some central location. Following precipitation events, land begins to dry. Land-owners and managers often trace serpentine paths of 150+ miles every morning to inspect the conditions of their various parcels. His or her objective lies in appropriate resource usage -- is a given tract of land dry enough to be workable at this moment or would he or she be better served waiting patiently? Longer-term, these owners and managers decide upon which seeds will grow most effectively and which crops will make their operations profitable. At even longer temporal scales, decisions are made regarding which fields must be acquired and sold and what types of equipment will be necessary in future operations. This work develops and validates algorithms for these shorter-term decisions, along with models of national climate patterns and climate changes to enable longer-term operational planning. A test site at the University of Illinois South Farms (Urbana, IL, USA) served as the primary location to validate machine learning algorithms, employing public sources of precipitation and potential evapotranspiration to model the wetting/drying process. In expanding such local decision support tools to locations on a national scale, one must recognize the heterogeneity of hydroclimatic and soil characteristics throughout the United States. Machine learning algorithms modeling the wetting/drying process must address this variability, and yet it is wholly impractical to construct a separate algorithm for every conceivable location. For this reason, a national hydrological classification system is presented, allowing clusters of hydroclimatic similarity to emerge naturally from annual regime curve data and facilitate the development of cluster-specific algorithms. Given the desire to enable intelligent decision-making at any location, this classification system is developed in a manner that will allow for classification anywhere in the U.S., even in an ungauged basin. Daily time series data from 428 catchments in the MOPEX database are analyzed to produce an empirical classification tree, partitioning the United States into regions of hydroclimatic similarity. In constructing a classification tree based upon 55 years of data, it is important to recognize the non-stationary nature of climate data. The shifts in climatic regimes will cause certain locations to shift their ultimate position within the classification tree, requiring decision-makers to alter land usage, farming practices, and equipment needs, and algorithms to adjust accordingly. This work adapts the classification model to address the issue of regime shifts over larger temporal scales and suggests how land-usage and farming protocol may vary from hydroclimatic shifts in decades to come. Finally, the generalizability of the hydroclimatic classification system is tested with a physically-based soil moisture model calibrated at several locations throughout the continental United States. The soil moisture model is calibrated at a given site and then applied with the same parameters at other sites within and outside the same hydroclimatic class. The model's performance deteriorates minimally if the calibration and validation location are within the same hydroclimatic class, but deteriorates significantly if the calibration and validates sites are located in different hydroclimatic classes. These soil moisture estimates at the field scale are then further refined by the introduction of LiDAR elevation data, distinguishing faster-drying peaks and ridges from slower-drying valleys. The inclusion of LiDAR enabled multiple locations within the same field to be predicted accurately despite non-identical topography. This cross-application of parametric calibrations and LiDAR-driven disaggregation facilitates decision-support at locations without proximally-located soil moisture sensors.
The Effect of Defense R&D Expenditures on Military Capability and Technological Spillover
2013-03-01
ix List of Figures Page Figure 1. Decision Tree for Sectoring R&D Units...approach, often called sectoring , categorizes R&D activities by funding source, and the functional approach categorizes R&D activities by their objective...economic objectives (defense, and control and care of environment) (OECD, 2002). Figure 1 shows the decision tree for sectoring R&D units and
A review on machine learning principles for multi-view biological data integration.
Li, Yifeng; Wu, Fang-Xiang; Ngom, Alioune
2018-03-01
Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.
Accurate reliability analysis method for quantum-dot cellular automata circuits
NASA Astrophysics Data System (ADS)
Cui, Huanqing; Cai, Li; Wang, Sen; Liu, Xiaoqiang; Yang, Xiaokuo
2015-10-01
Probabilistic transfer matrix (PTM) is a widely used model in the reliability research of circuits. However, PTM model cannot reflect the impact of input signals on reliability, so it does not completely conform to the mechanism of the novel field-coupled nanoelectronic device which is called quantum-dot cellular automata (QCA). It is difficult to get accurate results when PTM model is used to analyze the reliability of QCA circuits. To solve this problem, we present the fault tree models of QCA fundamental devices according to different input signals. After that, the binary decision diagram (BDD) is used to quantitatively investigate the reliability of two QCA XOR gates depending on the presented models. By employing the fault tree models, the impact of input signals on reliability can be identified clearly and the crucial components of a circuit can be found out precisely based on the importance values (IVs) of components. So this method is contributive to the construction of reliable QCA circuits.
Ensemble stump classifiers and gene expression signatures in lung cancer.
Frey, Lewis; Edgerton, Mary; Fisher, Douglas; Levy, Shawn
2007-01-01
Microarray data sets for cancer tumor tissue generally have very few samples, each sample having thousands of probes (i.e., continuous variables). The sparsity of samples makes it difficult for machine learning techniques to discover probes relevant to the classification of tumor tissue. By combining data from different platforms (i.e., data sources), data sparsity is reduced, but this typically requires normalizing data from the different platforms, which can be non-trivial. This paper proposes a variant on the idea of ensemble learners to circumvent the need for normalization. To facilitate comprehension we build ensembles of very simple classifiers known as decision stumps--decision trees of one test each. The Ensemble Stump Classifier (ESC) identifies an mRNA signature having three probes and high accuracy for distinguishing between adenocarcinoma and squamous cell carcinoma of the lung across four data sets. In terms of accuracy, ESC outperforms a decision tree classifier on all four data sets, outperforms ensemble decision trees on three data sets, and simple stump classifiers on two data sets.
Aguiar, Fabio S; Almeida, Luciana L; Ruffino-Netto, Antonio; Kritski, Afranio Lineu; Mello, Fernanda Cq; Werneck, Guilherme L
2012-08-07
Tuberculosis (TB) remains a public health issue worldwide. The lack of specific clinical symptoms to diagnose TB makes the correct decision to admit patients to respiratory isolation a difficult task for the clinician. Isolation of patients without the disease is common and increases health costs. Decision models for the diagnosis of TB in patients attending hospitals can increase the quality of care and decrease costs, without the risk of hospital transmission. We present a predictive model for predicting pulmonary TB in hospitalized patients in a high prevalence area in order to contribute to a more rational use of isolation rooms without increasing the risk of transmission. Cross sectional study of patients admitted to CFFH from March 2003 to December 2004. A classification and regression tree (CART) model was generated and validated. The area under the ROC curve (AUC), sensitivity, specificity, positive and negative predictive values were used to evaluate the performance of model. Validation of the model was performed with a different sample of patients admitted to the same hospital from January to December 2005. We studied 290 patients admitted with clinical suspicion of TB. Diagnosis was confirmed in 26.5% of them. Pulmonary TB was present in 83.7% of the patients with TB (62.3% with positive sputum smear) and HIV/AIDS was present in 56.9% of patients. The validated CART model showed sensitivity, specificity, positive predictive value and negative predictive value of 60.00%, 76.16%, 33.33%, and 90.55%, respectively. The AUC was 79.70%. The CART model developed for these hospitalized patients with clinical suspicion of TB had fair to good predictive performance for pulmonary TB. The most important variable for prediction of TB diagnosis was chest radiograph results. Prospective validation is still necessary, but our model offer an alternative for decision making in whether to isolate patients with clinical suspicion of TB in tertiary health facilities in countries with limited resources.
Dias, Cláudia Camila; Pereira Rodrigues, Pedro; Fernandes, Samuel; Portela, Francisco; Ministro, Paula; Martins, Diana; Sousa, Paula; Lago, Paula; Rosa, Isadora; Correia, Luis; Moura Santos, Paula; Magro, Fernando
2017-01-01
Crohn's disease (CD) is a chronic inflammatory bowel disease known to carry a high risk of disabling and many times requiring surgical interventions. This article describes a decision-tree based approach that defines the CD patients' risk or undergoing disabling events, surgical interventions and reoperations, based on clinical and demographic variables. This multicentric study involved 1547 CD patients retrospectively enrolled and divided into two cohorts: a derivation one (80%) and a validation one (20%). Decision trees were built upon applying the CHAIRT algorithm for the selection of variables. Three-level decision trees were built for the risk of disabling and reoperation, whereas the risk of surgery was described in a two-level one. A receiver operating characteristic (ROC) analysis was performed, and the area under the curves (AUC) Was higher than 70% for all outcomes. The defined risk cut-off values show usefulness for the assessed outcomes: risk levels above 75% for disabling had an odds test positivity of 4.06 [3.50-4.71], whereas risk levels below 34% and 19% excluded surgery and reoperation with an odds test negativity of 0.15 [0.09-0.25] and 0.50 [0.24-1.01], respectively. Overall, patients with B2 or B3 phenotype had a higher proportion of disabling disease and surgery, while patients with later introduction of pharmacological therapeutic (1 months after initial surgery) had a higher proportion of reoperation. The decision-tree based approach used in this study, with demographic and clinical variables, has shown to be a valid and useful approach to depict such risks of disabling, surgery and reoperation.
ERIC Educational Resources Information Center
Braus, Judy, Ed.
1992-01-01
Ranger Rick's NatureScope is a creative education series dedicated to inspiring in children an understanding and appreciation of the natural world while developing the skills they will need to make responsible decisions about the environment. Contents are organized into the following sections: (1) "What Makes a Tree a Tree?," including…
Naturalistic Decision Making for Power System Operators
DOE Office of Scientific and Technical Information (OSTI.GOV)
Greitzer, Frank L.; Podmore, Robin; Robinson, Marck
2010-02-01
Motivation – Investigations of large-scale outages in the North American interconnected electric system often attribute the causes to three T’s: Trees, Training and Tools. To document and understand the mental processes used by expert operators when making critical decisions, a naturalistic decision making (NDM) model was developed. Transcripts of conversations were analyzed to reveal and assess NDM-based performance criteria. Findings/Design – An item analysis indicated that the operators’ Situation Awareness Levels, mental models, and mental simulations can be mapped at different points in the training scenario. This may identify improved training methods or analytical/ visualization tools. Originality/Value – This studymore » applies for the first time, the concepts of Recognition Primed Decision Making, Situation Awareness Levels and Cognitive Task Analysis to training of electric power system operators. Take away message – The NDM approach provides a viable framework for systematic training management to accelerate learning in simulator-based training scenarios for power system operators and teams.« less
Technology transfer by means of fault tree synthesis
NASA Astrophysics Data System (ADS)
Batzias, Dimitris F.
2012-12-01
Since Fault Tree Analysis (FTA) attempts to model and analyze failure processes of engineering, it forms a common technique for good industrial practice. On the contrary, fault tree synthesis (FTS) refers to the methodology of constructing complex trees either from dentritic modules built ad hoc or from fault tress already used and stored in a Knowledge Base. In both cases, technology transfer takes place in a quasi-inductive mode, from partial to holistic knowledge. In this work, an algorithmic procedure, including 9 activity steps and 3 decision nodes is developed for performing effectively this transfer when the fault under investigation occurs within one of the latter stages of an industrial procedure with several stages in series. The main parts of the algorithmic procedure are: (i) the construction of a local fault tree within the corresponding production stage, where the fault has been detected, (ii) the formation of an interface made of input faults that might occur upstream, (iii) the fuzzy (to count for uncertainty) multicriteria ranking of these faults according to their significance, and (iv) the synthesis of an extended fault tree based on the construction of part (i) and on the local fault tree of the first-ranked fault in part (iii). An implementation is presented, referring to 'uneven sealing of Al anodic film', thus proving the functionality of the developed methodology.
Visualizing speciation in artificial cichlid fish.
Clement, Ross
2006-01-01
The Cichlid Speciation Project (CSP) is an ALife simulation system for investigating open problems in the speciation of African cichlid fish. The CSP can be used to perform a wide range of experiments that show that speciation is a natural consequence of certain biological systems. A visualization system capable of extracting the history of speciation from low-level trace data and creating a phylogenetic tree has been implemented. Unlike previous approaches, this visualization system presents a concrete trace of speciation, rather than a summary of low-level information from which the viewer can make subjective decisions on how speciation progressed. The phylogenetic trees are a more objective visualization of speciation, and enable automated collection and summarization of the results of experiments. The visualization system is used to create a phylogenetic tree from an experiment that models sympatric speciation.
Occupancy schedules learning process through a data mining framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
D'Oca, Simona; Hong, Tianzhen
Building occupancy is a paramount factor in building energy simulations. Specifically, lighting, plug loads, HVAC equipment utilization, fresh air requirements and internal heat gain or loss greatly depends on the level of occupancy within a building. Developing the appropriate methodologies to describe and reproduce the intricate network responsible for human-building interactions are needed. Extrapolation of patterns from big data streams is a powerful analysis technique which will allow for a better understanding of energy usage in buildings. A three-step data mining framework is applied to discover occupancy patterns in office spaces. First, a data set of 16 offices with 10more » minute interval occupancy data, over a two year period is mined through a decision tree model which predicts the occupancy presence. Then a rule induction algorithm is used to learn a pruned set of rules on the results from the decision tree model. Finally, a cluster analysis is employed in order to obtain consistent patterns of occupancy schedules. Furthermore, the identified occupancy rules and schedules are representative as four archetypal working profiles that can be used as input to current building energy modeling programs, such as EnergyPlus or IDA-ICE, to investigate impact of occupant presence on design, operation and energy use in office buildings.« less
Rotela, Camilo H; Spinsanti, Lorena I; Lamfri, Mario A; Contigiani, Marta S; Almirón, Walter R; Scavuzzo, Carlos M
2011-11-01
In response to the first human outbreak (January May 2005) of Saint Louis encephalitis (SLE) virus in Córdoba province, Argentina, we developed an environmental SLE virus risk map for the capital, i.e. Córdoba city. The aim was to provide a map capable of detecting macro-environmental factors associated with the spatial distribution of SLE cases, based on remotely sensed data and a geographical information system. Vegetation, soil brightness, humidity status, distances to water-bodies and areas covered by vegetation were assessed based on pre-outbreak images provided by the Landsat 5TM satellite. A strong inverse relationship between the number of humans infected by SLEV and distance to high-vigor vegetation was noted. A statistical non-hierarchic decision tree model was constructed, based on environmental variables representing the areas surrounding patient residences. From this point of view, 18% of the city could be classified as being at high risk for SLEV infection, while 34% carried a low risk, or none at all. Taking the whole 2005 epidemic into account, 80% of the cases came from areas classified by the model as medium-high or high risk. Almost 46% of the cases were registered in high-risk areas, while there were no cases (0%) in areas affirmed as risk free.
Occupancy schedules learning process through a data mining framework
D'Oca, Simona; Hong, Tianzhen
2014-12-17
Building occupancy is a paramount factor in building energy simulations. Specifically, lighting, plug loads, HVAC equipment utilization, fresh air requirements and internal heat gain or loss greatly depends on the level of occupancy within a building. Developing the appropriate methodologies to describe and reproduce the intricate network responsible for human-building interactions are needed. Extrapolation of patterns from big data streams is a powerful analysis technique which will allow for a better understanding of energy usage in buildings. A three-step data mining framework is applied to discover occupancy patterns in office spaces. First, a data set of 16 offices with 10more » minute interval occupancy data, over a two year period is mined through a decision tree model which predicts the occupancy presence. Then a rule induction algorithm is used to learn a pruned set of rules on the results from the decision tree model. Finally, a cluster analysis is employed in order to obtain consistent patterns of occupancy schedules. Furthermore, the identified occupancy rules and schedules are representative as four archetypal working profiles that can be used as input to current building energy modeling programs, such as EnergyPlus or IDA-ICE, to investigate impact of occupant presence on design, operation and energy use in office buildings.« less
Östlund, Lars; Hörnberg, Greger; DeLuca, Thomas H; Liedgren, Lars; Wikström, Peder; Zackrisson, Olle; Josefsson, Torbjörn
2015-10-01
Anthropogenic deforestation has shaped ecosystems worldwide. In subarctic ecosystems, primarily inhabited by native peoples, deforestation is generally considered to be mainly associated with the industrial period. Here we examined mechanisms underlying deforestation a thousand years ago in a high-mountain valley with settlement artifacts located in subarctic Scandinavia. Using the Heureka Forestry Decision Support System, we modeled pre-settlement conditions and effects of tree cutting on forest cover. To examine lack of regeneration and present nutrient status, we analyzed soil nitrogen. We found that tree cutting could have deforested the valley within some hundred years. Overexploitation left the soil depleted beyond the capacity of re-establishment of trees. We suggest that pre-historical deforestation has occurred also in subarctic ecosystems and that ecosystem boundaries were especially vulnerable to this process. This study improves our understanding of mechanisms behind human-induced ecosystem transformations and tree-line changes, and of the concept of wilderness in the Scandinavian mountain range.
Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando
2014-09-01
This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation. © 2014 Institute of Food Technologists®
Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam
2012-01-01
Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification of large snags than the RF imputation approach. Adjusting the decision threshold to account for unequal size for presence and absence classes is more straightforward for the logistic regression than for the RF imputation approach. Overall, model accuracies were poor in this study, which can be attributed to the poor predictive quality of the explanatory variables and the large range of forest types and geographic conditions observed in the data.
Using real options analysis to support strategic management decisions
NASA Astrophysics Data System (ADS)
Kabaivanov, Stanimir; Markovska, Veneta; Milev, Mariyan
2013-12-01
Decision making is a complex process that requires taking into consideration multiple heterogeneous sources of uncertainty. Standard valuation and financial analysis techniques often fail to properly account for all these sources of risk as well as for all sources of additional flexibility. In this paper we explore applications of a modified binomial tree method for real options analysis (ROA) in an effort to improve decision making process. Usual cases of use of real options are analyzed with elaborate study on the applications and advantages that company management can derive from their application. A numeric results based on extending simple binomial tree approach for multiple sources of uncertainty are provided to demonstrate the improvement effects on management decisions.
Knowledge Quality Functions for Rule Discovery
1994-09-01
Managers in many organizations finding themselves in the possession of large and rapidly growing databases are beginning to suspect the information in their...missing values (Smyth and Goodman, 1992, p. 303). Decision trees "tend to grow very large for realistic applications and are thus difficult to interpret...by humans" (Holsheimer, 1994, p. 42). Decision trees also grow excessively complicated in the presence of noisy databases (Dhar and Tuzhilin, 1993, p
Multiattribute Decision Modeling Techniques: A Comparative Analysis
1988-08-01
Analytic Hierarchy Process ( AHP ). It is structurally similar to SMART, but elicitation methods are different and there are several algorithms for...reconciliation of inconsistent judgments and for consistency checks that are not available in any of the utility procedures. The AHP has been applied...of commercially available software packages that implement the AHP algorithms. Elicitation Methods. The AHP builds heavily on value trees, which
Suzanne M. Joy; R. M. Reich; Richard T. Reynolds
2003-01-01
Traditional land classification techniques for large areas that use Landsat Thematic Mapper (TM) imagery are typically limited to the fixed spatial resolution of the sensors (30m). However, the study of some ecological processes requires land cover classifications at finer spatial resolutions. We model forest vegetation types on the Kaibab National Forest (KNF) in...
2015-06-30
7. Building Statistical Metamodels using Simulation Experimental Designs ............................................... 34 7.1. Statistical Design...system design drivers across several different domain models, our methodology uses statistical metamodeling to approximate the simulations’ behavior. A...output. We build metamodels using a number of statistical methods that include stepwise regression, boosted trees, neural nets, and bootstrap forest
2015-06-01
7. Building Statistical Metamodels using Simulation Experimental Designs ............................................... 34 7.1. Statistical Design...system design drivers across several different domain models, our methodology uses statistical metamodeling to approximate the simulations’ behavior. A...output. We build metamodels using a number of statistical methods that include stepwise regression, boosted trees, neural nets, and bootstrap forest
Central States forest management guides as applied in STEMS.
Nancy R. Walters
1988-01-01
Describes a management prescription system for Central States cover types developed for use in the Central States Stand and Tree Evaluation and Modeling System (STEMS). It includes one management guide for each of the six major cover types in the region. Each guide consists of a decision key that prescribes management, based on stand characteristics and a set of...
Identifying Characteristics of High School Dropouts: Data Mining with A Decision Tree Model
ERIC Educational Resources Information Center
Veitch, William Robert.
2004-01-01
The notion that all students should finish high school has grown throughout the last century and continues to be an important goal for all educational levels in this new century. Non-completion has been related to all sorts of social, financial, and psychological issues. Many studies have attempted to put together a process that will identify…
Evidence integration in model-based tree search
Solway, Alec; Botvinick, Matthew M.
2015-01-01
Research on the dynamics of reward-based, goal-directed decision making has largely focused on simple choice, where participants decide among a set of unitary, mutually exclusive options. Recent work suggests that the deliberation process underlying simple choice can be understood in terms of evidence integration: Noisy evidence in favor of each option accrues over time, until the evidence in favor of one option is significantly greater than the rest. However, real-life decisions often involve not one, but several steps of action, requiring a consideration of cumulative rewards and a sensitivity to recursive decision structure. We present results from two experiments that leveraged techniques previously applied to simple choice to shed light on the deliberation process underlying multistep choice. We interpret the results from these experiments in terms of a new computational model, which extends the evidence accumulation perspective to multiple steps of action. PMID:26324932
Language Adaptive LVCSR Through Polyphone Decision Tree Specialization
2000-08-01
transfer models outperform monolingual ones [3], [14]. modeling. Since for the monolingual case the use of larger phonetic context windows has proven to...12.1 2. Multiple Languages German 11.8 61K 200 44.5 43 9.0 For our experiments we developed monolingual LVCSR sys- Japanese 10.0 22K 230 33.8 33 7.9... monolingual recognizer. Since the Japanese, Korean, Portuguese, Russian, Spanish, Swedish, engines are the same across the languages, differences in the
Collins, A.L; Pulley, S.; Foster, I.D.L; Gellis, Allen; Porto, P.; Horowitz, A.J.
2017-01-01
The growing awareness of the environmental significance of fine-grained sediment fluxes through catchment systems continues to underscore the need for reliable information on the principal sources of this material. Source estimates are difficult to obtain using traditional monitoring techniques, but sediment source fingerprinting or tracing procedures, have emerged as a potentially valuable alternative. Despite the rapidly increasing numbers of studies reporting the use of sediment source fingerprinting, several key challenges and uncertainties continue to hamper consensus among the international scientific community on key components of the existing methodological procedures. Accordingly, this contribution reviews and presents recent developments for several key aspects of fingerprinting, namely: sediment source classification, catchment source and target sediment sampling, tracer selection, grain size issues, tracer conservatism, source apportionment modelling, and assessment of source predictions using artificial mixtures. Finally, a decision-tree representing the current state of knowledge is presented, to guide end-users in applying the fingerprinting approach.
Pathway-based predictive approaches for non-animal assessment of acute inhalation toxicity.
Clippinger, Amy J; Allen, David; Behrsing, Holger; BéruBé, Kelly A; Bolger, Michael B; Casey, Warren; DeLorme, Michael; Gaça, Marianna; Gehen, Sean C; Glover, Kyle; Hayden, Patrick; Hinderliter, Paul; Hotchkiss, Jon A; Iskandar, Anita; Keyser, Brian; Luettich, Karsta; Ma-Hock, Lan; Maione, Anna G; Makena, Patrudu; Melbourne, Jodie; Milchak, Lawrence; Ng, Sheung P; Paini, Alicia; Page, Kathryn; Patlewicz, Grace; Prieto, Pilar; Raabe, Hans; Reinke, Emily N; Roper, Clive; Rose, Jane; Sharma, Monita; Spoo, Wayne; Thorne, Peter S; Wilson, Daniel M; Jarabek, Annie M
2018-06-20
New approaches are needed to assess the effects of inhaled substances on human health. These approaches will be based on mechanisms of toxicity, an understanding of dosimetry, and the use of in silico modeling and in vitro test methods. In order to accelerate wider implementation of such approaches, development of adverse outcome pathways (AOPs) can help identify and address gaps in our understanding of relevant parameters for model input and mechanisms, and optimize non-animal approaches that can be used to investigate key events of toxicity. This paper describes the AOPs and the toolbox of in vitro and in silico models that can be used to assess the key events leading to toxicity following inhalation exposure. Because the optimal testing strategy will vary depending on the substance of interest, here we present a decision tree approach to identify an appropriate non-animal integrated testing strategy that incorporates consideration of a substance's physicochemical properties, relevant mechanisms of toxicity, and available in silico models and in vitro test methods. This decision tree can facilitate standardization of the testing approaches. Case study examples are presented to provide a basis for proof-of-concept testing to illustrate the utility of non-animal approaches to inform hazard identification and risk assessment of humans exposed to inhaled substances. Copyright © 2018 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Application Of Decision Tree Approach To Student Selection Model- A Case Study
NASA Astrophysics Data System (ADS)
Harwati; Sudiya, Amby
2016-01-01
The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.
Kaimakamis, Evangelos; Tsara, Venetia; Bratsas, Charalambos; Sichletidis, Lazaros; Karvounis, Charalambos; Maglaveras, Nikolaos
2016-01-01
Obstructive Sleep Apnea (OSA) is a common sleep disorder requiring the time/money consuming polysomnography for diagnosis. Alternative methods for initial evaluation are sought. Our aim was the prediction of Apnea-Hypopnea Index (AHI) in patients potentially suffering from OSA based on nonlinear analysis of respiratory biosignals during sleep, a method that is related to the pathophysiology of the disorder. Patients referred to a Sleep Unit (135) underwent full polysomnography. Three nonlinear indices (Largest Lyapunov Exponent, Detrended Fluctuation Analysis and Approximate Entropy) extracted from two biosignals (airflow from a nasal cannula, thoracic movement) and one linear derived from Oxygen saturation provided input to a data mining application with contemporary classification algorithms for the creation of predictive models for AHI. A linear regression model presented a correlation coefficient of 0.77 in predicting AHI. With a cutoff value of AHI = 8, the sensitivity and specificity were 93% and 71.4% in discrimination between patients and normal subjects. The decision tree for the discrimination between patients and normal had sensitivity and specificity of 91% and 60%, respectively. Certain obtained nonlinear values correlated significantly with commonly accepted physiological parameters of people suffering from OSA. We developed a predictive model for the presence/severity of OSA using a simple linear equation and additional decision trees with nonlinear features extracted from 3 respiratory recordings. The accuracy of the methodology is high and the findings provide insight to the underlying pathophysiology of the syndrome. Reliable predictions of OSA are possible using linear and nonlinear indices from only 3 respiratory signals during sleep. The proposed models could lead to a better study of the pathophysiology of OSA and facilitate initial evaluation/follow up of suspected patients OSA utilizing a practical low cost methodology. ClinicalTrials.gov NCT01161381.
Predicting Rotator Cuff Tears Using Data Mining and Bayesian Likelihood Ratios
Lu, Hsueh-Yi; Huang, Chen-Yuan; Su, Chwen-Tzeng; Lin, Chen-Chiang
2014-01-01
Objectives Rotator cuff tear is a common cause of shoulder diseases. Correct diagnosis of rotator cuff tears can save patients from further invasive, costly and painful tests. This study used predictive data mining and Bayesian theory to improve the accuracy of diagnosing rotator cuff tears by clinical examination alone. Methods In this retrospective study, 169 patients who had a preliminary diagnosis of rotator cuff tear on the basis of clinical evaluation followed by confirmatory MRI between 2007 and 2011 were identified. MRI was used as a reference standard to classify rotator cuff tears. The predictor variable was the clinical assessment results, which consisted of 16 attributes. This study employed 2 data mining methods (ANN and the decision tree) and a statistical method (logistic regression) to classify the rotator cuff diagnosis into “tear” and “no tear” groups. Likelihood ratio and Bayesian theory were applied to estimate the probability of rotator cuff tears based on the results of the prediction models. Results Our proposed data mining procedures outperformed the classic statistical method. The correction rate, sensitivity, specificity and area under the ROC curve of predicting a rotator cuff tear were statistical better in the ANN and decision tree models compared to logistic regression. Based on likelihood ratios derived from our prediction models, Fagan's nomogram could be constructed to assess the probability of a patient who has a rotator cuff tear using a pretest probability and a prediction result (tear or no tear). Conclusions Our predictive data mining models, combined with likelihood ratios and Bayesian theory, appear to be good tools to classify rotator cuff tears as well as determine the probability of the presence of the disease to enhance diagnostic decision making for rotator cuff tears. PMID:24733553
NASA Astrophysics Data System (ADS)
Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin
2014-06-01
This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.
Bayesian updating in a fault tree model for shipwreck risk assessment.
Landquist, H; Rosén, L; Lindhe, A; Norberg, T; Hassellöv, I-M
2017-07-15
Shipwrecks containing oil and other hazardous substances have been deteriorating on the seabeds of the world for many years and are threatening to pollute the marine environment. The status of the wrecks and the potential volume of harmful substances present in the wrecks are affected by a multitude of uncertainties. Each shipwreck poses a unique threat, the nature of which is determined by the structural status of the wreck and possible damage resulting from hazardous activities that could potentially cause a discharge. Decision support is required to ensure the efficiency of the prioritisation process and the allocation of resources required to carry out risk mitigation measures. Whilst risk assessments can provide the requisite decision support, comprehensive methods that take into account key uncertainties related to shipwrecks are limited. The aim of this paper was to develop a method for estimating the probability of discharge of hazardous substances from shipwrecks. The method is based on Bayesian updating of generic information on the hazards posed by different activities in the surroundings of the wreck, with information on site-specific and wreck-specific conditions in a fault tree model. Bayesian updating is performed using Monte Carlo simulations for estimating the probability of a discharge of hazardous substances and formal handling of intrinsic uncertainties. An example application involving two wrecks located off the Swedish coast is presented. Results show the estimated probability of opening, discharge and volume of the discharge for the two wrecks and illustrate the capability of the model to provide decision support. Together with consequence estimations of a discharge of hazardous substances, the suggested model enables comprehensive and probabilistic risk assessments of shipwrecks to be made. Copyright © 2017 Elsevier B.V. All rights reserved.
Geoffrey H. Donovan; John Mills
2014-01-01
Many cities have policies encouraging homeowners to plant trees. For these policies to be effective, it is important to understand what motivates a homeownerâs tree-planting decision. Researchers address this question by identifying variables that influence participation in a tree-planting program in Portland, Oregon, U.S. According to the study, homeowners with street...