Science.gov

Sample records for decision tree ensembles

  1. Creating ensembles of decision trees through sampling

    DOEpatents

    Kamath, Chandrika; Cantu-Paz, Erick

    2005-08-30

    A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.

  2. Creating Ensembles of Decision Trees Through Sampling

    SciTech Connect

    Kamath,C; Cantu-Paz, E

    2001-07-26

    Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. This paper focuses on ensembles of decision trees that are created with a randomized procedure based on sampling. Randomization can be introduced by using random samples of the training data (as in bagging or boosting) and running a conventional tree-building algorithm, or by randomizing the induction algorithm itself. The objective of this paper is to describe the first experiences with a novel randomized tree induction method that uses a sub-sample of instances at a node to determine the split. The empirical results show that ensembles generated using this approach yield results that are competitive in accuracy and superior in computational cost to boosting and bagging.

  3. Creating ensembles of decision trees through sampling

    SciTech Connect

    Kamath, C; Cantu-Paz, E

    2001-02-02

    Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. This paper focuses on ensembles of decision trees that are created with a randomized procedure based on sampling. Randomization can be introduced by using random samples of the training data (as in bagging or arcing) and running a conventional tree-building algorithm, or by randomizing the induction algorithm itself. The objective of this paper is to describe our first experiences with a novel randomized tree induction method that uses a subset of samples at a node to determine the split. Our empirical results show that ensembles generated using this approach yield results that are competitive in accuracy and superior in computational cost.

  4. Improving ensemble decision tree performance using Adaboost and Bagging

    NASA Astrophysics Data System (ADS)

    Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie

    2015-12-01

    Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.

  5. Proteomic mass spectra classification using decision tree based ensemble methods.

    PubMed

    Geurts, Pierre; Fillet, Marianne; de Seny, Dominique; Meuwis, Marie-Alice; Malaise, Michel; Merville, Marie-Paule; Wehenkel, Louis

    2005-07-15

    Modern mass spectrometry allows the determination of proteomic fingerprints of body fluids like serum, saliva or urine. These measurements can be used in many medical applications in order to diagnose the current state or predict the evolution of a disease. Recent developments in machine learning allow one to exploit such datasets, characterized by small numbers of very high-dimensional samples. We propose a systematic approach based on decision tree ensemble methods, which is used to automatically determine proteomic biomarkers and predictive models. The approach is validated on two datasets of surface-enhanced laser desorption/ionization time of flight measurements, for the diagnosis of rheumatoid arthritis and inflammatory bowel diseases. The results suggest that the methodology can handle a broad class of similar problems.

  6. Using histograms to introduce randomization in the generation of ensembles of decision trees

    DOEpatents

    Kamath, Chandrika; Cantu-Paz, Erick; Littau, David

    2005-02-22

    A system for decision tree ensembles that includes a module to read the data, a module to create a histogram, a module to evaluate a potential split according to some criterion using the histogram, a module to select a split point randomly in an interval around the best split, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method includes the steps of reading the data; creating a histogram; evaluating a potential split according to some criterion using the histogram, selecting a split point randomly in an interval around the best split, splitting the data, and combining multiple decision trees in ensembles.

  7. Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

    DOEpatents

    Cantu-Paz, Erick; Kamath, Chandrika

    2006-06-13

    A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.

  8. Predicting gene function using hierarchical multi-label decision tree ensembles.

    PubMed

    Schietgat, Leander; Vens, Celine; Struyf, Jan; Blockeel, Hendrik; Kocev, Dragi; Dzeroski, Saso

    2010-01-02

    S. cerevisiae, A. thaliana and M. musculus are well-studied organisms in biology and the sequencing of their genomes was completed many years ago. It is still a challenge, however, to develop methods that assign biological functions to the ORFs in these genomes automatically. Different machine learning methods have been proposed to this end, but it remains unclear which method is to be preferred in terms of predictive performance, efficiency and usability. We study the use of decision tree based models for predicting the multiple functions of ORFs. First, we describe an algorithm for learning hierarchical multi-label decision trees. These can simultaneously predict all the functions of an ORF, while respecting a given hierarchy of gene functions (such as FunCat or GO). We present new results obtained with this algorithm, showing that the trees found by it exhibit clearly better predictive performance than the trees found by previously described methods. Nevertheless, the predictive performance of individual trees is lower than that of some recently proposed statistical learning methods. We show that ensembles of such trees are more accurate than single trees and are competitive with state-of-the-art statistical learning and functional linkage methods. Moreover, the ensemble method is computationally efficient and easy to use. Our results suggest that decision tree based methods are a state-of-the-art, efficient and easy-to-use approach to ORF function prediction.

  9. A protocol for developing early warning score models from vital signs data in hospitals using ensembles of decision trees

    PubMed Central

    Xu, Michael; Tam, Benjamin; Thabane, Lehana; Fox-Robichaud, Alison

    2015-01-01

    Introduction Multiple early warning scores (EWS) have been developed and implemented to reduce cardiac arrests on hospital wards. Case–control observational studies that generate an area under the receiver operator curve (AUROC) are the usual validation method, but investigators have also generated EWS with algorithms with no prior clinical knowledge. We present a protocol for the validation and comparison of our local Hamilton Early Warning Score (HEWS) with that generated using decision tree (DT) methods. Methods and analysis A database of electronically recorded vital signs from 4 medical and 4 surgical wards will be used to generate DT EWS (DT-HEWS). A third EWS will be generated using ensemble-based methods. Missing data will be multiple imputed. For a relative risk reduction of 50% in our composite outcome (cardiac or respiratory arrest, unanticipated intensive care unit (ICU) admission or hospital death) with a power of 80%, we calculated a sample size of 17 151 patient days based on our cardiac arrest rates in 2012. The performance of the National EWS, DT-HEWS and the ensemble EWS will be compared using AUROC. Ethics and dissemination Ethics approval was received from the Hamilton Integrated Research Ethics Board (#13-724-C). The vital signs and associated outcomes are stored in a database on our secure hospital server. Preliminary dissemination of this protocol was presented in abstract form at an international critical care meeting. Final results of this analysis will be used to improve on the existing HEWS and will be shared through publication and presentation at critical care meetings. PMID:26353873

  10. A protocol for developing early warning score models from vital signs data in hospitals using ensembles of decision trees.

    PubMed

    Xu, Michael; Tam, Benjamin; Thabane, Lehana; Fox-Robichaud, Alison

    2015-09-09

    Multiple early warning scores (EWS) have been developed and implemented to reduce cardiac arrests on hospital wards. Case-control observational studies that generate an area under the receiver operator curve (AUROC) are the usual validation method, but investigators have also generated EWS with algorithms with no prior clinical knowledge. We present a protocol for the validation and comparison of our local Hamilton Early Warning Score (HEWS) with that generated using decision tree (DT) methods. A database of electronically recorded vital signs from 4 medical and 4 surgical wards will be used to generate DT EWS (DT-HEWS). A third EWS will be generated using ensemble-based methods. Missing data will be multiple imputed. For a relative risk reduction of 50% in our composite outcome (cardiac or respiratory arrest, unanticipated intensive care unit (ICU) admission or hospital death) with a power of 80%, we calculated a sample size of 17,151 patient days based on our cardiac arrest rates in 2012. The performance of the National EWS, DT-HEWS and the ensemble EWS will be compared using AUROC. Ethics approval was received from the Hamilton Integrated Research Ethics Board (#13-724-C). The vital signs and associated outcomes are stored in a database on our secure hospital server. Preliminary dissemination of this protocol was presented in abstract form at an international critical care meeting. Final results of this analysis will be used to improve on the existing HEWS and will be shared through publication and presentation at critical care meetings. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

  11. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS

    NASA Astrophysics Data System (ADS)

    Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah

    2013-11-01

    Decision tree (DT) machine learning algorithm was used to map the flood susceptible areas in Kelantan, Malaysia.We used an ensemble frequency ratio (FR) and logistic regression (LR) model in order to overcome weak points of the LR.Combined method of FR and LR was used to map the susceptible areas in Kelantan, Malaysia.Results of both methods were compared and their efficiency was assessed.Most influencing conditioning factors on flooding were recognized.

  12. Tree Ensembles on the Induced Discrete Space.

    PubMed

    Yildiz, Olcay Taner

    2016-05-01

    Decision trees are widely used predictive models in machine learning. Recently, K -tree is proposed, where the original discrete feature space is expanded by generating all orderings of values of k discrete attributes and these orderings are used as the new attributes in decision tree induction. Although K -tree performs significantly better than the proper one, their exponential time complexity can prohibit their use. In this brief, we propose K -forest, an extension of random forest, where a subset of features is selected randomly from the induced discrete space. Simulation results on 17 data sets show that the novel ensemble classifier has significantly lower error rate compared with the random forest based on the original feature space.

  13. Approximate Splitting for Ensembles of Trees using Histograms

    SciTech Connect

    Kamath, C; Cantu-Paz, E; Littau, D

    2001-09-28

    Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. Implicit in many of these techniques is the concept of randomization that generates different classifiers. In this paper, they focus on ensembles of decision trees that are created using a randomized procedure based on histograms. Techniques, such as histograms, that discretize continuous variables, have long been used in classification to convert the data into a form suitable for processing and to reduce the compute time. The approach combines the ideas behind discretization through histograms and randomization in ensembles to create decision trees by randomly selecting a split point in an interval around the best bin boundary in the histogram. The experimental results with public domain data show that ensembles generated using this approach are competitive in accuracy and superior in computational cost to other ensembles techniques such as boosting and bagging.

  14. Quantum decision tree classifier

    NASA Astrophysics Data System (ADS)

    Lu, Songfeng; Braunstein, Samuel L.

    2013-11-01

    We study the quantum version of a decision tree classifier to fill the gap between quantum computation and machine learning. The quantum entropy impurity criterion which is used to determine which node should be split is presented in the paper. By using the quantum fidelity measure between two quantum states, we cluster the training data into subclasses so that the quantum decision tree can manipulate quantum states. We also propose algorithms constructing the quantum decision tree and searching for a target class over the tree for a new quantum object.

  15. Lazy decision trees

    SciTech Connect

    Friedman, J.H.; Yun, Yeogirl; Kohavi, R.

    1996-12-31

    Lazy learning algorithms, exemplified by nearest-neighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single {open_quotes}best{close_quotes} decision tree during the training phase, and this tree is then used to classify test instances. The tests at the nodes of the constructed tree are good on average, but there may be better tests for classifying a specific instance. We propose a lazy decision tree algorithm-LazyDT-that conceptually constructs the {open_quotes}best{close_quote} decision tree for each test instance. In practice, only a path needs to be constructed, and a caching scheme makes the algorithm fast. The algorithm is robust with respect to missing values without resorting to the complicated methods usually seen in induction of decision trees. Experiments on real and artificial problems are presented.

  16. Human decision error (HUMDEE) trees

    SciTech Connect

    Ostrom, L.T.

    1993-08-01

    Graphical presentations of human actions in incident and accident sequences have been used for many years. However, for the most part, human decision making has been underrepresented in these trees. This paper presents a method of incorporating the human decision process into graphical presentations of incident/accident sequences. This presentation is in the form of logic trees. These trees are called Human Decision Error Trees or HUMDEE for short. The primary benefit of HUMDEE trees is that they graphically illustrate what else the individuals involved in the event could have done to prevent either the initiation or continuation of the event. HUMDEE trees also present the alternate paths available at the operator decision points in the incident/accident sequence. This is different from the Technique for Human Error Rate Prediction (THERP) event trees. There are many uses of these trees. They can be used for incident/accident investigations to show what other courses of actions were available and for training operators. The trees also have a consequence component so that not only the decision can be explored, also the consequence of that decision.

  17. Weighted Hybrid Decision Tree Model for Random Forest Classifier

    NASA Astrophysics Data System (ADS)

    Kulkarni, Vrushali Y.; Sinha, Pradeep K.; Petare, Manisha C.

    2016-06-01

    Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

  18. Randomized Ensemble Methods for Classification Trees

    DTIC Science & Technology

    2002-09-01

    105 Table A-3: Example of Ecoli Data...inputs Classes Biopsy 683 - 9 Continuous 2 Diabetes 768 - 8 Continuous 2 Ecoli 336 - 7 Continuous 8 German credit 1000...in the following table: Data Size of ensembles Error estimation Biopsy Diabetes Ecoli German Glass Ionosphere Liver Sonar Vehicle Votes

  19. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-07-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  20. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-02-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  1. Extensions and applications of ensemble-of-trees methods in machine learning

    NASA Astrophysics Data System (ADS)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of

  2. Reweighting with Boosted Decision Trees

    NASA Astrophysics Data System (ADS)

    Rogozhnikov, Alex

    2016-10-01

    Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software triggers [1]. In most cases, these are classification models used to select the “signal” events from data. Monte Carlo simulated events typically take part in training of these models. While the results of the simulation are expected to be close to real data, in practical cases there is notable disagreement between simulated and observed data. In order to use available simulation in training, corrections must be introduced to generated data. One common approach is reweighting — assigning weights to the simulated events. We present a novel method of event reweighting based on boosted decision trees. The problem of checking the quality of reweighting step in analyses is also discussed.

  3. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates

    PubMed Central

    Vilella, Albert J.; Severin, Jessica; Ureta-Vidal, Abel; Heng, Li; Durbin, Richard; Birney, Ewan

    2009-01-01

    We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project. PMID:19029536

  4. Bayesian Evidence Framework for Decision Tree Learning

    NASA Astrophysics Data System (ADS)

    Chatpatanasiri, Ratthachat; Kijsirikul, Boonserm

    2005-11-01

    This work is primary interested in the problem of, given the observed data, selecting a single decision (or classification) tree. Although a single decision tree has a high risk to be overfitted, the induced tree is easily interpreted. Researchers have invented various methods such as tree pruning or tree averaging for preventing the induced tree from overfitting (and from underfitting) the data. In this paper, instead of using those conventional approaches, we apply the Bayesian evidence framework of Gull, Skilling and Mackay to a process of selecting a decision tree. We derive a formal function to measure `the fitness' for each decision tree given a set of observed data. Our method, in fact, is analogous to a well-known Bayesian model selection method for interpolating noisy continuous-value data. As in regression problems, given reasonable assumptions, this derived score function automatically quantifies the principle of Ockham's razor, and hence reasonably deals with the issue of underfitting-overfitting tradeoff.

  5. Ensemble survival trees for identifying subpopulations in personalized medicine.

    PubMed

    Chen, Yu-Chuan; Chen, James J

    2016-09-01

    Recently, personalized medicine has received great attention to improve safety and effectiveness in drug development. Personalized medicine aims to provide medical treatment that is tailored to the patient's characteristics such as genomic biomarkers, disease history, etc., so that the benefit of treatment can be optimized. Subpopulations identification is to divide patients into several different subgroups where each subgroup corresponds to an optimal treatment. For two subgroups, traditionally the multivariate Cox proportional hazards model is fitted and used to calculate the risk score when outcome is survival time endpoint. Median is commonly chosen as the cutoff value to separate patients. However, using median as the cutoff value is quite subjective and sometimes may be inappropriate in situations where data are imbalanced. Here, we propose a novel tree-based method that adopts the algorithm of relative risk trees to identify subgroup patients. After growing a relative risk tree, we apply k-means clustering to group the terminal nodes based on the averaged covariates. We adopt an ensemble Bagging method to improve the performance of a single tree since it is well known that the performance of a single tree is quite unstable. A simulation study is conducted to compare the performance between our proposed method and the multivariate Cox model. The applications of our proposed method to two public cancer data sets are also conducted for illustration.

  6. From Family Trees to Decision Trees.

    ERIC Educational Resources Information Center

    Trobian, Helen R.

    This paper is a preliminary inquiry by a non-mathematician into graphic methods of sequential planning and ways in which hierarchical analysis and tree structures can be helpful in developing interest in the use of mathematical modeling in the search for creative solutions to real-life problems. Highlights include a discussion of hierarchical…

  7. The clinical decision analysis using decision tree

    PubMed Central

    Bae, Jong-Myon

    2014-01-01

    The clinical decision analysis (CDA) has used to overcome complexity and uncertainty in medical problems. The CDA is a tool allowing decision-makers to apply evidence-based medicine to make objective clinical decisions when faced with complex situations. The usefulness and limitation including six steps in conducting CDA were reviewed. The application of CDA results should be done under shared decision with patients’ value. PMID:25358466

  8. The clinical decision analysis using decision tree.

    PubMed

    Bae, Jong-Myon

    2014-01-01

    The clinical decision analysis (CDA) has used to overcome complexity and uncertainty in medical problems. The CDA is a tool allowing decision-makers to apply evidence-based medicine to make objective clinical decisions when faced with complex situations. The usefulness and limitation including six steps in conducting CDA were reviewed. The application of CDA results should be done under shared decision with patients' value.

  9. Decision Tree Technique for Particle Identification

    SciTech Connect

    Quiller, Ryan

    2003-09-05

    Particle identification based on measurements such as the Cerenkov angle, momentum, and the rate of energy loss per unit distance (-dE/dx) is fundamental to the BaBar detector for particle physics experiments. It is particularly important to separate the charged forms of kaons and pions. Currently, the Neural Net, an algorithm based on mapping input variables to an output variable using hidden variables as intermediaries, is one of the primary tools used for identification. In this study, a decision tree classification technique implemented in the computer program, CART, was investigated and compared to the Neural Net over the range of momenta, 0.25 GeV/c to 5.0 GeV/c. For a given subinterval of momentum, three decision trees were made using different sets of input variables. The sensitivity and specificity were calculated for varying kaon acceptance thresholds. This data was used to plot Receiver Operating Characteristic curves (ROC curves) to compare the performance of the classification methods. Also, input variables used in constructing the decision trees were analyzed. It was found that the Neural Net was a significant contributor to decision trees using dE/dx and the Cerenkov angle as inputs. Furthermore, the Neural Net had poorer performance than the decision tree technique, but tended to improve decision tree performance when used as an input variable. These results suggest that the decision tree technique using Neural Net input may possibly increase accuracy of particle identification in BaBar.

  10. The decision tree approach to classification

    NASA Technical Reports Server (NTRS)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  11. Role of Hydrological Ensemble Forecasts in Operational Decision Making

    NASA Astrophysics Data System (ADS)

    Ramaswamy, V.; Saleh, F.; Georgas, N.; Blumberg, A. F.

    2016-12-01

    Considerable importance has been placed on addressing uncertainties in hydrologic forecasts, particularly in regards to operational decision making. This work investigates the utility of short term hydrological ensemble forecasts for operational decision making using meteorological inputs from more than 100 ensemble members from different numerical weather prediction (NWP) models. To this end, an advanced automated hydrologic framework comprising of a regional scale hydrologic model, GIS datasets and the meteorological ensemble predictions from different weather prediction facilities was implemented over the Hudson and Raritan River basins, USA. The uncertainties associated with ensemble streamflow forecasts was analysed for three different flood events classified as minor, moderate and major. This was done by visually and statistically comparing the spread, magnitude and timing of the peak of the hydrologic outputs. Results from this work demonstrate the effectiveness of different NWP models for different operational scenarios, thus providing a better understanding of the uncertainties and risks associated with decision making. In addition to gaining insights into the risks associated with issuing flood alerts, this work also offers useful perspectives on the operationally managing water resources.

  12. Support Vector Machine with Ensemble Tree Kernel for Relation Extraction

    PubMed Central

    Fu, Hui; Du, Zhiguo

    2016-01-01

    Relation extraction is one of the important research topics in the field of information extraction research. To solve the problem of semantic variation in traditional semisupervised relation extraction algorithm, this paper proposes a novel semisupervised relation extraction algorithm based on ensemble learning (LXRE). The new algorithm mainly uses two kinds of support vector machine classifiers based on tree kernel for integration and integrates the strategy of constrained extension seed set. The new algorithm can weaken the inaccuracy of relation extraction, which is caused by the phenomenon of semantic variation. The numerical experimental research based on two benchmark data sets (PropBank and AIMed) shows that the LXRE algorithm proposed in the paper is superior to other two common relation extraction methods in four evaluation indexes (Precision, Recall, F-measure, and Accuracy). It indicates that the new algorithm has good relation extraction ability compared with others. PMID:27118966

  13. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class

  14. PRIA 3 Fee Determination Decision Tree

    EPA Pesticide Factsheets

    The PRIA 3 decision tree will help applicants requesting a pesticide registration or certain tolerance action to accurately identify the category of their application and the amount of the required fee before they submit the application.

  15. RE-Powering’s Electronic Decision Tree

    EPA Pesticide Factsheets

    Developed by US EPA's RE-Powering America's Land Initiative, the RE-Powering Decision Trees tool guides interested parties through a process to screen sites for their suitability for solar photovoltaics or wind installations

  16. Solar and Wind Site Screening Decision Trees

    EPA Pesticide Factsheets

    EPA and NREL created a decision tree to guide state and local governments and other stakeholders through a process for screening sites for their suitability for future redevelopment with solar photovoltaic (PV) energy and wind energy.

  17. A survey of decision tree classifier methodology

    NASA Technical Reports Server (NTRS)

    Safavian, S. Rasoul; Landgrebe, David

    1990-01-01

    Decision Tree Classifiers (DTC's) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps, the most important feature of DTC's is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issue. After considering potential advantages of DTC's over single stage classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  18. A survey of decision tree classifier methodology

    NASA Technical Reports Server (NTRS)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  19. Parallel object-oriented decision tree system

    DOEpatents

    Kamath; Chandrika , Cantu-Paz; Erick

    2006-02-28

    A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.

  20. Speeding up Boosting decision trees training

    NASA Astrophysics Data System (ADS)

    Zheng, Chao; Wei, Zhenzhong

    2015-10-01

    To overcome the drawback that Boosting decision trees perform fast speed in the test time while the training process is relatively too slow to meet the requirements of applications with real-time learning, we propose a fast decision trees training method by pruning those noneffective features in advance. And basing on this method, we also design a fast Boosting decision trees training algorithm. Firstly, we analyze the structure of each decision trees node, and prove that the classification error of each node has a bound through derivation. Then, by using the error boundary to prune non-effective features in the early stage, we greatly accelerate the decision tree training process, and would not affect the training results at all. Finally, the decision tree accelerated training method is integrated into the general Boosting process forming a fast boosting decision trees training algorithm. This algorithm is not a new variant of Boosting, on the contrary, it should be used in conjunction with existing Boosting algorithms to achieve more training acceleration. To test the algorithm's speedup performance and performance combined with other accelerated algorithms, the original AdaBoost and two typical acceleration algorithms LazyBoost and StochasticBoost were respectively used in conjunction with this algorithm into three fast versions, and their classification performance was tested by using the Lsis face database which contained 12788 images. Experimental results reveal that this fast algorithm can achieve more than double training speedup without affecting the results of the trained classifier, and can be combined with other acceleration algorithms. Key words: Boosting algorithm, decision trees, classifier training, preliminary classification error, face detection

  1. Automated critiquing of medical decision trees.

    PubMed

    Wellman, M P; Eckman, M H; Fleming, C; Marshall, S L; Sonnenberg, F A; Pauker, S G

    1989-01-01

    The authors developed a decision tree-critiquing program (called BUNYAN) that identifies potential modeling errors in medical decision trees. The program's critiques are based on the structure of a decision problem, obtained from an abstract description specifying only the basic semantic categories of the model's components. A taxonomy of node and branch types supplies the primitive building blocks for representing decision trees. Bunyan detects potential problems in a model by matching general pattern expressions that refer to these primitives. A small set of general principles justifies critiquing rules that detect four categories of potential structural problems: impossible strategies, dominated strategies, unaccountable violations of symmetry, and omission of apparently reasonable strategies. Although critiquing based on structure alone has clear limitations, principled structural analysis constitutes the core of a methodology for reasoning about decision models.

  2. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  3. Decision tree approach for soil liquefaction assessment.

    PubMed

    Gandomi, Amir H; Fridline, Mark M; Roke, David A

    2013-01-01

    In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view.

  4. Decision Tree Approach for Soil Liquefaction Assessment

    PubMed Central

    Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.

    2013-01-01

    In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498

  5. Prediction of regional streamflow frequency using model tree ensembles

    NASA Astrophysics Data System (ADS)

    Schnier, Spencer; Cai, Ximing

    2014-09-01

    This study introduces a novel data-driven method called model tree ensembles (MTEs) to predict streamflow frequency statistics based on known drainage area characteristics, which yields insights into the dominant controls of regional streamflow. The database used to induce the models contains both natural and anthropogenic drainage area characteristics for 294 USGS stream gages (164 in Texas and 130 in Illinois). MTEs were used to predict complete flow duration curves (FDCs) of ungaged streams by developing 17 models corresponding to 17 points along the FDC. Model accuracy was evaluated using ten-fold cross-validation and the coefficient of determination (R2). During the validation, the gages withheld from the analysis represent ungaged watersheds. MTEs are shown to outperform global multiple-linear regression models for predictions in ungaged watersheds. The accuracy of models for low flow is enhanced by explicit consideration of variables that capture human interference in watershed hydrology (e.g., population). Human factors (e.g., population and groundwater use) appear in the regionalizations for low flows, while annual and seasonal precipitation and drainage area are important for regionalizations of all flows. The results of this study have important implications for predictions in ungaged watersheds as well as gaged watersheds subject to anthropogenically-driven hydrologic changes.

  6. Fast Image Texture Classification Using Decision Trees

    NASA Technical Reports Server (NTRS)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  7. CUDT: a CUDA based decision tree algorithm.

    PubMed

    Lo, Win-Tsung; Chang, Yue-Shan; Sheu, Ruey-Kai; Chiu, Chun-Chieh; Yuan, Shyan-Ming

    2014-01-01

    Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture), which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5 ∼ 55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

  8. Automated Decision Tree Classification of Corneal Shape

    PubMed Central

    Twa, Michael D.; Parthasarathy, Srinivasan; Roberts, Cynthia; Mahmoud, Ashraf M.; Raasch, Thomas W.; Bullimore, Mark A.

    2011-01-01

    Purpose The volume and complexity of data produced during videokeratography examinations present a challenge of interpretation. As a consequence, results are often analyzed qualitatively by subjective pattern recognition or reduced to comparisons of summary indices. We describe the application of decision tree induction, an automated machine learning classification method, to discriminate between normal and keratoconic corneal shapes in an objective and quantitative way. We then compared this method with other known classification methods. Methods The corneal surface was modeled with a seventh-order Zernike polynomial for 132 normal eyes of 92 subjects and 112 eyes of 71 subjects diagnosed with keratoconus. A decision tree classifier was induced using the C4.5 algorithm, and its classification performance was compared with the modified Rabinowitz–McDonnell index, Schwiegerling’s Z3 index (Z3), Keratoconus Prediction Index (KPI), KISA%, and Cone Location and Magnitude Index using recommended classification thresholds for each method. We also evaluated the area under the receiver operator characteristic (ROC) curve for each classification method. Results Our decision tree classifier performed equal to or better than the other classifiers tested: accuracy was 92% and the area under the ROC curve was 0.97. Our decision tree classifier reduced the information needed to distinguish between normal and keratoconus eyes using four of 36 Zernike polynomial coefficients. The four surface features selected as classification attributes by the decision tree method were inferior elevation, greater sagittal depth, oblique toricity, and trefoil. Conclusions Automated decision tree classification of corneal shape through Zernike polynomials is an accurate quantitative method of classification that is interpretable and can be generated from any instrument platform capable of raw elevation data output. This method of pattern classification is extendable to other classification

  9. Algorithms for optimal dyadic decision trees

    SciTech Connect

    Hush, Don; Porter, Reid

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  10. Multi-test decision tree and its application to microarray data classification.

    PubMed

    Czajkowski, Marcin; Grześ, Marek; Kretowski, Marek

    2014-05-01

    The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 14 datasets by an average 6%. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts. Copyright © 2014 Elsevier B.V. All rights reserved.

  11. Decision Tree Modeling for Ranking Data

    NASA Astrophysics Data System (ADS)

    Yu, Philip L. H.; Wan, Wai Ming; Lee, Paul H.

    Ranking/preference data arises from many applications in marketing, psychology, and politics. We establish a new decision tree model for the analysis of ranking data by adopting the concept of classification and regression tree. The existing splitting criteria are modified in a way that allows them to precisely measure the impurity of a set of ranking data. Two types of impurity measures for ranking data are introduced, namelyg-wise and top-k measures. Theoretical results show that the new measures exhibit properties of impurity functions. In model assessment, the area under the ROC curve (AUC) is applied to evaluate the tree performance. Experiments are carried out to investigate the predictive performance of the tree model for complete and partially ranked data and promising results are obtained. Finally, a real-world application of the proposed methodology to analyze a set of political rankings data is presented.

  12. IND - THE IND DECISION TREE PACKAGE

    NASA Technical Reports Server (NTRS)

    Buntine, W.

    1994-01-01

    A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The

  13. IND - THE IND DECISION TREE PACKAGE

    NASA Technical Reports Server (NTRS)

    Buntine, W.

    1994-01-01

    A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The

  14. Two Trees: Migrating Fault Trees to Decision Trees for Real Time Fault Detection on International Space Station

    NASA Technical Reports Server (NTRS)

    Lee, Charles; Alena, Richard L.; Robinson, Peter

    2004-01-01

    We started from ISS fault trees example to migrate to decision trees, presented a method to convert fault trees to decision trees. The method shows that the visualizations of root cause of fault are easier and the tree manipulating becomes more programmatic via available decision tree programs. The visualization of decision trees for the diagnostic shows a format of straight forward and easy understands. For ISS real time fault diagnostic, the status of the systems could be shown by mining the signals through the trees and see where it stops at. The other advantage to use decision trees is that the trees can learn the fault patterns and predict the future fault from the historic data. The learning is not only on the static data sets but also can be online, through accumulating the real time data sets, the decision trees can gain and store faults patterns in the trees and recognize them when they come.

  15. Tree Structure Generation from Ensemble Forecasts for Short-Term Reservoir Optimization

    NASA Astrophysics Data System (ADS)

    Raso, L.; Schwanenberg, D.; Van De Giesen, N.

    2012-12-01

    In short-term reservoir management, weather forecasts enable water managers to look further ahead in time and anticipate on future system states. In this context, ensemble forecasts provide information about the uncertainty of the weather information. Tree-Based Model Predictive Control (TB-MPC) is an optimization scheme that embeds ensemble forecasts in a Multistage Stochastic Programming. TB-MPC requires a predefined tree structure that specifies when the ensemble trajectories diverge from each other. A correct tree structure is of critical importance because it strongly affects the performance of the optimization, and existing methods do not offer satisfactory results. We present a new methodology to generate a tree structure from the trajectories of an ensemble. The method models the information flow, considering which observations will become available along the forecast horizon, at which moment, and their level of uncertainty. It places a branching point when there is enough certainty on which trajectory is actually occurring. The method is well suited for trajectories that are close to each other at the beginning of the forecasting horizon, and spread out when progressing in time, as ensemble forecasts typically do. The method is compared to other tree structures (two-stage stochastic programming and others) in terms of performance by an application to the short-term management of the Salto Grande hydropower reservoir in River Uruguay along the Argentinean Uruguayan border.

  16. An Application of Decision Tree Based on ID3

    NASA Astrophysics Data System (ADS)

    Xiaohu, Wang; Lele, Wang; Nianfeng, Li

    This article deals with the application of classical decision tree ID3 of the data mining in a certain site data. It constitutes a decision tree based on information gain and thus produces some useful purchasing behavior rules. It also proves that the decision tree has a wide applicable future in the sale field on site.

  17. Lower Bounds for Algebraic Decision Trees.

    DTIC Science & Technology

    1980-07-01

    combinatorial and geometrical problems. While motivation for these models rests primarily on their generality and conceptual simplicity, they also have the...1151. Much less is known for general algebraic decision trees. Beyond the naie in- formation bound, Rabin’s theorem (Rabin [81) and the convex hull...problem (Yao [141) are apparently the only known results. The purpose of this article is to provide a general method for establishing lower bounds for

  18. Using Decision Trees for Comparing Pattern Recognition Feature Sets

    SciTech Connect

    Proctor, D D

    2005-08-18

    Determination of the best set of features has been acknowledged as one of the most difficult tasks in the pattern recognition process. In this report significance tests on the sort-ordered, sample-size normalized vote distribution of an ensemble of decision trees is introduced as a method of evaluating relative quality of feature sets. Alternative functional forms for feature sets are also examined. Associated standard deviations provide the means to evaluate the effect of the number of folds, the number of classifiers per fold, and the sample size on the resulting classifications. The method is applied to a problem for which a significant portion of the training set cannot be classified unambiguously.

  19. CUDT: A CUDA Based Decision Tree Algorithm

    PubMed Central

    Sheu, Ruey-Kai; Chiu, Chun-Chieh

    2014-01-01

    Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture), which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set. PMID:25140346

  20. Coherent neuronal ensembles are rapidly recruited when making a look-reach decision.

    PubMed

    Wong, Yan T; Fabiszak, Margaret M; Novikov, Yevgeny; Daw, Nathaniel D; Pesaran, Bijan

    2016-02-01

    Selecting and planning actions recruits neurons across many areas of the brain, but how ensembles of neurons work together to make decisions is unknown. Temporally coherent neural activity may provide a mechanism by which neurons coordinate their activity to make decisions. If so, neurons that are part of coherent ensembles may predict movement choices before other ensembles of neurons. We recorded neuronal activity in the lateral and medial banks of the intraparietal sulcus (IPS) of the posterior parietal cortex while monkeys made choices about where to look and reach. We decoded the activity to predict the choices. Ensembles of neurons that displayed coherent patterns of spiking activity extending across the IPS--'dual-coherent' ensembles--predicted movement choices substantially earlier than other neuronal ensembles. We propose that dual-coherent spike timing reflects interactions between groups of neurons that are important to decisions.

  1. Coherent neuronal ensembles are rapidly recruited when making a look-reach decision

    PubMed Central

    Wong, Yan T.; Fabiszak, Margaret M.; Novikov, Yevgeny; Daw, Nathaniel D.; Pesaran, Bijan

    2015-01-01

    Summary Selecting and planning actions recruits neurons across many areas of the brain but how ensembles of neurons work together to make decisions is unknown. Temporally-coherent neural activity may provide a mechanism by which neurons coordinate their activity in order to make decisions. If so, neurons that are part of coherent ensembles may predict movement choices before other ensembles of neurons. We recorded neuronal activity in the lateral and medial banks of the intraparietal sulcus (IPS) of the posterior parietal cortex, while monkeys made choices about where to look and reach and decoded the activity to predict the choices. Ensembles of neurons that displayed coherent patterns of spiking activity extending across the IPS, “dual coherent” ensembles, predicted movement choices substantially earlier than other neuronal ensembles. We propose that dual-coherent spike timing reflects interactions between groups of neurons that play an important role in how we make decisions. PMID:26752158

  2. The wisdom of the commons: ensemble tree classifiers for prostate cancer prognosis

    PubMed Central

    Koziol, James A.; Feng, Anne C.; Jia, Zhenyu; Wang, Yipeng; Goodison, Seven; McClelland, Michael; Mercola, Dan

    2009-01-01

    Motivation: Classification and regression trees have long been used for cancer diagnosis and prognosis. Nevertheless, instability and variable selection bias, as well as overfitting, are well-known problems of tree-based methods. In this article, we investigate whether ensemble tree classifiers can ameliorate these difficulties, using data from two recent studies of radical prostatectomy in prostate cancer. Results: Using time to progression following prostatectomy as the relevant clinical endpoint, we found that ensemble tree classifiers robustly and reproducibly identified three subgroups of patients in the two clinical datasets: non-progressors, early progressors and late progressors. Moreover, the consensus classifications were independent predictors of time to progression compared to known clinical prognostic factors. Contact: dmercola@uci.edu PMID:18628288

  3. Evaluation of a decision tree for management of chronic wounds.

    PubMed

    Melchior-MacDougall, F; Lander, J

    1995-03-01

    Nurses routinely make complex clinical decisions under conditions of uncertainty. They collect large, unwieldy data sets in the process of making these clinical decisions. To assist nurses in collecting and organizing data and in making complex clinical decisions, some nursing scholars recommend decision support systems. One such support system, a decision tree leads the nurse from general to specific assessments and ultimately to a decision choice or outcome. In this study, a decision tree was examined for its utility in promoting accuracy in decision making for management of chronic wounds among home care nurses. Home care nurses who used the decision tree made better decisions about staging and product choices for chronic wounds. More research is necessary to discover whether decision trees for the management of chronic wounds translate into improved client outcomes.

  4. Identification of metabolic syndrome using decision tree analysis.

    PubMed

    Worachartcheewan, Apilak; Nantasenamat, Chanin; Isarankura-Na-Ayudhya, Chartchalerm; Pidetcha, Phannee; Prachayasittikul, Virapong

    2010-10-01

    This study employs decision tree as a decision support system for rapid and automated identification of individuals with metabolic syndrome (MS) among a Thai population. Results demonstrated strong predictivity of the decision tree in classification of individuals with and without MS, displaying an overall accuracy in excess of 99%.

  5. Extracting decision rules from police accident reports through decision trees.

    PubMed

    de Oña, Juan; López, Griselda; Abellán, Joaquín

    2013-01-01

    Given the current number of road accidents, the aim of many road safety analysts is to identify the main factors that contribute to crash severity. To pinpoint those factors, this paper shows an application that applies some of the methods most commonly used to build decision trees (DTs), which have not been applied to the road safety field before. An analysis of accidents on rural highways in the province of Granada (Spain) between 2003 and 2009 (both inclusive) showed that the methods used to build DTs serve our purpose and may even be complementary. Applying these methods has enabled potentially useful decision rules to be extracted that could be used by road safety analysts. For instance, some of the rules may indicate that women, contrary to men, increase their risk of severity under bad lighting conditions. The rules could be used in road safety campaigns to mitigate specific problems. This would enable managers to implement priority actions based on a classification of accidents by types (depending on their severity). However, the primary importance of this proposal is that other databases not used here (i.e. other infrastructure, roads and countries) could be used to identify unconventional problems in a manner easy for road safety managers to understand, as decision rules.

  6. Ventriculogram segmentation using boosted decision trees

    NASA Astrophysics Data System (ADS)

    McDonald, John A.; Sheehan, Florence H.

    2004-05-01

    Left ventricular status, reflected in ejection fraction or end systolic volume, is a powerful prognostic indicator in heart disease. Quantitative analysis of these and other parameters from ventriculograms (cine xrays of the left ventricle) is infrequently performed due to the labor required for manual segmentation. None of the many methods developed for automated segmentation has achieved clinical acceptance. We present a method for semi-automatic segmentation of ventriculograms based on a very accurate two-stage boosted decision-tree pixel classifier. The classifier determines which pixels are inside the ventricle at key ED (end-diastole) and ES (end-systole) frames. The test misclassification rate is about 1%. The classifier is semi-automatic, requiring a user to select 3 points in each frame: the endpoints of the aortic valve and the apex. The first classifier stage is 2 boosted decision-trees, trained using features such as gray-level statistics (e.g. median brightness) and image geometry (e.g. coordinates relative to user supplied 3 points). Second stage classifiers are trained using the same features as the first, plus the output of the first stage. Border pixels are determined from the segmented images using dilation and erosion. A curve is then fit to the border pixels, minimizing a penalty function that trades off fidelity to the border pixels with smoothness. ED and ES volumes, and ejection fraction are estimated from border curves using standard area-length formulas. On independent test data, the differences between automatic and manual volumes (and ejection fractions) are similar in size to the differences between two human observers.

  7. Application of portfolio theory in decision tree analysis.

    PubMed

    Galligan, D T; Ramberg, C; Curtis, C; Ferguson, J; Fetrow, J

    1991-07-01

    A general application of portfolio analysis for herd decision tree analysis is described. In the herd environment, this methodology offers a means of employing population-based decision strategies that can help the producer control economic variation in expected return from a given set of decision options. An economic decision tree model regarding the use of prostaglandin in dairy cows with undetected estrus was used to determine the expected return of the decisions to use prostaglandin and breed on a timed basis, use prostaglandin and then breed on sign of estrus, or breed on signs of estrus. The risk attributes of these decision alternatives were calculated from the decision tree, and portfolio theory was used to find the efficient decision combinations (portfolios with the highest return for a given variance). The resulting combinations of decisions could be used to control return variation.

  8. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  9. 15 CFR Supplement No 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 15 Commerce and Foreign Trade 2 2014-01-01 2014-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000...

  10. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 15 Commerce and Foreign Trade 2 2012-01-01 2012-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  11. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 15 Commerce and Foreign Trade 2 2011-01-01 2011-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  12. 15 CFR Supplement No 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 15 Commerce and Foreign Trade 2 2013-01-01 2013-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000...

  13. Decision-Tree Formulation With Order-1 Lateral Execution

    NASA Technical Reports Server (NTRS)

    James, Mark

    2007-01-01

    A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive

  14. Computational study of developing high-quality decision trees

    NASA Astrophysics Data System (ADS)

    Fu, Zhiwei

    2002-03-01

    Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio- Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

  15. Peculiar spectral statistics of ensembles of trees and star-like graphs

    NASA Astrophysics Data System (ADS)

    Kovaleva, V.; Maximov, Yu; Nechaev, S.; Valba, O.

    2017-07-01

    In this paper we investigate the eigenvalue statistics of exponentially weighted ensembles of full binary trees and p-branching star graphs. We show that spectral densities of corresponding adjacency matrices demonstrate peculiar ultrametric structure inherent to sparse systems. In particular, the tails of the distribution for binary trees share the ‘Lifshitz singularity’ emerging in the one-dimensional localization, while the spectral statistics of p-branching star-like graphs is less universal, being strongly dependent on p. The hierarchical structure of spectra of adjacency matrices is interpreted as sets of resonance frequencies, that emerge in ensembles of fully branched tree-like systems, known as dendrimers. However, the relaxational spectrum is not determined by the cluster topology, but has rather the number-theoretic origin, reflecting the peculiarities of the rare-event statistics typical for one-dimensional systems with a quenched structural disorder. The similarity of spectral densities of an individual dendrimer and of an ensemble of linear chains with exponential distribution in lengths, demonstrates that dendrimers could be served as simple disorder-less toy models of one-dimensional systems with quenched disorder.

  16. Peculiar spectral statistics of ensembles of trees and star-like graphs

    DOE PAGES

    Kovaleva, V.; Maximov, Yu; Nechaev, S.; ...

    2017-07-11

    In this paper we investigate the eigenvalue statistics of exponentially weighted ensembles of full binary trees and p-branching star graphs. We show that spectral densities of corresponding adjacency matrices demonstrate peculiar ultrametric structure inherent to sparse systems. In particular, the tails of the distribution for binary trees share the \\Lifshitz singularity" emerging in the onedimensional localization, while the spectral statistics of p-branching star-like graphs is less universal, being strongly dependent on p. The hierarchical structure of spectra of adjacency matrices is interpreted as sets of resonance frequencies, that emerge in ensembles of fully branched tree-like systems, known as dendrimers. However,more » the relaxational spectrum is not determined by the cluster topology, but has rather the number-theoretic origin, re ecting the peculiarities of the rare-event statistics typical for one-dimensional systems with a quenched structural disorder. The similarity of spectral densities of an individual dendrimer and of ensemble of linear chains with exponential distribution in lengths, demonstrates that dendrimers could be served as simple disorder-less toy models of one-dimensional systems with quenched disorder.« less

  17. Decision tree methods: applications for classification and prediction

    PubMed Central

    SONG, Yan-yan; LU, Ying

    2015-01-01

    Summary Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure. PMID:26120265

  18. An Ensemble of Neural Networks for Stock Trading Decision Making

    NASA Astrophysics Data System (ADS)

    Chang, Pei-Chann; Liu, Chen-Hao; Fan, Chin-Yuan; Lin, Jun-Lin; Lai, Chih-Ming

    Stock turning signals detection are very interesting subject arising in numerous financial and economic planning problems. In this paper, Ensemble Neural Network system with Intelligent Piecewise Linear Representation for stock turning points detection is presented. The Intelligent piecewise linear representation method is able to generate numerous stocks turning signals from the historic data base, then Ensemble Neural Network system will be applied to train the pattern and retrieve similar stock price patterns from historic data for training. These turning signals represent short-term and long-term trading signals for selling or buying stocks from the market which are applied to forecast the future turning points from the set of test data. Experimental results demonstrate that the hybrid system can make a significant and constant amount of profit when compared with other approaches using stock data available in the market.

  19. Operational optimization of irrigation scheduling for citrus trees using an ensemble based data assimilation approach

    NASA Astrophysics Data System (ADS)

    Hendricks Franssen, H.; Han, X.; Martinez, F.; Jimenez, M.; Manzano, J.; Chanzy, A.; Vereecken, H.

    2013-12-01

    Data assimilation (DA) techniques, like the local ensemble transform Kalman filter (LETKF) not only offer the opportunity to update model predictions by assimilating new measurement data in real time, but also provide an improved basis for real-time (DA-based) control. This study focuses on the optimization of real-time irrigation scheduling for fields of citrus trees near Picassent (Spain). For three selected fields the irrigation was optimized with DA-based control, and for other fields irrigation was optimized on the basis of a more traditional approach where reference evapotranspiration for citrus trees was estimated using the FAO-method. The performance of the two methods is compared for the year 2013. The DA-based real-time control approach is based on ensemble predictions of soil moisture profiles, using the Community Land Model (CLM). The uncertainty in the model predictions is introduced by feeding the model with weather predictions from an ensemble prediction system (EPS) and uncertain soil hydraulic parameters. The model predictions are updated daily by assimilating soil moisture data measured by capacitance probes. The measurement data are assimilated with help of LETKF. The irrigation need was calculated for each of the ensemble members, averaged, and logistic constraints (hydraulics, energy costs) were taken into account for the final assigning of irrigation in space and time. For the operational scheduling based on this approach only model states and no model parameters were updated by the model. Other, non-operational simulation experiments for the same period were carried out where (1) neither ensemble weather forecast nor DA were used (open loop), (2) Only ensemble weather forecast was used, (3) Only DA was used, (4) also soil hydraulic parameters were updated in data assimilation and (5) both soil hydraulic and plant specific parameters were updated. The FAO-based and DA-based real-time irrigation control are compared in terms of soil moisture

  20. Ensemble modelling and structured decision-making to support Emergency Disease Management.

    PubMed

    Webb, Colleen T; Ferrari, Matthew; Lindström, Tom; Carpenter, Tim; Dürr, Salome; Garner, Graeme; Jewell, Chris; Stevenson, Mark; Ward, Michael P; Werkman, Marleen; Backer, Jantien; Tildesley, Michael

    2017-03-01

    Epidemiological models in animal health are commonly used as decision-support tools to understand the impact of various control actions on infection spread in susceptible populations. Different models contain different assumptions and parameterizations, and policy decisions might be improved by considering outputs from multiple models. However, a transparent decision-support framework to integrate outputs from multiple models is nascent in epidemiology. Ensemble modelling and structured decision-making integrate the outputs of multiple models, compare policy actions and support policy decision-making. We briefly review the epidemiological application of ensemble modelling and structured decision-making and illustrate the potential of these methods using foot and mouth disease (FMD) models. In case study one, we apply structured decision-making to compare five possible control actions across three FMD models and show which control actions and outbreak costs are robustly supported and which are impacted by model uncertainty. In case study two, we develop a methodology for weighting the outputs of different models and show how different weighting schemes may impact the choice of control action. Using these case studies, we broadly illustrate the potential of ensemble modelling and structured decision-making in epidemiology to provide better information for decision-making and outline necessary development of these methods for their further application.

  1. Automatic design of decision-tree algorithms with evolutionary algorithms.

    PubMed

    Barros, Rodrigo C; Basgalupp, Márcio P; de Carvalho, André C P L F; Freitas, Alex A

    2013-01-01

    This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

  2. Decision Trees for Prediction and Data Mining

    DTIC Science & Technology

    2005-02-10

    ironic, as research in tree-structured methods was originally motivated by the desire for an interpretable alternative to standard methods such as...multiple linear regression and neural networks. Another problem with most tree construction algorithms is that their variable selection methods are biased...software, including well-known ones such as CART (Breiman, Friedman, Olshen and Stone 1984) and M5 (Quinlan 1992). With the excep- tion of the lesser

  3. Generating the Simple Decision Tree with Symbiotic Evolution

    NASA Astrophysics Data System (ADS)

    Otani, Noriko; Shimura, Masamichi

    In representing classification rules by decision trees, simplicity of tree structure is as important as predictive accuracy especially in consideration of the comprehensibility to a human, the memory capacity and the time required to classify. Trees tend to be complex when they get high accuracy. This paper proposes a novel method for generating accurate and simple decision trees based on symbiotic evolution. It is distinctive of symbiotic evolution that two different populations are evolved in parallel through genetic algorithms. In our method one's individuals are partial trees of height 1, and the other's individuals are whole trees represented by the combinations of the former individuals. Generally, overfitting to training examples prevents getting high predictive accuracy. In order to circumvent this difficulty, individuals are evaluated with not only the accuracy in training examples but also the correct answer biased rate indicating the dispersion of the correct answers in the terminal nodes. Based on our method we developed a system called SESAT for generating decision trees. Our experimental results show that SESAT compares favorably with other systems on several datasets in the UCI repository. SESAT has the ability to generate more simple trees than C5.0 without sacrificing predictive accuracy.

  4. RNA search with decision trees and partial covariance models.

    PubMed

    Smith, Jennifer A

    2009-01-01

    The use of partial covariance models to search for RNA family members in genomic sequence databases is explored. The partial models are formed from contiguous subranges of the overall RNA family multiple alignment columns. A binary decision-tree framework is presented for choosing the order to apply the partial models and the score thresholds on which to make the decisions. The decision trees are chosen to minimize computation time subject to the constraint that all of the training sequences are passed to the full covariance model for final evaluation. Computational intelligence methods are suggested to select the decision tree since the tree can be quite complex and there is no obvious method to build the tree in these cases. Experimental results from seven RNA families shows execution times of 0.066-0.268 relative to using the full covariance model alone. Tests on the full sets of known sequences for each family show that at least 95 percent of these sequences are found for two families and 100 percent for five others. Since the full covariance model is run on all sequences accepted by the partial model decision tree, the false alarm rate is at least as low as that of the full model alone.

  5. Ethnographic Decision Tree Modeling: A Research Method for Counseling Psychology.

    ERIC Educational Resources Information Center

    Beck, Kirk A.

    2005-01-01

    This article describes ethnographic decision tree modeling (EDTM; C. H. Gladwin, 1989) as a mixed method design appropriate for counseling psychology research. EDTM is introduced and located within a postpositivist research paradigm. Decision theory that informs EDTM is reviewed, and the 2 phases of EDTM are highlighted. The 1st phase, model…

  6. Learning accurate very fast decision trees from uncertain data streams

    NASA Astrophysics Data System (ADS)

    Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo

    2015-12-01

    Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.

  7. Multi-Model Long-Range Ensemble Forecast for Decision Support in Hydroelectric Operations

    NASA Astrophysics Data System (ADS)

    Kunkel, M. L.; Parkinson, S.; Blestrud, D.; Holbrook, V. P.

    2014-12-01

    Idaho Power Company (IPC) is a hydroelectric based utility serving over a million customers in southern Idaho and eastern Oregon. Hydropower makes up ~50% of our power generation and accurate predictions of streamflow and precipitation drive our long-term planning and decision support for operations. We investigate the use of a multi-model ensemble approach for mid and long-range streamflow and precipitation forecasts throughout the Snake River Basin. Forecast are prepared using an Idaho Power developed ensemble forecasting technique for 89 locations throughout the Snake River Basin for periods of 3 to 18 months in advance. A series of multivariable linear regression, multivariable non-linear regression and multivariable Kalman filter techniques are combined in an ensemble forecast based upon two data types, historical data (streamflow, precipitation, climate indices [i.e. PDO, ENSO, AO, etc…]) and single value decomposition derived values based upon atmospheric heights and sea surface temperatures.

  8. Decision tree-based learning to predict patient controlled analgesia consumption and readjustment.

    PubMed

    Hu, Yuh-Jyh; Ku, Tien-Hsiung; Jan, Rong-Hong; Wang, Kuochen; Tseng, Yu-Chee; Yang, Shu-Fen

    2012-11-14

    Appropriate postoperative pain management contributes to earlier mobilization, shorter hospitalization, and reduced cost. The under treatment of pain may impede short-term recovery and have a detrimental long-term effect on health. This study focuses on Patient Controlled Analgesia (PCA), which is a delivery system for pain medication. This study proposes and demonstrates how to use machine learning and data mining techniques to predict analgesic requirements and PCA readjustment. The sample in this study included 1099 patients. Every patient was described by 280 attributes, including the class attribute. In addition to commonly studied demographic and physiological factors, this study emphasizes attributes related to PCA. We used decision tree-based learning algorithms to predict analgesic consumption and PCA control readjustment based on the first few hours of PCA medications. We also developed a nearest neighbor-based data cleaning method to alleviate the class-imbalance problem in PCA setting readjustment prediction. The prediction accuracies of total analgesic consumption (continuous dose and PCA dose) and PCA analgesic requirement (PCA dose only) by an ensemble of decision trees were 80.9% and 73.1%, respectively. Decision tree-based learning outperformed Artificial Neural Network, Support Vector Machine, Random Forest, Rotation Forest, and Naïve Bayesian classifiers in analgesic consumption prediction. The proposed data cleaning method improved the performance of every learning method in this study of PCA setting readjustment prediction. Comparative analysis identified the informative attributes from the data mining models and compared them with the correlates of analgesic requirement reported in previous works. This study presents a real-world application of data mining to anesthesiology. Unlike previous research, this study considers a wider variety of predictive factors, including PCA demands over time. We analyzed PCA patient data and conducted several

  9. Decision tree-based learning to predict patient controlled analgesia consumption and readjustment

    PubMed Central

    2012-01-01

    Background Appropriate postoperative pain management contributes to earlier mobilization, shorter hospitalization, and reduced cost. The under treatment of pain may impede short-term recovery and have a detrimental long-term effect on health. This study focuses on Patient Controlled Analgesia (PCA), which is a delivery system for pain medication. This study proposes and demonstrates how to use machine learning and data mining techniques to predict analgesic requirements and PCA readjustment. Methods The sample in this study included 1099 patients. Every patient was described by 280 attributes, including the class attribute. In addition to commonly studied demographic and physiological factors, this study emphasizes attributes related to PCA. We used decision tree-based learning algorithms to predict analgesic consumption and PCA control readjustment based on the first few hours of PCA medications. We also developed a nearest neighbor-based data cleaning method to alleviate the class-imbalance problem in PCA setting readjustment prediction. Results The prediction accuracies of total analgesic consumption (continuous dose and PCA dose) and PCA analgesic requirement (PCA dose only) by an ensemble of decision trees were 80.9% and 73.1%, respectively. Decision tree-based learning outperformed Artificial Neural Network, Support Vector Machine, Random Forest, Rotation Forest, and Naïve Bayesian classifiers in analgesic consumption prediction. The proposed data cleaning method improved the performance of every learning method in this study of PCA setting readjustment prediction. Comparative analysis identified the informative attributes from the data mining models and compared them with the correlates of analgesic requirement reported in previous works. Conclusion This study presents a real-world application of data mining to anesthesiology. Unlike previous research, this study considers a wider variety of predictive factors, including PCA demands over time. We analyzed

  10. Comparing the decision-relevance and utility of alternative ensembles of climate projections in water management and other applications

    NASA Astrophysics Data System (ADS)

    Lempert, R. J.; Tingstad, A.

    2015-12-01

    Decisions to manage the risks of climate change hinge, among many other things, on deeply uncertain and imperfect climate projections. Improving the decision relevance and utility of climate projections requires navigating a trade-off between increasing the physical realism of the model (often by improving the spatial resolution) and increasing the representation of decision-relevant uncertainties. This talk will examine the decision-relevance and utility of alternative ensembles of climate information by comparing two decision support applications, in water management and biodiversity perseveration, both in California. The climate ensembles will consist of different combinations of high and medium resolution projections from NARCCAP (North American Regional Climate Assessment Program) as well as low resolution, but more numerous, projections from the CMIP3 and CMIP5 ensembles. The decision support applications will use the same ensembles of climate projections in different contexts. Workshops with decision makers examine the extent to which the different ensembles lead to different decisions, the extent to which considering a wider range of uncertainty affects decisions, the extent to which decision makers' confidence in the projections and the decisions based on them will be sensitive to the resolution at which they are communicated and the resolution dependent skill, and how the answers to these questions varies with the water management and biodiversity contexts. This study aims to provide empirical evidence to support judgments on how best to use uncertainty climate information in water management and other decision support applications.

  11. Identifying ultrasound and clinical features of breast cancer molecular subtypes by ensemble decision

    PubMed Central

    Zhang, Lei; Li, Jing; Xiao, Yun; Cui, Hao; Du, Guoqing; Wang, Ying; Li, Ziyao; Wu, Tong; Li, Xia; Tian, Jiawei

    2015-01-01

    Breast cancer is molecularly heterogeneous and categorized into four molecular subtypes: Luminal-A, Luminal-B, HER2-amplified and Triple-negative. In this study, we aimed to apply an ensemble decision approach to identify the ultrasound and clinical features related to the molecular subtypes. We collected ultrasound and clinical features from 1,000 breast cancer patients and performed immunohistochemistry on these samples. We used the ensemble decision approach to select unique features and to construct decision models. The decision model for Luminal-A subtype was constructed based on the presence of an echogenic halo and post-acoustic shadowing or indifference. The decision model for Luminal-B subtype was constructed based on the absence of an echogenic halo and vascularity. The decision model for HER2-amplified subtype was constructed based on the presence of post-acoustic enhancement, calcification, vascularity and advanced age. The model for Triple-negative subtype followed two rules. One was based on irregular shape, lobulate margin contour, the absence of calcification and hypovascularity, whereas the other was based on oval shape, hypovascularity and micro-lobulate margin contour. The accuracies of the models were 83.8%, 77.4%, 87.9% and 92.7%, respectively. We identified specific features of each molecular subtype and expanded the scope of ultrasound for making diagnoses using these decision models. PMID:26046791

  12. Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets

    PubMed Central

    Doubravsky, Karel; Dohnal, Mirko

    2015-01-01

    Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details. PMID:26158662

  13. Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets.

    PubMed

    Doubravsky, Karel; Dohnal, Mirko

    2015-01-01

    Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details.

  14. Evaluation of Decision Trees for Cloud Detection from AVHRR Data

    NASA Technical Reports Server (NTRS)

    Shiffman, Smadar; Nemani, Ramakrishna

    2005-01-01

    Automated cloud detection and tracking is an important step in assessing changes in radiation budgets associated with global climate change via remote sensing. Data products based on satellite imagery are available to the scientific community for studying trends in the Earth's atmosphere. The data products include pixel-based cloud masks that assign cloud-cover classifications to pixels. Many cloud-mask algorithms have the form of decision trees. The decision trees employ sequential tests that scientists designed based on empirical astrophysics studies and simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In a previous study we compared automatically learned decision trees to cloud masks included in Advanced Very High Resolution Radiometer (AVHRR) data products from the year 2000. In this paper we report the replication of the study for five-year data, and for a gold standard based on surface observations performed by scientists at weather stations in the British Islands. For our sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks p < 0.001.

  15. Classification of posture and activities by using decision trees.

    PubMed

    Zhang, Ting; Tang, Wenlong; Sazonov, Edward S

    2012-01-01

    Obesity prevention and treatment as well as healthy life style recommendation requires the estimation of everyday physical activity. Monitoring posture allocations and activities with sensor systems is an effective method to achieve the goal. However, at present, most devices available rely on multiple sensors distributed on the body, which might be too obtrusive for everyday use. In this study, data was collected from a wearable shoe sensor system (SmartShoe) and a decision tree algorithm was applied for classification with high computational accuracy. The dataset was collected from 9 individual subjects performing 6 different activities--sitting, standing, walking, cycling, and stairs ascent/descent. Statistical features were calculated and the classification with decision tree classifier was performed, after which, advanced boosting algorithm was applied. The computational accuracy is as high as 98.85% without boosting, and 98.90% after boosting. Additionally, the simple tree structure provides a direct approach to simplify the feature set.

  16. Towards the assimilation of tree-ring-width records using ensemble Kalman filtering techniques

    NASA Astrophysics Data System (ADS)

    Acevedo, Walter; Reich, Sebastian; Cubasch, Ulrich

    2016-03-01

    This paper investigates the applicability of the Vaganov-Shashkin-Lite (VSL) forward model for tree-ring-width chronologies as observation operator within a proxy data assimilation (DA) setting. Based on the principle of limiting factors, VSL combines temperature and moisture time series in a nonlinear fashion to obtain simulated TRW chronologies. When used as observation operator, this modelling approach implies three compounding, challenging features: (1) time averaging, (2) "switching recording" of 2 variables and (3) bounded response windows leading to "thresholded response". We generate pseudo-TRW observations from a chaotic 2-scale dynamical system, used as a cartoon of the atmosphere-land system, and attempt to assimilate them via ensemble Kalman filtering techniques. Results within our simplified setting reveal that VSL's nonlinearities may lead to considerable loss of assimilation skill, as compared to the utilization of a time-averaged (TA) linear observation operator. In order to understand this undesired effect, we embed VSL's formulation into the framework of fuzzy logic (FL) theory, which thereby exposes multiple representations of the principle of limiting factors. DA experiments employing three alternative growth rate functions disclose a strong link between the lack of smoothness of the growth rate function and the loss of optimality in the estimate of the TA state. Accordingly, VSL's performance as observation operator can be enhanced by resorting to smoother FL representations of the principle of limiting factors. This finding fosters new interpretations of tree-ring-growth limitation processes.

  17. Supervised learning with decision tree-based methods in computational and systems biology.

    PubMed

    Geurts, Pierre; Irrthum, Alexandre; Wehenkel, Louis

    2009-12-01

    At the intersection between artificial intelligence and statistics, supervised learning allows algorithms to automatically build predictive models from just observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the review is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the review provides a survey of their applications in the context of computational and systems biology.

  18. Computerized Adaptive Test vs. decision trees: Development of a support decision system to identify suicidal behavior.

    PubMed

    Delgado-Gomez, D; Baca-Garcia, E; Aguado, D; Courtet, P; Lopez-Castroman, J

    2016-12-01

    Several Computerized Adaptive Tests (CATs) have been proposed to facilitate assessments in mental health. These tests are built in a standard way, disregarding useful and usually available information not included in the assessment scales that could increase the precision and utility of CATs, such as the history of suicide attempts. Using the items of a previously developed scale for suicidal risk, we compared the performance of a standard CAT and a decision tree in a support decision system to identify suicidal behavior. We included the history of past suicide attempts as a class for the separation of patients in the decision tree. The decision tree needed an average of four items to achieve a similar accuracy than a standard CAT with nine items. The accuracy of the decision tree, obtained after 25 cross-validations, was 81.4%. A shortened test adapted for the separation of suicidal and non-suicidal patients was developed. CATs can be very useful tools for the assessment of suicidal risk. However, standard CATs do not use all the information that is available. A decision tree can improve the precision of the assessment since they are constructed using a priori information. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Boosting alternating decision trees modeling of disease trait information.

    PubMed

    Liu, Kuang-Yu; Lin, Jennifer; Zhou, Xiaobo; Wong, Stephen T C

    2005-12-30

    We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disorder disease status as the outcome of ADTrees-based classifiers, we obtained a new quantitative trait based on average prediction scores, which was then used for genome-wide quantitative trait linkage (QTL) analysis. ADTrees are machine learning methods that combine boosting and decision trees algorithms to generate smaller and easier-to-interpret classification rules. In this application, we compared four modeling strategies from the combinations of two boosting iterations (log or exponential loss functions) coupled with two choices of tree generation types (a full alternating decision tree or a classic boosting decision tree). These four different strategies were applied to the founders in each population to construct four classifiers, which were then applied to each study participant. To compute average prediction score for each subject with a specific trait profile, such a process was repeated with 10 runs of 10-fold cross validation, and standardized prediction scores obtained from the 10 runs were averaged and used in subsequent expectation-maximization Haseman-Elston QTL analyses (implemented in GENEHUNTER) with the approximate 900 SNPs in Hardy-Weinberg equilibrium provided for each population. Our QTL analyses on the basis of four models (a full alternating decision tree and a classic boosting decision tree paired with either log or exponential loss function) detected evidence for linkage (Z >or= 1.96, p < 0.01) on chromosomes 1, 3, 5, and 9. Moreover, using average iteration and abundance scores for the 12 phenotypes and sex as their relevancy measurements, we found all relevant phenotypes for all four populations except phenotype b for the Karangar population

  20. The Utility of Decision Trees in Oncofertility Care in Japan.

    PubMed

    Ito, Yuki; Shiraishi, Eriko; Kato, Atsuko; Haino, Takayuki; Sugimoto, Kouhei; Okamoto, Aikou; Suzuki, Nao

    2017-03-01

    To identify the utility and issues associated with the use of decision trees in oncofertility patient care in Japan. A total of 35 women who had been diagnosed with cancer, but had not begun anticancer treatment, were enrolled. We applied the oncofertility decision tree for women published by Gardino et al. to counsel a consecutive series of women on fertility preservation (FP) options following cancer diagnosis. Percentage of women who decided to undergo oocyte retrieval for embryo cryopreservation and the expected live-birth rate for these patients were calculated using the following equation: expected live-birth rate = pregnancy rate at each age per embryo transfer × (1 - miscarriage rate) × No. of cryopreserved embryos. Oocyte retrieval was performed for 17 patients (48.6%; mean ± standard deviation [SD] age, 36.35 ± 3.82 years). The mean ± SD number of cryopreserved embryos was 5.29 ± 4.63. The expected live-birth rate was 0.66. The expected live-birth rate with FP indicated that one in three oncofertility patients would not expect to have a live birth following oocyte retrieval and embryo cryopreservation. While the decision trees were useful as decision-making tools for women contemplating FP, in the context of the current restrictions on oocyte donation and the extremely small number of adoptions in Japan, the remaining options for fertility after cancer are limited. In order for cancer survivors to feel secure in their decisions, the decision tree may need to be adapted simultaneously with improvements to the social environment, such as greater support for adoption.

  1. The limitations of decision trees and automatic learning in real world medical decision making.

    PubMed

    Kokol, P; Zorman, M; Stiglic, M M; Malèiae, I

    1998-01-01

    The decision tree approach is one of the most common approaches in automatic learning and decision making. It is popular for its simplicity in constructing, efficient use in decision making and for simple representation, which is easily understood by humans. The automatic learning of decision trees and their use usually show very good results in various "theoretical" environments. The training sets are usually large enough for learning algorithm to construct a hypothesis consistent with the underlying concept. But in real life it is often impossible to find the desired number of training objects for various reasons. The lack of possibilities to measure attribute values, high cost and complexity of such measurements, unavailability of all attributes at the same time are the typical representatives. There are different ways to deal with some of these problems, but in a delicate field of medical decision making, we cannot allow ourselves to make any inaccurate decisions. We have measured the values of 24 attributes before and after the 82 operations of children in age between 2 and 10 years. The aim was to find the dependencies between attribute values and a child's predisposition to acidemia--the decrease of blood's pH. Our main interest was in discovering predisposition to two forms of acidosis, the metabolic acidosis and the respiratory acidosis, which can both have serious effects on child's health. We decided to construct different decision trees from a set of training objects, which was complete (there were no missing attribute values), but on the other hand not large enough to avoid the effect of overfitting. A common approach to evaluation of a decision tree is the use of a test set. In our case we decided that instead of using a test set, we ask medical experts to take a closer look at the generated trees. They examined and evaluated the decision trees branch by branch. Their comments on the generated trees can be found in this paper. The comments show, that

  2. epiDMS: Data Management and Analytics for Decision-Making From Epidemic Spread Simulation Ensembles.

    PubMed

    Liu, Sicong; Poccia, Silvestro; Candan, K Selçuk; Chowell, Gerardo; Sapino, Maria Luisa

    2016-12-01

    Carefully calibrated large-scale computational models of epidemic spread represent a powerful tool to support the decision-making process during epidemic emergencies. Epidemic models are being increasingly used for generating forecasts of the spatial-temporal progression of epidemics at different spatial scales and for assessing the likely impact of different intervention strategies. However, the management and analysis of simulation ensembles stemming from large-scale computational models pose challenges, particularly when dealing with multiple interdependent parameters, spanning multiple layers and geospatial frames, affected by complex dynamic processes operating at different resolutions. We describe and illustrate with examples a novel epidemic simulation data management system, epiDMS, that was developed to address the challenges that arise from the need to generate, search, visualize, and analyze, in a scalable manner, large volumes of epidemic simulation ensembles and observations during the progression of an epidemic. epiDMS is a publicly available system that facilitates management and analysis of large epidemic simulation ensembles. epiDMS aims to fill an important hole in decision-making during healthcare emergencies by enabling critical services with significant economic and health impact.

  3. Using Evolutionary Algorithms to Induce Oblique Decision Trees

    SciTech Connect

    Cantu-Paz, E.; Kamath, C.

    2000-01-21

    This paper illustrates the application of evolutionary algorithms (EAs) to the problem of oblique decision tree induction. The objectives are to demonstrate that EAs can find classifiers whose accuracy is competitive with other oblique tree construction methods, and that this can be accomplished in a shorter time. Experiments were performed with a (1+1) evolutionary strategy and a simple genetic algorithm on public domain and artificial data sets. The empirical results suggest that the EAs quickly find Competitive classifiers, and that EAs scale up better than traditional methods to the dimensionality of the domain and the number of training instances.

  4. Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model

    NASA Astrophysics Data System (ADS)

    Jung, M.; Reichstein, M.; Bondeau, A.

    2009-05-01

    Global, spatially and temporally explicit estimates of carbon and water fluxes derived from empirical up-scaling eddy covariance measurements would constitute a new and possibly powerful data stream to study the variability of the global terrestrial carbon and water cycle. This paper introduces and validates a machine learning approach dedicated to the upscaling of observations from the current global network of eddy covariance towers (FLUXNET). We present a new model TRee Induction ALgorithm (TRIAL) that performs hierarchical stratification of the data set into units where particular multiple regressions for a target variable hold. We propose an ensemble approach (Evolving tRees with RandOm gRowth, ERROR) where the base learning algorithm is perturbed in order to gain a diverse sequence of different model trees which evolves over time. We evaluate the efficiency of the model tree ensemble approach using an artificial data set derived from the the Lund-Potsdam-Jena managed Land (LPJmL) biosphere model. We aim at reproducing global monthly gross primary production as simulated by LPJmL from 1998-2005 using only locations and months where high quality FLUXNET data exist for the training of the model trees. The model trees are trained with the LPJmL land cover and meteorological input data, climate data, and the fraction of absorbed photosynthetic active radiation simulated by LPJmL. Given that we know the "true result" in the form of global LPJmL simulations we can effectively study the performance of the model tree ensemble upscaling and associated problems of extrapolation capacity. We show that the model tree ensemble is able to explain 92% of the variability of the global LPJmL GPP simulations. The mean spatial pattern and the seasonal variability of GPP that constitute the largest sources of variance are very well reproduced (96% and 94% of variance explained respectively) while the monthly interannual anomalies which occupy much less variance are less well

  5. Sinkhole hazard assessment in Minnesota using a decision tree model

    NASA Astrophysics Data System (ADS)

    Gao, Yongli; Alexander, E. Calvin

    2008-05-01

    An understanding of what influences sinkhole formation and the ability to accurately predict sinkhole hazards is critical to environmental management efforts in the karst lands of southeastern Minnesota. Based on the distribution of distances to the nearest sinkhole, sinkhole density, bedrock geology and depth to bedrock in southeastern Minnesota and northwestern Iowa, a decision tree model has been developed to construct maps of sinkhole probability in Minnesota. The decision tree model was converted as cartographic models and implemented in ArcGIS to create a preliminary sinkhole probability map in Goodhue, Wabasha, Olmsted, Fillmore, and Mower Counties. This model quantifies bedrock geology, depth to bedrock, sinkhole density, and neighborhood effects in southeastern Minnesota but excludes potential controlling factors such as structural control, topographic settings, human activities and land-use. The sinkhole probability map needs to be verified and updated as more sinkholes are mapped and more information about sinkhole formation is obtained.

  6. Enhancement of Fast Face Detection Algorithm Based on a Cascade of Decision Trees

    NASA Astrophysics Data System (ADS)

    Khryashchev, V. V.; Lebedev, A. A.; Priorov, A. L.

    2017-05-01

    Face detection algorithm based on a cascade of ensembles of decision trees (CEDT) is presented. The new approach allows detecting faces other than the front position through the use of multiple classifiers. Each classifier is trained for a specific range of angles of the rotation head. The results showed a high rate of productivity for CEDT on images with standard size. The algorithm increases the area under the ROC-curve of 13% compared to a standard Viola-Jones face detection algorithm. Final realization of given algorithm consist of 5 different cascades for frontal/non-frontal faces. One more thing which we take from the simulation results is a low computational complexity of CEDT algorithm in comparison with standard Viola-Jones approach. This could prove important in the embedded system and mobile device industries because it can reduce the cost of hardware and make battery life longer.

  7. A modified classification tree method for personalized medicine decisions

    PubMed Central

    Tsai, Wan-Min; Zhang, Heping; Buta, Eugenia; O’Malley, Stephanie

    2015-01-01

    The tree-based methodology has been widely applied to identify predictors of health outcomes in medical studies. However, the classical tree-based approaches do not pay particular attention to treatment assignment and thus do not consider prediction in the context of treatment received. In recent years, attention has been shifting from average treatment effects to identifying moderators of treatment response, and tree-based approaches to identify subgroups of subjects with enhanced treatment responses are emerging. In this study, we extend and present modifications to one of these approaches (Zhang et al., 2010 [29]) to efficiently identify subgroups of subjects who respond more favorably to one treatment than another based on their baseline characteristics. We extend the algorithm by incorporating an automatic pruning step and propose a measure for assessment of the predictive performance of the constructed tree. We evaluate the proposed method through a simulation study and illustrate the approach using a data set from a clinical trial of treatments for alcohol dependence. This simple and efficient statistical tool can be used for developing algorithms for clinical decision making and personalized treatment for patients based on their characteristics. PMID:26770292

  8. An efficient tree classifier ensemble-based approach for pedestrian detection.

    PubMed

    Xu, Yanwu; Cao, Xianbin; Qiao, Hong

    2011-02-01

    Classification-based pedestrian detection systems (PDSs) are currently a hot research topic in the field of intelligent transportation. A PDS detects pedestrians in real time on moving vehicles. A practical PDS demands not only high detection accuracy but also high detection speed. However, most of the existing classification-based approaches mainly seek for high detection accuracy, while the detection speed is not purposely optimized for practical application. At the same time, the performance, particularly the speed, is primarily tuned based on experiments without theoretical foundations, leading to a long training procedure. This paper starts with measuring and optimizing detection speed, and then a practical classification-based pedestrian detection solution with high detection speed and training speed is described. First, an extended classification/detection speed metric, named feature-per-object (fpo), is proposed to measure the detection speed independently from execution. Then, an fpo minimization model with accuracy constraints is formulated based on a tree classifier ensemble, where the minimum fpo can guarantee the highest detection speed. Finally, the minimization problem is solved efficiently by using nonlinear fitting based on radial basis function neural networks. In addition, the optimal solution is directly used to instruct classifier training; thus, the training speed could be accelerated greatly. Therefore, a rapid and accurate classification-based detection technique is proposed for the PDS. Experimental results on urban traffic videos show that the proposed method has a high detection speed with an acceptable detection rate and a false-alarm rate for onboard detection; moreover, the training procedure is also very fast.

  9. The xeroderma pigmentosum pathway: decision tree analysis of DNA quality.

    PubMed

    Naegeli, Hanspeter; Sugasawa, Kaoru

    2011-07-15

    The nucleotide excision repair (NER) system is a fundamental cellular stress response that uses only a handful of DNA binding factors, mutated in the cancer-prone syndrome xeroderma pigmentosum (XP), to detect an astounding diversity of bulky base lesions, including those induced by ultraviolet light, electrophilic chemicals, oxygen radicals and further genetic insults. Several of these XP proteins are characterized by a mediocre preference for damaged substrates over the native double helix but, intriguingly, none of them recognizes injured bases with sufficient selectivity to account for the very high precision of bulky lesion excision. Instead, substrate versatility as well as damage specificity and strand selectivity are achieved by a multistage quality control strategy whereby different subunits of the XP pathway, in succession, interrogate the DNA double helix for a distinct abnormality in its structural or dynamic parameters. Through this step-by-step filtering procedure, the XP proteins operate like a systematic decision making tool, generally known as decision tree analysis, to sort out rare damaged bases embedded in a vast excess of native DNA. The present review is focused on the mechanisms by which multiple XP subunits of the NER pathway contribute to the proposed decision tree analysis of DNA quality in eukaryotic cells.

  10. Investigating the Utility of Oblique Tree-Based Ensembles for the Classification of Hyperspectral Data

    PubMed Central

    Poona, Nitesh; van Niekerk, Adriaan; Ismail, Riyad

    2016-01-01

    Ensemble classifiers are being widely used for the classification of spectroscopic data. In this regard, the random forest (RF) ensemble has been successfully applied in an array of applications, and has proven to be robust in handling high dimensional data. More recently, several variants of the traditional RF algorithm including rotation forest (rotF) and oblique random forest (oRF) have been applied to classifying high dimensional data. In this study we compare the traditional RF, rotF, and oRF (using three different splitting rules, i.e., ridge regression, partial least squares, and support vector machine) for the classification of healthy and infected Pinus radiata seedlings using high dimensional spectroscopic data. We further test the robustness of these five ensemble classifiers to reduced spectral resolution by spectral resampling (binning) of the original spectral bands. The results showed that the three oblique random forest ensembles outperformed both the traditional RF and rotF ensembles. Additionally, the rotF ensemble proved to be the least robust of the five ensembles tested. Spectral resampling of the original bands provided mixed results. Nevertheless, the results demonstrate that using spectral resampled bands is a promising approach to classifying asymptomatic stress in Pinus radiata seedlings. PMID:27854290

  11. Classification of Subcellular Phenotype Images by Decision Templates for Classifier Ensemble

    NASA Astrophysics Data System (ADS)

    Zhang, Bailing

    2010-01-01

    Subcellular localization is a key functional characteristic of proteins. An automatic, reliable and efficient prediction system for protein subcellular localization is needed for large-scale genome analysis. The automated cell phenotype image classification problem is an interesting "bioimage informatics" application. It can be used for establishing knowledge of the spatial distribution of proteins within living cells and permits to screen systems for drug discovery or for early diagnosis of a disease. In this paper, three well-known texture feature extraction methods including local binary patterns (LBP), Gabor filtering and Gray Level Coocurrence Matrix (GLCM) have been applied to cell phenotype images and the multiple layer perceptron (MLP) method has been used to classify cell phenotype image. After classification of the extracted features, decision-templates ensemble algorithm (DT) is used to combine base classifiers built on the different feature sets. Different texture feature sets can provide sufficient diversity among base classifiers, which is known as a necessary condition for improvement in ensemble performance. For the HeLa cells, the human classification error rate on this task is of 17% as reported in previous publications. We obtain with our method an error rate of 4.8%.

  12. Applying an Ensemble Classification Tree Approach to the Prediction of Completion of a 12-Step Facilitation Intervention with Stimulant Abusers

    PubMed Central

    Doyle, Suzanne R.; Donovan, Dennis M.

    2014-01-01

    Aims The purpose of this study was to explore the selection of predictor variables in the evaluation of drug treatment completion using an ensemble approach with classification trees. The basic methodology is reviewed and the subagging procedure of random subsampling is applied. Methods Among 234 individuals with stimulant use disorders randomized to a 12-Step facilitative intervention shown to increase stimulant use abstinence, 67.52% were classified as treatment completers. A total of 122 baseline variables were used to identify factors associated with completion. Findings The number of types of self-help activity involvement prior to treatment was the predominant predictor. Other effective predictors included better coping self-efficacy for substance use in high-risk situations, more days of prior meeting attendance, greater acceptance of the Disease model, higher confidence for not resuming use following discharge, lower ASI Drug and Alcohol composite scores, negative urine screens for cocaine or marijuana, and fewer employment problems. Conclusions The application of an ensemble subsampling regression tree method utilizes the fact that classification trees are unstable but, on average, produce an improved prediction of the completion of drug abuse treatment. The results support the notion there are early indicators of treatment completion that may allow for modification of approaches more tailored to fitting the needs of individuals and potentially provide more successful treatment engagement and improved outcomes. PMID:25134038

  13. Toward the Decision Tree for Inferring Requirements Maturation Types

    NASA Astrophysics Data System (ADS)

    Nakatani, Takako; Kondo, Narihito; Shirogane, Junko; Kaiya, Haruhiko; Hori, Shozo; Katamine, Keiichi

    Requirements are elicited step by step during the requirements engineering (RE) process. However, some types of requirements are elicited completely after the scheduled requirements elicitation process is finished. Such a situation is regarded as problematic situation. In our study, the difficulties of eliciting various kinds of requirements is observed by components. We refer to the components as observation targets (OTs) and introduce the word “Requirements maturation.” It means when and how requirements are elicited completely in the project. The requirements maturation is discussed on physical and logical OTs. OTs Viewed from a logical viewpoint are called logical OTs, e.g. quality requirements. The requirements of physical OTs, e.g., modules, components, subsystems, etc., includes functional and non-functional requirements. They are influenced by their requesters' environmental changes, as well as developers' technical changes. In order to infer the requirements maturation period of each OT, we need to know how much these factors influence the OTs' requirements maturation. According to the observation of actual past projects, we defined the PRINCE (Pre Requirements Intelligence Net Consideration and Evaluation) model. It aims to guide developers in their observation of the requirements maturation of OTs. We quantitatively analyzed the actual cases with their requirements elicitation process and extracted essential factors that influence the requirements maturation. The results of interviews of project managers are analyzed by WEKA, a data mining system, from which the decision tree was derived. This paper introduces the PRINCE model and the category of logical OTs to be observed. The decision tree that helps developers infer the maturation type of an OT is also described. We evaluate the tree through real projects and discuss its ability to infer the requirements maturation types.

  14. A Novel Approach on Designing Augmented Fuzzy Cognitive Maps Using Fuzzified Decision Trees

    NASA Astrophysics Data System (ADS)

    Papageorgiou, Elpiniki I.

    This paper proposes a new methodology for designing Fuzzy Cognitive Maps using crisp decision trees that have been fuzzified. Fuzzy cognitive map is a knowledge-based technique that works as an artificial cognitive network inheriting the main aspects of cognitive maps and artificial neural networks. Decision trees, in the other hand, are well known intelligent techniques that extract rules from both symbolic and numeric data. Fuzzy theoretical techniques are used to fuzzify crisp decision trees in order to soften decision boundaries at decision nodes inherent in this type of trees. Comparisons between crisp decision trees and the fuzzified decision trees suggest that the later fuzzy tree is significantly more robust and produces a more balanced decision making. The approach proposed in this paper could incorporate any type of fuzzy decision trees. Through this methodology, new linguistic weights were determined in FCM model, thus producing augmented FCM tool. The framework is consisted of a new fuzzy algorithm to generate linguistic weights that describe the cause-effect relationships among the concepts of the FCM model, from induced fuzzy decision trees.

  15. A Theoretical Analysis of Why Hybrid Ensembles Work

    PubMed Central

    2017-01-01

    Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles. PMID:28255296

  16. Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression.

    PubMed

    Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M

    2014-12-01

    Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed.

  17. DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION

    SciTech Connect

    Vasconcellos, E. C.; Ruiz, R. S. R.; De Carvalho, R. R.; Capelato, H. V.; Gal, R. R.; LaBarbera, F. L.; Frago Campos Velho, H.; Trevisan, M.

    2011-06-15

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 {<=} r {<=} 21 (85.2%) and r {>=} 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 {<=} r {<=} 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination ({approx}2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 {<=} r {<=} 21.

  18. Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model

    NASA Astrophysics Data System (ADS)

    Jung, M.; Reichstein, M.; Bondeau, A.

    2009-10-01

    Global, spatially and temporally explicit estimates of carbon and water fluxes derived from empirical up-scaling eddy covariance measurements would constitute a new and possibly powerful data stream to study the variability of the global terrestrial carbon and water cycle. This paper introduces and validates a machine learning approach dedicated to the upscaling of observations from the current global network of eddy covariance towers (FLUXNET). We present a new model TRee Induction ALgorithm (TRIAL) that performs hierarchical stratification of the data set into units where particular multiple regressions for a target variable hold. We propose an ensemble approach (Evolving tRees with RandOm gRowth, ERROR) where the base learning algorithm is perturbed in order to gain a diverse sequence of different model trees which evolves over time. We evaluate the efficiency of the model tree ensemble (MTE) approach using an artificial data set derived from the Lund-Potsdam-Jena managed Land (LPJmL) biosphere model. We aim at reproducing global monthly gross primary production as simulated by LPJmL from 1998-2005 using only locations and months where high quality FLUXNET data exist for the training of the model trees. The model trees are trained with the LPJmL land cover and meteorological input data, climate data, and the fraction of absorbed photosynthetic active radiation simulated by LPJmL. Given that we know the "true result" in the form of global LPJmL simulations we can effectively study the performance of the MTE upscaling and associated problems of extrapolation capacity. We show that MTE is able to explain 92% of the variability of the global LPJmL GPP simulations. The mean spatial pattern and the seasonal variability of GPP that constitute the largest sources of variance are very well reproduced (96% and 94% of variance explained respectively) while the monthly interannual anomalies which occupy much less variance are less well matched (41% of variance explained

  19. Career Path Suggestion using String Matching and Decision Trees

    NASA Astrophysics Data System (ADS)

    Nagpal, Akshay; P. Panda, Supriya

    2015-05-01

    High school and college graduates seemingly are often battling for the courses they should major in order to achieve their target career. In this paper, we worked on suggesting a career path to a graduate to reach his/her dream career given the current educational status. Firstly, we collected the career data of professionals and academicians from various career fields and compiled the data set by using the necessary information from the data. Further, this was used as the basis to suggest the most appropriate career path for the person given his/her current educational status. Decision trees and string matching algorithms were employed to suggest the appropriate career path for a person. Finally, an analysis of the result has been done directing to further improvements in the model.

  20. Probabilistic lung nodule classification with belief decision trees.

    PubMed

    Zinovev, Dmitriy; Feigenbaum, Jonathan; Furst, Jacob; Raicu, Daniela

    2011-01-01

    In reading Computed Tomography (CT) scans with potentially malignant lung nodules, radiologists make use of high level information (semantic characteristics) in their analysis. Computer-Aided Diagnostic Characterization (CADc) systems can assist radiologists by offering a "second opinion"--predicting these semantic characteristics for lung nodules. In this work, we propose a way of predicting the distribution of radiologists' opinions using a multiple-label classification algorithm based on belief decision trees using the National Cancer Institute (NCI) Lung Image Database Consortium (LIDC) dataset, which includes semantic annotations by up to four human radiologists for each one of the 914 nodules. Furthermore, we evaluate our multiple-label results using a novel distance-threshold curve technique--and, measuring the area under this curve, obtain 69% performance on the validation subset. We conclude that multiple-label classification algorithms are an appropriate method of representing the diagnoses of multiple radiologists on lung CT scans when ground truth is unavailable.

  1. Integrating Ensemble Forecasts of Precipitation and Streamflow into Decision Support for Reservoir Operations in North Central Texas

    NASA Astrophysics Data System (ADS)

    Kim, S.; Limon, R. A.; Alizadeh, B.; Seo, D. J.; Fincannon, T. J.; Winguth, A. M. E.; Brown, J.; Blaylock, L.; Lampe, M.; Philpott, A.; Bell, F.

    2016-12-01

    North Central Texas relies heavily on surface water for water supply. To meet the growing demand, large raw water suppliers, such as the Tarrant Regional Water District (TRWD), operate systems of reservoirs that are connected by extensive networks of pipelines over long distances. To ensure water supply at all times, while minimizing flooding risks and pumping cost, TRWD utilizes a suite of decision support tools. This research aims to improve the operating efficiency of the water delivery system by providing skillful ensemble precipitation and inflow forecasts that can be used to optimize the water supply operations under uncertain environmental conditions. To assess the value of medium- and long-range forecasts of precipitation and inflow to operation and management of the TRWD's reservoir-pipeline system, a set of hindcasting and verification experiments is being carried out. The hindcasting experiments use weather and climate reforecasts from the Global Ensemble Forecast System and the Climate Forecast System Version 2, and the forecasting and verification tools of the National Weather Service's Hydrologic Ensemble Forecast Service, the Community Hydrologic Prediction System, and the Ensemble Verification System. The value of ensemble forecasts will be demonstrated via the TRWD forecasting model, which uses RiverWare. We also present a multiscale bias correction procedure for post-processing the raw streamflow ensembles.

  2. The value of decision tree analysis in planning anaesthetic care in obstetrics.

    PubMed

    Bamber, J H; Evans, S A

    2016-08-01

    The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed.

  3. Classification of Liss IV Imagery Using Decision Tree Methods

    NASA Astrophysics Data System (ADS)

    Verma, Amit Kumar; Garg, P. K.; Prasad, K. S. Hari; Dadhwal, V. K.

    2016-06-01

    Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green) has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness) with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  4. Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised

    USDA-ARS?s Scientific Manuscript database

    In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with ...

  5. An Improved Decision Tree for Predicting a Major Product in Competing Reactions

    ERIC Educational Resources Information Center

    Graham, Kate J.

    2014-01-01

    When organic chemistry students encounter competing reactions, they are often overwhelmed by the task of evaluating multiple factors that affect the outcome of a reaction. The use of a decision tree is a useful tool to teach students to evaluate a complex situation and propose a likely outcome. Specifically, a decision tree can help students…

  6. Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments

    ERIC Educational Resources Information Center

    Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.

    2009-01-01

    The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…

  7. Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments

    ERIC Educational Resources Information Center

    Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.

    2009-01-01

    The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…

  8. An Improved Decision Tree for Predicting a Major Product in Competing Reactions

    ERIC Educational Resources Information Center

    Graham, Kate J.

    2014-01-01

    When organic chemistry students encounter competing reactions, they are often overwhelmed by the task of evaluating multiple factors that affect the outcome of a reaction. The use of a decision tree is a useful tool to teach students to evaluate a complex situation and propose a likely outcome. Specifically, a decision tree can help students…

  9. Prediction of Regional Streamflow Frequency using Model Tree Ensembles: A data-driven approach based on natural and anthropogenic drainage area characteristics

    NASA Astrophysics Data System (ADS)

    Schnier, S.; Cai, X.

    2012-12-01

    This study introduces a highly accurate data-driven method to predict streamflow frequency statistics based on known drainage area characteristics which yields insights into the dominant controls of regional streamflow. The model is enhanced by explicit consideration of human interference in local hydrology. The basic idea is to use decision trees (i.e., regression trees) to regionalize the dataset and create a model tree by fitting multi-linear equations to the leaves of the regression tree. We improve model accuracy and obtain a measure of variable importance by creating an ensemble of randomized model trees using bootstrap aggregation (i.e., bagging). The database used to induce the models is built from public domain drainage area characteristics for 715 USGS stream gages (455 in Texas and 260 in Illinois). The database includes information on natural characteristics such as precipitation, soil type and slope, as well as anthropogenic ones including land cover, human population and water use. Model accuracy was evaluated using cross-validation and several performance metrics. During the validation, the gauges that are withheld from the analysis represent ungauged watersheds. The proposed method outperforms standard regression models such as the method of residuals for predictions in ungauged watersheds. Importantly, out-of-bag variable importance combined with models for 17 points along the flow duration curve (FDC) (i.e., from 0% to 100% exceedance frequency) yields insight into the dominant controls of regional streamflow. The most discriminant variables for high flows are drainage area and seasonal precipitation. Discriminant variables for low flows are more complex and model accuracy is improved with base-flow data, which is particularly difficult to obtain for ungauged sites. Consideration of human activities, such as percent urban and water use, is also shown to improve accuracy of low flow predictions. Drainage area characteristics, especially

  10. Learning from examples - Generation and evaluation of decision trees for software resource analysis

    NASA Technical Reports Server (NTRS)

    Selby, Richard W.; Porter, Adam A.

    1988-01-01

    A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.

  11. Learning from examples - Generation and evaluation of decision trees for software resource analysis

    NASA Technical Reports Server (NTRS)

    Selby, Richard W.; Porter, Adam A.

    1988-01-01

    A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.

  12. Discovering Patterns in Brain Signals Using Decision Trees

    PubMed Central

    2016-01-01

    Even with emerging technologies, such as Brain-Computer Interfaces (BCI) systems, understanding how our brains work is a very difficult challenge. So we propose to use a data mining technique to help us in this task. As a case of study, we analyzed the brain's behaviour of blind people and sighted people in a spatial activity. There is a common belief that blind people compensate their lack of vision using the other senses. If an object is given to sighted people and we asked them to identify this object, probably the sense of vision will be the most determinant one. If the same experiment was repeated with blind people, they will have to use other senses to identify the object. In this work, we propose a methodology that uses decision trees (DT) to investigate the difference of how the brains of blind people and people with vision react against a spatial problem. We choose the DT algorithm because it can discover patterns in the brain signal, and its presentation is human interpretable. Our results show that using DT to analyze brain signals can help us to understand the brain's behaviour. PMID:27688746

  13. ArborZ: Photometric Redshifts Using Boosted Decision Trees

    NASA Astrophysics Data System (ADS)

    Gerdes, David W.; Sypniewski, Adam J.; McKay, Timothy A.; Hao, Jiangang; Weis, Matthew R.; Wechsler, Risa H.; Busha, Michael T.

    2010-06-01

    Precision photometric redshifts will be essential for extracting cosmological parameters from the next generation of wide-area imaging surveys. In this paper, we introduce a photometric redshift algorithm, ArborZ, based on the machine-learning technique of boosted decision trees. We study the algorithm using galaxies from the Sloan Digital Sky Survey (SDSS) and from mock catalogs intended to simulate both the SDSS and the upcoming Dark Energy Survey. We show that it improves upon the performance of existing algorithms. Moreover, the method naturally leads to the reconstruction of a full probability density function (PDF) for the photometric redshift of each galaxy, not merely a single "best estimate" and error, and also provides a photo-z quality figure of merit for each galaxy that can be used to reject outliers. We show that the stacked PDFs yield a more accurate reconstruction of the redshift distribution N(z). We discuss limitations of the current algorithm and ideas for future work.

  14. Molecular decision trees realized by ultrafast electronic spectroscopy

    PubMed Central

    Fresch, Barbara; Hiluf, Dawit; Collini, Elisabetta; Levine, R. D.; Remacle, F.

    2013-01-01

    The outcome of a light–matter interaction depends on both the state of matter and the state of light. It is thus a natural setting for implementing bilinear classical logic. A description of the state of a time-varying system requires measuring an (ideally complete) set of time-dependent observables. Typically, this is prohibitive, but in weak-field spectroscopy we can move toward this goal because only a finite number of levels are accessible. Recent progress in nonlinear spectroscopies means that nontrivial measurements can be implemented and thereby give rise to interesting logic schemes where the outputs are functions of the observables. Lie algebra offers a natural tool for generating the outcome of the bilinear light–matter interaction. We show how to synthesize these ideas by explicitly discussing three-photon spectroscopy of a bichromophoric molecule for which there are four accessible states. Switching logic would use the on–off occupancies of these four states as outcomes. Here, we explore the use of all 16 observables that define the time-evolving state of the bichromophoric system. The bilinear laser–system interaction with the three pulses of the setup of a 2D photon echo spectroscopy experiment can be used to generate a rich parallel logic that corresponds to the implementation of a molecular decision tree. Our simulations allow relaxation by weak coupling to the environment, which adds to the complexity of the logic operations. PMID:24043793

  15. Decision trees in selection of featured determined food quality.

    PubMed

    Dębska, B; Guzowska-Świder, B

    2011-10-31

    The determination of food quality, authenticity and the detection of adulterations are problems of increasing importance in food chemistry. Recently, chemometric classification techniques and pattern recognition analysis methods for wine and other alcoholic beverages have received great attention and have been largely used. Beer is a complex mixture of components: on one hand a volatile fraction, which is responsible for its aroma, and on the other hand, a non-volatile fraction or extract consisting of a great variety of substances with distinct characteristics. The aim of this study was to consider parameters which contribute to beer differentiation according to the quality grade. Chemical (e.g. pH, acidity, dry extract, alcohol content, CO(2) content) and sensory features (e.g. bitter taste, color) were determined in 70 beer samples and used as variables in decision tree techniques. This pattern recognition techniques applied to the dataset were able to extract information useful in obtaining a satisfactory classification of beer samples according to their quality grade. Feature selection procedures indicated which features are the most discriminating for classification.

  16. Using decision trees to measure activities in people with stroke.

    PubMed

    Zhang, Ting; Fulk, George D; Tang, Wenlong; Sazonov, Edward S

    2013-01-01

    Improving community mobility is a common goal for persons with stroke. Measuring daily physical activity is helpful to determine the effectiveness of rehabilitation interventions. In our previous studies, a novel wearable shoe-based sensor system (SmartShoe) was shown to be capable of accurately classify three major postures and activities (sitting, standing, and walking) from individuals with stroke by using Artificial Neural Network (ANN). In this study, we utilized decision tree algorithms to develop individual and group activity classification models for stroke patients. The data was acquired from 12 participants with stroke. For 3-class classification, the average accuracy was 99.1% with individual models and 91.5% with group models. Further, we extended the activities into 8 classes: sitting, standing, walking, cycling, stairs-up, stairs-down, wheel-chair-push, and wheel-chair-propel. The classification accuracy for individual models was 97.9%, and for group model was 80.2%, demonstrating feasibility of multi-class activity recognition by SmartShoe in stroke patients.

  17. Ensemble Methods

    NASA Astrophysics Data System (ADS)

    Re, Matteo; Valentini, Giorgio

    2012-03-01

    Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been

  18. Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine

    NASA Technical Reports Server (NTRS)

    Schwabacher, Mark A.; Aguilar, Robert; Figueroa, Fernando F.

    2009-01-01

    The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically "learns" a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to "train" and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it "learned" a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location.

  19. Potential of ensemble tree methods for early-season prediction of winter wheat yield from short time series of remotely sensed normalized difference vegetation index and in situ meteorological data

    NASA Astrophysics Data System (ADS)

    Heremans, Stien; Dong, Qinghan; Zhang, Beier; Bydekerke, Lieven; Van Orshoven, Jos

    2015-01-01

    We aimed at analyzing the potential of two ensemble tree machine learning methods-boosted regression trees and random forests-for (early) prediction of winter wheat yield from short time series of remotely sensed vegetation indices at low spatial resolution and of in situ meteorological data in combination with annual fertilization levels. The study area was the Huaibei Plain in eastern China, and all models were calibrated and validated for five separate prefectures. To this end, a cross-validation process was developed that integrates model meta-parameterization and simple forward feature selection. We found that the resulting models deliver early estimates that are accurate enough to support decision making in the agricultural sector and to allow their operational use for yield forecasting. To attain maximum prediction accuracy, incorporating predictors from the end of the growing season is, however, recommended.

  20. Using GEFS ensemble forecasts for decision making in reservoir management in California

    NASA Astrophysics Data System (ADS)

    Scheuerer, M.; Hamill, T.; Webb, R. S.

    2015-12-01

    Reservoirs such as Lake Mendocino in California's Russian River Basin provide flood control, water supply, recreation, and environmental stream flow regulation. Many of these reservoirs are operated by the U.S. Army Corps of Engineers (Corps) according to water control manuals that specify elevations for an upper volume of reservoir storage that must be kept available for capturing storm runoff and reducing flood risk, and a lower volume of storage that may be used for water supply. During extreme rainfall events, runoff is captured by these reservoirs and released as quickly as possible to create flood storage space for another potential storm. These flood control manuals are based on typical historical weather patterns - wet during the winter, dry otherwise - but are not informed directly by weather prediction. Alternative reservoir management approaches such as Forecast-Informed Reservoir Operations (FIRO), which seek to incorporate advances in weather prediction, are currently being explored as means to improve water supply availability while maintaining flood risk reduction and providing additional ecosystem benefits.We present results from a FIRO proof-of-concept study investigating the reliability of post-processed GEFS ensemble forecasts to predict the probability that day 6-to-10 precipitation accumulations in certain areas in California exceed a high threshold. Our results suggest that reliable forecast guidance can be provided, and the resulting probabilities could be used to inform decisions to release or hold water in the reservoirs. We illustrate the potential of these forecasts in a case study of extreme event probabilities for the Russian River Basin in California.

  1. Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    NASA Astrophysics Data System (ADS)

    Kim, Jong Kyu; Kim, Nam Soo

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  2. Detecting road maps for capacity utilization decisions by Clustering Analysis and CHAID Decision Trees.

    PubMed

    Koyuncugil, Ali Serhan; Ozgulbas, Nermin

    2010-08-01

    The aims of this study are to provide a standard CUR value, to determine financial and organizational factors which affect the capacity utilization and develop road maps for increasing capacity utilization. To reach these aims by an objective method, we used data mining method that discovers hidden and useful pattern in a large amount of data. Two different method of data mining were used in two stages for this study. In first step, standard value of CUR was determined by K-means Clustering Analysis. CHAID Decision Tree Algorithm as a second method was implemented for determination of impact factors that provided steps for road maps. The study was concerned Turkish Ministry of Health public hospitals. 592 hospitals were covered and financial and operational data of the year 2004 were used in the study. Finally two different road maps were developed and suggestions were made according the results of the study.

  3. Using decision trees to enhance interdisciplinary team work: the case of oncofertility.

    PubMed

    Gardino, Shauna L; Jeruss, Jacqueline S; Woodruff, Teresa K

    2010-05-01

    Oncofertility, an emerging discipline at the intersection of cancer and fertility, strives to give cancer patients options when they are confronting potential infertility as a consequence of cancer treatment. Fertility preservation decisions must be made before treatment begins, adding stress to the decision-making process. Healthcare providers need to be aware of the intricacies involved in oncofertility decision making, and the often tight time line that patients face when making these decisions. Cancer patient's perspectives may also change, as the dual burden of a cancer diagnosis and potential infertility can cause great flux in emotions. A provider-facing decision tree was created to enhance patient decision-making capacities and outline the multiple potential intervention points. Decision trees, which highlight the important decision points during which providers can approach patients, can be a useful tool to help providers in counseling patients on fertility preservation.

  4. How to pose the question matters: Behavioural Economics concepts in decision making on the basis of ensemble forecasts

    NASA Astrophysics Data System (ADS)

    Alfonso, Leonardo; van Andel, Schalk Jan

    2014-05-01

    Part of recent research in ensemble and probabilistic hydro-meteorological forecasting analyses which probabilistic information is required by decision makers and how it can be most effectively visualised. This work, in addition, analyses if decision making in flood early warning is also influenced by the way the decision question is posed. For this purpose, the decision-making game "Do probabilistic forecasts lead to better decisions?", which Ramos et al (2012) conducted at the EGU General Assembly 2012 in the city of Vienna, has been repeated with a small group and expanded. In that game decision makers had to decide whether or not to open a flood release gate, on the basis of flood forecasts, with and without uncertainty information. A conclusion of that game was that, in the absence of uncertainty information, decision makers are compelled towards a more risk-averse attitude. In order to explore to what extent the answers were driven by the way the questions were framed, in addition to the original experiment, a second variant was introduced where participants were asked to choose between a sure value (for either loosing or winning with a giving probability) and a gamble. This set-up is based on Kahneman and Tversky (1979). Results indicate that the way how the questions are posed may play an important role in decision making and that Prospect Theory provides promising concepts to further understand how this works.

  5. Prediction of Weather Impacted Airport Capacity using Ensemble Learning

    NASA Technical Reports Server (NTRS)

    Wang, Yao Xun

    2011-01-01

    Ensemble learning with the Bagging Decision Tree (BDT) model was used to assess the impact of weather on airport capacities at selected high-demand airports in the United States. The ensemble bagging decision tree models were developed and validated using the Federal Aviation Administration (FAA) Aviation System Performance Metrics (ASPM) data and weather forecast at these airports. The study examines the performance of BDT, along with traditional single Support Vector Machines (SVM), for airport runway configuration selection and airport arrival rates (AAR) prediction during weather impacts. Testing of these models was accomplished using observed weather, weather forecast, and airport operation information at the chosen airports. The experimental results show that ensemble methods are more accurate than a single SVM classifier. The airport capacity ensemble method presented here can be used as a decision support model that supports air traffic flow management to meet the weather impacted airport capacity in order to reduce costs and increase safety.

  6. DeTMan: A Decision Tree Management Tool for the Web and PDA Environments

    PubMed Central

    Lindquist, Michael; Afrin, Lawrence B.; Norcross, E. Douglas

    2003-01-01

    Illness management protocols, often represented as decision trees, are used in many areas of medicine. Some clinical departments maintain numerous, often quite complex protocols. Protocol access in acute care situations can be challenging, especially when available only in hardcopy format. Access via the web and especially via personal digital assistants would be more helpful. In the absence of the prior availability of a general purpose web/PDA decision tree editor/ navigator, we are developing such a tool. PMID:14728421

  7. Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree

    DTIC Science & Technology

    2008-04-01

    REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music

  8. Metric Sex Determination of the Human Coxal Bone on a Virtual Sample using Decision Trees.

    PubMed

    Savall, Frédéric; Faruch-Bilfeld, Marie; Dedouit, Fabrice; Sans, Nicolas; Rousseau, Hervé; Rougé, Daniel; Telmon, Norbert

    2015-11-01

    Decision trees provide an alternative to multivariate discriminant analysis, which is still the most commonly used in anthropometric studies. Our study analyzed the metric characterization of a recent virtual sample of 113 coxal bones using decision trees for sex determination. From 17 osteometric type I landmarks, a dataset was built with five classic distances traditionally reported in the literature and six new distances selected using the two-step ratio method. A ten-fold cross-validation was performed, and a decision tree was established on two subsamples (training and test sets). The decision tree established on the training set included three nodes and its application to the test set correctly classified 92% of individuals. This percentage was similar to the data of the literature. The usefulness of decision trees has been demonstrated in numerous fields. They have been already used in sex determination, body mass prediction, and ancestry estimation. This study shows another use of decision trees enabling simple and accurate sex determination.

  9. Development of a diagnostic decision tree for obstructive pulmonary diseases based on real-life data.

    PubMed

    Metting, Esther I; In 't Veen, Johannes C C M; Dekhuijzen, P N Richard; van Heijst, Ellen; Kocks, Janwillem W H; Muilwijk-Kroes, Jacqueline B; Chavannes, Niels H; van der Molen, Thys

    2016-01-01

    The aim of this study was to develop and explore the diagnostic accuracy of a decision tree derived from a large real-life primary care population. Data from 9297 primary care patients (45% male, mean age 53±17 years) with suspicion of an obstructive pulmonary disease was derived from an asthma/chronic obstructive pulmonary disease (COPD) service where patients were assessed using spirometry, the Asthma Control Questionnaire, the Clinical COPD Questionnaire, history data and medication use. All patients were diagnosed through the Internet by a pulmonologist. The Chi-squared Automatic Interaction Detection method was used to build the decision tree. The tree was externally validated in another real-life primary care population (n=3215). Our tree correctly diagnosed 79% of the asthma patients, 85% of the COPD patients and 32% of the asthma-COPD overlap syndrome (ACOS) patients. External validation showed a comparable pattern (correct: asthma 78%, COPD 83%, ACOS 24%). Our decision tree is considered to be promising because it was based on real-life primary care patients with a specialist's diagnosis. In most patients the diagnosis could be correctly predicted. Predicting ACOS, however, remained a challenge. The total decision tree can be implemented in computer-assisted diagnostic systems for individual patients. A simplified version of this tree can be used in daily clinical practice as a desk tool.

  10. Application of decision tree model for the ground subsidence hazard mapping near abandoned underground coal mines.

    PubMed

    Lee, Saro; Park, Inhye

    2013-09-30

    Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. Copyright © 2013. Published by Elsevier Ltd.

  11. Combining evolutionary algorithms with oblique decision trees to detect bent double galaxies

    SciTech Connect

    Cantu-Paz, E; Kamath, C

    2000-06-22

    Decision trees have long been popular in classification as they use simple and easy-to-understand tests at each node. Most variants of decision trees test a single attribute at a node, leading to axis-parallel trees, where the test results in a hyperplane which is parallel to one of the dimensions in the attribute space. These trees can be rather large and inaccurate in cases where the concept to be learnt is best approximated by oblique hyperplanes. In such cases, it may be more appropriate to use an oblique decision tree, where the decision at each node is a linear combination of the attributes. Oblique decision trees have not gained wide popularity in part due to the complexity of constructing good oblique splits and the tendency of existing splitting algorithms to get stuck in local minima. Several alternatives have been proposed to handle these problems including randomization in conjunction with deterministic hill climbing and the use of simulated annealing. In this paper, they use evolutionary algorithms (EAs) to determine the split. EAs are well suited for this problem because of their global search properties, their tolerance to noisy fitness evaluations, and their scalability to large dimensional search spaces. They demonstrate the technique on a practical problem from astronomy, namely, the classification of galaxies with a bent-double morphology, and describe their experiences with several split evaluation criteria.

  12. Improved predictive mapping of indoor radon concentrations using ensemble regression trees based on automatic clustering of geological units.

    PubMed

    Kropat, Georg; Bochud, Francois; Jaboyedoff, Michel; Laedermann, Jean-Pascal; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien

    2015-09-01

    According to estimations around 230 people die as a result of radon exposure in Switzerland. This public health concern makes reliable indoor radon prediction and mapping methods necessary in order to improve risk communication to the public. The aim of this study was to develop an automated method to classify lithological units according to their radon characteristics and to develop mapping and predictive tools in order to improve local radon prediction. About 240 000 indoor radon concentration (IRC) measurements in about 150 000 buildings were available for our analysis. The automated classification of lithological units was based on k-medoids clustering via pair-wise Kolmogorov distances between IRC distributions of lithological units. For IRC mapping and prediction we used random forests and Bayesian additive regression trees (BART). The automated classification groups lithological units well in terms of their IRC characteristics. Especially the IRC differences in metamorphic rocks like gneiss are well revealed by this method. The maps produced by random forests soundly represent the regional difference of IRCs in Switzerland and improve the spatial detail compared to existing approaches. We could explain 33% of the variations in IRC data with random forests. Additionally, the influence of a variable evaluated by random forests shows that building characteristics are less important predictors for IRCs than spatial/geological influences. BART could explain 29% of IRC variability and produced maps that indicate the prediction uncertainty. Ensemble regression trees are a powerful tool to model and understand the multidimensional influences on IRCs. Automatic clustering of lithological units complements this method by facilitating the interpretation of radon properties of rock types. This study provides an important element for radon risk communication. Future approaches should consider taking into account further variables like soil gas radon measurements as

  13. Ramping ensemble activity in dorsal anterior cingulate neurons during persistent commitment to a decision

    PubMed Central

    Hayden, Benjamin Y.

    2015-01-01

    We frequently need to commit to a choice to achieve our goals; however, the neural processes that keep us motivated in pursuit of delayed goals remain obscure. We examined ensemble responses of neurons in macaque dorsal anterior cingulate cortex (dACC), an area previously implicated in self-control and persistence, in a task that requires commitment to a choice to obtain a reward. After reward receipt, dACC neurons signaled reward amount with characteristic ensemble firing rate patterns; during the delay in anticipation of the reward, ensemble activity smoothly and gradually came to resemble the postreward pattern. On the subset of risky trials, in which a reward was anticipated with 50% certainty, ramping ensemble activity evolved to the pattern associated with the anticipated reward (and not with the anticipated loss) and then, on loss trials, took on an inverted form anticorrelated with the form associated with a win. These findings enrich our knowledge of reward processing in dACC and may have broader implications for our understanding of persistence and self-control. PMID:26334016

  14. [Prediction of regional soil quality based on mutual information theory integrated with decision tree algorithm].

    PubMed

    Lin, Fen-Fang; Wang, Ke; Yang, Ning; Yan, Shi-Guang; Zheng, Xin-Yu

    2012-02-01

    In this paper, some main factors such as soil type, land use pattern, lithology type, topography, road, and industry type that affect soil quality were used to precisely obtain the spatial distribution characteristics of regional soil quality, mutual information theory was adopted to select the main environmental factors, and decision tree algorithm See 5.0 was applied to predict the grade of regional soil quality. The main factors affecting regional soil quality were soil type, land use, lithology type, distance to town, distance to water area, altitude, distance to road, and distance to industrial land. The prediction accuracy of the decision tree model with the variables selected by mutual information was obviously higher than that of the model with all variables, and, for the former model, whether of decision tree or of decision rule, its prediction accuracy was all higher than 80%. Based on the continuous and categorical data, the method of mutual information theory integrated with decision tree could not only reduce the number of input parameters for decision tree algorithm, but also predict and assess regional soil quality effectively.

  15. Decision Support on the Sediments Flushing of Aimorés Dam Using Medium-Range Ensemble Forecasts

    NASA Astrophysics Data System (ADS)

    Mainardi Fan, Fernando; Schwanenberg, Dirk; Collischonn, Walter; Assis dos Reis, Alberto; Alvarado Montero, Rodolfo; Alencar Siqueira, Vinicius

    2015-04-01

    In the present study we investigate the use of medium-range streamflow forecasts in the Doce River basin (Brazil), at the reservoir of Aimorés Hydro Power Plant (HPP). During daily operations this reservoir acts as a "trap" to the sediments that originate from the upstream basin of the Doce River. This motivates a cleaning process called "pass through" to periodically remove the sediments from the reservoir. The "pass through" or "sediments flushing" process consists of a decrease of the reservoir's water level to a certain flushing level when a determined reservoir inflow threshold is forecasted. Then, the water in the approaching inflow is used to flush the sediments from the reservoir through the spillway and to recover the original reservoir storage. To be triggered, the sediments flushing operation requires an inflow larger than 3000m³/s in a forecast horizon of 7 days. This lead-time of 7 days is far beyond the basin's concentration time (around 2 days), meaning that the forecasts for the pass through procedure highly depends on Numerical Weather Predictions (NWP) models that generate Quantitative Precipitation Forecasts (QPF). This dependency creates an environment with a high amount of uncertainty to the operator. To support the decision making at Aimorés HPP we developed a fully operational hydrological forecasting system to the basin. The system is capable of generating ensemble streamflow forecasts scenarios when driven by QPF data from meteorological Ensemble Prediction Systems (EPS). This approach allows accounting for uncertainties in the NWP at a decision making level. This system is starting to be used operationally by CEMIG and is the one shown in the present study, including a hindcasting analysis to assess the performance of the system for the specific flushing problem. The QPF data used in the hindcasting study was derived from the TIGGE (THORPEX Interactive Grand Global Ensemble) database. Among all EPS available on TIGGE, three were

  16. Bootstrap aggregating of alternating decision trees to detect sets of SNPs that associate with disease.

    PubMed

    Guy, Richard T; Santago, Peter; Langefeld, Carl D

    2012-02-01

    Complex genetic disorders are a result of a combination of genetic and nongenetic factors, all potentially interacting. Machine learning methods hold the potential to identify multilocus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learning technique, offer a computationally low complexity algorithm capable of detecting associated sets of single nucleotide polymorphisms (SNPs) of arbitrary size, including modern genome-wide SNP scans. However, interpretation of the importance of an individual SNP within these trees can present challenges. We present a new decision tree algorithm denoted as Bagged Alternating Decision Trees (BADTrees) that is based on identifying common structural elements in a bootstrapped set of Alternating Decision Trees (ADTrees). The algorithm is order nk(2), where n is the number of SNPs considered and k is the number of SNPs in the tree constructed. Our simulation study suggests that BADTrees have higher power and lower type I error rates than ADTrees alone and comparable power with lower type I error rates compared to logistic regression. We illustrate the application of these data using simulated data as well as from the Lupus Large Association Study 1 (7,822 SNPs in 3,548 individuals). Our results suggest that BADTrees hold promise as a low computational order algorithm for detecting complex combinations of SNP and environmental factors associated with disease.

  17. Evaluation of Decision Trees for Cloud Detection from AVHRR Data

    DTIC Science & Technology

    2005-01-01

    Australia, July 9-13, pp. 1152-1154, 2001. [12] Hansen, M., Dubayah, R., and DeFries , R. Classification trees: an alternative to traditional land... DeFries , R.S. Detection of land cover changes using MODIS 250m data. Remote Sensing of Environment, Vol. 83, No. 1-2, pp. 336–350, 2002. [15

  18. Using decision trees to characterize verbal communication during change and stuck episodes in the therapeutic process

    PubMed Central

    Masías, Víctor H.; Krause, Mariane; Valdés, Nelson; Pérez, J. C.; Laengle, Sigifredo

    2015-01-01

    Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice. PMID:25914657

  19. [Comparison of Discriminant Analysis and Decision Trees for the Detection of Subclinical Keratoconus].

    PubMed

    Kleinhans, Sonja; Herrmann, Eva; Kohnen, Thomas; Bühren, Jens

    2017-08-15

    Background Iatrogenic keratectasia is one of the most dreaded complications of refractive surgery. In most cases, keratectasia develops after refractive surgery of eyes suffering from subclinical stages of keratoconus with few or no signs. Unfortunately, there has been no reliable procedure for the early detection of keratoconus. In this study, we used binary decision trees (recursive partitioning) to assess their suitability for discrimination between normal eyes and eyes with subclinical keratoconus. Patients and Methods The method of decision tree analysis was compared with discriminant analysis which has shown good results in previous studies. Input data were 32 eyes of 32 patients with newly diagnosed keratoconus in the contralateral eye and preoperative data of 10 eyes of 5 patients with keratectasia after laser in-situ keratomileusis (LASIK). The control group was made up of 245 normal eyes after LASIK and 12-month follow-up without any signs of iatrogenic keratectasia. Results Decision trees gave better accuracy and specificity than did discriminant analysis. The sensitivity of decision trees was lower than the sensitivity of discriminant analysis. Conclusion On the basis of the patient population of this study, decision trees did not prove to be superior to linear discriminant analysis for the detection of subclinical keratoconus. Georg Thieme Verlag KG Stuttgart · New York.

  20. Pruning a decision tree for selecting computer-related assistive devices for people with disabilities.

    PubMed

    Chi, Chia-Fen; Tseng, Li-Kai; Jang, Yuh

    2012-07-01

    Many disabled individuals lack extensive knowledge about assistive technology, which could help them use computers. In 1997, Denis Anson developed a decision tree of 49 evaluative questions designed to evaluate the functional capabilities of the disabled user and choose an appropriate combination of assistive devices, from a selection of 26, that enable the individual to use a computer. In general, occupational therapists guide the disabled users through this process. They often have to go over repetitive questions in order to find an appropriate device. A disabled user may require an alphanumeric entry device, a pointing device, an output device, a performance enhancement device, or some combination of these. Therefore, the current research eliminates redundant questions and divides Anson's decision tree into multiple independent subtrees to meet the actual demand of computer users with disabilities. The modified decision tree was tested by six disabled users to prove it can determine a complete set of assistive devices with a smaller number of evaluative questions. The means to insert new categories of computer-related assistive devices was included to ensure the decision tree can be expanded and updated. The current decision tree can help the disabled users and assistive technology practitioners to find appropriate computer-related assistive devices that meet with clients' individual needs in an efficient manner.

  1. Using decision trees to characterize verbal communication during change and stuck episodes in the therapeutic process.

    PubMed

    Masías, Víctor H; Krause, Mariane; Valdés, Nelson; Pérez, J C; Laengle, Sigifredo

    2015-01-01

    Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.

  2. Attracting Dynamics of Frontal Cortex Ensembles during Memory-Guided Decision-Making

    PubMed Central

    Seamans, Jeremy K.; Durstewitz, Daniel

    2011-01-01

    A common theoretical view is that attractor-like properties of neuronal dynamics underlie cognitive processing. However, although often proposed theoretically, direct experimental support for the convergence of neural activity to stable population patterns as a signature of attracting states has been sparse so far, especially in higher cortical areas. Combining state space reconstruction theorems and statistical learning techniques, we were able to resolve details of anterior cingulate cortex (ACC) multiple single-unit activity (MSUA) ensemble dynamics during a higher cognitive task which were not accessible previously. The approach worked by constructing high-dimensional state spaces from delays of the original single-unit firing rate variables and the interactions among them, which were then statistically analyzed using kernel methods. We observed cognitive-epoch-specific neural ensemble states in ACC which were stable across many trials (in the sense of being predictive) and depended on behavioral performance. More interestingly, attracting properties of these cognitively defined ensemble states became apparent in high-dimensional expansions of the MSUA spaces due to a proper unfolding of the neural activity flow, with properties common across different animals. These results therefore suggest that ACC networks may process different subcomponents of higher cognitive tasks by transiting among different attracting states. PMID:21625577

  3. Outsourcing the Portal: Another Branch in the Decision Tree.

    ERIC Educational Resources Information Center

    McMahon, Tim

    2000-01-01

    Discussion of the management of information resources in organizations focuses on the use of portal technologies to update intranet capabilities. Considers application outsourcing decisions, reviews benefits (including reducing costs) as well as concerns, and describes application service providers (ASPs). (LRW)

  4. Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS) and its application on modeling ligand functionality for 5HT-subtype GPCR families.

    PubMed

    Ma, Chao; Wang, Lirong; Xie, Xiang-Qun

    2011-03-28

    Advanced high-throughput screening (HTS) technologies generate great amounts of bioactivity data, and this data needs to be analyzed and interpreted with attention to understand how these small molecules affect biological systems. As such, there is an increasing demand to develop and adapt cheminformatics algorithms and tools in order to predict molecular and pharmacological properties on the basis of these large data sets. In this manuscript, we report a novel machine-learning-based ligand classification algorithm, named Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), for data-mining and modeling of large chemical data sets to predict pharmacological properties in an efficient and accurate manner. The performance of LiCABEDS was evaluated through predicting GPCR ligand functionality (agonist or antagonist) using four different molecular fingerprints, including Maccs, FP2, Unity, and Molprint 2D fingerprints. Our studies showed that LiCABEDS outperformed two other popular techniques, classification tree and Naive Bayes classifier, on all four types of molecular fingerprints. Parameters in LiCABEDS, including the number of boosting iterations, initialization condition, and a "reject option" boundary, were thoroughly explored and discussed to demonstrate the capability of handling imbalanced data sets, as well as its robustness and flexibility. In addition, the detailed mathematical concepts and theory are also given to address the principle behind statistical prediction models. The LiCABEDS algorithm has been implemented into a user-friendly software package that is accessible online at http://www.cbligand.org/LiCABEDS/ .

  5. Ensembl 2007.

    PubMed

    Hubbard, T J P; Aken, B L; Beal, K; Ballester, B; Caccamo, M; Chen, Y; Clarke, L; Coates, G; Cunningham, F; Cutts, T; Down, T; Dyer, S C; Fitzgerald, S; Fernandez-Banet, J; Graf, S; Haider, S; Hammond, M; Herrero, J; Holland, R; Howe, K; Howe, K; Johnson, N; Kahari, A; Keefe, D; Kokocinski, F; Kulesha, E; Lawson, D; Longden, I; Melsopp, C; Megy, K; Meidl, P; Ouverdin, B; Parker, A; Prlic, A; Rice, S; Rios, D; Schuster, M; Sealy, I; Severin, J; Slater, G; Smedley, D; Spudich, G; Trevanion, S; Vilella, A; Vogel, J; White, S; Wood, M; Cox, T; Curwen, V; Durbin, R; Fernandez-Suarez, X M; Flicek, P; Kasprzyk, A; Proctor, G; Searle, S; Smith, J; Ureta-Vidal, A; Birney, E

    2007-01-01

    The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of chordate genome sequences. Over the past year the number of genomes available from Ensembl has increased from 15 to 33, with the addition of sites for the mammalian genomes of elephant, rabbit, armadillo, tenrec, platypus, pig, cat, bush baby, common shrew, microbat and european hedgehog; the fish genomes of stickleback and medaka and the second example of the genomes of the sea squirt (Ciona savignyi) and the mosquito (Aedes aegypti). Some of the major features added during the year include the first complete gene sets for genomes with low-sequence coverage, the introduction of new strain variation data and the introduction of new orthology/paralog annotations based on gene trees.

  6. [Postmastectomy pain syndrome evidence based guidelines and decision trees].

    PubMed

    Labrèze, Laurent; Dixmérias-Iskandar, Florence; Monnin, Dominique; Bussières, Emmanuel; Delahaye, Evelyne; Bernard, Dominique; Lakdja, Fabrice

    2007-03-01

    A multidisciplinary expert group had reviewed all scientific data available of post mastectomy pain syndrome. Seventy six publications were retained and thirty evidence based diagnosis, treatment and follow-up recommendations are listed. Few of theses recommendations are classed level A. Datas analysis make possible to propose a strategy based on systematic association of drugs, kinesitherapy and psychological support. Evaluation and closer follow-up are necessary. Several decisional trees are proposed.

  7. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    PubMed

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity.

  8. Post-event human decision errors: operator action tree/time reliability correlation

    SciTech Connect

    Hall, R E; Fragola, J; Wreathall, J

    1982-11-01

    This report documents an interim framework for the quantification of the probability of errors of decision on the part of nuclear power plant operators after the initiation of an accident. The framework can easily be incorporated into an event tree/fault tree analysis. The method presented consists of a structure called the operator action tree and a time reliability correlation which assumes the time available for making a decision to be the dominating factor in situations requiring cognitive human response. This limited approach decreases the magnitude and complexity of the decision modeling task. Specifically, in the past, some human performance models have attempted prediction by trying to emulate sequences of human actions, or by identifying and modeling the information processing approach applicable to the task. The model developed here is directed at describing the statistical performance of a representative group of hypothetical individuals responding to generalized situations.

  9. Comparison of Taxi Time Prediction Performance Using Different Taxi Speed Decision Trees

    NASA Technical Reports Server (NTRS)

    Lee, Hanbong

    2017-01-01

    In the STBO modeler and tactical surface scheduler for ATD-2 project, taxi speed decision trees are used to calculate the unimpeded taxi times of flights taxiing on the airport surface. The initial taxi speed values in these decision trees did not show good prediction accuracy of taxi times. Using the more recent, reliable surveillance data, new taxi speed values in ramp area and movement area were computed. Before integrating these values into the STBO system, we performed test runs using live data from Charlotte airport, with different taxi speed settings: 1) initial taxi speed values and 2) new ones. Taxi time prediction performance was evaluated by comparing various metrics. The results show that the new taxi speed decision trees can calculate the unimpeded taxi-out times more accurately.

  10. A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem

    PubMed Central

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  11. [Analysis of the characteristics of the older adults with depression using data mining decision tree analysis].

    PubMed

    Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi

    2013-02-01

    The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.

  12. Image Change Detection via Ensemble Learning

    SciTech Connect

    Martin, Benjamin W; Vatsavai, Raju

    2013-01-01

    The concept of geographic change detection is relevant in many areas. Changes in geography can reveal much information about a particular location. For example, analysis of changes in geography can identify regions of population growth, change in land use, and potential environmental disturbance. A common way to perform change detection is to use a simple method such as differencing to detect regions of change. Though these techniques are simple, often the application of these techniques is very limited. Recently, use of machine learning methods such as neural networks for change detection has been explored with great success. In this work, we explore the use of ensemble learning methodologies for detecting changes in bitemporal synthetic aperture radar (SAR) images. Ensemble learning uses a collection of weak machine learning classifiers to create a stronger classifier which has higher accuracy than the individual classifiers in the ensemble. The strength of the ensemble lies in the fact that the individual classifiers in the ensemble create a mixture of experts in which the final classification made by the ensemble classifier is calculated from the outputs of the individual classifiers. Our methodology leverages this aspect of ensemble learning by training collections of weak decision tree based classifiers to identify regions of change in SAR images collected of a region in the Staten Island, New York area during Hurricane Sandy. Preliminary studies show that the ensemble method has approximately 11.5% higher change detection accuracy than an individual classifier.

  13. Image change detection via ensemble learning

    NASA Astrophysics Data System (ADS)

    Martin, Benjamin W.; Vatsavai, Ranga R.

    2013-05-01

    The concept of geographic change detection is relevant in many areas. Changes in geography can reveal much information about a particular location. For example, analysis of changes in geography can identify regions of population growth, change in land use, and potential environmental disturbance. A common way to perform change detection is to use a simple method such as differencing to detect regions of change. Though these techniques are simple, often the application of these techniques is very limited. Recently, use of machine learning methods such as neural networks for change detection has been explored with great success. In this work, we explore the use of ensemble learning methodologies for detecting changes in bitemporal synthetic aperture radar (SAR) images. Ensemble learning uses a collection of weak machine learning classifiers to create a stronger classifier which has higher accuracy than the individual classifiers in the ensemble. The strength of the ensemble lies in the fact that the individual classifiers in the ensemble create a "mixture of experts" in which the final classification made by the ensemble classifier is calculated from the outputs of the individual classifiers. Our methodology leverages this aspect of ensemble learning by training collections of weak decision tree based classifiers to identify regions of change in SAR images collected of a region in the Staten Island, New York area during Hurricane Sandy. Preliminary studies show that the ensemble method has approximately 11.5% higher change detection accuracy than an individual classifier.

  14. Vlsi implementation of flexible architecture for decision tree classification in data mining

    NASA Astrophysics Data System (ADS)

    Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

    2017-07-01

    The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.

  15. Dynamics of Cortical Neuronal Ensembles Transit from Decision Making to Storage for Later Report

    PubMed Central

    Ponce-Alvarez, Adrián; Nácher, Verónica; Luna, Rogelio; Riehle, Alexa

    2012-01-01

    Decisions based on sensory evaluation during single trials may depend on the collective activity of neurons distributed across brain circuits. Previous studies have deepened our understanding of how the activity of individual neurons relates to the formation of a decision and its storage for later report. However, little is known about how decision-making and decision maintenance processes evolve in single trials. We addressed this problem by studying the activity of simultaneously recorded neurons from different somatosensory and frontal lobe cortices of monkeys performing a vibrotactile discrimination task. We used the hidden Markov model to describe the spatiotemporal pattern of activity in single trials as a sequence of firing rate states. We show that the animal's decision was reliably maintained in frontal lobe activity through a selective state sequence, initiated by an abrupt state transition, during which many neurons changed their activity in a concomitant way, and for which both latency and variability depended on task difficulty. Indeed, transitions were more delayed and more variable for difficult trials compared with easy trials. In contrast, state sequences in somatosensory cortices were weakly decision related, had less variable transitions, and were not affected by the difficulty of the task. In summary, our results suggest that the decision process and its subsequent maintenance are dynamically linked by a cascade of transient events in frontal lobe cortices. PMID:22933781

  16. Dynamics of cortical neuronal ensembles transit from decision making to storage for later report.

    PubMed

    Ponce-Alvarez, Adrián; Nácher, Verónica; Luna, Rogelio; Riehle, Alexa; Romo, Ranulfo

    2012-08-29

    Decisions based on sensory evaluation during single trials may depend on the collective activity of neurons distributed across brain circuits. Previous studies have deepened our understanding of how the activity of individual neurons relates to the formation of a decision and its storage for later report. However, little is known about how decision-making and decision maintenance processes evolve in single trials. We addressed this problem by studying the activity of simultaneously recorded neurons from different somatosensory and frontal lobe cortices of monkeys performing a vibrotactile discrimination task. We used the hidden Markov model to describe the spatiotemporal pattern of activity in single trials as a sequence of firing rate states. We show that the animal's decision was reliably maintained in frontal lobe activity through a selective state sequence, initiated by an abrupt state transition, during which many neurons changed their activity in a concomitant way, and for which both latency and variability depended on task difficulty. Indeed, transitions were more delayed and more variable for difficult trials compared with easy trials. In contrast, state sequences in somatosensory cortices were weakly decision related, had less variable transitions, and were not affected by the difficulty of the task. In summary, our results suggest that the decision process and its subsequent maintenance are dynamically linked by a cascade of transient events in frontal lobe cortices.

  17. ROSE: decision trees, automatic learning and their applications in cardiac medicine.

    PubMed

    Zavrsnik, J; Kokol, P; Malèiae, I; Kancler, K; Mernik, M; Bigec, M

    1995-01-01

    Computerized information systems, especially decision support systems, have acquired an increasingly important role in medical applications, particularly in those where important decisions must be made effectively and reliably. But the possibility of using computers in medical decision making is limited by many difficulties, including the complexity of conventional computer languages, methodologies, and tools. Thus a conceptual simple decision making model with the possibility of automating learning should be used. In this paper, we introduce a cardiological knowledge-based system based on the decision tree approach supporting the mitral valve prolapse determination. Prolapse is defined as the displacement of a bodily part from its normal position. The term mitral valve prolapse (PMV), therefore, implies that the mitral leaflets are displaced relative to some structure, generally taken to be the mitral annulus. The implications of the PMV are: disturbed normal laminar blood flow, turbulence of the blood flow, injury of the chordae tendinae, the possibility of thrombus's composition, bacterial endocarditis, and, finally, hemodynamic changes defined as mitral insufficiency and mitral regurgitation. Uncertainty persists about how it should be diagnosed and about its clinical importance. It is our deep belief that the echocardiography enables properly trained expert armed with proper criteria to evaluate PMV almost 100%. But, unfortunately, there are some problems concerned with the use of echocardiography. With this in mind, we have decided to start a research project aimed at finding new criteria and enabling the general practitioner to evaluate the PMV using conventional methods and to select potential patients from the general population. To empower doctors to perform needed activities, we have developed a computer tool called ROSE (computeRized prOlaps Syndrome dEtermination) based on algorithms of automatic learning. This tool supports the definition of new

  18. A Com-Gis Based Decision Tree Model Inagricultural Application

    NASA Astrophysics Data System (ADS)

    Cheng, Wei; Wang, Ke; Zhang, Xiuying

    The problem of agricultural soil pollution by heavy metals has been receiving an increasing attention in the last few decades. Geostatistics module in ArcGIS, could not however efficiently simulate the spatial distribution of heavy metals with satisfied accuracy when the spatial autocorrelation of the study area severely destroyed by human activities. In this study, the classificationand regression tree (CART) has been integrated into ArcGIS using ArcObjects and Visual Basic for Application (VBA) to predict the spatial distribution of soil heavy metals contents in the area severely polluted. This is a great improvement comparing with ordinary Kriging method in ArcGIS. The integrated approach allows for relatively easy, fast, and cost-effective estimation of spatially distributed soil heavy metals pollution.

  19. A Com-Gis Based Decision Tree Model Inagricultural Application

    NASA Astrophysics Data System (ADS)

    Cheng, Wei; Wang, Ke; Zhang, Xiuying

    The problem of agricultural soil pollution by heavy metals has been receiving an increasing attention in the last few decades. Geostatistics module in ArcGIS, could not however efficiently simulate the spatial distribution of heavy metals with satisfied accuracy when the spatial autocorrelation of the study area severely destroyed by human activities. In this study, the classificationand regression tree (CART) has been integrated into ArcGIS using ArcObjects and Visual Basic for Application (VBA) to predict the spatial distribution of soil heavy metals contents in the area severely polluted. This is a great improvement comparing with ordinary Kriging method in ArcGIS. The integrated approach allows for relatively easy, fast, and cost-effective estimation of spatially distributed soil heavy metals pollution.

  20. Predicting the distribution of out-of-reach biotopes with decision trees in a Swedish marine protected area.

    PubMed

    Gonzalez-Mirelis, Genoveva; Lindegarth, Mats

    2012-12-01

    Through spatially explicit predictive models, knowledge of spatial patterns of biota can be generated for out-of-reach environments, where there is a paucity of survey data. This knowledge is invaluable for conservation decisions. We used distribution modeling to predict the occurrence of benthic biotopes, or megafaunal communities of the seabed, to support the spatial planning of a marine national park. Nine biotope classes were obtained prior to modeling from multivariate species data derived from point source, underwater imagery. Five map layers relating to depth and terrain were used as predictor variables. Biotope type was predicted on a pixel-by-pixel basis, where pixel size was 15 x 15 m and total modeled area was 455 km2. To choose a suitable modeling technique we compared the performance of five common models based on recursive partitioning: two types of classification and regression trees ([1] pruned by 10-fold cross-validation and [2] pruned by minimizing complexity), random forests, conditional inference (CI) trees, and CI forests. The selected model was a CI forest (an ensemble of CI trees), a machine-learning technique whose discriminatory power (class-by-class area under the curve [AUC] ranged from 0.75 to 0.86) and classification accuracy (72%) surpassed those of the other methods tested. Conditional inference trees are virtually new to the field of ecology. The final model's overall prediction error was 28%. Model predictions were also checked against a custom-built measure of dubiousness, calculated at the polygon level. Key factors other than the choice of modeling technique include: the use of a multinomial response, accounting for the heterogeneity of observations, and spatial autocorrelation. To illustrate how the model results can be implemented in spatial planning, representation of biodiversity in the national park was described and quantified. Given a goal of maximizing classification accuracy, we conclude that conditional inference trees

  1. A decision tree approach using silvics to guide planning for forest restoration

    Treesearch

    Sharon M. Hermann; John S. Kush; John C. Gilbert

    2013-01-01

    We created a decision tree based on silvics of longleaf pine (Pinus palustris) and historical descriptions to develop approaches for restoration management at Horseshoe Bend National Military Park located in central Alabama. A National Park Service goal is to promote structure and composition of a forest that likely surrounded the 1814 battlefield....

  2. A multivariate decision tree analysis of biophysical factors in tropical forest fire occurrence

    Treesearch

    Rey S. Ofren; Edward Harvey

    2000-01-01

    A multivariate decision tree model was used to quantify the relative importance of complex hierarchical relationships between biophysical variables and the occurrence of tropical forest fires. The study site is the Huai Kha Kbaeng wildlife sanctuary, a World Heritage Site in northwestern Thailand where annual fires are common and particularly destructive. Thematic...

  3. Multi-Dimensional Inference and Confidential Data Protection with Decision Tree Methods

    DTIC Science & Technology

    2002-01-01

    critical information technologies today. The pressing demand for such a protection technique is partly due to the trend of information sharing between insti...Dimensional Inference and Confidential Data Protection with Decision Tree Methods 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6

  4. The Americans with Disabilities Act: A Decision Tree for Social Services Administrators

    ERIC Educational Resources Information Center

    O'Brien, Gerald V.; Ellegood, Christina

    2005-01-01

    The 1990 Americans with Disabilities Act has had a profound influence on social workers and social services administrators in virtually all work settings. Because of the multiple elements of the act, however, assessing the validity of claims can be a somewhat arduous and complicated task. This article provides a "decision tree" for…

  5. Test Reviews: Euler, B. L. (2007). "Emotional Disturbance Decision Tree". Lutz, FL: Psychological Assessment Resources

    ERIC Educational Resources Information Center

    Tansy, Michael

    2009-01-01

    The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…

  6. What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    2004-01-01

    To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…

  7. Test Reviews: Euler, B. L. (2007). "Emotional Disturbance Decision Tree". Lutz, FL: Psychological Assessment Resources

    ERIC Educational Resources Information Center

    Tansy, Michael

    2009-01-01

    The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…

  8. Ultrasonographic Diagnosis of Biliary Atresia Based on a Decision-Making Tree Model

    PubMed Central

    Lee, So Mi; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung

    2015-01-01

    Objective To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). Materials and Methods From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Results Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Conclusion Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology. PMID:26576128

  9. Predicting metabolic syndrome using decision tree and support vector machine methods

    PubMed Central

    Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh

    2016-01-01

    BACKGROUND Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. METHODS This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. RESULTS SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. CONCLUSION The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in

  10. Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.

    PubMed

    Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G

    2017-09-01

    To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.

  11. A Study of Factors that Influence First-Year Nonmusic Majors' Decisions to Participate in Music Ensembles at Small Liberal Arts Colleges in Indiana

    ERIC Educational Resources Information Center

    Faber, Ardis R.

    2010-01-01

    The purpose of this study was to investigate factors that influence first-year nonmusic majors' decisions regarding participation in music ensembles at small liberal arts colleges in Indiana. A survey questionnaire was used to gather data. The data collected was analyzed to determine significant differences between the nonmusic majors who have…

  12. A Study of Factors that Influence First-Year Nonmusic Majors' Decisions to Participate in Music Ensembles at Small Liberal Arts Colleges in Indiana

    ERIC Educational Resources Information Center

    Faber, Ardis R.

    2010-01-01

    The purpose of this study was to investigate factors that influence first-year nonmusic majors' decisions regarding participation in music ensembles at small liberal arts colleges in Indiana. A survey questionnaire was used to gather data. The data collected was analyzed to determine significant differences between the nonmusic majors who have…

  13. The Americans with Disabilities Act: a decision tree for social services administrators.

    PubMed

    O'Brien, Gerald V; Ellegood, Christina

    2005-07-01

    The 1990 Americans with Disabilities Act has had a profound influence on social workers and social services administrators in virtually all work settings. Because of the multiple elements of the act, however, assessing the validity of claims can be a somewhat arduous and complicated task. This article provides a "decision tree" for administrators to assist with the evaluation of claims. This decision tree allows people who are considering the validity of an ADA claim to break the decision-making process into discrete steps that can be considered separately and sequentially. These steps include employee and disability status, employer knowledge of the disability, employee qualification for the job, the provision of accommodations, the adverse actions that may be included in a claim, valid employer rationales for adverse action, and the procedural elements required for a successful ADA claim. Issues that are important in each step are discussed.

  14. An expert-guided decision tree construction strategy: an application in knowledge discovery with medical databases.

    PubMed Central

    Tsai, Y. S.; King, P. H.; Higgins, M. S.; Pierce, D.; Patel, N. P.

    1997-01-01

    With the steady growth in electronic patient records and clinical medical informatics systems, the data collected for routine clinical use have been accumulating at a dramatic rate. Inter-disciplinary research provides a new generation of computation tools in knowledge discovery and data management is in great demand. In this study, an expert-guided decision tree construction strategy is proposed to offer an user-oriented knowledge discovery environment. The strategy allows experts, based on their expertise and/or preference, to override inductive decision tree construction process. Moreover, by reviewing decision paths, experts could focus on subsets of data that may be clues to new findings, or simply contaminated cases. PMID:9357618

  15. Minimizing the cost of translocation failure with decision-tree models that predict species' behavioral response in translocation sites.

    PubMed

    Ebrahimi, Mehregan; Ebrahimie, Esmaeil; Bull, C Michael

    2015-08-01

    The high number of failures is one reason why translocation is often not recommended. Considering how behavior changes during translocations may improve translocation success. To derive decision-tree models for species' translocation, we used data on the short-term responses of an endangered Australian skink in 5 simulated translocations with different release conditions. We used 4 different decision-tree algorithms (decision tree, decision-tree parallel, decision stump, and random forest) with 4 different criteria (gain ratio, information gain, gini index, and accuracy) to investigate how environmental and behavioral parameters may affect the success of a translocation. We assumed behavioral changes that increased dispersal away from a release site would reduce translocation success. The trees became more complex when we included all behavioral parameters as attributes, but these trees yielded more detailed information about why and how dispersal occurred. According to these complex trees, there were positive associations between some behavioral parameters, such as fight and dispersal, that showed there was a higher chance, for example, of dispersal among lizards that fought than among those that did not fight. Decision trees based on parameters related to release conditions were easier to understand and could be used by managers to make translocation decisions under different circumstances.

  16. Application of decision tree algorithm for identification of rock forming minerals using energy dispersive spectrometry

    NASA Astrophysics Data System (ADS)

    Akkaş, Efe; Çubukçu, H. Evren; Artuner, Harun

    2014-05-01

    Rapid and automated mineral identification is compulsory in certain applications concerning natural rocks. Among all microscopic and spectrometric methods, energy dispersive X-ray spectrometers (EDS) integrated with scanning electron microscopes produce rapid information with reliable chemical data. Although obtaining elemental data with EDS analyses is fast and easy by the help of improving technology, it is rather challenging to perform accurate and rapid identification considering the large quantity of minerals in a rock sample with varying dimensions ranging between nanometer to centimeter. Furthermore, the physical properties of the specimen (roughness, thickness, electrical conductivity, position in the instrument etc.) and the incident electron beam (accelerating voltage, beam current, spot size etc.) control the produced characteristic X-ray, which in turn affect the elemental analyses. In order to minimize the effects of these physical constraints and develop an automated mineral identification system, a rule induction paradigm has been applied to energy dispersive spectral data. Decision tree classifiers divide training data sets into subclasses using generated rules or decisions and thereby it produces classification or recognition associated with these data sets. A number of thinsections prepared from rock samples with suitable mineralogy have been investigated and a preliminary 12 distinct mineral groups (olivine, orthopyroxene, clinopyroxene, apatite, amphibole, plagioclase, K- feldspar, zircon, magnetite, titanomagnetite, biotite, quartz), comprised mostly of silicates and oxides, have been selected. Energy dispersive spectral data for each group, consisting of 240 reference and 200 test analyses, have been acquired under various, non-standard, physical and electrical conditions. The reference X-Ray data have been used to assign the spectral distribution of elements to the specified mineral groups. Consequently, the test data have been analyzed using

  17. Tools of the Future: How Decision Tree Analysis Will Impact Mission Planning

    NASA Technical Reports Server (NTRS)

    Otterstatter, Matthew R.

    2005-01-01

    The universe is infinitely complex; however, the human mind has a finite capacity. The multitude of possible variables, metrics, and procedures in mission planning are far too many to address exhaustively. This is unfortunate because, in general, considering more possibilities leads to more accurate and more powerful results. To compensate, we can get more insightful results by employing our greatest tool, the computer. The power of the computer will be utilized through a technology that considers every possibility, decision tree analysis. Although decision trees have been used in many other fields, this is innovative for space mission planning. Because this is a new strategy, no existing software is able to completely accommodate all of the requirements. This was determined through extensive research and testing of current technologies. It was necessary to create original software, for which a short-term model was finished this summer. The model was built into Microsoft Excel to take advantage of the familiar graphical interface for user input, computation, and viewing output. Macros were written to automate the process of tree construction, optimization, and presentation. The results are useful and promising. If this tool is successfully implemented in mission planning, our reliance on old-fashioned heuristics, an error-prone shortcut for handling complexity, will be reduced. The computer algorithms involved in decision trees will revolutionize mission planning. The planning will be faster and smarter, leading to optimized missions with the potential for more valuable data.

  18. Tools of the Future: How Decision Tree Analysis Will Impact Mission Planning

    NASA Technical Reports Server (NTRS)

    Otterstatter, Matthew R.

    2005-01-01

    The universe is infinitely complex; however, the human mind has a finite capacity. The multitude of possible variables, metrics, and procedures in mission planning are far too many to address exhaustively. This is unfortunate because, in general, considering more possibilities leads to more accurate and more powerful results. To compensate, we can get more insightful results by employing our greatest tool, the computer. The power of the computer will be utilized through a technology that considers every possibility, decision tree analysis. Although decision trees have been used in many other fields, this is innovative for space mission planning. Because this is a new strategy, no existing software is able to completely accommodate all of the requirements. This was determined through extensive research and testing of current technologies. It was necessary to create original software, for which a short-term model was finished this summer. The model was built into Microsoft Excel to take advantage of the familiar graphical interface for user input, computation, and viewing output. Macros were written to automate the process of tree construction, optimization, and presentation. The results are useful and promising. If this tool is successfully implemented in mission planning, our reliance on old-fashioned heuristics, an error-prone shortcut for handling complexity, will be reduced. The computer algorithms involved in decision trees will revolutionize mission planning. The planning will be faster and smarter, leading to optimized missions with the potential for more valuable data.

  19. Inductive Decision Tree Analysis of the Validity Rank of Construction Parameters of Innovative Gear Pump after Tooth Root Undercutting

    NASA Astrophysics Data System (ADS)

    Deptuła, A.; Partyka, M. A.

    2017-02-01

    The article presents an innovative use of inductive algorithm for generating the decision tree for an analysis of the rank validity parameters of construction and maintenance of the gear pump with undercut tooth. It is preventet an alternative way of generating sets of decisions and determining the hierarchy of decision variables to existing the methods of discrete optimization.

  20. Generation of Gridded Daily Weather Ensembles for Decision Support in the Argentine Pampas

    NASA Astrophysics Data System (ADS)

    Verdin, A.; Rajagopalan, B.; Kleiber, W.; Katz, R. W.; Podesta, G. P.

    2014-12-01

    We introduce a stochastic weather generator for the variables of minimum temperature, maximum temperature, and precipitation occurrence. Temperature variables are modeled in vector autoregressive framework, conditional on precipitation occurrence. Precipitation occurrence arises via a probit model, and both temperature and occurrence are spatially correlatedusing spatial Gaussian processes. Additionally, local climate is included by spatially-varying model coefficients, allowing spatially-evolving relationships between variables. The method is illustrated on a network of stations in the Pampas region of Argentina where nonstationary relationships and historical spatial correlation challenge existing approaches. The covariancestructure of this network of stations is then used to produce daily gridded weather scenarios which can be used to drive hydrologic models. Inclusion of other covariates such as seasonal total precipitation and global climate drivers allows the potential for decadal projections, an increasingly useful tool for decision support.

  1. Three-dimensional object recognition using similar triangles and decision trees

    NASA Technical Reports Server (NTRS)

    Spirkovska, Lilly

    1993-01-01

    A system, TRIDEC, that is capable of distinguishing between a set of objects despite changes in the objects' positions in the input field, their size, or their rotational orientation in 3D space is described. TRIDEC combines very simple yet effective features with the classification capabilities of inductive decision tree methods. The feature vector is a list of all similar triangles defined by connecting all combinations of three pixels in a coarse coded 127 x 127 pixel input field. The classification is accomplished by building a decision tree using the information provided from a limited number of translated, scaled, and rotated samples. Simulation results are presented which show that TRIDEC achieves 94 percent recognition accuracy in the 2D invariant object recognition domain and 98 percent recognition accuracy in the 3D invariant object recognition domain after training on only a small sample of transformed views of the objects.

  2. An Algorithm for Anticipating Future Decision Trees from Concept-Drifting Data

    NASA Astrophysics Data System (ADS)

    Böttcher, Mirko; Spott, Martin; Kruse, Rudolf

    Concept-Drift is an important topic in practical data mining, since it is reality in most business applications. Whenever a mining model is used in an application it is already outdated since the world has changed since the model induction. The solution is to predict the drift of a model and derive a future model based on such a prediction. One way would be to simulate future data and derive a model from it, but this is typically not feasible. Instead we suggest to predict the values of the measures that drive model induction. In particular, we propose to predict the future values of attribute selection measures and class label distribution for the induction of decision trees. We give an example of how concept drift is reflected in the trend of these measures and that the resulting decision trees perform considerably better than the ones produced by existing approaches.

  3. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach.

    PubMed

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members.

  4. Data mining with decision trees for diagnosis of breast tumor in medical ultrasonic images.

    PubMed

    Kuo, W J; Chang, R F; Chen, D R; Lee, C C

    2001-03-01

    To increase the ability of ultrasonographic (US) technology for the differential diagnosis of solid breast tumors, we describe a novel computer-aided diagnosis (CADx) system using data mining with decision tree for classification of breast tumor to increase the levels of diagnostic confidence and to provide the immediate second opinion for physicians. Cooperating with the texture information extracted from the region of interest (ROI) image, a decision tree model generated from the training data in a top-down, general-to-specific direction with 24 co-variance texture features is used to classify the tumors as benign or malignant. In the experiments, accuracy rates for a experienced physician and the proposed CADx are 86.67% (78/90) and 95.50% (86/90), respectively.

  5. Circum-Arctic petroleum systems identified using decision-tree chemometrics

    USGS Publications Warehouse

    Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.

    2007-01-01

    Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.

  6. Three-dimensional object recognition using similar triangles and decision trees

    NASA Technical Reports Server (NTRS)

    Spirkovska, Lilly

    1993-01-01

    A system, TRIDEC, that is capable of distinguishing between a set of objects despite changes in the objects' positions in the input field, their size, or their rotational orientation in 3D space is described. TRIDEC combines very simple yet effective features with the classification capabilities of inductive decision tree methods. The feature vector is a list of all similar triangles defined by connecting all combinations of three pixels in a coarse coded 127 x 127 pixel input field. The classification is accomplished by building a decision tree using the information provided from a limited number of translated, scaled, and rotated samples. Simulation results are presented which show that TRIDEC achieves 94 percent recognition accuracy in the 2D invariant object recognition domain and 98 percent recognition accuracy in the 3D invariant object recognition domain after training on only a small sample of transformed views of the objects.

  7. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach

    PubMed Central

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  8. The bone-grafting decision tree: a systematic methodology for achieving new bone.

    PubMed

    Smiler, Dennis; Soltan, Muna

    2006-06-01

    Successful bone grafting requires that the clinician select the optimal bone grafting material and surgical technique from among a number of alternatives. This article reviews the biology of bone growth and repair, and presents a decision-making protocol in which the clinician first evaluates the bone quality at the surgical site to determine which graft material should be used. Bone quantity is then evaluated to determine the optimal surgical technique. Choices among graft stabilization techniques are also reviewed, and cases are presented to illustrate the use of this decision tree.

  9. An application of contingent valuation and decision tree analysis to water quality improvements.

    PubMed

    Atkins, Jonathan P; Burdon, Daryl; Allen, James H

    2007-01-01

    This paper applies contingent valuation and decision tree analysis to investigate public preferences for water quality improvements, and in particular reduced eutrophication. Such preferences are important given that the development of EU water quality legislation is imposing significant costs on European economies. Results are reported of a survey undertaken of residents of Arhus County, Denmark for water quality improvements in the Randers Fjord. Results demonstrate strong public support for reduced eutrophication and identify key determinants of such support.

  10. Office of Legacy Management Decision Tree for Solar Photovoltaic Projects - 13317

    SciTech Connect

    Elmer, John; Butherus, Michael; Barr, Deborah L.

    2013-07-01

    To support consideration of renewable energy power development as a land reuse option, the DOE Office of Legacy Management (LM) and the National Renewable Energy Laboratory (NREL) established a partnership to conduct an assessment of wind and solar renewable energy resources on LM lands. From a solar capacity perspective, the larger sites in the western United States present opportunities for constructing solar photovoltaic (PV) projects. A detailed analysis and preliminary plan was developed for three large sites in New Mexico, assessing the costs, the conceptual layout of a PV system, and the electric utility interconnection process. As a result of the study, a 1,214-hectare (3,000-acre) site near Grants, New Mexico, was chosen for further study. The state incentives, utility connection process, and transmission line capacity were key factors in assessing the feasibility of the project. LM's Durango, Colorado, Disposal Site was also chosen for consideration because the uranium mill tailings disposal cell is on a hillside facing south, transmission lines cross the property, and the community was very supportive of the project. LM worked with the regulators to demonstrate that the disposal cell's long-term performance would not be impacted by the installation of a PV solar system. A number of LM-unique issues were resolved in making the site available for a private party to lease a portion of the site for a solar PV project. A lease was awarded in September 2012. Using a solar decision tree that was developed and launched by the EPA and NREL, LM has modified and expanded the decision tree structure to address the unique aspects and challenges faced by LM on its multiple sites. The LM solar decision tree covers factors such as land ownership, usable acreage, financial viability of the project, stakeholder involvement, and transmission line capacity. As additional sites are transferred to LM in the future, the decision tree will assist in determining whether a solar

  11. Data mining for multiagent rules, strategies, and fuzzy decision tree structure

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Rhyne, Robert D., II; Fisher, Kristin

    2002-03-01

    A fuzzy logic based resource manager (RM) has been developed that automatically allocates electronic attack resources in real-time over many dissimilar platforms. Two different data mining algorithms have been developed to determine rules, strategies, and fuzzy decision tree structure. The first data mining algorithm uses a genetic algorithm as a data mining function and is called from an electronic game. The game allows a human expert to play against the resource manager in a simulated battlespace with each of the defending platforms being exclusively directed by the fuzzy resource manager and the attacking platforms being controlled by the human expert or operating autonomously under their own logic. This approach automates the data mining problem. The game automatically creates a database reflecting the domain expert's knowledge. It calls a data mining function, a genetic algorithm, for data mining of the database as required and allows easy evaluation of the information mined in the second step. The criterion for re- optimization is discussed as well as experimental results. Then a second data mining algorithm that uses a genetic program as a data mining function is introduced to automatically discover fuzzy decision tree structures. Finally, a fuzzy decision tree generated through this process is discussed.

  12. Building Decision Trees for Characteristic Ellipsoid Method to Monitor Power System Transient Behaviors

    SciTech Connect

    Ma, Jian; Diao, Ruisheng; Makarov, Yuri V.; Etingov, Pavel V.; Zhou, Ning; Dagle, Jeffery E.

    2010-12-01

    The characteristic ellipsoid is a new method to monitor the dynamics of power systems. Decision trees (DTs) play an important role in applying the characteristic ellipsoid method to system operation and analysis. This paper presents the idea and initial results of building DTs for detecting transient dynamic events using the characteristic ellipsoid method. The objective is to determine fault types, fault locations and clearance time in the system using decision trees based on ellipsoids of system transient responses. The New England 10-machine 39-bus system is used for running dynamic simulations to generate a sufficiently large number of transient events in different system configurations. Comprehensive transient simulations considering three fault types, two fault clearance times and different fault locations were conducted in the study. Bus voltage magnitudes and monitored reactive and active power flows are recorded as the phasor measurements to calculate characteristic ellipsoids whose volume, eccentricity, center and projection of the longest axis are used as indices to build decision trees. The DT performances are tested and compared by considering different sets of PMU locations. The proposed method demonstrates that the characteristic ellipsoid method is a very efficient and promising tool to monitor power system dynamic behaviors.

  13. MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination

    NASA Technical Reports Server (NTRS)

    Riggs, George A.; Hall, Dorothy K.

    2010-01-01

    Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.

  14. Decision Trees for Continuous Data and Conditional Mutual Information as a Criterion for Splitting Instances.

    PubMed

    Drakakis, Georgios; Moledina, Saadiq; Chomenidis, Charalampos; Doganis, Philip; Sarimveis, Haralambos

    2016-01-01

    Decision trees are renowned in the computational chemistry and machine learning communities for their interpretability. Their capacity and usage are somewhat limited by the fact that they normally work on categorical data. Improvements to known decision tree algorithms are usually carried out by increasing and tweaking parameters, as well as the post-processing of the class assignment. In this work we attempted to tackle both these issues. Firstly, conditional mutual information was used as the criterion for selecting the attribute on which to split instances. The algorithm performance was compared with the results of C4.5 (WEKA's J48) using default parameters and no restrictions. Two datasets were used for this purpose, DrugBank compounds for HRH1 binding prediction and Traditional Chinese Medicine formulation predicted bioactivities for therapeutic class annotation. Secondly, an automated binning method for continuous data was evaluated, namely Scott's normal reference rule, in order to allow any decision tree to easily handle continuous data. This was applied to all approved drugs in DrugBank for predicting the RDKit SLogP property, using the remaining RDKit physicochemical attributes as input.

  15. Optimization of matrix tablets controlled drug release using Elman dynamic neural networks and decision trees.

    PubMed

    Petrović, Jelena; Ibrić, Svetlana; Betz, Gabriele; Đurić, Zorica

    2012-05-30

    The main objective of the study was to develop artificial intelligence methods for optimization of drug release from matrix tablets regardless of the matrix type. Static and dynamic artificial neural networks of the same topology were developed to model dissolution profiles of different matrix tablets types (hydrophilic/lipid) using formulation composition, compression force used for tableting and tablets porosity and tensile strength as input data. Potential application of decision trees in discovering knowledge from experimental data was also investigated. Polyethylene oxide polymer and glyceryl palmitostearate were used as matrix forming materials for hydrophilic and lipid matrix tablets, respectively whereas selected model drugs were diclofenac sodium and caffeine. Matrix tablets were prepared by direct compression method and tested for in vitro dissolution profiles. Optimization of static and dynamic neural networks used for modeling of drug release was performed using Monte Carlo simulations or genetic algorithms optimizer. Decision trees were constructed following discretization of data. Calculated difference (f(1)) and similarity (f(2)) factors for predicted and experimentally obtained dissolution profiles of test matrix tablets formulations indicate that Elman dynamic neural networks as well as decision trees are capable of accurate predictions of both hydrophilic and lipid matrix tablets dissolution profiles. Elman neural networks were compared to most frequently used static network, Multi-layered perceptron, and superiority of Elman networks have been demonstrated. Developed methods allow simple, yet very precise way of drug release predictions for both hydrophilic and lipid matrix tablets having controlled drug release. Copyright © 2012 Elsevier B.V. All rights reserved.

  16. Automatic rule learning using decision tree for fuzzy classifier in fault diagnosis of roller bearing

    NASA Astrophysics Data System (ADS)

    Sugumaran, V.; Ramachandran, K. I.

    2007-07-01

    Roller bearing is one of the most widely used elements in rotary machines. Condition monitoring of such elements is conceived as pattern recognition problem. Pattern recognition has two main phases: feature extraction and feature classification. Statistical features like minimum value, standard error and kurtosis, etc. are widely used as features in fault diagnostics. These features are extracted from vibration signals. A rule set is formed from the extracted features and input to a fuzzy classifier. The rule set necessary for building the fuzzy classifier is obtained largely by intuition and domain knowledge. This paper presents the use of decision tree to generate the rules automatically from the feature set. The vibration signal from a piezo-electric transducer is captured for the following conditions—good bearing, bearing with inner race fault, bearing with outer race fault, and inner and outer race fault. The statistical features are extracted and good features that discriminate the different fault conditions of the bearing are selected using decision tree. The rule set for fuzzy classifier is obtained once again by using the decision tree. A fuzzy classifier is built and tested with representative data. The results are found to be encouraging.

  17. Ensembl 2017

    PubMed Central

    Aken, Bronwen L.; Achuthan, Premanand; Akanni, Wasiu; Amode, M. Ridwan; Bernsdorff, Friederike; Bhai, Jyothish; Billis, Konstantinos; Carvalho-Silva, Denise; Cummins, Carla; Clapham, Peter; Gil, Laurent; Girón, Carlos García; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah E.; Janacek, Sophie H.; Juettemann, Thomas; Keenan, Stephen; Laird, Matthew R.; Lavidas, Ilias; Maurel, Thomas; McLaren, William; Moore, Benjamin; Murphy, Daniel N.; Nag, Rishi; Newman, Victoria; Nuhn, Michael; Ong, Chuang Kee; Parker, Anne; Patricio, Mateus; Riat, Harpreet Singh; Sheppard, Daniel; Sparrow, Helen; Taylor, Kieron; Thormann, Anja; Vullo, Alessandro; Walts, Brandon; Wilder, Steven P.; Zadissa, Amonida; Kostadima, Myrto; Martin, Fergal J.; Muffato, Matthieu; Perry, Emily; Ruffier, Magali; Staines, Daniel M.; Trevanion, Stephen J.; Cunningham, Fiona; Yates, Andrew; Zerbino, Daniel R.; Flicek, Paul

    2017-01-01

    Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license. PMID:27899575

  18. Merging Multi-model CMIP5/PMIP3 Past-1000 Ensemble Simulations with Tree Ring Proxy Data by Optimal Interpolation Approach

    NASA Astrophysics Data System (ADS)

    Chen, Xin; Luo, Yong; Xing, Pei; Nie, Suping; Tian, Qinhua

    2015-04-01

    Two sets of gridded annual mean surface air temperature in past millennia over the Northern Hemisphere was constructed employing optimal interpolation (OI) method so as to merge the tree ring proxy records with the simulations from CMIP5 (the fifth phase of the Climate Model Intercomparison Project). Both the uncertainties in proxy reconstruction and model simulations can be taken into account applying OI algorithm. For better preservation of physical coordinated features and spatial-temporal completeness of climate variability in 7 copies of model results, we perform the Empirical Orthogonal Functions (EOF) analysis to truncate the ensemble mean field as the first guess (background field) for OI. 681 temperature sensitive tree-ring chronologies are collected and screened from International Tree Ring Data Bank (ITRDB) and Past Global Changes (PAGES-2k) project. Firstly, two methods (variance matching and linear regression) are employed to calibrate the tree ring chronologies with instrumental data (CRUTEM4v) individually. In addition, we also remove the bias of both the background field and proxy records relative to instrumental dataset. Secondly, time-varying background error covariance matrix (B) and static "observation" error covariance matrix (R) are calculated for OI frame. In our scheme, matrix B was calculated locally, and "observation" error covariance are partially considered in R matrix (the covariance value between the pairs of tree ring sites that are very close to each other would be counted), which is different from the traditional assumption that R matrix should be diagonal. Comparing our results, it turns out that regional averaged series are not sensitive to the selection for calibration methods. The Quantile-Quantile plots indicate regional climatologies based on both methods are tend to be more agreeable with regional reconstruction of PAGES-2k in 20th century warming period than in little ice age (LIA). Lager volcanic cooling response over Asia

  19. Transporter studies in drug development: experience to date and follow-up on decision trees from the International Transporter Consortium.

    PubMed

    Tweedie, D; Polli, J W; Berglund, E Gil; Huang, S M; Zhang, L; Poirier, A; Chu, X; Feng, B

    2013-07-01

    The International Transporter Consortium (ITC) organized a second workshop in March 2012 to expand on the themes developed during the inaugural ITC workshop held in 2008. The final session of the workshop provided perspectives from regulatory and industry-based scientists, with input from academic scientists, and focused primarily on the decision trees published from the first workshop. These decision trees have become a central part of subsequent regulatory drug-drug interaction (DDI) guidances issued over the past few years.

  20. Binary Decision Trees for Preoperative Periapical Cyst Screening Using Cone-beam Computed Tomography.

    PubMed

    Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa

    2017-03-01

    Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm(3), there was 80% probability of a cyst. If volume was <247 mm(3) and root displacement was present, cyst probability was 60% (78% accuracy). The good accuracy and high specificity of the decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.

  1. Using decision trees to manage hospital readmission risk for acute myocardial infarction, heart failure, and pneumonia.

    PubMed

    Hilbert, John P; Zasadil, Scott; Keyser, Donna J; Peele, Pamela B

    2014-12-01

    To improve healthcare quality and reduce costs, the Affordable Care Act places hospitals at financial risk for excessive readmissions associated with acute myocardial infarction (AMI), heart failure (HF), and pneumonia (PN). Although predictive analytics is increasingly looked to as a means for measuring, comparing, and managing this risk, many modeling tools require data inputs that are not readily available and/or additional resources to yield actionable information. This article demonstrates how hospitals and clinicians can use their own structured discharge data to create decision trees that produce highly transparent, clinically relevant decision rules for better managing readmission risk associated with AMI, HF, and PN. For illustrative purposes, basic decision trees are trained and tested using publically available data from the California State Inpatient Databases and an open-source statistical package. As expected, these simple models perform less well than other more sophisticated tools, with areas under the receiver operating characteristic (ROC) curve (or AUC) of 0.612, 0.583, and 0.650, respectively, but achieve a lift of at least 1.5 or greater for higher-risk patients with any of the three conditions. More importantly, they are shown to offer substantial advantages in terms of transparency and interpretability, comprehensiveness, and adaptability. By enabling hospitals and clinicians to identify important factors associated with readmissions, target subgroups of patients at both high and low risk, and design and implement interventions that are appropriate to the risk levels observed, decision trees serve as an ideal application for addressing the challenge of reducing hospital readmissions.

  2. A fuzzy decision tree to predict phosphorus export at the catchment scale

    NASA Astrophysics Data System (ADS)

    Schärer, M.; Page, T.; Beven, K.

    2006-12-01

    SummaryQualitative understanding of the processes controlling phosphorus (P) export from agricultural land has been significantly improved in recent years. Problems remain in predicting P losses despite the requirement of tools providing accurate predictions by legislation such as the EU Water Framework Directive. Decision making, not only in the field of diffuse pollution, often relies on limited data. This study is aiming to predict annual P export from agricultural catchments using a very simple approach that concentrates on the functional behaviour of a catchment. Two simple fuzzy decision trees have been established to predict both total P filtered at 0.45 μm (TP <0.45) and particulate P (TP >0.45) export. The predictions are within range of the P export estimated from measured data using discharge-concentration rating curves. The fuzzy method is capable of identifying the catchments having high P export and reproduces the pattern of P export for wet and dry years, especially for TP <0.45. The predicted fuzzy ranges for TP >0.45 export are wide. The available data indicate that single events have a high importance for TP >0.45 export. We assume that an event-based decision tree might be the appropriate approach to constrain the uncertainties. The proposed methodology is simple. For both trees, a classification is made based on only four input variables using fuzzy rules. The rules do not depend on the estimation of numerous parameters but can easily be adapted once new information becomes available. Therefore, the fuzzy system has a high potential to be used as a decision support tool for policy makers.

  3. Cloud Detection from Satellite Imagery: A Comparison of Expert-Generated and Automatically-Generated Decision Trees

    NASA Technical Reports Server (NTRS)

    Shiffman, Smadar

    2004-01-01

    Automated cloud detection and tracking is an important step in assessing global climate change via remote sensing. Cloud masks, which indicate whether individual pixels depict clouds, are included in many of the data products that are based on data acquired on- board earth satellites. Many cloud-mask algorithms have the form of decision trees, which employ sequential tests that scientists designed based on empirical astrophysics studies and astrophysics simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In this study we explored the potential benefits of automatically-learned decision trees for detecting clouds from images acquired using the Advanced Very High Resolution Radiometer (AVHRR) instrument on board the NOAA-14 weather satellite of the National Oceanic and Atmospheric Administration. We constructed three decision trees for a sample of 8km-daily AVHRR data from 2000 using a decision-tree learning procedure provided within MATLAB(R), and compared the accuracy of the decision trees to the accuracy of the cloud mask. We used ground observations collected by the National Aeronautics and Space Administration Clouds and the Earth s Radiant Energy Systems S COOL project as the gold standard. For the sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks included in the AVHRR data product.

  4. Decision Tree based Prediction and Rule Induction for Groundwater Trichloroethene (TCE) Pollution Vulnerability

    NASA Astrophysics Data System (ADS)

    Park, J.; Yoo, K.

    2013-12-01

    For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.

  5. Decision tree and PCA-based fault diagnosis of rotating machinery

    NASA Astrophysics Data System (ADS)

    Sun, Weixiang; Chen, Jin; Li, Jiaqing

    2007-04-01

    After analysing the flaws of conventional fault diagnosis methods, data mining technology is introduced to fault diagnosis field, and a new method based on C4.5 decision tree and principal component analysis (PCA) is proposed. In this method, PCA is used to reduce features after data collection, preprocessing and feature extraction. Then, C4.5 is trained by using the samples to generate a decision tree model with diagnosis knowledge. At last the tree model is used to make diagnosis analysis. To validate the method proposed, six kinds of running states (normal or without any defect, unbalance, rotor radial rub, oil whirl, shaft crack and a simultaneous state of unbalance and radial rub), are simulated on Bently Rotor Kit RK4 to test C4.5 and PCA-based method and back-propagation neural network (BPNN). The result shows that C4.5 and PCA-based diagnosis method has higher accuracy and needs less training time than BPNN.

  6. A web-based decision support system to enhance IPM programs in Washington tree fruit.

    PubMed

    Jones, Vincent P; Brunner, Jay F; Grove, Gary G; Petit, Brad; Tangren, Gerald V; Jones, Wendy E

    2010-06-01

    Integrated pest management (IPM) decision-making has become more information intensive in Washington State tree crops in response to changes in pesticide availability, the development of new control tactics (such as mating disruption) and the development of new information on pest and natural enemy biology. The time-sensitive nature of the information means that growers must have constant access to a single source of verified information to guide management decisions. The authors developed a decision support system for Washington tree fruit growers that integrates environmental data [140 Washington State University (WSU) stations plus weather forecasts from NOAA], model predictions (ten insects, four diseases and a horticultural model), management recommendations triggered by model status and a pesticide database that provides information on non-target impacts on other pests and natural enemies. A user survey in 2008 found that the user base was providing recommendations for most of the orchards and acreage in the state, and that users estimated the value at $ 16 million per year. The design of the system facilitates education on a range of time-sensitive topics and will make it possible easily to incorporate other models, new management recommendations or information from new sensors as they are developed.

  7. Pešek lecture: SETI and society—decision trees

    NASA Astrophysics Data System (ADS)

    Billingham, John

    This paper presents a simplified decision tree diagram for SETI (the Search for Extraterrestrial Intelligence) and society. It deals with the series of steps and circumstances that follow from the quest for evidence of the existence of extraterrestrial civilizations, with the major goal of including those branch points which involve decisions that are societal rather than scientific or technological. Since SETI is based on science and technology these factors are also included in the decision diagram, but more in a summary fashion. A condensed list of the relevant societal disciplines is given. The most difficult decisions are those related to issues of transmitting communications from Earth to ETI. The diagram may be useful as a new way of looking at the subject of Communication with Extraterrestrial Intelligence (CETI), and how it inevitably blends into SETI. Arguments are made for vigorous pursuit of studies in all the societal, cultural and behavioral domains involved, and it is shown that many of these studies can profitably be undertaken now, before an ETI signal has been detected. Not least, it is argued that the newly emerging field of CETI and Society would benefit materially from the application of formal decision theory and analysis, and from game theory and utility theory.

  8. Applications of urban tree canopy assessment and prioritization tools: supporting collaborative decision making to achieve urban sustainability goals

    Treesearch

    Dexter H. Locke; J. Morgan Grove; Michael Galvin; Jarlath P.M. ONeil-Dunne; Charles. Murphy

    2013-01-01

    Urban Tree Canopy (UTC) Prioritizations can be both a set of geographic analysis tools and a planning process for collaborative decision-making. In this paper, we describe how UTC Prioritizations can be used as a planning process to provide decision support to multiple government agencies, civic groups and private businesses to aid in reaching a canopy target. Linkages...

  9. A study of fuzzy logic ensemble system performance on face recognition problem

    NASA Astrophysics Data System (ADS)

    Polyakova, A.; Lipinskiy, L.

    2017-02-01

    Some problems are difficult to solve by using a single intelligent information technology (IIT). The ensemble of the various data mining (DM) techniques is a set of models which are able to solve the problem by itself, but the combination of which allows increasing the efficiency of the system as a whole. Using the IIT ensembles can improve the reliability and efficiency of the final decision, since it emphasizes on the diversity of its components. The new method of the intellectual informational technology ensemble design is considered in this paper. It is based on the fuzzy logic and is designed to solve the classification and regression problems. The ensemble consists of several data mining algorithms: artificial neural network, support vector machine and decision trees. These algorithms and their ensemble have been tested by solving the face recognition problems. Principal components analysis (PCA) is used for feature selection.

  10. Improvement and analysis of ID3 algorithm in decision-making tree

    NASA Astrophysics Data System (ADS)

    Xie, Xiao-Lan; Long, Zhen; Liao, Wen-Qi

    2015-12-01

    For the cooperative system under development, it needs to use the spatial analysis and relative technology concerning data mining in order to carry out the detection of the subject conflict and redundancy, while the ID3 algorithm is an important data mining. Due to the traditional ID3 algorithm in the decision-making tree towards the log part is rather complicated, this paper obtained a new computational formula of information gain through the optimization of algorithm of the log part. During the experiment contrast and theoretical analysis, it is found that IID3 (Improved ID3 Algorithm) algorithm owns higher calculation efficiency and accuracy and thus worth popularizing.

  11. Decision Optimization of Machine Sets Taking Into Consideration Logical Tree Minimization of Design Guidelines

    NASA Astrophysics Data System (ADS)

    Deptuła, A.; Partyka, M. A.

    2014-08-01

    The method of minimization of complex partial multi-valued logical functions determines the degree of importance of construction and exploitation parameters playing the role of logical decision variables. Logical functions are taken into consideration in the issues of modelling machine sets. In multi-valued logical functions with weighting products, it is possible to use a modified Quine - McCluskey algorithm of multi-valued functions minimization. Taking into account weighting coefficients in the logical tree minimization reflects a physical model of the object being analysed much better

  12. A comparison of student academic achievement using decision trees techniques: Reflection from University Malaysia Perlis

    NASA Astrophysics Data System (ADS)

    Aziz, Fatihah; Jusoh, Abd Wahab; Abu, Mohd Syafarudy

    2015-05-01

    A decision tree is one of the techniques in data mining for prediction. Using this method, hidden information from abundant of data can be taken out and interpret the information into useful knowledge. In this paper the academic performance of the student will be examined from 2002 to 2012 from two faculties; Faculty of Manufacturing Engineering and Faculty of Microelectronic Engineering in University Malaysia Perlis (UniMAP). The objectives of this study are to determine and compare the factors that affect the students' academic achievement between the two faculties. The prediction results show there are five attributes that have been considered as factors that influence the students' academic performance.

  13. Decision Tree Classifier for Classification of Plant and Animal Micro RNA's

    NASA Astrophysics Data System (ADS)

    Pant, Bhasker; Pant, Kumud; Pardasani, K. R.

    Gene expression is regulated by miRNAs or micro RNAs which can be 21-23 nucleotide in length. They are non coding RNAs which control gene expression either by translation repression or mRNA degradation. Plants and animals both contain miRNAs which have been classified by wet lab techniques. These techniques are highly expensive, labour intensive and time consuming. Hence faster and economical computational approaches are needed. In view of above a machine learning model has been developed for classification of plant and animal miRNAs using decision tree classifier. The model has been tested on available data and it gives results with 91% accuracy.

  14. Spatial distribution of block falls using volumetric GIS-decision-tree models

    NASA Astrophysics Data System (ADS)

    Abdallah, C.

    2010-10-01

    Block falls are considered a significant aspect of surficial instability contributing to losses in land and socio-economic aspects through their damaging effects to natural and human environments. This paper predicts and maps the geographic distribution and volumes of block falls in central Lebanon using remote sensing, geographic information systems (GIS) and decision-tree modeling (un-pruned and pruned trees). Eleven terrain parameters (lithology, proximity to fault line, karst type, soil type, distance to drainage line, elevation, slope gradient, slope aspect, slope curvature, land cover/use, and proximity to roads) were generated to statistically explain the occurrence of block falls. The latter were discriminated using SPOT4 satellite imageries, and their dimensions were determined during field surveys. The un-pruned tree model based on all considered parameters explained 86% of the variability in field block fall measurements. Once pruned, it classifies 50% in block falls' volumes by selecting just four parameters (lithology, slope gradient, soil type, and land cover/use). Both tree models (un-pruned and pruned) were converted to quantitative 1:50,000 block falls' maps with different classes; starting from Nil (no block falls) to more than 4000 m 3. These maps are fairly matching with coincidence value equal to 45%; however, both can be used to prioritize the choice of specific zones for further measurement and modeling, as well as for land-use management. The proposed tree models are relatively simple, and may also be applied to other areas (i.e. the choice of un-pruned or pruned model is related to the availability of terrain parameters in a given area).

  15. Decision support for mitigating the risk of tree induced transmission line failure in utility rights-of-way.

    PubMed

    Poulos, H M; Camp, A E

    2010-02-01

    Vegetation management is a critical component of rights-of-way (ROW) maintenance for preventing electrical outages and safety hazards resulting from tree contact with conductors during storms. Northeast Utility's (NU) transmission lines are a critical element of the nation's power grid; NU is therefore under scrutiny from federal agencies charged with protecting the electrical transmission infrastructure of the United States. We developed a decision support system to focus right-of-way maintenance and minimize the potential for a tree fall episode that disables transmission capacity across the state of Connecticut. We used field data on tree characteristics to develop a system for identifying hazard trees (HTs) in the field using limited equipment to manage Connecticut power line ROW. Results from this study indicated that the tree height-to-diameter ratio, total tree height, and live crown ratio were the key characteristics that differentiated potential risk trees (danger trees) from trees with a high probability of tree fall (HTs). Products from this research can be transferred to adaptive right-of-way management, and the methods we used have great potential for future application to other regions of the United States and elsewhere where tree failure can disrupt electrical power.

  16. Decision Support for Mitigating the Risk of Tree Induced Transmission Line Failure in Utility Rights-of-Way

    NASA Astrophysics Data System (ADS)

    Poulos, H. M.; Camp, A. E.

    2010-02-01

    Vegetation management is a critical component of rights-of-way (ROW) maintenance for preventing electrical outages and safety hazards resulting from tree contact with conductors during storms. Northeast Utility’s (NU) transmission lines are a critical element of the nation’s power grid; NU is therefore under scrutiny from federal agencies charged with protecting the electrical transmission infrastructure of the United States. We developed a decision support system to focus right-of-way maintenance and minimize the potential for a tree fall episode that disables transmission capacity across the state of Connecticut. We used field data on tree characteristics to develop a system for identifying hazard trees (HTs) in the field using limited equipment to manage Connecticut power line ROW. Results from this study indicated that the tree height-to-diameter ratio, total tree height, and live crown ratio were the key characteristics that differentiated potential risk trees (danger trees) from trees with a high probability of tree fall (HTs). Products from this research can be transferred to adaptive right-of-way management, and the methods we used have great potential for future application to other regions of the United States and elsewhere where tree failure can disrupt electrical power.

  17. Integrating Decision Tree and Hidden Markov Model (HMM) for Subtype Prediction of Human Influenza A Virus

    NASA Astrophysics Data System (ADS)

    Attaluri, Pavan K.; Chen, Zhengxin; Weerakoon, Aruna M.; Lu, Guoqing

    Multiple criteria decision making (MCDM) has significant impact in bioinformatics. In the research reported here, we explore the integration of decision tree (DT) and Hidden Markov Model (HMM) for subtype prediction of human influenza A virus. Infection with influenza viruses continues to be an important public health problem. Viral strains of subtype H3N2 and H1N1 circulates in humans at least twice annually. The subtype detection depends mainly on the antigenic assay, which is time-consuming and not fully accurate. We have developed a Web system for accurate subtype detection of human influenza virus sequences. The preliminary experiment showed that this system is easy-to-use and powerful in identifying human influenza subtypes. Our next step is to examine the informative positions at the protein level and extend its current functionality to detect more subtypes. The web functions can be accessed at http://glee.ist.unomaha.edu/.

  18. High resolution multisensor fusion of SAR, optical and LiDAR data based on crisp vs. fuzzy and feature vs. decision ensemble systems

    NASA Astrophysics Data System (ADS)

    Bigdeli, Behnaz; Pahlavani, Parham

    2016-10-01

    Synthetic Aperture Radar (SAR) data are of high interest for different applications in remote sensing specially land cover classification. SAR imaging is independent of solar illumination and weather conditions. It can even penetrate some of the Earth's surface materials to return information about subsurface features. However, the response of radar is more a function of geometry and structure than a surface reflection occurs in optical images. In addition, the backscatter of objects in the microwave range depends on the frequency of the band used, and the grey values in SAR images are different from the usual assumption of the spectral reflectance of the Earth's surface. Consequently, SAR imaging is often used as a complementary technique to traditional optical remote sensing. This study presents different ensemble systems for multisensor fusion of SAR, multispectral and LiDAR data. First, in decision ensemble system, after extraction and selection of proper features from each data, crisp SVM (Support Vector Machine) and Fuzzy KNN (K Nearest Neighbor) are utilized on each feature space. Finally Bayesian Theory is applied to fuse SVMs when Decision Template (DT) and Dempster Shafer (DS) are applied as fuzzy decision fusion methods on KNNs. Second, in feature ensemble system, features from all data are applied on a cube. Then classifications were performed by SVM and FKNN as crisp and fuzzy decision making system respectively. A co-registered TerrraSAR-X, WorldView-2 and LiDAR data set form San Francisco of USA was available to examine the effectiveness of the proposed method. The results show that combinations of SAR data with different sensor improves classification results for most of the classes.

  19. Categorization of 77 dystrophin exons into 5 groups by a decision tree using indexes of splicing regulatory factors as decision markers

    PubMed Central

    2012-01-01

    Background Duchenne muscular dystrophy, a fatal muscle-wasting disease, is characterized by dystrophin deficiency caused by mutations in the dystrophin gene. Skipping of a target dystrophin exon during splicing with antisense oligonucleotides is attracting much attention as the most plausible way to express dystrophin in DMD. Antisense oligonucleotides have been designed against splicing regulatory sequences such as splicing enhancer sequences of target exons. Recently, we reported that a chemical kinase inhibitor specifically enhances the skipping of mutated dystrophin exon 31, indicating the existence of exon-specific splicing regulatory systems. However, the basis for such individual regulatory systems is largely unknown. Here, we categorized the dystrophin exons in terms of their splicing regulatory factors. Results Using a computer-based machine learning system, we first constructed a decision tree separating 77 authentic from 14 known cryptic exons using 25 indexes of splicing regulatory factors as decision markers. We evaluated the classification accuracy of a novel cryptic exon (exon 11a) identified in this study. However, the tree mislabeled exon 11a as a true exon. Therefore, we re-constructed the decision tree to separate all 15 cryptic exons. The revised decision tree categorized the 77 authentic exons into five groups. Furthermore, all nine disease-associated novel exons were successfully categorized as exons, validating the decision tree. One group, consisting of 30 exons, was characterized by a high density of exonic splicing enhancer sequences. This suggests that AOs targeting splicing enhancer sequences would efficiently induce skipping of exons belonging to this group. Conclusions The decision tree categorized the 77 authentic exons into five groups. Our classification may help to establish the strategy for exon skipping therapy for Duchenne muscular dystrophy. PMID:22462762

  20. Termination of pregnancy for fetal abnormalities: main arguments and a decision-tree model.

    PubMed

    Kose, Semir; Altunyurt, Sabahattin; Yıldırım, Nuri; Keskinoğlu, Pembe; Çankaya, Tufan; Bora, Elçin; Erçal, Derya; Özer, Erdener

    2015-11-01

    By looking through our ethical committee cases, we demonstrate the main arguments we use for making a judgment in face of fetal abnormalities. Our decision making model is a simplified algorithm of the arguments and concepts we use in scientific-ethic discussion. A retrospective analysis was conducted from single, tertiary referral center of patients evaluated for fetal abnormalities from 2004 to 2014. We hypothesized that all our judgments would fit into a decision-tree model. 553 fetal abnormality cases were discussed, 348 (63%) were given termination of pregnancy (TOP) proposal. When detected <24 weeks, fetuses with chromosomal abnormality/genetic disorders (n:100) and with mental retardation risk (n:93) ended up with TOP proposal. For incompatibility with life cases (n:111) and the multimorbidity cases (n:44) the committee suggest TOP, regardless of gestational age. The highest family approval ratios were in chromosomal abnormalities/genetic disorders group (93%), and the lowest figures were in mental retardation risk group (80%). Continuously changing literature on prenatal and postnatal therapy options and the long term outcome of various fetal abnormalities influence committee decisions. Theoretical high success rates and inconsistent data on long term prognosis of some anomaly groups resulted in heterogenous decisions and various approval ratios. © 2015 John Wiley & Sons, Ltd.

  1. MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine

    PubMed Central

    Valdes, Gilmer; Luna, José Marcio; Eaton, Eric; Simone, Charles B.; Ungar, Lyle H.; Solberg, Timothy D.

    2016-01-01

    Machine learning algorithms that are both interpretable and accurate are essential in applications such as medicine where errors can have a dire consequence. Unfortunately, there is currently a tradeoff between accuracy and interpretability among state-of-the-art methods. Decision trees are interpretable and are therefore used extensively throughout medicine for stratifying patients. Current decision tree algorithms, however, are consistently outperformed in accuracy by other, less-interpretable machine learning models, such as ensemble methods. We present MediBoost, a novel framework for constructing decision trees that retain interpretability while having accuracy similar to ensemble methods, and compare MediBoost’s performance to that of conventional decision trees and ensemble methods on 13 medical classification problems. MediBoost significantly outperformed current decision tree algorithms in 11 out of 13 problems, giving accuracy comparable to ensemble methods. The resulting trees are of the same type as decision trees used throughout clinical practice but have the advantage of improved accuracy. Our algorithm thus gives the best of both worlds: it grows a single, highly interpretable tree that has the high accuracy of ensemble methods. PMID:27901055

  2. MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine

    NASA Astrophysics Data System (ADS)

    Valdes, Gilmer; Luna, José Marcio; Eaton, Eric; Simone, Charles B.; Ungar, Lyle H.; Solberg, Timothy D.

    2016-11-01

    Machine learning algorithms that are both interpretable and accurate are essential in applications such as medicine where errors can have a dire consequence. Unfortunately, there is currently a tradeoff between accuracy and interpretability among state-of-the-art methods. Decision trees are interpretable and are therefore used extensively throughout medicine for stratifying patients. Current decision tree algorithms, however, are consistently outperformed in accuracy by other, less-interpretable machine learning models, such as ensemble methods. We present MediBoost, a novel framework for constructing decision trees that retain interpretability while having accuracy similar to ensemble methods, and compare MediBoost’s performance to that of conventional decision trees and ensemble methods on 13 medical classification problems. MediBoost significantly outperformed current decision tree algorithms in 11 out of 13 problems, giving accuracy comparable to ensemble methods. The resulting trees are of the same type as decision trees used throughout clinical practice but have the advantage of improved accuracy. Our algorithm thus gives the best of both worlds: it grows a single, highly interpretable tree that has the high accuracy of ensemble methods.

  3. Diagnostic Features of Common Oral Ulcerative Lesions: An Updated Decision Tree

    PubMed Central

    Safi, Yaser

    2016-01-01

    Diagnosis of oral ulcerative lesions might be quite challenging. This narrative review article aims to introduce an updated decision tree for diagnosing oral ulcerative lesions on the basis of their diagnostic features. Various general search engines and specialized databases including PubMed, PubMed Central, Medline Plus, EBSCO, Science Direct, Scopus, Embase, and authenticated textbooks were used to find relevant topics by means of MeSH keywords such as “oral ulcer,” “stomatitis,” and “mouth diseases.” Thereafter, English-language articles published since 1983 to 2015 in both medical and dental journals including reviews, meta-analyses, original papers, and case reports were appraised. Upon compilation of the relevant data, oral ulcerative lesions were categorized into three major groups: acute, chronic, and recurrent ulcers and into five subgroups: solitary acute, multiple acute, solitary chronic, multiple chronic, and solitary/multiple recurrent, based on the number and duration of lesions. In total, 29 entities were organized in the form of a decision tree in order to help clinicians establish a logical diagnosis by stepwise progression. PMID:27781066

  4. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

    PubMed Central

    Chang, Wan-Yu; Chiu, Chung-Cheng; Yang, Jia-Horng

    2015-01-01

    In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods. PMID:26393597

  5. Computational prediction of blood-brain barrier permeability using decision tree induction.

    PubMed

    Suenderhauf, Claudia; Hammann, Felix; Huwyler, Jörg

    2012-08-31

    Predicting blood-brain barrier (BBB) permeability is essential to drug development, as a molecule cannot exhibit pharmacological activity within the brain parenchyma without first transiting this barrier. Understanding the process of permeation, however, is complicated by a combination of both limited passive diffusion and active transport. Our aim here was to establish predictive models for BBB drug permeation that include both active and passive transport. A database of 153 compounds was compiled using in vivo surface permeability product (logPS) values in rats as a quantitative parameter for BBB permeability. The open source Chemical Development Kit (CDK) was used to calculate physico-chemical properties and descriptors. Predictive computational models were implemented by machine learning paradigms (decision tree induction) on both descriptor sets. Models with a corrected classification rate (CCR) of 90% were established. Mechanistic insight into BBB transport was provided by an Ant Colony Optimization (ACO)-based binary classifier analysis to identify the most predictive chemical substructures. Decision trees revealed descriptors of lipophilicity (aLogP) and charge (polar surface area), which were also previously described in models of passive diffusion. However, measures of molecular geometry and connectivity were found to be related to an active drug transport component.

  6. Using decision-tree classifier systems to extract knowledge from databases

    NASA Technical Reports Server (NTRS)

    St.clair, D. C.; Sabharwal, C. L.; Hacke, Keith; Bond, W. E.

    1990-01-01

    One difficulty in applying artificial intelligence techniques to the solution of real world problems is that the development and maintenance of many AI systems, such as those used in diagnostics, require large amounts of human resources. At the same time, databases frequently exist which contain information about the process(es) of interest. Recently, efforts to reduce development and maintenance costs of AI systems have focused on using machine learning techniques to extract knowledge from existing databases. Research is described in the area of knowledge extraction using a class of machine learning techniques called decision-tree classifier systems. Results of this research suggest ways of performing knowledge extraction which may be applied in numerous situations. In addition, a measurement called the concept strength metric (CSM) is described which can be used to determine how well the resulting decision tree can differentiate between the concepts it has learned. The CSM can be used to determine whether or not additional knowledge needs to be extracted from the database. An experiment involving real world data is presented to illustrate the concepts described.

  7. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees.

    PubMed

    Chang, Wan-Yu; Chiu, Chung-Cheng; Yang, Jia-Horng

    2015-09-18

    In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods.

  8. Development of decision tree models for substrates, inhibitors, and inducers of p-glycoprotein.

    PubMed

    Hammann, Felix; Gutmann, Heike; Jecklin, Ursula; Maunz, Andreas; Helma, Christoph; Drewe, Juergen

    2009-05-01

    In silico classification of new compounds for certain properties is a useful tool to guide further experiments or compound selection. Interaction of new compounds with the efflux pump P-glycoprotein (P-gp) is an important drug property determining tissue distribution and the potential for drug-drug interactions. We present three datasets on substrate, inhibitor, and inducer activities for P-gp (n = 471) obtained from a literature search which we compared to an existing evaluation of the Prestwick Chemical Library with the calcein-AM assay (retrieved from PubMed). Additionally, we present decision tree models of these activities with predictive accuracies of 77.7 % (substrates), 86.9 % (inhibitors), and 90.3 % (inducers) using three algorithms (CHAID, CART, and C4.5). We also present decision tree models of the calcein-AM assay (79.9 %). Apart from a comprehensive dataset of P-gp interacting compounds, our study provides evidence of the efficacy of logD descriptors and of two algorithms not commonly used in pharmacological QSAR studies (CART and CHAID).

  9. Longitudinal risk profiling for suicidal thoughts and behaviours in a community cohort using decision trees.

    PubMed

    Batterham, Philip J; Christensen, Helen

    2012-12-15

    While associations between specific risk factors and subsequent suicidal thoughts or behaviours have been widely examined, there is limited understanding of the interplay between risk factors in the development of suicide risk. This study used a decision tree approach to develop individual models of suicide risk and identify the risk factors for suicidality that are important for different subpopulations. In a population cohort of 6656 Australian adults, the study examined whether measures of mental health, physical health, personality, substance use, social support, social stressors and background characteristics were associated with suicidal ideation and suicidal behaviours after four-year follow-up. Previous suicidality, anxiety symptoms, depression symptoms, neuroticism and rumination were the strongest predictors of suicidal ideation and behaviour after four years. However, divergent factors were predictive of suicidal thoughts and behaviours across the spectrum of mental health. In particular, substance use was only associated with suicidal thoughts and behaviours in those with moderate levels of anxiety or depression. Most of the measurements were based on self-report. Further research is required to assess whether changes in risk factors lead to changes in suicidality. Examining suicide risk factors using decision trees is a promising approach for developing individualised assessments of suicide risk and tailored intervention programs. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. Using Boosting Decision Trees in Gravitational Wave Searches triggered by Gamma-ray Bursts

    NASA Astrophysics Data System (ADS)

    Zuraw, Sarah; LIGO Collaboration

    2015-04-01

    The search for gravitational wave bursts requires the ability to distinguish weak signals from background detector noise. Gravitational wave bursts are characterized by their transient nature, making them particularly difficult to detect as they are similar to non-Gaussian noise fluctuations in the detector. The Boosted Decision Tree method is a powerful machine learning algorithm which uses Multivariate Analysis techniques to explore high-dimensional data sets in order to distinguish between gravitational wave signal and background detector noise. It does so by training with known noise events and simulated gravitational wave events. The method is tested using waveform models and compared with the performance of the standard gravitational wave burst search pipeline for Gamma-ray Bursts. It is shown that the method is able to effectively distinguish between signal and background events under a variety of conditions and over multiple Gamma-ray Burst events. This example demonstrates the usefulness and robustness of the Boosted Decision Tree and Multivariate Analysis techniques as a detection method for gravitational wave bursts. LIGO, UMass, PREP, NEGAP.

  11. [Classification of dengue hemorrhagic fever using decision trees in the early phase of the disease].

    PubMed

    Vega Riverón, Beatriz; Sánchez Valdés, C Lizet; Cortiñas Abrahantes, C José; Castro Peraza, Osvaldo; González Rubio, C Daniel; Castro Peraza, Marta

    2012-01-01

    dengue is a viral disease with endemic behavior. At the beginning of the illness it is not possible to know which patients will have an unfavorable evolution and develop a severe form of dengue. However, some warning symptoms and signs may be present. to apply decision tree techniques to the exploration of signs of severity in the early phase of the illness. the study sample was made up of 230 patients admitted with dengue to "Pedro Kouri" Institute of Tropical Medicine in 2001. The variables considered for the classification were the signs, symptoms and laboratory exams on the third day of evolution of the illness. The algorithm of classification and regression trees using the Gini's index was applied. Different loss matrices to improve the sensitivity were considered. the algorithm CART, corresponding to the best loss, had a sensitivity of 98,68% and global error of 0,36. Without considering loss, it obtained its sensitivity reached 74% with an error of 0,25. In both cases, the most important variables were platelets and hemoglobin. the study submitted rules of decision with high sensitivity and negative predictive value of utility in the clinical practice. The laboratory variables resulted more important from the informational viewpoint than the clinical ones to discriminate clinical forms of dengue.

  12. Recognition of Protozoa and Metazoa using image analysis tools, discriminant analysis, neural networks and decision trees.

    PubMed

    Ginoris, Y P; Amaral, A L; Nicolau, A; Coelho, M A Z; Ferreira, E C

    2007-07-09

    Protozoa and metazoa are considered good indicators of the treatment quality in activated sludge systems due to the fact that these organisms are fairly sensitive to physical, chemical and operational processes. Therefore, it is possible to establish close relationships between the predominance of certain species or groups of species and several operational parameters of the plant, such as the biotic indices, namely the Sludge Biotic Index (SBI). This procedure requires the identification, classification and enumeration of the different species, which is usually achieved manually implying both time and expertise availability. Digital image analysis combined with multivariate statistical techniques has proved to be a useful tool to classify and quantify organisms in an automatic and not subjective way. This work presents a semi-automatic image analysis procedure for protozoa and metazoa recognition developed in Matlab language. The obtained morphological descriptors were analyzed using discriminant analysis, neural network and decision trees multivariable statistical techniques to identify and classify each protozoan or metazoan. The obtained procedure was quite adequate for distinguishing between the non-sessile protozoa classes and also for the metazoa classes, with high values for the overall species recognition with the exception of sessile protozoa. In terms of the wastewater conditions assessment the obtained results were found to be suitable for the prediction of these conditions. Finally, the discriminant analysis and neural networks results were found to be quite similar whereas the decision trees technique was less appropriate.

  13. A decision tree for the management of exposed cervical dentin (ECD) and dentin hypersensitivity (DHS).

    PubMed

    Martens, Luc C

    2013-03-01

    Dentin hypersensitivity (DHS) is a problematic clinical entity that may become an increasing clinical problem for dentists to treat as a consequence of patients retaining their teeth throughout life and improved oral hygiene practices. The aim of this review was to develop a decision tree for the management of exposed cervical dentin (ECD) and DHS. A brief PUBMED literature search was performed on dentin hypersensitivity using "MeSH" terms, "review", and "management". In addition, some websites and local guidelines were screened. From this review, it became clear that all dentate patients should routinely be screened for ECD and DHS. In this respect, underdiagnosis of the condition will be avoided and the preventive management can be initiated early. A decision tree process and a flowchart for daily practice were designed which should be started up as soon as a patient present with ECD or suffers from DHS. This approach takes into account the possible improved quality of life of the patient and is further based on a hierarchy of treatment options. In this respect, active management of DHS will usually involve a combination of at-home and in-office therapies. Starting with the use of desensitizing toothpastes is strongly recommended.

  14. Analysis of acid rain patterns in northeastern China using a decision tree method

    NASA Astrophysics Data System (ADS)

    Zhang, Xiuying; Jiang, Hong; Jin, Jiaxin; Xu, Xiaohua; Zhang, Qingxin

    2012-01-01

    Acid rain is a major regional-scale environmental problem in China. To control acid rain pollution and to protect the ecological environment, it is urgent to document acid rain patterns in various regions of China. Taking Liaoning Province as the study area, the present work focused on the spatial and temporal variations of acid rains in northeastern China. It presents a means for predicting the occurrence of acid rain using geographic position, terrain characteristics, routinely monitored meteorological factors and column concentrations of atmospheric SO 2 and NO 2. The analysis applies a decision tree approach to the foregoing observation data. Results showed that: (1) acid rain occurred at 17 stations among the 81 monitoring stations in Liaoning Province, with the frequency of acid rain from 0 to 84.38%; (2) summer had the most acid rain occurrences followed by spring and autumn, and the winter had the least; (3) the total accuracy for the simulation of precipitation pH (pH ≤ 4.5, 4.5 < pH ≤ 5.6, and pH > 5.6) was 98.04% using the decision tree method known as C5. The simulation results also indicated that the distance to coastline, elevation, wind direction, wind speed, rainfall amount, atmospheric pressure, and the precursors of acid rain all have a strong influence on the occurrence of acid rains in northeastern China.

  15. Decision trees to characterise the roles of permeability and solubility on the prediction of oral absorption.

    PubMed

    Newby, Danielle; Freitas, Alex A; Ghafourian, Taravat

    2015-01-27

    Oral absorption of compounds depends on many physiological, physiochemical and formulation factors. Two important properties that govern oral absorption are in vitro permeability and solubility, which are commonly used as indicators of human intestinal absorption. Despite this, the nature and exact characteristics of the relationship between these parameters are not well understood. In this study a large dataset of human intestinal absorption was collated along with in vitro permeability, aqueous solubility, melting point, and maximum dose for the same compounds. The dataset allowed a permeability threshold to be established objectively to predict high or low intestinal absorption. Using this permeability threshold, classification decision trees incorporating a solubility-related parameter such as experimental or predicted solubility, or the melting point based absorption potential (MPbAP), along with structural molecular descriptors were developed and validated to predict oral absorption class. The decision trees were able to determine the individual roles of permeability and solubility in oral absorption process. Poorly permeable compounds with high solubility show low intestinal absorption, whereas poorly water soluble compounds with high or low permeability may have high intestinal absorption provided that they have certain molecular characteristics such as a small polar surface or specific topology. Copyright © 2015 Elsevier Masson SAS. All rights reserved.

  16. Non-compliance with a postmastectomy radiotherapy guideline: decision tree and cause analysis.

    PubMed

    Razavi, Amir R; Gill, Hans; Ahlfeldt, Hans; Shahsavar, Nosrat

    2008-09-21

    The guideline for postmastectomy radiotherapy (PMRT), which is prescribed to reduce recurrence of breast cancer in the chest wall and improve overall survival, is not always followed. Identifying and extracting important patterns of non-compliance are crucial in maintaining the quality of care in Oncology. Analysis of 759 patients with malignant breast cancer using decision tree induction (DTI) found patterns of non-compliance with the guideline. The PMRT guideline was used to separate cases according to the recommendation to receive or not receive PMRT. The two groups of patients were analyzed separately. Resulting patterns were transformed into rules that were then compared with the reasons that were extracted by manual inspection of records for the non-compliant cases. Analyzing patients in the group who should receive PMRT according to the guideline did not result in a robust decision tree. However, classification of the other group, patients who should not receive PMRT treatment according to the guideline, resulted in a tree with nine leaves and three of them were representing non-compliance with the guideline. In a comparison between rules resulting from these three non-compliant patterns and manual inspection of patient records, the following was found: In the decision tree, presence of perigland growth is the most important variable followed by number of malignantly invaded lymph nodes and level of Progesterone receptor. DNA index, age, size of the tumor and level of Estrogen receptor are also involved but with less importance. From manual inspection of the cases, the most frequent pattern for non-compliance is age above the threshold followed by near cut-off values for risk factors and unknown reasons. Comparison of patterns of non-compliance acquired from data mining and manual inspection of patient records demonstrates that not all of the non-compliances are repetitive or important. There are some overlaps between important variables acquired from manual

  17. Non-compliance with a postmastectomy radiotherapy guideline: Decision tree and cause analysis

    PubMed Central

    Razavi, Amir R; Gill, Hans; Åhlfeldt, Hans; Shahsavar, Nosrat

    2008-01-01

    Background The guideline for postmastectomy radiotherapy (PMRT), which is prescribed to reduce recurrence of breast cancer in the chest wall and improve overall survival, is not always followed. Identifying and extracting important patterns of non-compliance are crucial in maintaining the quality of care in Oncology. Methods Analysis of 759 patients with malignant breast cancer using decision tree induction (DTI) found patterns of non-compliance with the guideline. The PMRT guideline was used to separate cases according to the recommendation to receive or not receive PMRT. The two groups of patients were analyzed separately. Resulting patterns were transformed into rules that were then compared with the reasons that were extracted by manual inspection of records for the non-compliant cases. Results Analyzing patients in the group who should receive PMRT according to the guideline did not result in a robust decision tree. However, classification of the other group, patients who should not receive PMRT treatment according to the guideline, resulted in a tree with nine leaves and three of them were representing non-compliance with the guideline. In a comparison between rules resulting from these three non-compliant patterns and manual inspection of patient records, the following was found: In the decision tree, presence of perigland growth is the most important variable followed by number of malignantly invaded lymph nodes and level of Progesterone receptor. DNA index, age, size of the tumor and level of Estrogen receptor are also involved but with less importance. From manual inspection of the cases, the most frequent pattern for non-compliance is age above the threshold followed by near cut-off values for risk factors and unknown reasons. Conclusion Comparison of patterns of non-compliance acquired from data mining and manual inspection of patient records demonstrates that not all of the non-compliances are repetitive or important. There are some overlaps between

  18. Decision tree analysis of factors influencing rainfall-related building damage

    NASA Astrophysics Data System (ADS)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-04-01

    Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a

  19. Decision-tree analysis of factors influencing rainfall-related building structure and content damage

    NASA Astrophysics Data System (ADS)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-09-01

    Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the

  20. Determinants of farmers' tree planting investment decision as a degraded landscape management strategy in the central highlands of Ethiopia

    NASA Astrophysics Data System (ADS)

    Gessesse, B.; Bewket, W.; Bräuning, A.

    2015-11-01

    Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and binary logistic regression model. The model significantly predicted farmers' tree planting decision (Chi-square = 37.29, df = 15, P<0.001). Besides, the computed significant value of the model suggests that all the considered predictor variables jointly influenced the farmers' decision to plant trees as a land management strategy. In this regard, the finding of the study show that local land-users' willingness to adopt tree growing decision is a function of a wide range of biophysical, institutional, socioeconomic and household level factors, however, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system have positively and significantly influence on tree growing investment decisions in the study watershed. Eventually, the processes of land use conversion and land degradation are serious which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, devising sustainable and integrated land management policy options and implementing them would enhance ecological restoration and livelihood sustainability in the study watershed.

  1. Determinants of farmers' tree-planting investment decisions as a degraded landscape management strategy in the central highlands of Ethiopia

    NASA Astrophysics Data System (ADS)

    Gessesse, Berhan; Bewket, Woldeamlak; Bräuning, Achim

    2016-04-01

    Land degradation due to lack of sustainable land management practices is one of the critical challenges in many developing countries including Ethiopia. This study explored the major determinants of farm-level tree-planting decisions as a land management strategy in a typical farming and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and a binary logistic regression model. The model significantly predicted farmers' tree-planting decisions (χ2 = 37.29, df = 15, P < 0.001). Besides, the computed significant value of the model revealed that all the considered predictor variables jointly influenced the farmers' decisions to plant trees as a land management strategy. The findings of the study demonstrated that the adoption of tree-growing decisions by local land users was a function of a wide range of biophysical, institutional, socioeconomic and household-level factors. In this regard, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system had a critical influence on tree-growing investment decisions in the study watershed. Eventually, the processes of land-use conversion and land degradation were serious, which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, the study recommended that devising and implementing sustainable land management policy options would enhance ecological restoration and livelihood sustainability in the study watershed.

  2. Generation of 2D Land Cover Maps for Urban Areas Using Decision Tree Classification

    NASA Astrophysics Data System (ADS)

    Höhle, J.

    2014-09-01

    A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects like buildings, roads, grassland, trees, hedges, and walls from such an "intelligent" point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using image analysis techniques. The proposed methodology is described step by step. The classification, assessment, and refinement is carried out by the open source software "R"; the generation of the dense and accurate digital surface model by the "Match-T DSM" program of the Trimble Company. A practical example of a 2D land cover map generation is carried out. Images of a multispectral medium-format aerial camera covering an urban area in Switzerland are used. The assessment of the produced land cover map is based on class-wise stratified sampling where reference values of samples are determined by means of stereo-observations of false-colour stereopairs. The stratified statistical assessment of the produced land cover map with six classes and based on 91 points per class reveals a high thematic accuracy for classes "building" (99 %, 95 % CI: 95 %-100 %) and "road and parking lot" (90 %, 95 % CI: 83 %-95 %). Some other accuracy measures (overall accuracy, kappa value) and their 95 % confidence intervals are derived as well. The proposed methodology has a high potential for automation and fast processing and may be applied to other scenes and sensors.

  3. CorRECTreatment: A Web-based Decision Support Tool for Rectal Cancer Treatment that Uses the Analytic Hierarchy Process and Decision Tree

    PubMed Central

    Karakülah, G.; Dicle, O.; Sökmen, S.; Çelikoğlu, C.C.

    2015-01-01

    Summary Background The selection of appropriate rectal cancer treatment is a complex multi-criteria decision making process, in which clinical decision support systems might be used to assist and enrich physicians’ decision making. Objective The objective of the study was to develop a web-based clinical decision support tool for physicians in the selection of potentially beneficial treatment options for patients with rectal cancer. Methods The updated decision model contained 8 and 10 criteria in the first and second steps respectively. The decision support model, developed in our previous study by combining the Analytic Hierarchy Process (AHP) method which determines the priority of criteria and decision tree that formed using these priorities, was updated and applied to 388 patients data collected retrospectively. Later, a web-based decision support tool named corRECTreatment was developed. The compatibility of the treatment recommendations by the expert opinion and the decision support tool was examined for its consistency. Two surgeons were requested to recommend a treatment and an overall survival value for the treatment among 20 different cases that we selected and turned into a scenario among the most common and rare treatment options in the patient data set. Results In the AHP analyses of the criteria, it was found that the matrices, generated for both decision steps, were consistent (consistency ratio<0.1). Depending on the decisions of experts, the consistency value for the most frequent cases was found to be 80% for the first decision step and 100% for the second decision step. Similarly, for rare cases consistency was 50% for the first decision step and 80% for the second decision step. Conclusions The decision model and corRECTreatment, developed by applying these on real patient data, are expected to provide potential users with decision support in rectal cancer treatment processes and facilitate them in making projections about treatment options

  4. Trees

    ERIC Educational Resources Information Center

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  5. Structured Learning of Tree Potentials in CRF for Image Segmentation.

    PubMed

    Liu, Fayao; Lin, Guosheng; Qiao, Ruizhi; Shen, Chunhua

    2017-04-13

    We propose a new approach to image segmentation, which exploits the advantages of both conditional random fields (CRFs) and decision trees. In the literature, the potential functions of CRFs are mostly defined as a linear combination of some predefined parametric models, and then, methods, such as structured support vector machines, are applied to learn those linear coefficients. We instead formulate the unary and pairwise potentials as nonparametric forests--ensembles of decision trees, and learn the ensemble parameters and the trees in a unified optimization problem within the large-margin framework. In this fashion, we easily achieve nonlinear learning of potential functions on both unary and pairwise terms in CRFs. Moreover, we learn classwise decision trees for each object that appears in the image. Experimental results on several public segmentation data sets demonstrate the power of the learned nonlinear nonparametric potentials.

  6. Evaluation with Decision Trees of Efficacy and Safety of Semirigid Ureteroscopy in the Treatment of Proximal Ureteral Calculi.

    PubMed

    Sancak, Eyup Burak; Kılınç, Muhammet Fatih; Yücebaş, Sait Can

    2017-05-05

    The decision on the choice of proximal ureteral stone therapy depends on many factors, and sometimes urologists have difficulty in choosing the treatment option. This study is aimed at evaluating the factors affecting the success of semirigid ureterorenoscopy (URS) using the "decision tree" method. From January 2005 to November 2015, the data of consecutive patients treated for proximal ureteral stone were retrospectively analyzed. A total of 920 patients with proximal ureteral stone treated with semirigid URS were included in the study. All statistically significant attributes were tested using the decision tree method. The model created using decision tree had a sensitivity of 0.993 and an accuracy of 0.857. While URS treatment was successful in 752 patients (81.7%), it was unsuccessful in 168 patients (18.3%). According to the decision tree method, the most important factor affecting the success of URS is whether the stone is impacted to the ureteral wall. The second most important factor affecting treatment was intramural stricture requiring dilatation if the stone is impacted, and the size of the stone if not impacted. Our study suggests that the impacted stone, intramural stricture requiring dilatation and stone size may have a significant effect on the success rate of semirigid URS for proximal ureteral stone. Further studies with population-based and longitudinal design should be conducted to confirm this finding. © 2017 S. Karger AG, Basel.

  7. An efficient ensemble learning method for gene microarray classification.

    PubMed

    Osareh, Alireza; Shadgar, Bita

    2013-01-01

    The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

  8. Induction of decision trees and Bayesian classification applied to diagnosis of sport injuries.

    PubMed

    Zelic, I; Kononenko, I; Lavrac, N; Vuga, V

    1997-12-01

    Machine learning techniques can be used to extract knowledge from data stored in medical databases. In our application, various machine learning algorithms were used to extract diagnostic knowledge which may be used to support the diagnosis of sport injuries. The applied methods include variants of the Assistant algorithm for top-down induction of decision trees, and variants of the Bayesian classifier. The available dataset was insufficient for reliable diagnosis of all sport injuries considered by the system. Consequently, expert-defined diagnostic rules were added and used as pre-classifiers or as generators of additional training instances for diagnoses for which only few training examples were available. Experimental results show that the classification accuracy and the explanation capability of the naive Bayesian classifier with the fuzzy discretization of numerical attributes were superior to other methods and estimated as the most appropriate for practical use.

  9. Comparative Analysis of Decision Trees with Logistic Regression in Predicting Fault-Prone Classes

    NASA Astrophysics Data System (ADS)

    Singh, Yogesh; Takkar, Arvinder Kaur; Malhotra, Ruchika

    There are available metrics for predicting fault prone classes, which may help software organizations for planning and performing testing activities. This may be possible due to proper allocation of resources on fault prone parts of the design and code of the software. Hence, importance and usefulness of such metrics is understandable, but empirical validation of these metrics is always a great challenge. Decision Tree (DT) methods have been successfully applied for solving classification problems in many applications. This paper evaluates the capability of three DT methods and compares its performance with statistical method in predicting fault prone software classes using publicly available NASA data set. The results indicate that the prediction performance of DT is generally better than statistical model. However, similar types of studies are required to be carried out in order to establish the acceptability of the DT models.

  10. Independent component analysis and decision trees for ECG holter recording de-noising.

    PubMed

    Kuzilek, Jakub; Kremen, Vaclav; Soucek, Filip; Lhotska, Lenka

    2014-01-01

    We have developed a method focusing on ECG signal de-noising using Independent component analysis (ICA). This approach combines JADE source separation and binary decision tree for identification and subsequent ECG noise removal. In order to to test the efficiency of this method comparison to standard filtering a wavelet- based de-noising method was used. Freely data available at Physionet medical data storage were evaluated. Evaluation criteria was root mean square error (RMSE) between original ECG and filtered data contaminated with artificial noise. Proposed algorithm achieved comparable result in terms of standard noises (power line interference, base line wander, EMG), but noticeably significantly better results were achieved when uncommon noise (electrode cable movement artefact) were compared.

  11. Multi-Output Decision Trees for Lesion Segmentation in Multiple Sclerosis.

    PubMed

    Jog, Amod; Carass, Aaron; Pham, Dzung L; Prince, Jerry L

    2015-02-01

    Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.

  12. Multi-output decision trees for lesion segmentation in multiple sclerosis

    NASA Astrophysics Data System (ADS)

    Jog, Amod; Carass, Aaron; Pham, Dzung L.; Prince, Jerry L.

    2015-03-01

    Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.

  13. Decision-Tree-based data mining and rule induction for predicting and mapping soil bacterial diversity.

    PubMed

    Kim, Kangsuk; Yoo, Keunje; Ki, Dongwon; Son, Il Suh; Oh, Kyong Joo; Park, Joonhong

    2011-07-01

    Soilmicrobial ecology plays a significant role in global ecosystems. Nevertheless, methods of model prediction and mapping have yet to be established for soil microbial ecology. The present study was undertaken to develop an artificial-intelligence- and geographical information system (GIS)-integrated framework for predicting and mapping soil bacterial diversity using pre-existing environmental geospatial database information, and to further evaluate the applicability of soil bacterial diversity mapping for planning construction of eco-friendly roads. Using a stratified random sampling, soil bacterial diversity was measured in 196 soil samples in a forest area where construction of an eco-friendly road was planned. Model accuracy, coherence analyses, and tree analysis were systematically performed, and four-class discretized decision tree (DT) with ordinary pair-wise partitioning (OPP) was selected as the optimal model among tested five DT model variants. GIS-based simulations of the optimal DT model with varying weights assigned to soil ecological quality showed that the inclusion of soil ecology in environmental components, which are considered in environmental impact assessment, significantly affects the spatial distributions of overall environmental quality values as well as the determination of an environmentally optimized road route. This work suggests a guideline to use systematic accuracy, coherence, and tree analyses in selecting an optimal DT model from multiple candidate model variants, and demonstrates the applicability of the OPP-improved DT integrated with GIS in rule induction for mapping bacterial diversity. These findings also provide implication on the significance of soil microbial ecology in environmental impact assessment and eco-friendly construction planning.

  14. Determining Optimal Route of Hysterectomy for Benign Indications: Clinical Decision Tree Algorithm.

    PubMed

    Schmitt, Jennifer J; Carranza Leon, Daniel A; Occhino, John A; Weaver, Amy L; Dowdy, Sean C; Bakkum-Gamez, Jamie N; Pasupathy, Kalyan S; Gebhart, John B

    2017-01-01

    To evaluate practice change after initiation of a robotic surgery program using a clinical algorithm to determine the optimal surgical approach to benign hysterectomy. A retrospective postrobot cohort of benign hysterectomies (2009-2013) was identified and the expected surgical route was determined from an algorithm using vaginal access and uterine size as decision tree branches. We excluded the laparoscopic hysterectomy route. A prerobot cohort (2004-2005) was used to evaluate a practice change after the addition of robotic technology (2007). Costs were estimated. Cohorts were similar in regard to uterine size, vaginal parity, and prior laparotomy history. In the prerobot cohort (n=473), 320 hysterectomies (67.7%) were performed vaginally and 153 (32.3%) through laparotomy with 15.1% (46/305) performed abdominally when the algorithm specified vaginal hysterectomy. In the postrobot cohort (n=1,198), 672 hysterectomies (56.1%) were vaginal; 390 (32.6%) robot-assisted; and 136 (11.4%) abdominal. Of 743 procedures, 38 (5.1%) involved laparotomy and 154 (20.7%) involved robotic technique when a vaginal approach was expected. Robotic hysterectomies had longer operations (141 compared with 59 minutes, P<.001) and higher rates of surgical site infection (4.7% compared with 0.2%, P<.001) and urinary tract infection (8.1% compared with 4.1%, P=.05) but no difference in major complications (P=.27) or readmissions (P=.27) compared with vaginal hysterectomy. Algorithm conformance would have saved an estimated $800,000 in hospital costs over 5 years. When a decision tree algorithm indicated vaginal hysterectomy as the route of choice, vaginal hysterectomy was associated with shorter operative times, lower infection rate, and lower cost. Vaginal hysterectomy should be the route of choice when feasible.

  15. Decision trees for the analysis of genes involved in Alzheimer's disease pathology.

    PubMed

    Mestizo Gutiérrez, Sonia L; Herrera Rivero, Marisol; Cruz Ramírez, Nicandro; Hernández, Elena; Aranda-Abreu, Gonzalo E

    2014-09-21

    Alzheimer's disease (AD) is characterized by a gradual loss of memory, orientation, judgement and language. There is still no cure for this disorder. AD pathogenesis remains fairly unknown and its underlying molecular mechanisms are not yet fully understood. Several studies have shown that the abnormal accumulation of beta-amyloid and tau proteins occurs 10 to 20 years before the onset of symptoms of the disease, so it is extremely important to identify changes in the brain before the first symptoms. We used decision trees to classify 31 individuals (9 healthy controls and 22 AD patients in three different stages of disease) according to the expression of 69 genes previously reported in a meta-analysis, plus the expression levels of APP, APOE, BACE1, NCSTN, PSEN1, PSEN2 and MAPT. We also included in our analysis the MMSE (Mini-Mental State Examination) scores and number of NFT (neurofibrillary tangles). Results allowed us to generate a model of classification values for different AD stages of severity, according to MMSE scores, and achieve the identification of the expression level of protein tau that may possibly determine the onset (incipient stage) of AD. We used decision trees to model the different stages of AD (severe, moderate, incipient and control) based on the meta-analysis of gene expression levels plus MMSE and NFT scores. Both classifiers reported the variable MMSE as most informative, however it we were found that the protein tau also an important role in the onset of AD. Copyright © 2014 Elsevier Ltd. All rights reserved.

  16. Decision tree structure based classification of EEG signals recorded during two dimensional cursor movement imagery.

    PubMed

    Aydemir, Onder; Kayikcioglu, Temel

    2014-05-30

    Input signals of an EEG based brain computer interface (BCI) system are naturally non-stationary, have poor signal to noise ratio, depend on physical or mental tasks and are contaminated with various artifacts such as external electromagnetic waves, electromyogram and electrooculogram. All these disadvantages have motivated researchers to substantially improve speed and accuracy of all components of the communication system between brain and a BCI output device. In this study, a fast and accurate decision tree structure based classification method was proposed for classifying EEG data to up/down/right/left computer cursor movement imagery EEG data. The data sets were acquired from three healthy human subjects in age group of between 24 and 29 years old in two sessions on different days. The proposed decision tree structure based method was successfully applied to the present data sets and achieved 55.92%, 57.90% and 82.24% classification accuracy rate on the test data of three subjects. The results indicated that the proposed method provided 12.25% improvement over the best results of the most closely related studies although the EEG signals were collected on two different sessions with about 1 week interval. The proposed method required only a training set of the subject and automatically generated specific DTS for each new subject by determining the most appropriate feature set and classifier for each node. Additionally, with further developments of feature extraction and/or classification algorithms, any existing node can be easily replaced with new one without breaking the whole DTS. This attribute makes the proposed method flexible. Copyright © 2014 Elsevier B.V. All rights reserved.

  17. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree

    PubMed Central

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  18. Bayesian decision tree for the classification of the mode of motion in single-molecule trajectories.

    PubMed

    Türkcan, Silvan; Masson, Jean-Baptiste

    2013-01-01

    Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. The method is based on Bayesian inference to calculate the posteriori probability of an observed trajectory according to a certain model. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and we built a decision tree. This decision tree first uses the BIC to distinguish between free Brownian motion and confined motion. In a second step, it classifies the confining potential further using the AIC. We apply the method to experimental Clostridium Perfingens [Formula: see text]-toxin (CP[Formula: see text]T) receptor trajectories to show that these receptors are confined by a spring-like potential. An adaptation of this technique was applied on a sliding window in the temporal dimension along the trajectory. We applied this adaptation to experimental CP[Formula: see text]T trajectories that lose confinement due to disaggregation of confining domains. This new technique adds another dimension to the discussion of SMT data. The mode of motion of a receptor might hold more biologically relevant information than the diffusion

  19. Bayesian Decision Tree for the Classification of the Mode of Motion in Single-Molecule Trajectories

    PubMed Central

    Türkcan, Silvan; Masson, Jean-Baptiste

    2013-01-01

    Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. The method is based on Bayesian inference to calculate the posteriori probability of an observed trajectory according to a certain model. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and we built a decision tree. This decision tree first uses the BIC to distinguish between free Brownian motion and confined motion. In a second step, it classifies the confining potential further using the AIC. We apply the method to experimental Clostridium Perfingens -toxin (CPT) receptor trajectories to show that these receptors are confined by a spring-like potential. An adaptation of this technique was applied on a sliding window in the temporal dimension along the trajectory. We applied this adaptation to experimental CPT trajectories that lose confinement due to disaggregation of confining domains. This new technique adds another dimension to the discussion of SMT data. The mode of motion of a receptor might hold more biologically relevant information than the diffusion coefficient or domain size and may be a better tool to

  20. A novel decision-tree method for structured continuous-label classification.

    PubMed

    Hu, Hsiao-Wei; Chen, Yen-Liang; Tang, Kwei

    2013-12-01

    Structured continuous-label classification is a variety of classification in which the label is continuous in the data, but the goal is to classify data into classes that are a set of predefined ranges and can be organized in a hierarchy. In the hierarchy, the ranges at the lower levels are more specific and inherently more difficult to predict, whereas the ranges at the upper levels are less specific and inherently easier to predict. Therefore, both prediction specificity and prediction accuracy must be considered when building a decision tree (DT) from this kind of data. This paper proposes a novel classification algorithm for learning DT classifiers from data with structured continuous labels. This approach considers the distribution of labels throughout the hierarchical structure during the construction of trees without requiring discretization in the preprocessing stage. We compared the results of the proposed method with those of the C4.5 algorithm using eight real data sets. The empirical results indicate that the proposed method outperforms the C4.5 algorithm with regard to prediction accuracy, prediction specificity, and computational complexity.

  1. Determining Cutoff Point of Ensemble Trees Based on Sample Size in Predicting Clinical Dose with DNA Microarray Data

    PubMed Central

    Karabulut, Erdem; Alpar, Celal Reha

    2016-01-01

    Background/Aim. Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. Materials and Methods. Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. Results. The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). Conclusion. Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of n = 25 as a cutoff point for RT bagging to outperform a single RT. PMID:28096893

  2. Using Ensemble Decisions and Active Selection to Improve Low-Cost Labeling for Multi-View Data

    NASA Technical Reports Server (NTRS)

    Rebbapragada, Umaa; Wagstaff, Kiri L.

    2011-01-01

    This paper seeks to improve low-cost labeling in terms of training set reliability (the fraction of correctly labeled training items) and test set performance for multi-view learning methods. Co-training is a popular multiview learning method that combines high-confidence example selection with low-cost (self) labeling. However, co-training with certain base learning algorithms significantly reduces training set reliability, causing an associated drop in prediction accuracy. We propose the use of ensemble labeling to improve reliability in such cases. We also discuss and show promising results on combining low-cost ensemble labeling with active (low-confidence) example selection. We unify these example selection and labeling strategies under collaborative learning, a family of techniques for multi-view learning that we are developing for distributed, sensor-network environments.

  3. Using Ensemble Decisions and Active Selection to Improve Low-Cost Labeling for Multi-View Data

    NASA Technical Reports Server (NTRS)

    Rebbapragada, Umaa; Wagstaff, Kiri L.

    2011-01-01

    This paper seeks to improve low-cost labeling in terms of training set reliability (the fraction of correctly labeled training items) and test set performance for multi-view learning methods. Co-training is a popular multiview learning method that combines high-confidence example selection with low-cost (self) labeling. However, co-training with certain base learning algorithms significantly reduces training set reliability, causing an associated drop in prediction accuracy. We propose the use of ensemble labeling to improve reliability in such cases. We also discuss and show promising results on combining low-cost ensemble labeling with active (low-confidence) example selection. We unify these example selection and labeling strategies under collaborative learning, a family of techniques for multi-view learning that we are developing for distributed, sensor-network environments.

  4. Ensemble-based analysis of Front Range severe convection on 6-7 June 2012: Forecast uncertainty and communication of weather information to Front Range decision-makers

    NASA Astrophysics Data System (ADS)

    Vincente, Vanessa

    -allowing ensemble also showed greater skill in forecasting heavy precipitation amounts in the vicinity of where they were observed during the most active convective period, particularly near urbanized areas. A total of 9 Front Range EMs were interviewed to research how they understood hazardous weather information, and how their perception of forecast uncertainty would influence their decision making following a heavy rain event. Many of the EMs use situational awareness and past experiences with major weather events to guide their emergency planning. They also highly valued their relationship with the National Weather Service to improve their understanding of weather forecasts and ask questions about the uncertainties. Most of the EMs perceived forecast uncertainty in terms of probability and with the understanding that forecasting the weather is an imprecise science. The greater the likelihood of occurrence (implied by a higher probability of precipitation) showed greater confidence in the forecast that an event was likely to happen. Five probabilistic forecast products were generated from the convection-allowing ensemble output to generate a hypothetical warm season heavy rain event scenario. Responses varied between the EMs in which products they found most practical or least useful. Most EMs believed that there was a high probability for flooding, as illustrated by the degree of forecasted precipitation intensity. Most confirmed perceiving uncertainty in the different forecast representations, sharing the idea that there is an inherent uncertainty that follows modeled forecasts. The long-term goal of this research is to develop and add reliable probabilistic forecast products to the "toolbox" of decision-makers to help them better assess hazardous weather information and improve warning notifications and response.

  5. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    PubMed

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. Copyright © 2015 Elsevier B.V. All rights reserved.

  6. Construction and validation of a decision tree for treating metabolic acidosis in calves with neonatal diarrhea

    PubMed Central

    2012-01-01

    Background The aim of the present prospective study was to investigate whether a decision tree based on basic clinical signs could be used to determine the treatment of metabolic acidosis in calves successfully without expensive laboratory equipment. A total of 121 calves with a diagnosis of neonatal diarrhea admitted to a veterinary teaching hospital were included in the study. The dosages of sodium bicarbonate administered followed simple guidelines based on the results of a previous retrospective analysis. Calves that were neither dehydrated nor assumed to be acidemic received an oral electrolyte solution. In cases in which intravenous correction of acidosis and/or dehydration was deemed necessary, the provided amount of sodium bicarbonate ranged from 250 to 750 mmol (depending on alterations in posture) and infusion volumes from 1 to 6.25 liters (depending on the degree of dehydration). Individual body weights of calves were disregarded. During the 24 hour study period the investigator was blinded to all laboratory findings. Results After being lifted, many calves were able to stand despite base excess levels below −20 mmol/l. Especially in those calves, metabolic acidosis was undercorrected with the provided amount of 500 mmol sodium bicarbonate, which was intended for calves standing insecurely. In 13 calves metabolic acidosis was not treated successfully as defined by an expected treatment failure or a measured base excess value below −5 mmol/l. By contrast, 24 hours after the initiation of therapy, a metabolic alkalosis was present in 55 calves (base excess levels above +5 mmol/l). However, the clinical status was not affected significantly by the metabolic alkalosis. Conclusions Assuming re-evaluation of the calf after 24 hours, the tested decision tree can be recommended for the use in field practice with minor modifications. Calves that stand insecurely and are not able to correct their position if pushed require higher doses of

  7. VR-BFDT: A variance reduction based binary fuzzy decision tree induction method for protein function prediction.

    PubMed

    Golzari, Fahimeh; Jalili, Saeed

    2015-07-21

    In protein function prediction (PFP) problem, the goal is to predict function of numerous well-sequenced known proteins whose function is not still known precisely. PFP is one of the special and complex problems in machine learning domain in which a protein (regarded as instance) may have more than one function simultaneously. Furthermore, the functions (regarded as classes) are dependent and also are organized in a hierarchical structure in the form of a tree or directed acyclic graph. One of the common learning methods proposed for solving this problem is decision trees in which, by partitioning data into sharp boundaries sets, small changes in the attribute values of a new instance may cause incorrect change in predicted label of the instance and finally misclassification. In this paper, a Variance Reduction based Binary Fuzzy Decision Tree (VR-BFDT) algorithm is proposed to predict functions of the proteins. This algorithm just fuzzifies the decision boundaries instead of converting the numeric attributes into fuzzy linguistic terms. It has the ability of assigning multiple functions to each protein simultaneously and preserves the hierarchy consistency between functional classes. It uses the label variance reduction as splitting criterion to select the best "attribute-value" at each node of the decision tree. The experimental results show that the overall performance of the proposed algorithm is promising.

  8. A Decision-Tree-Oriented Guidance Mechanism for Conducting Nature Science Observation Activities in a Context-Aware Ubiquitous Learning

    ERIC Educational Resources Information Center

    Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung

    2010-01-01

    A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…

  9. Genetic Program Based Data Mining of Fuzzy Decision Trees and Methods of Improving Convergence and Reducing Bloat

    DTIC Science & Technology

    2007-04-01

    A data mining procedure for automatic determination of fuzzy decision tree structure using a genetic program (GP) is discussed. A GP is an algorithm...that evolves other algorithms or mathematical expressions. Innovative methods for accelerating convergence of the data mining procedure and reducing...Finally, additional methods that have been used to validate the data mining algorithm are referenced.

  10. Classification of Parkinsonian Syndromes from FDG-PET Brain Data Using Decision Trees with SSM/PCA Features

    PubMed Central

    Mudali, D.; Teune, L. K.; Renken, R. J.; Leenders, K. L.; Roerdink, J. B. T. M.

    2015-01-01

    Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data. PMID:25918550

  11. What Satisfies Students? Mining Student-Opinion Data with Regression and Decision-Tree Analysis. AIR 2002 Forum Paper.

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…

  12. A Decision-Tree-Oriented Guidance Mechanism for Conducting Nature Science Observation Activities in a Context-Aware Ubiquitous Learning

    ERIC Educational Resources Information Center

    Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung

    2010-01-01

    A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…

  13. Visualization of spatial decision tree for predicting hotspot occurrence in land and forest in Rokan Hilir District Riau

    NASA Astrophysics Data System (ADS)

    Primajaya, Aji; Sukaesih Sitanggang, Imas; Syaufina, Lailan

    2017-01-01

    Visualization is an important issue in datamining to easy understand patterns extracted from dataset. This research applied the Bottom-Up Approach method to develop a visualization module for a spatial decision tree in a geographic information system. Spatial data used in this work consists of nine explanatory layers and one target layers. Explanatory layers include maximum daily temperature, daily precipitation, wind of speed, distance of nearest river, distance of nearest road, land cover, peatland type, peatland depth, income source. The target layer contains hotspot and non-hotspot points that occurred in 2008. The result is the visualization module of spatial decision tree that has three main features including mapping window, interactive window, tree node and tabular visualization for predicting hotspot occurrence.

  14. Classification and Progression Based on CFS-GA and C5.0 Boost Decision Tree of TCM Zheng in Chronic Hepatitis B.

    PubMed

    Chen, Xiao Yu; Ma, Li Zhuang; Chu, Na; Zhou, Min; Hu, Yiyang

    2013-01-01

    Chronic hepatitis B (CHB) is a serious public health problem, and Traditional Chinese Medicine (TCM) plays an important role in the control and treatment for CHB. In the treatment of TCM, zheng discrimination is the most important step. In this paper, an approach based on CFS-GA (Correlation based Feature Selection and Genetic Algorithm) and C5.0 boost decision tree is used for zheng classification and progression in the TCM treatment of CHB. The CFS-GA performs better than the typical method of CFS. By CFS-GA, the acquired attribute subset is classified by C5.0 boost decision tree for TCM zheng classification of CHB, and C5.0 decision tree outperforms two typical decision trees of NBTree and REPTree on CFS-GA, CFS, and nonselection in comparison. Based on the critical indicators from C5.0 decision tree, important lab indicators in zheng progression are obtained by the method of stepwise discriminant analysis for expressing TCM zhengs in CHB, and alterations of the important indicators are also analyzed in zheng progression. In conclusion, all the three decision trees perform better on CFS-GA than on CFS and nonselection, and C5.0 decision tree outperforms the two typical decision trees both on attribute selection and nonselection.

  15. Applying of Decision Tree Analysis to Risk Factors Associated with Pressure Ulcers in Long-Term Care Facilities

    PubMed Central

    Moon, Mikyung

    2017-01-01

    Objectives The purpose of this study was to use decision tree analysis to explore the factors associated with pressure ulcers (PUs) among elderly people admitted to Korean long-term care facilities. Methods The data were extracted from the 2014 National Inpatient Sample (NIS)—data of Health Insurance Review and Assessment Service (HIRA). A MapReduce-based program was implemented to join and filter 5 tables of the NIS. The outcome predicted by the decision tree model was the prevalence of PUs as defined by the Korean Standard Classification of Disease-7 (KCD-7; code L89*). Using R 3.3.1, a decision tree was generated with the finalized 15,856 cases and 830 variables. Results The decision tree displayed 15 subgroups with 8 variables showing 0.804 accuracy, 0.820 sensitivity, and 0.787 specificity. The most significant primary predictor of PUs was length of stay less than 0.5 day. Other predictors were the presence of an infectious wound dressing, followed by having diagnoses numbering less than 3.5 and the presence of a simple dressing. Among diagnoses, “injuries to the hip and thigh” was the top predictor ranking 5th overall. Total hospital cost exceeding 2,200,000 Korean won (US $2,000) rounded out the top 7. Conclusions These results support previous studies that showed length of stay, comorbidity, and total hospital cost were associated with PUs. Moreover, wound dressings were commonly used to treat PUs. They also show that machine learning, such as a decision tree, could effectively predict PUs using big data. PMID:28261530

  16. An exploratory decision tree analysis to predict cardiovascular disease risk in African American women.

    PubMed

    Leach, Heather J; O'Connor, Daniel P; Simpson, Richard J; Rifai, Hanadi S; Mama, Scherezade K; Lee, Rebecca E

    2016-04-01

    African American (AA) women are at greater risk for cardiovascular disease (CVD) compared to White women, which can be attributed to disparities in risk factors. The built environment may contribute to improving CVD risk factors by increasing physical activity (PA). This study used recursive partitioning, a multivariate decision tree risk classification approach, to determine which built environment characteristics contributed to the classification of AA women as having 4 or more CVD risk factors at optimal levels. Recursive partitioning has the ability to detect interactions and does not have sample size limitations to detect effects. The Classification and Regression Trees (CR&T) growing method was used to group participants as having 4 or more versus 3 or fewer risk factors at optimal levels. Risk factors were smoking, body mass index (BMI), PA, healthy diet, cholesterol, glucose, and blood pressure. Built environment predictors were presence and quality of neighborhood PA resources (PARs), walkability, traffic safety, and crime. Participants (N = 30, mean age of 54.1 ± 7.5) all had at least 1 risk factor at the optimal level, none had all 7, and 66.7% had 4 or more risk factors at optimal levels. The CR&T identified participants with few, low-quality neighborhood PARs and who were older than 55 as least likely to have 4 or more CVD risk factors at optimal levels. Being younger than 55 years old and having many, high-quality neighborhood PARs may predict lower risk for CVD in AA women. Results should be used in future studies with larger sample sizes to inform logistic regression models. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  17. Identification of the sequence determinants of protein N-terminal acetylation through a decision tree approach.

    PubMed

    Yamada, Kazunori D; Omori, Satoshi; Nishi, Hafumi; Miyagi, Masaru

    2017-06-02

    N-terminal acetylation is one of the most common protein modifications in eukaryotes and occurs co-translationally when the N-terminus of the nascent polypeptide is still attached to the ribosome. This modification has been shown to be involved in a wide range of biological phenomena such as protein half-life regulation, protein-protein and protein-membrane interactions, and protein subcellular localization. Thus, accurately predicting which proteins receive an acetyl group based on their protein sequence is expected to facilitate the functional study of this modification. As the occurrence of N-terminal acetylation strongly depends on the context of protein sequences, attempts to understand the sequence determinants of N-terminal acetylation were conducted initially by simply examining the N-terminal sequences of many acetylated and unacetylated proteins and more recently by machine learning approaches. However, a complete understanding of the sequence determinants of this modification remains to be elucidated. We obtained curated N-terminally acetylated and unacetylated sequences from the UniProt database and employed a decision tree algorithm to identify the sequence determinants of N-terminal acetylation for proteins whose initiator methionine ((i)Met) residues have been removed. The results suggested that the main determinants of N-terminal acetylation are contained within the first five residues following (i)Met and that the first and second positions are the most important discriminator for the occurrence of this phenomenon. The results also indicated the existence of position-specific preferred and inhibitory residues that determine the occurrence of N-terminal acetylation. The developed predictor software, termed NT-AcPredictor, accurately predicted the N-terminal acetylation, with an overall performance comparable or superior to those of preceding predictors incorporating machine learning algorithms. Our machine learning approach based on a decision tree

  18. Proposal of a Clinical Decision Tree Algorithm Using Factors Associated with Severe Dengue Infection.

    PubMed

    Tamibmaniam, Jayashamani; Hussin, Narwani; Cheah, Wee Kooi; Ng, Kee Sing; Muninathan, Prema

    2016-01-01

    WHO's new classification in 2009: dengue with or without warning signs and severe dengue, has necessitated large numbers of admissions to hospitals of dengue patients which in turn has been imposing a huge economical and physical burden on many hospitals around the globe, particularly South East Asia and Malaysia where the disease has seen a rapid surge in numbers in recent years. Lack of a simple tool to differentiate mild from life threatening infection has led to unnecessary hospitalization of dengue patients. We conducted a single-centre, retrospective study involving serologically confirmed dengue fever patients, admitted in a single ward, in Hospital Kuala Lumpur, Malaysia. Data was collected for 4 months from February to May 2014. Socio demography, co-morbidity, days of illness before admission, symptoms, warning signs, vital signs and laboratory result were all recorded. Descriptive statistics was tabulated and simple and multiple logistic regression analysis was done to determine significant risk factors associated with severe dengue. 657 patients with confirmed dengue were analysed, of which 59 (9.0%) had severe dengue. Overall, the commonest warning sign were vomiting (36.1%) and abdominal pain (32.1%). Previous co-morbid, vomiting, diarrhoea, pleural effusion, low systolic blood pressure, high haematocrit, low albumin and high urea were found as significant risk factors for severe dengue using simple logistic regression. However the significant risk factors for severe dengue with multiple logistic regressions were only vomiting, pleural effusion, and low systolic blood pressure. Using those 3 risk factors, we plotted an algorithm for predicting severe dengue. When compared to the classification of severe dengue based on the WHO criteria, the decision tree algorithm had a sensitivity of 0.81, specificity of 0.54, positive predictive value of 0.16 and negative predictive of 0.96. The decision tree algorithm proposed in this study showed high sensitivity

  19. Merger of three modeling approaches to assess potential effects of climate change on trees in the eastern United States

    Treesearch

    Louis R. Iverson; Anantha M. Prasad; Stephen N. Matthews; Matthew P. Peters

    2010-01-01

    Climate change will likely cause impacts that are species specific and significant; modeling is critical to better understand potential changes in suitable habitat. We use empirical, abundance-based habitat models utilizing decision tree-based ensemble methods to explore potential changes of 134 tree species habitats in the eastern United States (http://www.nrs.fs.fed....

  20. How to differentiate acute pelvic inflammatory disease from acute appendicitis ? A decision tree based on CT findings.

    PubMed

    El Hentour, Kim; Millet, Ingrid; Pages-Bouic, Emmanuelle; Curros-Doyon, Fernanda; Molinari, Nicolas; Taourel, Patrice

    2017-09-11

    To construct a decision tree based on CT findings to differentiate acute pelvic inflammatory disease (PID) from acute appendicitis (AA) in women with lower abdominal pain and inflammatory syndrome. This retrospective study was approved by our institutional review board and informed consent was waived. Contrast-enhanced CT studies of 109 women with acute PID and 218 age-matched women with AA were retrospectively and independently reviewed by two radiologists to identify CT findings predictive of PID or AA. Surgical and laboratory data were used for the PID and AA reference standard. Appropriate tests were performed to compare PID and AA and a CT decision tree using the classification and regression tree (CART) algorithm was generated. The median patient age was 28 years (interquartile range, 22-39 years). According to the decision tree, an appendiceal diameter ≥ 7 mm was the most discriminating criterion for differentiating acute PID and AA, followed by a left tubal diameter ≥ 10 mm, with a global accuracy of 98.2 % (95 % CI: 96-99.4). Appendiceal diameter and left tubal thickening are the most discriminating CT criteria for differentiating acute PID from AA. • Appendiceal diameter and marked left tubal thickening allow differentiating PID from AA. • PID should be considered if appendiceal diameter is < 7 mm. • Marked left tubal diameter indicates PID rather than AA when enlarged appendix. • No pathological CT findings were identified in 5 % of PID patients.

  1. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    PubMed

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.

  2. Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

    PubMed Central

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988

  3. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    NASA Astrophysics Data System (ADS)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  4. Ensembl comparative genomics resources

    PubMed Central

    Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

  5. Ensembl comparative genomics resources.

    PubMed

    Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. © The Author(s) 2016. Published by Oxford University Press.

  6. Trees

    NASA Astrophysics Data System (ADS)

    Epstein, Henri

    2016-11-01

    An algebraic formalism, developed with V. Glaser and R. Stora for the study of the generalized retarded functions of quantum field theory, is used to prove a factorization theorem which provides a complete description of the generalized retarded functions associated with any tree graph. Integrating over the variables associated to internal vertices to obtain the perturbative generalized retarded functions for interacting fields arising from such graphs is shown to be possible for a large category of space-times.

  7. Uninjured trees - a meaningful guide to white-pine weevil control decisions

    Treesearch

    William E. Waters

    1962-01-01

    The white-pine weevil, Pissodes strobi, is a particularly insidious forest pest that can render a stand of host trees virtually worthless. It rarely, if ever, kills a tree; but the crooks, forks, and internal defects that develop in attacked trees over a period of years may reduce the merchantable volume and value of the tree at harvest age to zero. Dollar losses are...

  8. An application based on the decision tree to classify the marbling of beef by hyperspectral imaging.

    PubMed

    Velásquez, Lía; Cruz-Tirado, J P; Siche, Raúl; Quevedo, Roberto

    2017-11-01

    The aim of this study was to develop a system to classify the marbling of beef using the hyperspectral imaging technology. The Japanese standard classification of the degree of marbling of beef was used as reference and twelve standards were digitized to obtain the parameters of shape and spatial distribution of marbling of each class. A total of 35 samples M. longissmus dorsi muscle were scanned by the hyperspectral imaging system of 400-1000 nm in reflectance mode. The wavelength of 528nm was selected to segment the sample and the background, and 440nm was used for classified the samples. Processing algorithms on image, based on decision tree method, were used in the region of interest obtaining a classification error of 0.08% in the building stage. The results showed that the proposed technique has a great potential, as a non-destructive and fast technique, that can be used to classify beef with respect to the degree of marbling. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. A Low Complexity System Based on Multiple Weighted Decision Trees for Indoor Localization

    PubMed Central

    Sánchez-Rodríguez, David; Hernández-Morera, Pablo; Quinteiro, José Ma.; Alonso-González, Itziar

    2015-01-01

    Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity. The localization system is built using a dataset from sensor fusion, which combines the strength of radio signals from different wireless local area network access points and device orientation information from a digital compass built-in mobile device, so that extra sensors are unnecessary. Experimental results indicate that the proposed system leads to substantial improvements on computational complexity over the widely-used traditional fingerprinting methods, and it has a better accuracy than they have. PMID:26110413

  10. A Low Complexity System Based on Multiple Weighted Decision Trees for Indoor Localization.

    PubMed

    Sánchez-Rodríguez, David; Hernández-Morera, Pablo; Quinteiro, José Ma; Alonso-González, Itziar

    2015-06-23

    Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity. The localization system is built using a dataset from sensor fusion, which combines the strength of radio signals from different wireless local area network access points and device orientation information from a digital compass built-in mobile device, so that extra sensors are unnecessary. Experimental results indicate that the proposed system leads to substantial improvements on computational complexity over the widely-used traditional fingerprinting methods, and it has a better accuracy than they have.

  11. Effect of training characteristics on object classification: An application using Boosted Decision Trees

    NASA Astrophysics Data System (ADS)

    Sevilla-Noarbe, I.; Etayo-Sotos, P.

    2015-06-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects that different input vectors and training sets have on the classification performance, the results being of wider use to other machine learning techniques.

  12. [A prediction model for internet game addiction in adolescents: using a decision tree analysis].

    PubMed

    Kim, Ki Sook; Kim, Kyung Hee

    2010-06-01

    This study was designed to build a theoretical frame to provide practical help to prevent and manage adolescent internet game addiction by developing a prediction model through a comprehensive analysis of related factors. The participants were 1,318 students studying in elementary, middle, and high schools in Seoul and Gyeonggi Province, Korea. Collected data were analyzed using the SPSS program. Decision Tree Analysis using the Clementine program was applied to build an optimum and significant prediction model to predict internet game addiction related to various factors, especially parent related factors. From the data analyses, the prediction model for factors related to internet game addiction presented with 5 pathways. Causative factors included gender, type of school, siblings, economic status, religion, time spent alone, gaming place, payment to Internet café, frequency, duration, parent's ability to use internet, occupation (mother), trust (father), expectations regarding adolescent's study (mother), supervising (both parents), rearing attitude (both parents). The results suggest preventive and managerial nursing programs for specific groups by path. Use of this predictive model can expand the role of school nurses, not only in counseling addicted adolescents but also, in developing and carrying out programs with parents and approaching adolescents individually through databases and computer programming.

  13. Smart on-board diagnostic decision trees for quantitative aviation equipment and safety procedures validation

    NASA Astrophysics Data System (ADS)

    Ali, Ali H.; Markarian, Garik; Tarter, Alex; Kölle, Rainer

    2010-04-01

    The current trend in high-accuracy aircraft navigation systems is towards using data from one or more inertial navigation subsystem and one or more navigational reference subsystems. The enhancement in fault diagnosis and detection is achieved via computing the minimum mean square estimate of the aircraft states using, for instance, Kalman filter method. However, this enhancement might degrade if the cause of a subsystem fault has some effect on other subsystems that are calculating the same measurement. One instance of such case is the tragic incident of Air France Flight 447 in June, 2009 where message transmissions in the last moment before the crash indicated inconsistencies in measured airspeed as reported by Airbus. In this research, we propose the use of mathematical aircraft model to work out the current states of the airplane and in turn, using these states to validate the readings of the navigation equipment throughout smart diagnostic decision tree network. Various simulated equipment failures have been introduced in a controlled environment to proof the concept of operation. The results have showed successful detection of the failing equipment in all cases.

  14. Effective Visualization of Temporal Ensembles.

    PubMed

    Hao, Lihua; Healey, Christopher G; Bass, Steffen A

    2016-01-01

    An ensemble is a collection of related datasets, called members, built from a series of runs of a simulation or an experiment. Ensembles are large, temporal, multidimensional, and multivariate, making them difficult to analyze. Another important challenge is visualizing ensembles that vary both in space and time. Initial visualization techniques displayed ensembles with a small number of members, or presented an overview of an entire ensemble, but without potentially important details. Recently, researchers have suggested combining these two directions, allowing users to choose subsets of members to visualization. This manual selection process places the burden on the user to identify which members to explore. We first introduce a static ensemble visualization system that automatically helps users locate interesting subsets of members to visualize. We next extend the system to support analysis and visualization of temporal ensembles. We employ 3D shape comparison, cluster tree visualization, and glyph based visualization to represent different levels of detail within an ensemble. This strategy is used to provide two approaches for temporal ensemble analysis: (1) segment based ensemble analysis, to capture important shape transition time-steps, clusters groups of similar members, and identify common shape changes over time across multiple members; and (2) time-step based ensemble analysis, which assumes ensemble members are aligned in time by combining similar shapes at common time-steps. Both approaches enable users to interactively visualize and analyze a temporal ensemble from different perspectives at different levels of detail. We demonstrate our techniques on an ensemble studying matter transition from hadronic gas to quark-gluon plasma during gold-on-gold particle collisions.

  15. Identification of Potential Sources of Mercury (Hg) in Farmland Soil Using a Decision Tree Method in China

    PubMed Central

    Zhong, Taiyang; Chen, Dongmei; Zhang, Xiuying

    2016-01-01

    Identification of the sources of soil mercury (Hg) on the provincial scale is helpful for enacting effective policies to prevent further contamination and take reclamation measurements. The natural and anthropogenic sources and their contributions of Hg in Chinese farmland soil were identified based on a decision tree method. The results showed that the concentrations of Hg in parent materials were most strongly associated with the general spatial distribution pattern of Hg concentration on a provincial scale. The decision tree analysis gained an 89.70% total accuracy in simulating the influence of human activities on the additions of Hg in farmland soil. Human activities—for example, the production of coke, application of fertilizers, discharge of wastewater, discharge of solid waste, and the production of non-ferrous metals—were the main external sources of a large amount of Hg in the farmland soil. PMID:27834884

  16. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    PubMed Central

    Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  17. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    PubMed

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  18. A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees

    PubMed Central

    Caragea, Doina; Silvescu, Adrian; Honavar, Vasant

    2009-01-01

    This paper motivates and precisely formulates the problem of learning from distributed data; describes a general strategy for transforming traditional machine learning algorithms into algorithms for learning from distributed data; demonstrates the application of this strategy to devise algorithms for decision tree induction from distributed data; and identifies the conditions under which the algorithms in the distributed setting are superior to their centralized counterparts in terms of time and communication complexity; The resulting algorithms are provably exact in that the decision tree constructed from distributed data is identical to that obtained in the centralized setting. Some natural extensions leading to algorithms for learning from heterogeneous distributed data and learning under privacy constraints are outlined. PMID:20351798

  19. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China

    PubMed Central

    Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin

    2016-01-01

    Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328

  20. The risk of disabling, surgery and reoperation in Crohn's disease - A decision tree-based approach to prognosis.

    PubMed

    Dias, Cláudia Camila; Pereira Rodrigues, Pedro; Fernandes, Samuel; Portela, Francisco; Ministro, Paula; Martins, Diana; Sousa, Paula; Lago, Paula; Rosa, Isadora; Correia, Luis; Moura Santos, Paula; Magro, Fernando

    2017-01-01

    Crohn's disease (CD) is a chronic inflammatory bowel disease known to carry a high risk of disabling and many times requiring surgical interventions. This article describes a decision-tree based approach that defines the CD patients' risk or undergoing disabling events, surgical interventions and reoperations, based on clinical and demographic variables. This multicentric study involved 1547 CD patients retrospectively enrolled and divided into two cohorts: a derivation one (80%) and a validation one (20%). Decision trees were built upon applying the CHAIRT algorithm for the selection of variables. Three-level decision trees were built for the risk of disabling and reoperation, whereas the risk of surgery was described in a two-level one. A receiver operating characteristic (ROC) analysis was performed, and the area under the curves (AUC) Was higher than 70% for all outcomes. The defined risk cut-off values show usefulness for the assessed outcomes: risk levels above 75% for disabling had an odds test positivity of 4.06 [3.50-4.71], whereas risk levels below 34% and 19% excluded surgery and reoperation with an odds test negativity of 0.15 [0.09-0.25] and 0.50 [0.24-1.01], respectively. Overall, patients with B2 or B3 phenotype had a higher proportion of disabling disease and surgery, while patients with later introduction of pharmacological therapeutic (1 months after initial surgery) had a higher proportion of reoperation. The decision-tree based approach used in this study, with demographic and clinical variables, has shown to be a valid and useful approach to depict such risks of disabling, surgery and reoperation.

  1. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

    NASA Astrophysics Data System (ADS)

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-01

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.

  2. hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm.

    PubMed

    Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid

    2017-04-01

    Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Identification of pests and diseases of Dalbergia hainanensis based on EVI time series and classification of decision tree

    NASA Astrophysics Data System (ADS)

    Luo, Qiu; Xin, Wu; Qiming, Xiong

    2017-06-01

    In the process of vegetation remote sensing information extraction, the problem of phenological features and low performance of remote sensing analysis algorithm is not considered. To solve this problem, the method of remote sensing vegetation information based on EVI time-series and the classification of decision-tree of multi-source branch similarity is promoted. Firstly, to improve the time-series stability of recognition accuracy, the seasonal feature of vegetation is extracted based on the fitting span range of time-series. Secondly, the decision-tree similarity is distinguished by adaptive selection path or probability parameter of component prediction. As an index, it is to evaluate the degree of task association, decide whether to perform migration of multi-source decision tree, and ensure the speed of migration. Finally, the accuracy of classification and recognition of pests and diseases can reach 87%--98% of commercial forest in Dalbergia hainanensis, which is significantly better than that of MODIS coverage accuracy of 80%--96% in this area. Therefore, the validity of the proposed method can be verified.

  4. Prediction of Severe Acute Pancreatitis Using a Decision Tree Model Based on the Revised Atlanta Classification of Acute Pancreatitis

    PubMed Central

    Zhang, Yushun; Yang, Chong; Gou, Shanmiao; Li, Yongfeng; Xiong, Jiongxin; Wu, Heshui; Wang, Chunyou

    2015-01-01

    Objective To develop a model for the early prediction of severe acute pancreatitis based on the revised Atlanta classification of acute pancreatitis. Methods Clinical data of 1308 patients with acute pancreatitis (AP) were included in the retrospective study. A total of 603 patients who were admitted to the hospital within 36 hours of the onset of the disease were included at last according to the inclusion criteria. The clinical data were collected within 12 hours after admission. All the patients were classified as having mild acute pancreatitis (MAP), moderately severe acute pancreatitis (MSAP) and severe acute pancreatitis (SAP) based on the revised Atlanta classification of acute pancreatitis. All the 603 patients were randomly divided into training group (402 cases) and test group (201 cases). Univariate and multiple regression analyses were used to identify the independent risk factors for the development of SAP in the training group. Then the prediction model was constructed using the decision tree method, and this model was applied to the test group to evaluate its validity. Results The decision tree model was developed using creatinine, lactate dehydrogenase, and oxygenation index to predict SAP. The diagnostic sensitivity and specificity of SAP in the training group were 80.9% and 90.0%, respectively, and the sensitivity and specificity in the test group were 88.6% and 90.4%, respectively. Conclusions The decision tree model based on creatinine, lactate dehydrogenase, and oxygenation index is more likely to predict the occurrence of SAP. PMID:26580397

  5. a Rough Set Decision Tree Based Mlp-Cnn for Very High Resolution Remotely Sensed Image Classification

    NASA Astrophysics Data System (ADS)

    Zhang, C.; Pan, X.; Zhang, S. Q.; Li, H. P.; Atkinson, P. M.

    2017-09-01

    Recent advances in remote sensing have witnessed a great amount of very high resolution (VHR) images acquired at sub-metre spatial resolution. These VHR remotely sensed data has post enormous challenges in processing, analysing and classifying them effectively due to the high spatial complexity and heterogeneity. Although many computer-aid classification methods that based on machine learning approaches have been developed over the past decades, most of them are developed toward pixel level spectral differentiation, e.g. Multi-Layer Perceptron (MLP), which are unable to exploit abundant spatial details within VHR images. This paper introduced a rough set model as a general framework to objectively characterize the uncertainty in CNN classification results, and further partition them into correctness and incorrectness on the map. The correct classification regions of CNN were trusted and maintained, whereas the misclassification areas were reclassified using a decision tree with both CNN and MLP. The effectiveness of the proposed rough set decision tree based MLP-CNN was tested using an urban area at Bournemouth, United Kingdom. The MLP-CNN, well capturing the complementarity between CNN and MLP through the rough set based decision tree, achieved the best classification performance both visually and numerically. Therefore, this research paves the way to achieve fully automatic and effective VHR image classification.

  6. Prediction of Severe Acute Pancreatitis Using a Decision Tree Model Based on the Revised Atlanta Classification of Acute Pancreatitis.

    PubMed

    Yang, Zhiyong; Dong, Liming; Zhang, Yushun; Yang, Chong; Gou, Shanmiao; Li, Yongfeng; Xiong, Jiongxin; Wu, Heshui; Wang, Chunyou

    2015-01-01

    To develop a model for the early prediction of severe acute pancreatitis based on the revised Atlanta classification of acute pancreatitis. Clinical data of 1308 patients with acute pancreatitis (AP) were included in the retrospective study. A total of 603 patients who were admitted to the hospital within 36 hours of the onset of the disease were included at last according to the inclusion criteria. The clinical data were collected within 12 hours after admission. All the patients were classified as having mild acute pancreatitis (MAP), moderately severe acute pancreatitis (MSAP) and severe acute pancreatitis (SAP) based on the revised Atlanta classification of acute pancreatitis. All the 603 patients were randomly divided into training group (402 cases) and test group (201 cases). Univariate and multiple regression analyses were used to identify the independent risk factors for the development of SAP in the training group. Then the prediction model was constructed using the decision tree method, and this model was applied to the test group to evaluate its validity. The decision tree model was developed using creatinine, lactate dehydrogenase, and oxygenation index to predict SAP. The diagnostic sensitivity and specificity of SAP in the training group were 80.9% and 90.0%, respectively, and the sensitivity and specificity in the test group were 88.6% and 90.4%, respectively. The decision tree model based on creatinine, lactate dehydrogenase, and oxygenation index is more likely to predict the occurrence of SAP.

  7. A comparison of artificial neural net and inductive decision tree learning applied to the diagnosis of coronary artery disease

    SciTech Connect

    Silver, D.L.; Hurwitz, G.A.; Cradduck, T.D.

    1994-05-01

    A variety of artificial intelligence systems are available for applications within nuclear medicine. It is important to understand the strengths and weaknesses of these systems and the class of problems for which each is best. Two supervised machine learning systems, a back propagation neural network and an inductive decision tree, were applied to the classification of coronary artery disease given a set of diagnostic input parameters. A comparison indicates that both paradigms perform well depending upon the requirements of the user. We examined the setup complexity, learning and classification speed, training accuracy, ability to generalize to previously unseen cases, and the explanatory power of the internal representations generated by the learning systems. A database of 503 patient records composed of ten parameters was used for the analysis. The target response was a binary value of disease or no disease. The results indicate that the inductive decision tree learning system is the better choice for this class of problem. It is easier to setup and training takes less time. It has good explanatory power since it produces a printed decision tree of the internal representation of acquired knowledge. On the other hand, the artificial neural net provides better generalization for new test cases, and has greater classification accuracy.

  8. Grassland gross carbon dioxide uptake based on an improved model tree ensemble approach considering human interventions: global estimation and covariation with climate.

    PubMed

    Liang, Wei; Lü, Yihe; Zhang, Weibin; Li, Shuai; Jin, Zhao; Ciais, Philippe; Fu, Bojie; Wang, Shuai; Yan, Jianwu; Li, Junyi; Su, Huimin

    2016-12-14

    Grassland ecosystems act as a crucial role in the global carbon cycle and provide vital ecosystem services for many species. However, these low-productivity and water-limited ecosystems are sensitive and vulnerable to climate perturbations and human intervention, the latter of which is often not considered due to lack of spatial information regarding the grassland management. Here by the application of a model tree ensemble (MTE-GRASS) trained on local eddy covariance data and using as predictors gridded climate and management intensity field (grazing and cutting), we first provide an estimate of global grassland gross primary production (GPP). GPP from our study compares well (modeling efficiency NSE = 0.85 spatial; NSE between 0.69 and 0.94 interannual) with that from flux measurement. Global grassland GPP was on average 11 ± 0.31 Pg C yr(-1) and exhibited significantly increasing trend at both annual and seasonal scales, with an annual increase of 0.023 Pg C (0.2%) from 1982 to 2011. Meanwhile, we found that at both annual and seasonal scale, the trend (except for northern summer) and interannual variability of the GPP are primarily driven by arid/semiarid ecosystems, the latter of which is due to the larger variation in precipitation. Grasslands in arid/semiarid regions have a stronger (33 g C m(-2)  yr(-1) /100 mm) and faster (0- to 1-month time lag) response to precipitation than those in other regions. Although globally spatial gradients (71%) and interannual changes (51%) in GPP were mainly driven by precipitation, where most regions with arid/semiarid climate zone, temperature and radiation together shared half of GPP variability, which is mainly distributed in the high-latitude or cold regions. Our findings and the results of other studies suggest the overwhelming importance of arid/semiarid regions as a control on grassland ecosystems carbon cycle. Similarly, under the projected future climate change, grassland ecosystems in these regions

  9. A Bedside Decision Tree for Use of Saline With Endotracheal Tube Suctioning in Children.

    PubMed

    Owen, Erin B; Woods, Charles R; O'Flynn, Justine A; Boone, Megan C; Calhoun, Aaron W; Montgomery, Vicki L

    2016-02-01

    Endotracheal tube suctioning is necessary for patients receiving mechanical ventilation. Studies examining saline instillation before suctioning have demonstrated mixed results. A prospective study to evaluate whether saline instillation is associated with an increased risk of suctioning-related adverse events in patients 18 years old or younger requiring mechanical ventilation through an endotracheal tube for at least 48 hours when suctioned per protocol using a bedside decision tree. A total of 1986 suctioning episodes (1003 with saline) were recorded in 69 patients. The most common indication for use of saline was thick secretions (87% of episodes). In 586 suctioning episodes, at least 1 adverse event occurred with increased frequency in the saline group (P < .001). Normal saline was more likely to be associated with hemodynamic instability (P = .04), bronchospasm (P < .001), and oxygen desaturation (P < .001). Patient factors associated with adverse events include younger age (P < .001), a cuffed endotracheal tube (P = .001), endotracheal tube diameter of 4.0 mm or less (P < .001), respiratory or hemodynamic indication for intubation (P < .001), underlying respiratory disease (P < .001), and longer duration of mechanical ventilation (P < .001). Saline instillation (P < .001), endotracheal tube size of 4.0 mm or less (P = .03), and comorbid respiratory diseases (P = .03) were associated with an increased risk of adverse events. Saline instillation before endotracheal tube suctioning is associated with hemodynamic instability, bronchospasm, and transient hypoxemia. Saline should be used cautiously, especially in children with a small endotracheal tube and comorbid respiratory disease. ©2016 American Association of Critical-Care Nurses.

  10. The Bump Hunting by the Decision Tree with the Genetic Algorithm

    NASA Astrophysics Data System (ADS)

    Hirose, Hideo

    In difficult classification problems of the z-dimensional points into two groups giving 0-1 responses due to the messy data structure, it is more favorable to search for the denser regions for the response 1 points than to find the boundaries to separate the two groups. For such problems which can often be seen in customer databases, we have developed a bump hunting method using probabilistic and statistical methods as shown in the previous study. By specifying a pureness rate in advance, a maximum capture rate will be obtained. In finding the maximum capture rate, we have used the decision tree method combined with the genetic algorithm. Then, a trade-off curve between the pureness rate and the capture rate can be constructed. However, such a trade-off curve could be optimistic if the training data set alone is used. Therefore, we should be careful in assessing the accuracy of the tradeoff curve. Using the accuracy evaluation procedures such as the cross validation or the bootstrapped hold-out method combined with the training and test data sets, we have shown that the actually applicable trade-off curve can be obtained. We have also shown that an attainable upper bound trade-off curve can be estimated by using the extreme-value statistics because the genetic algorithm provides many local maxima of the capture rates with different initial values. We have constructed the three kinds of trade-off curves; the first is the curve obtained by using the training data; the second is the return capture rate curve obtained by using the extreme-value statistics; the last is the curve obtained by using the test data. These three are indispensable like the Trinity to comprehend the whole figure of the trade-off curve between the pureness rate and the capture rate. This paper deals with the behavior of the trade-off curve from a statistical viewpoint.

  11. Risk Factors Predicting Infectious Lactational Mastitis: Decision Tree Approach versus Logistic Regression Analysis.

    PubMed

    Fernández, Leónides; Mediano, Pilar; García, Ricardo; Rodríguez, Juan M; Marín, María

    2016-09-01

    Objectives Lactational mastitis frequently leads to a premature abandonment of breastfeeding; its development has been associated with several risk factors. This study aims to use a decision tree (DT) approach to establish the main risk factors involved in mastitis and to compare its performance for predicting this condition with a stepwise logistic regression (LR) model. Methods Data from 368 cases (breastfeeding women with mastitis) and 148 controls were collected by a questionnaire about risk factors related to medical history of mother and infant, pregnancy, delivery, postpartum, and breastfeeding practices. The performance of the DT and LR analyses was compared using the area under the receiver operating characteristic (ROC) curve. Sensitivity, specificity and accuracy of both models were calculated. Results Cracked nipples, antibiotics and antifungal drugs during breastfeeding, infant age, breast pumps, familial history of mastitis and throat infection were significant risk factors associated with mastitis in both analyses. Bottle-feeding and milk supply were related to mastitis for certain subgroups in the DT model. The areas under the ROC curves were similar for LR and DT models (0.870 and 0.835, respectively). The LR model had better classification accuracy and sensitivity than the DT model, but the last one presented better specificity at the optimal threshold of each curve. Conclusions The DT and LR models constitute useful and complementary analytical tools to assess the risk of lactational infectious mastitis. The DT approach identifies high-risk subpopulations that need specific mastitis prevention programs and, therefore, it could be used to make the most of public health resources.

  12. CART Decision-Tree Statistical Analysis and Prediction of Summer Season Maximum Surface Ozone for the Vancouver, Montreal, and Atlantic Regions of Canada.

    NASA Astrophysics Data System (ADS)

    Burrows, William R.; Benjamin, Mario; Beauchamp, Stephen; Lord, Edward R.; McCollor, Douglas; Thomson, Bruce

    1995-08-01

    reasonably well, and the rules for node splitting were found to be physically realistic. Some of the important aspects of the analyses are noted. One interesting result was that moisture content of the air plays a limiting role on the maximum surface O3 concentration that can be achieved when other factors point to occurrence of high values.The decision trees can be used to predict maximum surface O3 concentrations if the predictor variables are forecast, thus providing an inexpensive site-specific model for forecasts and climate impact analysis. An estimation of performance with independent data was conducted for the Vancouver lower Fraser River valley and Montreal regions for each of the five years 1988 92. Verification of the ensemble of forecasts in the two regions shows the technique would have reasonably good skill in forecasting surface O3 concentrations near or exceeding acceptable 1-h limits. A computer version of the technique has been provided for use in the regional forecast offices.

  13. Ensemble Models

    EPA Science Inventory

    Ensemble forecasting has been used for operational numerical weather prediction in the United States and Europe since the early 1990s. An ensemble of weather or climate forecasts is used to characterize the two main sources of uncertainty in computer models of physical systems: ...

  14. Ensemble Models

    EPA Science Inventory

    Ensemble forecasting has been used for operational numerical weather prediction in the United States and Europe since the early 1990s. An ensemble of weather or climate forecasts is used to characterize the two main sources of uncertainty in computer models of physical systems: ...

  15. Development and Validation of a Primary Care-Based Family Health History and Decision Support Program (MeTree)

    PubMed Central

    Orlando, Lori A.; Buchanan, Adam H.; Hahn, Susan E.; Christianson, Carol A.; Powell, Karen P.; Skinner, Celette Sugg; Chesnut, Blair; Blach, Colette; Due, Barbara; Ginsburg, Geoffrey S.; Henrich, Vincent C.

    2016-01-01

    INTRODUCTION Family health history is a strong predictor of disease risk. To reduce the morbidity and mortality of many chronic diseases, risk-stratified evidence-based guidelines strongly encourage the collection and synthesis of family health history to guide selection of primary prevention strategies. However, the collection and synthesis of such information is not well integrated into clinical practice. To address barriers to collection and use of family health histories, the Genomedical Connection developed and validated MeTree, a Web-based, patient-facing family health history collection and clinical decision support tool. MeTree is designed for integration into primary care practices as part of the genomic medicine model for primary care. METHODS We describe the guiding principles, operational characteristics, algorithm development, and coding used to develop MeTree. Validation was performed through stakeholder cognitive interviewing, a genetic counseling pilot program, and clinical practice pilot programs in 2 community-based primary care clinics. RESULTS Stakeholder feedback resulted in changes to MeTree’s interface and changes to the phrasing of clinical decision support documents. The pilot studies resulted in the identification and correction of coding errors and the reformatting of clinical decision support documents. MeTree’s strengths in comparison with other tools are its seamless integration into clinical practice and its provision of action-oriented recommendations guided by providers’ needs. LIMITATIONS The tool was validated in a small cohort. CONCLUSION MeTree can be integrated into primary care practices to help providers collect and synthesize family health history information from patients with the goal of improving adherence to risk-stratified evidence-based guidelines. PMID:24044145

  16. Including public-health benefits of trees in urban-forestry decision making

    Treesearch

    Geoffrey H. Donovan

    2017-01-01

    Research demonstrating the biophysical benefits of urban trees are often used to justify investments in urban forestry. Far less emphasis, however, is placed on the non-bio-physical benefits such as improvements in public health. Indeed, the public-health benefits of trees may be significantly larger than the biophysical benefits, and, therefore, failure to account for...

  17. Ensembl 2015

    PubMed Central

    Cunningham, Fiona; Amode, M. Ridwan; Barrell, Daniel; Beal, Kathryn; Billis, Konstantinos; Brent, Simon; Carvalho-Silva, Denise; Clapham, Peter; Coates, Guy; Fitzgerald, Stephen; Gil, Laurent; Girón, Carlos García; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah E.; Janacek, Sophie H.; Johnson, Nathan; Juettemann, Thomas; Kähäri, Andreas K.; Keenan, Stephen; Martin, Fergal J.; Maurel, Thomas; McLaren, William; Murphy, Daniel N.; Nag, Rishi; Overduin, Bert; Parker, Anne; Patricio, Mateus; Perry, Emily; Pignatelli, Miguel; Riat, Harpreet Singh; Sheppard, Daniel; Taylor, Kieron; Thormann, Anja; Vullo, Alessandro; Wilder, Steven P.; Zadissa, Amonida; Aken, Bronwen L.; Birney, Ewan; Harrow, Jennifer; Kinsella, Rhoda; Muffato, Matthieu; Ruffier, Magali; Searle, Stephen M.J.; Spudich, Giulietta; Trevanion, Stephen J.; Yates, Andy; Zerbino, Daniel R.; Flicek, Paul

    2015-01-01

    Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license. PMID:25352552

  18. Lessons learned from Applications of a Decision Tree for Confronting Climate Change Uncertainty - the Short Term and the Long Term

    NASA Astrophysics Data System (ADS)

    Ray, P. A.; Wi, S.; Bonzanigo, L.; Taner, M. U.; Rodriguez, D.; Garcia, L.; Brown, C.

    2016-12-01

    The Decision Tree for Confronting Climate Change Uncertainty is a hierarchical, staged framework for accomplishing climate change risk management in water resources system investments. Since its development for the World Bank Water Group two years ago, the framework has been applied to pilot demonstration projects in Nepal (hydropower generation), Mexico (water supply), Kenya (multipurpose reservoir operation), and Indonesia (flood risks to dam infrastructure). An important finding of the Decision Tree demonstration projects has been the need to present the risks/opportunities of climate change to stakeholders and investors in proportion to risks/opportunities and hazards of other kinds. This presentation will provide an overview of tools and techniques used to quantify risks/opportunities to each of the project types listed above, with special attention to those found most useful for exploration of the risk space. Careful exploration of the risk/opportunity space shows that some interventions would be better taken now, whereas risks/opportunities of other types would be better instituted incrementally in order to maintain reversibility and flexibility. A number of factors contribute to the robustness/flexibility tradeoff: available capital, magnitude and imminence of potential risk/opportunity, modular (or not) character of investment, and risk aversion of the decision maker, among others. Finally, in each case, nuance was required in the translation of Decision Tree findings into actionable policy recommendations. Though the narrative of stakeholder solicitation, engagement, and ultimate partnership is unique to each case, summary lessons are available from the portfolio that can serve as a guideline to the community of climate change risk managers.

  19. The creation of a digital soil map for Cyprus using decision-tree classification techniques

    NASA Astrophysics Data System (ADS)

    Camera, Corrado; Zomeni, Zomenia; Bruggeman, Adriana; Noller, Joy; Zissimos, Andreas

    2014-05-01

    Considering the increasing threats soil are experiencing especially in semi-arid, Mediterranean environments like Cyprus (erosion, contamination, sealing and salinisation), producing a high resolution, reliable soil map is essential for further soil conservation studies. This study aims to create a 1:50.000 soil map covering the area under the direct control of the Republic of Cyprus (5.760 km2). The study consists of two major steps. The first is the creation of a raster database of predictive variables selected according to the scorpan formula (McBratney et al., 2003). It is of particular interest the possibility of using, as soil properties, data coming from three older island-wide soil maps and the recently published geochemical atlas of Cyprus (Cohen et al., 2011). Ten highly characterizing elements were selected and used as predictors in the present study. For the other factors usual variables were used: temperature and aridity index for climate; total loss on ignition, vegetation and forestry types maps for organic matter; the DEM and related relief derivatives (slope, aspect, curvature, landscape units); bedrock, surficial geology and geomorphology (Noller, 2009) for parent material and age; and a sub-watershed map to better bound location related to parent material sources. In the second step, the digital soil map is created using the Random Forests package in R. Random Forests is a decision tree classification technique where many trees, instead of a single one, are developed and compared to increase the stability and the reliability of the prediction. The model is trained and verified on areas where a 1:25.000 published soil maps obtained from field work is available and then it is applied for predictive mapping to the other areas. Preliminary results obtained in a small area in the plain around the city of Lefkosia, where eight different soil classes are present, show very good capacities of the method. The Ramdom Forest approach leads to reproduce soil

  20. Evaluating the Effectiveness of Science for Decision-Making: Water Managers and Tree- Ring Data in the Western United States

    NASA Astrophysics Data System (ADS)

    Rice, J. L.; Woodhouse, C.; Lukas, J.

    2008-12-01

    Current climate variability, potential impacts of climate change, and limited resources in the face of growing demand are increasingly prompting water managers in the western United States to consider and use data from climate-related research in water resource planning. Much of these data are produced by stakeholder- driven science programs, such as NOAA's Regional Integrated Science Assessments (RISAs), but there have been few efforts to evaluate the effectiveness of these science-to-application efforts. Over the past several years, researchers with the Western Water Assessment (WWA) RISA have been providing tree-ring reconstructions of streamflow to water managers in Colorado and other western states, and presenting technical workshops explaining the applications of these tree-ring data for water management and planning. Using in-depth interviews and a survey questionnaire, we have assessed the effectiveness and outcomes of these engagements, addressing (1) the factors that have prompted water managers to seek out tree-ring data, (2) how paleoclimate data has been made relevant and accessible for water resource planning, and (3) how tree-ring data and information have been utilized by water managers and other workshop participants. We also provide an assessment of challenges and opportunities that exist in the translation of climate science for decision-making, including how tree-ring data are interpreted in the context of water planning paradigms, issues of credibility and acceptance of tree ring data, and what data needs exist in different planning environments. These findings have broader application in improving and evaluating science-policy interactions related to climate and climate change.

  1. Using decision trees to predict benthic communities within and near the German Exclusive Economic Zone (EEZ) of the North Sea.

    PubMed

    Pesch, Roland; Pehlke, Hendrik; Jerosch, Kerstin; Schröder, Winfried; Schlüter, Michael

    2008-01-01

    In this article a concept is described in order to predict and map the occurrence of benthic communities within and near the German Exclusive Economic Zone (EEZ) of the North Sea. The approach consists of two work steps: (1) geostatistical analysis of abiotic measurement data and (2) calculation of benthic provinces by means of Classification and Regression Trees (CART) and GIS-techniques. From bottom water measurements on salinity, temperature, silicate and nutrients as well as from punctual data on grain size ranges (0-20, 20-63, 63-2,000 mu) raster maps were calculated by use of geostatistical methods. At first the autocorrelation structure was examined and modelled with help of variogram analysis. The resulting variogram models were then used to calculate raster maps by applying ordinary kriging procedures. After intersecting these raster maps with punctual data on eight benthic communities a decision tree was derived to predict the occurrence of these communities within the study area. Since such a CART tree corresponds to a hierarchically ordered set of decision rules it was applied to the geostatistically estimated raster data to predict benthic habitats within and near the EEZ.

  2. Cost-Effectiveness of a new Rotavirus Vaccination Program in Pakistan: a Decision Tree Model

    PubMed Central

    Patel, Hiten D.; Roberts, Eric T.; Constenla, Dagna O.

    2013-01-01

    Background Rotavirus gastroenteritis places a significant health and economic burden on Pakistan. To determine the public health impact of a national rotavirus vaccination program, we performed a cost-effectiveness study from the perspective of the health care system. Methods A decision tree model was developed to assess the cost-effectiveness of a national vaccination program in Pakistan. Disease and cost burden with the program were compared to the current state. Disease parameters, vaccine-related costs, and medical treatment costs were based on published epidemiological and economic data, which were specific to Pakistan when possible. An annual birth cohort of children was followed for 5 years to model the public health impact of vaccination on health-related events and costs. The cost-effectiveness was assessed and quantified in cost (2012 US$) per disability-adjusted life-year (DALY) averted and cost per death averted. Sensitivity analyses were performed to assess the robustness of the incremental cost-effectiveness ratios (ICERs). Results The base case results showed vaccination prevented 1.2 million cases of rotavirus gastroenteritis, 93,000 outpatient visits, 43,000 hospitalizations, and 6,700 deaths by 5 years of age for an annual birth cohort scaled from 6% current coverage to DPT3 levels (85%). The medical cost savings would be US$1.4 million from hospitalizations and US$200,000 from outpatient visit costs. The vaccination program would cost US$35 million at a vaccine price of US$5.00. The ICER was US$149.50 per DALY averted or US$4,972 per death averted. Sensitivity analyses showed changes in case-fatality ratio, vaccine efficacy, and vaccine cost exerted the greatest influence on the ICER. Conclusions Across a range of sensitivity analyses, a national rotavirus vaccination program was predicted to decrease health and economic burden due to rotavirus gastroenteritis in Pakistan by ~40%. Vaccination was highly cost-effective in this context. As

  3. Analysis of the impact of recreational trail usage for prioritising management decisions: a regression tree approach

    NASA Astrophysics Data System (ADS)

    Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek

    2016-04-01

    The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of

  4. Landslide Susceptibility Mapping of Tegucigalpa, Honduras Using Artificial Neural Network, Bayesian Network and Decision Trees

    NASA Astrophysics Data System (ADS)

    Garcia Urquia, E. L.; Braun, A.; Yamagishi, H.

    2016-12-01

    Tegucigalpa, the capital city of Honduras, experiences rainfall-induced landslides on a yearly basis. The high precipitation regime and the rugged topography the city has been built in couple with the lack of a proper urban expansion plan to contribute to the occurrence of landslides during the rainy season. Thousands of inhabitants live at risk of losing their belongings due to the construction of precarious shelters in landslide-prone areas on mountainous terrains and next to the riverbanks. Therefore, the city is in the need for landslide susceptibility and hazard maps to aid in the regulation of future development. Major challenges in the context of highly dynamic urbanizing areas are the overlap of natural and anthropogenic slope destabilizing factors, as well as the availability and accuracy of data. Data-driven multivariate techniques have proven to be powerful in discovering interrelations between factors, identifying important factors in large datasets, capturing non-linear problems and coping with noisy and incomplete data. This analysis focuses on the creation of a landslide susceptibility map using different methods from the field of data mining, Artificial Neural Networks (ANN), Bayesian Networks (BN) and Decision Trees (DT). The input dataset of the study contains geomorphological and hydrological factors derived from a digital elevation model with a 10 m resolution, lithological factors derived from a geological map, and anthropogenic factors, such as information on the development stage of the neighborhoods in Tegucigalpa and road density. Moreover, a landslide inventory map that was developed in 2014 through aerial photo interpretation was used as target variable in the analysis. The analysis covers an area of roughly 100 km2, while 8.95 km2 are occupied by landslides. In a first step, the dataset was explored by assessing and improving the data quality, identifying unimportant variables and finding interrelations. Then, based on a training

  5. Skill and reliability of experimental GEFS ensemble forecast guidance designed to inform decision-making in reservoir management in California

    NASA Astrophysics Data System (ADS)

    Scheuerer, Michael; Webb, Robert S.; Hamill, Thomas M.

    2017-04-01

    Many reservoirs operated by the U.S. Army Corps of Engineers (Corps) in California provide flood control as well as water supply, recreation and stream flow regulation. Operations for flood control follow seasonally specified elevations for an upper volume of reservoir storage with unused storage capacity designated for flood risk management and thus not available for water supply storage. In the flood control operation of these reservoirs, runoff is captured during rain events and then released soon after at rates that do not result in downstream flooding (typically over a 5 to 8 day period), resulting in evacuated storage space to capture runoff from the next potential storm. As part of the Forecast-Informed Reservoir Operations (FIRO) partnership to more effectively balance flood and drought risks, we developed an experimental California medium-range precipitation forecast system based on NCEP GEFS reforecasts and Climatology-Calibrated Precipitation Analysis (CCPA). We have applied this experimental forecast system to predict the probability of day 5-10 precipitation accumulations at each CCPA grid point within California to exceed certain pre-specified thresholds. Discussions with flood and water supply managers indicate that forecast guidance for the very low risk of extreme precipitation for watersheds above reservoirs can be valuable for decision making. In this study, we assess the skill and reliability of this experimental forecast system to predict low probabilities of precipitation extreme events for select watersheds during recent winter precipitation seasons. Our analysis indicate there may be sufficient reliability in forecasts guidance for low probabilities of heavy precipitation events to inform decision making in reservoir management in select California river basins to manage flood risk while increasing water supply for consumptive use and ecosystem services.

  6. Skill and reliability of experimental GEFS ensemble forecast guidance designed to inform decision-making in reservoir management in California

    NASA Astrophysics Data System (ADS)

    Webb, R. S.; Scheuerer, M.; Hamill, T.

    2016-12-01

    Many reservoirs operated by the U.S. Army Corps of Engineers (Corps) in California provide flood control as well as water supply, recreation and stream flow regulation. Operations for flood control follow seasonally specified elevations for an upper volume of reservoir storage with unused storage capacity designated for flood risk management and thus not available for water supply storage. In the flood control operation of these reservoirs, runoff is captured during rain events and then released soon after at rates that do not result in downstream flooding (typically over a 5 to 8 day period), resulting in evacuated storage space to capture runoff from the next potential storm. As part of the Forecast-Informed Reservoir Operations (FIRO) partnership to more effectively balance flood and drought risks, we developed an experimental California medium-range precipitation forecast system based on NCEP GEFS Reforecasts and Climatology-Calibrated Precipitation Analysis (CCPA). We have applied this experimental forecast system to predict the probability of day 5-10 precipitation accumulations at each CCPA grid point within California to exceed certain pre-specified thresholds. Discussions with flood and water supply managers indicate that forecast guidance for the very low risk of extreme precipitation for watersheds above reservoirs can be valuable for decision making. In this study, we assess the skill and reliability of this experimental forecast system to predict low probabilities of precipitation extreme events for select watersheds during recent winter precipitation seasons. Our analysis indicate there may be sufficient reliability in forecasts guidance for low probabilities of heavy precipitation events to inform decision making in reservoir management in select California river basins to manage flood risk while increasing water supply for consumptive use and ecosystem services.

  7. Detection of clinical mastitis with sensor data from automatic milking systems is improved by using decision-tree induction.

    PubMed

    Kamphuis, C; Mollenhorst, H; Heesterbeek, J A P; Hogeveen, H

    2010-08-01

    The objective was to develop and validate a clinical mastitis (CM) detection model by means of decision-tree induction. For farmers milking with an automatic milking system (AMS), it is desirable that the detection model has a high level of sensitivity (Se), especially for more severe cases of CM, at a very high specificity (Sp). In addition, an alert for CM should be generated preferably at the quarter milking (QM) at which the CM infection is visible for the first time. Data were collected from 9 Dutch dairy herds milking automatically during a 2.5-yr period. Data included sensor data (electrical conductivity, color, and yield) at the QM level and visual observations of quarters with CM recorded by the farmers. Visual observations of quarters with CM were combined with sensor data of the most recent automatic milking recorded for that same quarter, within a 24-h time window before the visual assessment time. Sensor data of 3.5 million QM were collected, of which 348 QM were combined with a CM observation. Data were divided into a training set, including two-thirds of all data, and a test set. Cows in the training set were not included in the test set and vice versa. A decision-tree model was trained using only clear examples of healthy (n=24,717) or diseased (n=243) QM. The model was tested on 105 QM with CM and a random sample of 50,000 QM without CM. While keeping the Se at a level comparable to that of models currently used by AMS, the decision-tree model was able to decrease the number of false-positive alerts by more than 50%. At an Sp of 99%, 40% of the CM cases were detected. Sixty-four percent of the severe CM cases were detected and only 12.5% of the CM that were scored as watery milk. The Se increased considerably from 40% to 66.7% when the time window increased from less than 24h before the CM observation, to a time window from 24h before to 24h after the CM observation. Even at very wide time windows, however, it was impossible to reach an Se of 100

  8. Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes

    PubMed Central

    2013-01-01

    Background Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets. Results A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups. Conclusions The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease. PMID:24188919

  9. A novel decision tree approach based on transcranial Doppler sonography to screen for blunt cervical vascular injuries.

    PubMed

    Purvis, Dianna; Aldaghlas, Tayseer; Trickey, Amber W; Rizzo, Anne; Sikdar, Siddhartha

    2013-06-01

    Early detection and treatment of blunt cervical vascular injuries prevent adverse neurologic sequelae. Current screening criteria can miss up to 22% of these injuries. The study objective was to investigate bedside transcranial Doppler sonography for detecting blunt cervical vascular injuries in trauma patients using a novel decision tree approach. This prospective pilot study was conducted at a level I trauma center. Patients undergoing computed tomographic angiography for suspected blunt cervical vascular injuries were studied with transcranial Doppler sonography. Extracranial and intracranial vasculatures were examined with a portable power M-mode transcranial Doppler unit. The middle cerebral artery mean flow velocity, pulsatility index, and their asymmetries were used to quantify flow patterns and develop an injury decision tree screening protocol. Student t tests validated associations between injuries and transcranial Doppler predictive measures. We evaluated 27 trauma patients with 13 injuries. Single vertebral artery injuries were most common (38.5%), followed by single internal carotid artery injuries (30%). Compared to patients without injuries, mean flow velocity asymmetry was higher for single internal carotid artery (P = .003) and single vertebral artery (P = .004) injuries. Similarly, pulsatility index asymmetry was higher in single internal carotid artery (P = .015) and single vertebral artery (P = .042) injuries, whereas the lowest pulsatility index was elevated for bilateral vertebral artery injuries (P = .006). The decision tree yielded 92% specificity, 93% sensitivity, and 93% correct classifications. In this pilot feasibility study, transcranial Doppler measures were significantly associated with the blunt cervical vascular injury status, suggesting that transcranial Doppler sonography might be a viable bedside screening tool for trauma. Patient-specific hemodynamic information from transcranial Doppler assessment has the potential to alter

  10. Single nucleotide polymorphism barcoding of cytochrome c oxidase I sequences for discriminating 17 species of Columbidae by decision tree algorithm.

    PubMed

    Yang, Cheng-Hong; Wu, Kuo-Chuan; Dahms, Hans-Uwe; Chuang, Li-Yeh; Chang, Hsueh-Wei

    2017-07-01

    DNA barcodes are widely used in taxonomy, systematics, species identification, food safety, and forensic science. Most of the conventional DNA barcode sequences contain the whole information of a given barcoding gene. Most of the sequence information does not vary and is uninformative for a given group of taxa within a monophylum. We suggest here a method that reduces the amount of noninformative nucleotides in a given barcoding sequence of a major taxon, like the prokaryotes, or eukaryotic animals, plants, or fungi. The actual differences in genetic sequences, called single nucleotide polymorphism (SNP) genotyping, provide a tool for developing a rapid, reliable, and high-throughput assay for the discrimination between known species. Here, we investigated SNPs as robust markers of genetic variation for identifying different pigeon species based on available cytochrome c oxidase I (COI) data. We propose here a decision tree-based SNP barcoding (DTSB) algorithm where SNP patterns are selected from the DNA barcoding sequence of several evolutionarily related species in order to identify a single species with pigeons as an example. This approach can make use of any established barcoding system. We here firstly used as an example the mitochondrial gene COI information of 17 pigeon species (Columbidae, Aves) using DTSB after sequence trimming and alignment. SNPs were chosen which followed the rule of decision tree and species-specific SNP barcodes. The shortest barcode of about 11 bp was then generated for discriminating 17 pigeon species using the DTSB method. This method provides a sequence alignment and tree decision approach to parsimoniously assign a unique and shortest SNP barcode for any known species of a chosen monophyletic taxon where a barcoding sequence is available.

  11. Decision-tree early warning score (DTEWS) validates the design of the National Early Warning Score (NEWS).

    PubMed

    Badriyah, Tessy; Briggs, James S; Meredith, Paul; Jarvis, Stuart W; Schmidt, Paul E; Featherstone, Peter I; Prytherch, David R; Smith, Gary B

    2014-03-01

    To compare the performance of a human-generated, trial and error-optimised early warning score (EWS), i.e., National Early Warning Score (NEWS), with one generated entirely algorithmically using Decision Tree (DT) analysis. We used DT analysis to construct a decision-tree EWS (DTEWS) from a database of 198,755 vital signs observation sets collected from 35,585 consecutive, completed acute medical admissions. We evaluated the ability of DTEWS to discriminate patients at risk of cardiac arrest, unanticipated intensive care unit admission or death, each within 24h of a given vital signs observation. We compared the performance of DTEWS and NEWS using the area under the receiver-operating characteristic (AUROC) curve. The structures of DTEWS and NEWS were very similar. The AUROC (95% CI) for DTEWS for cardiac arrest, unanticipated ICU admission, death, and any of the outcomes, all within 24h, were 0.708 (0.669-0.747), 0.862 (0.852-0.872), 0.899 (0.892-0.907), and 0.877 (0.870-0.883), respectively. Values for NEWS were 0.722 (0.685-0.759) [cardiac arrest], 0.857 (0.847-0.868) [unanticipated ICU admission}, 0.894 (0.887-0.902) [death], and 0.873 (0.866-0.879) [any outcome]. The decision-tree technique independently validates the composition and weightings of NEWS. The DT approach quickly provided an almost identical EWS to NEWS, although one that admittedly would benefit from fine-tuning using clinical knowledge. We believe that DT analysis could be used to quickly develop candidate models for disease-specific EWSs, which may be required in future. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

  12. Mapping mangrove forests using multi-tidal remotely-sensed data and a decision-tree-based procedure

    NASA Astrophysics Data System (ADS)

    Zhang, Xuehong; Treitz, Paul M.; Chen, Dongmei; Quan, Chang; Shi, Lixin; Li, Xinhui

    2017-10-01

    Mangrove forests grow in intertidal zones in tropical and subtropical regions and have suffered a dramatic decline globally over the past few decades. Remote sensing data, collected at various spatial resolutions, provide an effective way to map the spatial distribution of mangrove forests over time. However, the spectral signatures of mangrove forests are significantly affected by tide levels. Therefore, mangrove forests may not be accurately mapped with remote sensing data collected during a single-tidal event, especially if not acquired at low tide. This research reports how a decision-tree -based procedure was developed to map mangrove forests using multi-tidal Landsat 5 Thematic Mapper (TM) data and a Digital Elevation Model (DEM). Three indices, including the Normalized Difference Moisture Index (NDMI), the Normalized Difference Vegetation Index (NDVI) and NDVIL·NDMIH (the multiplication of NDVIL by NDMIH, L: low tide level, H: high tide level) were used in this algorithm to differentiate mangrove forests from other land-cover and land-use types in Fangchenggang City, China. Additionally, the recent Landsat 8 OLI (Operational Land Imager) data were selected to validate the results and compare if the methodology is reliable. The results demonstrate that short-term multi-tidal remotely-sensed data better represent the unique nearshore coastal wetland habitats of mangrove forests than single-tidal data. Furthermore, multi-tidal remotely-sensed data has led to improved accuracies using two classification approaches: i.e. decision trees and the maximum likelihood classification (MLC). Since mangrove forests are typically found at low elevations, the inclusion of elevation data in the two classification procedures was tested. Given the decision-tree method does not assume strict data distribution parameters, it was able to optimize the application of multi-tidal and elevation data, resulting in higher classification accuracies of mangrove forests. When using multi

  13. Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree With Gene Selection

    PubMed Central

    2005-01-01

    We investigate the problems of multiclass cancer classification with gene selection from gene expression data. Two different constructed multiclass classifiers with gene selection are proposed, which are fuzzy support vector machine (FSVM) with gene selection and binary classification tree based on SVM with gene selection. Using F test and recursive feature elimination based on SVM as gene selection methods, binary classification tree based on SVM with F test, binary classification tree based on SVM with recursive feature elimination based on SVM, and FSVM with recursive feature elimination based on SVM are tested in our experiments. To accelerate computation, preselecting the strongest genes is also used. The proposed techniques are applied to analyze breast cancer data, small round blue-cell tumors, and acute leukemia data. Compared to existing multiclass cancer classifiers and binary classification tree based on SVM with F test or binary classification tree based on SVM with recursive feature elimination based on SVM mentioned in this paper, FSVM based on recursive feature elimination based on SVM can find most important genes that affect certain types of cancer with high recognition accuracy. PMID:16046822

  14. Ensemble Tractography

    PubMed Central

    Wandell, Brian A.

    2016-01-01

    Tractography uses diffusion MRI to estimate the trajectory and cortical projection zones of white matter fascicles in the living human brain. There are many different tractography algorithms and each requires the user to set several parameters, such as curvature threshold. Choosing a single algorithm with specific parameters poses two challenges. First, different algorithms and parameter values produce different results. Second, the optimal choice of algorithm and parameter value may differ between different white matter regions or different fascicles, subjects, and acquisition parameters. We propose using ensemble methods to reduce algorithm and parameter dependencies. To do so we separate the processes of fascicle generation and evaluation. Specifically, we analyze the value of creating optimized connectomes by systematically combining candidate streamlines from an ensemble of algorithms (deterministic and probabilistic) and systematically varying parameters (curvature and stopping criterion). The ensemble approach leads to optimized connectomes that provide better cross-validated prediction error of the diffusion MRI data than optimized connectomes generated using a single-algorithm or parameter set. Furthermore, the ensemble approach produces connectomes that contain both short- and long-range fascicles, whereas single-parameter connectomes are biased towards one or the other. In summary, a systematic ensemble tractography approach can produce connectomes that are superior to standard single parameter estimates both for predicting the diffusion measurements and estimating white matter fascicles. PMID:26845558

  15. Modelling the spatial distribution of Fasciola hepatica in bovines using decision tree, logistic regression and GIS query approaches for Brazil.

    PubMed

    Bennema, S C; Molento, M B; Scholte, R G; Carvalho, O S; Pritsch, I

    2017-11-01

    Fascioliasis is a condition caused by the trematode Fasciola hepatica. In this paper, the spatial distribution of F. hepatica in bovines in Brazil was modelled using a decision tree approach and a logistic regression, combined with a geographic information system (GIS) query. In the decision tree and the logistic model, isothermality had the strongest influence on disease prevalence. Also, the 50-year average precipitation in the warmest quarter of the year was included as a risk factor, having a negative influence on the parasite prevalence. The risk maps developed using both techniques, showed a predicted higher prevalence mainly in the South of Brazil. The prediction performance seemed to be high, but both techniques failed to reach a high accuracy in predicting the medium and high prevalence classes to the entire country. The GIS query map, based on the range of isothermality, minimum temperature of coldest month, precipitation of warmest quarter of the year, altitude and the average dailyland surface temperature, showed a possibility of presence of F. hepatica in a very large area. The risk maps produced using these methods can be used to focus activities of animal and public health programmes, even on non-evaluated F. hepatica areas.

  16. Mapping potential carbon and timber losses from hurricanes using a decision tree and ecosystem services driver model.

    PubMed

    Delphin, S; Escobedo, F J; Abd-Elrahman, A; Cropper, W

    2013-11-15

    Information on the effect of direct drivers such as hurricanes on ecosystem services is relevant to landowners and policy makers due to predicted effects from climate change. We identified forest damage risk zones due to hurricanes and estimated the potential loss of 2 key ecosystem services: aboveground carbon storage and timber volume. Using land cover, plot-level forest inventory data, the Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) model, and a decision tree-based framework; we determined potential damage to subtropical forests from hurricanes in the Lower Suwannee River (LS) and Pensacola Bay (PB) watersheds in Florida, US. We used biophysical factors identified in previous studies as being influential in forest damage in our decision tree and hurricane wind risk maps. Results show that 31% and 0.5% of the total aboveground carbon storage in the LS and PB, respectively was located in high forest damage risk (HR) zones. Overall 15% and 0.7% of the total timber net volume in the LS and PB, respectively, was in HR zones. This model can also be used for identifying timber salvage areas, developing ecosystem service provision and management scenarios, and assessing the effect of other drivers on ecosystem services and goods.

  17. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    PubMed Central

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule’s overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  18. Lessons Learned from Applications of a Climate Change Decision Tree toWater System Projects in Kenya and Nepal

    NASA Astrophysics Data System (ADS)

    Ray, P. A.; Bonzanigo, L.; Taner, M. U.; Wi, S.; Yang, Y. C. E.; Brown, C.

    2015-12-01

    The Decision Tree Framework developed for the World Bank's Water Partnership Program provides resource-limited project planners and program managers with a cost-effective and effort-efficient, scientifically defensible, repeatable, and clear method for demonstrating the robustness of a project to climate change. At the conclusion of this process, the project planner is empowered to confidently communicate the method by which the vulnerabilities of the project have been assessed, and how the adjustments that were made (if any were necessary) improved the project's feasibility and profitability. The framework adopts a "bottom-up" approach to risk assessment that aims at a thorough understanding of a project's vulnerabilities to climate change in the context of other nonclimate uncertainties (e.g., economic, environmental, demographic, political). It helps identify projects that perform well across a wide range of potential future climate conditions, as opposed to seeking solutions that are optimal in expected conditions but fragile to conditions deviating from the expected. Lessons learned through application of the Decision Tree to case studies in Kenya and Nepal will be presented, and aspects of the framework requiring further refinement will be described.

  19. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections.

    PubMed

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-03-10

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule's overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context.

  20. Use of CHAID Decision Trees to Formulate Pathways for the Early Detection of Metabolic Syndrome in Young Adults

    PubMed Central

    Liu, Pei-Yang

    2014-01-01

    Metabolic syndrome (MetS) in young adults (age 20–39) is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID) decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES) 2009-2010 Cohort as a representative sample of the United States population (n = 745). Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP) classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS. PMID:24817904

  1. Use of CHAID decision trees to formulate pathways for the early detection of metabolic syndrome in young adults.

    PubMed

    Miller, Brian; Fridline, Mark; Liu, Pei-Yang; Marino, Deborah

    2014-01-01

    Metabolic syndrome (MetS) in young adults (age 20-39) is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID) decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES) 2009-2010 Cohort as a representative sample of the United States population (n = 745). Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP) classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS.

  2. Improving Crop Classification Techniques Using Optical Remote Sensing Imagery, High-Resolution Agriculture Resource Inventory Shapefiles and Decision Trees

    NASA Astrophysics Data System (ADS)

    Melnychuk, A. L.; Berg, A. A.; Sweeney, S.

    2010-12-01

    Recognition of anthropogenic effects of land use management practices on bodies of water is important for remediating and preventing eutrophication. In the case of Lake Simcoe, Ontario the main surrounding landuse is agriculture. To better manage the nutrient flow into the lake, knowledge of the management of the agricultural land is important. For this basin, a comprehensive agricultural resource inventory is required for assessment of policy and for input into water quality management and assessment tools. Supervised decision tree classification schemes, used in many previous applications, have yielded reliable classifications in agricultural land-use systems. However, when using these classification techniques the user is confronted with numerous data sources. In this study we use a large inventory of optical satellite image products (Landsat, AWiFS, SPOT and MODIS) and ancillary data sources (temporal MODIS-NDVI product signatures, digital elevation models and soil maps) at various spatial and temporal resolutions in a decision tree classification scheme. The sensitivity of the classification accuracy to various products is assessed to identify optimal data sources for classifying crop systems.

  3. Falls in the elderly were predicted opportunistically using a decision tree and systematically using a database-driven screening tool.

    PubMed

    Rafiq, Meena; McGovern, Andrew; Jones, Simon; Harris, Kevin; Tomson, Charles; Gallagher, Hugh; de Lusignan, Simon

    2014-08-01

    To identify risk factors for falls and generate two screening tools: an opportunistic tool for use in consultation to flag at risk patients and a systematic database screening tool for comprehensive falls assessment of the practice population. This multicenter cohort study was part of the quality improvement in chronic kidney disease trial. Routine data for participants aged 65 years and above were collected from 127 general practice (GP) databases across the UK, including sociodemographic, physical, diagnostic, pharmaceutical, lifestyle factors, and records of falls or fractures over 5 years. Multilevel logistic regression analyses were performed to identify predictors. The strongest predictors were used to generate a decision tree and risk score. Of the 135,433 individuals included, 10,766 (8%) experienced a fall or fracture during follow-up. Age, female sex, previous fall, nocturia, anti-depressant use, and urinary incontinence were the strongest predictors from our risk profile (area under the receiver operating characteristics curve = 0.72). Medication for hypertension did not increase the falls risk. Females aged over 75 years and subjects with a previous fall were the highest risk groups from the decision tree. The risk profile was converted into a risk score (range -7 to 56). Using a cut-off of ≥9, sensitivity was 68%, and specificity was 60%. Our study developed opportunistic and systematic tools to predict falls without additional mobility assessments. Copyright © 2014 Elsevier Inc. All rights reserved.

  4. Comparison of decision tree-fuzzy and rough set-fuzzy methods for fault categorization of mono-block centrifugal pump

    NASA Astrophysics Data System (ADS)

    Sakthivel, N. R.; Sugumaran, V.; Nair, Binoy. B.

    2010-08-01

    Mono-block centrifugal pumps are widely used in a variety of applications. In many applications the role of mono-block centrifugal pump is critical and condition monitoring is essential. Vibration based continuous monitoring and analysis using machine learning approach is gaining momentum. Particularly, artificial neural networks, fuzzy logic have been employed for continuous monitoring and fault diagnosis. This paper presents the use of decision tree and rough sets to generate the rules from statistical features extracted from vibration signals under good and faulty conditions of a mono-block centrifugal pump. A fuzzy classifier is built using decision tree and rough set rules and tested using test data. The results obtained using decision tree rules and those obtained using rough set rules are compared. Finally, the accuracy of a principle component analysis based decision tree-fuzzy system is also evaluated. The study reveals that overall classification accuracy obtained by the decision tree-fuzzy hybrid system is to some extent better than the rough set-fuzzy hybrid system.

  5. Refined estimation of solar energy potential on roof areas using decision trees on CityGML-data

    NASA Astrophysics Data System (ADS)

    Baumanns, K.; Löwner, M.-O.

    2009-04-01

    We present a decision tree for a refined solar energy plant potential estimation on roof areas using the exchange format CityGML. Compared to raster datasets CityGML-data holds geometric and semantic information of buildings and roof areas in more detail. In addition to shadowing effects ownership structures and lifetime of roof areas can be incorporated into the valuation. Since the Renewable Energy Sources Act came into force in Germany in 2000, private house owners and municipals raise attention to the production of green electricity. At this the return on invest depends on the statutory price per Watt, the initial costs of the solar energy plant, its lifetime, and the real production of this installation. The latter depends on the radiation that is obtained from and the size of the solar energy plant. In this context the exposition and slope of the roof area is as important as building parts like chimneys or dormers that might shadow parts of the roof. Knowing the controlling factors a decision tree can be created to support a beneficial deployment of a solar energy plant. Also sufficient data has to be available. Airborne raster datasets can only support a coarse estimation of the solar energy potential of roof areas. While they carry no semantically information, even roof installations are hardly to identify. CityGML as an Open Geospatial Consortium standard is an interoperable exchange data format for virtual 3-dimensional Cities. Based on international standards it holds the aforementioned geometric properties as well as semantically information. In Germany many Cities are on the way to provide CityGML dataset, e. g. Berlin. Here we present a decision tree that incorporates geometrically as well as semantically demands for a refined estimation of the solar energy potential on roof areas. Based on CityGML's attribute lists we consider geometries of roofs and roof installations as well as global radiation which can be derived e. g. from the European Solar

  6. The relation of student behavior, peer status, race, and gender to decisions about school discipline using CHAID decision trees and regression modeling.

    PubMed

    Horner, Stacy B; Fireman, Gary D; Wang, Eugene W

    2010-04-01

    Peer nominations and demographic information were collected from a diverse sample of 1493 elementary school participants to examine behavior (overt and relational aggression, impulsivity, and prosociality), context (peer status), and demographic characteristics (race and gender) as predictors of teacher and administrator decisions about discipline. Exploratory results using classification tree analyses indicated students nominated as average or highly overtly aggressive were more likely to be disciplined than others. Among these students, race was the most significant predictor, with African American students more likely to be disciplined than Caucasians, Hispanics, or Others. Among the students nominated as low in overt aggression, a lack of prosocial behavior was the most significant predictor. Confirmatory analysis using hierarchical logistic regression supported the exploratory results. Similarities with other biased referral patterns, proactive classroom management strategies, and culturally sensitive recommendations are discussed.

  7. Ensembl 2016

    PubMed Central

    Yates, Andrew; Akanni, Wasiu; Amode, M. Ridwan; Barrell, Daniel; Billis, Konstantinos; Carvalho-Silva, Denise; Cummins, Carla; Clapham, Peter; Fitzgerald, Stephen; Gil, Laurent; Girón, Carlos García; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah E.; Janacek, Sophie H.; Johnson, Nathan; Juettemann, Thomas; Keenan, Stephen; Lavidas, Ilias; Martin, Fergal J.; Maurel, Thomas; McLaren, William; Murphy, Daniel N.; Nag, Rishi; Nuhn, Michael; Parker, Anne; Patricio, Mateus; Pignatelli, Miguel; Rahtz, Matthew; Riat, Harpreet Singh; Sheppard, Daniel; Taylor, Kieron; Thormann, Anja; Vullo, Alessandro; Wilder, Steven P.; Zadissa, Amonida; Birney, Ewan; Harrow, Jennifer; Muffato, Matthieu; Perry, Emily; Ruffier, Magali; Spudich, Giulietta; Trevanion, Stephen J.; Cunningham, Fiona; Aken, Bronwen L.; Zerbino, Daniel R.; Flicek, Paul

    2016-01-01

    The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license. PMID:26687719

  8. Re-Construction of Reference Population and Generating Weights by Decision Tree

    DTIC Science & Technology

    2017-07-21

    2nd edition, September 22, 2005. (8) Floyd J . Fowler , “Survey Research Methods”, 5th Edition SAGE Publications, Inc., 5 edition, September 18...on Distribution of Variables 33 Figure 4: Comparing Weights by t -Test 34 Re-construction and Weighting 5 1...Modules Feature\\Algo rithm C& R Tree QUEST CHAID C5.0 Input fields(predict ors) continuous, categorical, flag, nominal or ordinal continuous

  9. A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations

    PubMed Central

    Rood, Richard B.

    2016-01-01

    Abstract An object‐based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smooth topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small‐scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales. PMID:28239437

  10. A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations.

    PubMed

    Soner Yorgun, M; Rood, Richard B

    2016-12-01

    An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smooth topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.

  11. Using decision trees to explore the association between the length of stay and potentially avoidable readmissions: A retrospective cohort study.

    PubMed

    Alyahya, Mohammad S; Hijazi, Heba H; Alshraideh, Hussam A; Al-Nasser, Amjad D

    2017-01-13

    There is a growing concern that reduction in hospital length of stay (LOS) may raise the rate of hospital readmission. This study aims to identify the rate of avoidable 30-day readmission and find out the association between LOS and readmission. All consecutive patient admissions to the internal medicine services (n = 5,273) at King Abdullah University Hospital in Jordan between 1 December 2012 and 31 December 2013 were analyzed. To identify avoidable readmissions, a validated computerized algorithm called SQLape was used. The multinomial logistic regression was firstly employed. Then, detailed analysis was performed using the Decision Trees (DTs) model, one of the most widely used data mining algorithms in Clinical Decision Support Systems (CDSS). The potentially avoidable 30-day readmission rate was 44%, and patients with longer LOS were more likely to be readmitted avoidably. However, LOS had a significant negative effect on unavoidable readmissions. The avoidable readmission rate is still highly unacceptable. Because LOS potentially increases the likelihood of avoidable readmission, it is still possible to achieve a shorter LOS without increasing the readmission rate. Moreover, the way the DT model classified patient subgroups of readmissions based on patient characteristics and LOS is applicable in real clinical decisions.

  12. Ensemble learning as approach for pipeline condition assessment

    NASA Astrophysics Data System (ADS)

    Camacho-Navarro, Jhonatan; Ruiz, Magda; Villamizar, Rodolfo; Mujica, Luis; Moreno-Beltrán, Gustavo

    2017-05-01

    The algorithms commonly used for damage condition monitoring present several drawbacks related to unbalanced data, optimal training requirements, low capability to manage feature diversity and low tolerance to errors. In this work, an approach based on ensemble learning is discussed as alternative to obtain more efficient diagnosis. The main advantage of ensemble learning is the use of several algorithms at the same time for a better proficiency. Thereby, combining simplest tree decision algorithms in bagging scheme, the accuracy of damage detection is improved. It takes advantage by combining prediction of preliminary algorithms based on regression models. The methodology is experimentally validated on a carbon steel pipe section, where mass adding conditions are studied as possible failures. Data from an active system based on piezoelectric sensors are stored and characterized through the T2 and Q statistical indexes. Then, they are the inputs to the ensemble learning. The proposed methodology allows determining the condition assessment and damage localizations in the structure. The results of the studied cases show the feasibility of ensemble learning for detecting occurrence of structural damages with successful results.

  13. Decision-tree model for predicting outcomes after out-of-hospital cardiac arrest in the emergency department

    PubMed Central

    2013-01-01

    Introduction Estimation of outcomes in patients after out-of-hospital cardiac arrest (OHCA) soon after arrival at the hospital may help clinicians guide in-hospital strategies, particularly in the emergency department. This study aimed to develop a simple and generally applicable bedside model for predicting outcomes after cardiac arrest. Methods We analyzed data for 390,226 adult patients who had undergone OHCA, from a prospectively recorded nationwide Utstein-style Japanese database for 2005 through 2009. The primary end point was survival with favorable neurologic outcome (cerebral performance category (CPC) scale, categories 1 to 2 [CPC 1 to 2]) at 1 month. The secondary end point was survival at 1 month. We developed a decision-tree prediction model by using data from a 4-year period (2005 through 2008, n = 307,896), with validation by using external data from 2009 (n = 82,330). Results Recursive partitioning analysis of the development cohort for 10 predictors indicated that the best single predictor for survival and CPC 1 to 2 was shockable initial rhythm. The next predictors for patients with shockable initial rhythm were age (<70 years) followed by witnessed arrest and age (>70 years) followed by arrest witnessed by emergency medical services (EMS) personnel. For patients with unshockable initial rhythm, the next best predictor was witnessed arrest. A simple decision-tree prediction mode permitted stratification into four prediction groups: good, moderately good, poor, and absolutely poor. This model identified patient groups with a range from 1.2% to 30.2% for survival and from 0.3% to 23.2% for CPC 1 to 2 probabilities. Similar results were observed when this model was applied to the validation cohort. Conclusions On the basis of a decision-tree prediction model using four prehospital variables (shockable initial rhythm, age, witnessed arrest, and witnessed by EMS personnel), OHCA patients can be readily stratified into the four groups (good, moderately

  14. Contrasting determinants for the introduction and establishment success of exotic birds in Taiwan using decision trees models.

    PubMed

    Liang, Shih-Hsiung; Walther, Bruno Andreas; Shieh, Bao-Sen

    2017-01-01

    Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables associated with reproduction. Our

  15. Contrasting determinants for the introduction and establishment success of exotic birds in Taiwan using decision trees models

    PubMed Central

    Liang, Shih-Hsiung; Walther, Bruno Andreas

    2017-01-01

    Background Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. Methods We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. Results The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables

  16. Application of decision trees to the analysis of soil radon data for earthquake prediction.

    PubMed

    Zmazek, B; Todorovski, L; Dzeroski, S; Vaupotic, J; Kobal, I

    2003-06-01

    Different regression methods have been used to predict radon concentration in soil gas on the basis of environmental data, i.e. barometric pressure, soil temperature, air temperature and rainfall. Analyses of the radon data from three stations in the Krsko basin, Slovenia, have shown that model trees outperform other regression methods. A model has been built which predicts radon concentration with a correlation of 0.8, provided it is influenced only by the environmental parameters. In periods with seismic activity this correlation is much lower. This decrease in predictive accuracy appears 1-7 days before earthquakes with local magnitude 0.8-3.3.

  17. Using decision tree analysis to identify risk factors for relapse to smoking

    PubMed Central

    Piper, Megan E.; Loh, Wei-Yin; Smith, Stevens S.; Japuntich, Sandra J.; Baker, Timothy B.

    2010-01-01

    This research used classification tree analysis and logistic regression models to identify risk factors related to short- and long-term abstinence. Baseline and cessation outcome data from two smoking cessation trials, conducted from 2001 to 2002, in two Midwestern urban areas, were analyzed. There were 928 participants (53.1% women, 81.8% white) with complete data. Both analyses suggest that relapse risk is produced by interactions of risk factors and that early and late cessation outcomes reflect different vulnerability factors. The results illustrate the dynamic nature of relapse risk and suggest the importance of efficient modeling of interactions in relapse prediction. PMID:20397871

  18. Evaluating Psychiatric Hospital Admission Decisions for Children in Foster Care: An Optimal Classification Tree Analysis

    ERIC Educational Resources Information Center

    Snowden, Jessica A.; Leon, Scott C.; Bryant, Fred B.; Lyons, John S.

    2007-01-01

    This study explored clinical and nonclinical predictors of inpatient hospital admission decisions across a sample of children in foster care over 4 years (N = 13,245). Forty-eight percent of participants were female and the mean age was 13.4 (SD = 3.5 years). Optimal data analysis (Yarnold & Soltysik, 2005) was used to construct a nonlinear…

  19. Ensembl 2014

    PubMed Central

    Flicek, Paul; Amode, M. Ridwan; Barrell, Daniel; Beal, Kathryn; Billis, Konstantinos; Brent, Simon; Carvalho-Silva, Denise; Clapham, Peter; Coates, Guy; Fitzgerald, Stephen; Gil, Laurent; Girón, Carlos García; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah; Johnson, Nathan; Juettemann, Thomas; Kähäri, Andreas K.; Keenan, Stephen; Kulesha, Eugene; Martin, Fergal J.; Maurel, Thomas; McLaren, William M.; Murphy, Daniel N.; Nag, Rishi; Overduin, Bert; Pignatelli, Miguel; Pritchard, Bethan; Pritchard, Emily; Riat, Harpreet S.; Ruffier, Magali; Sheppard, Daniel; Taylor, Kieron; Thormann, Anja; Trevanion, Stephen J.; Vullo, Alessandro; Wilder, Steven P.; Wilson, Mark; Zadissa, Amonida; Aken, Bronwen L.; Birney, Ewan; Cunningham, Fiona; Harrow, Jennifer; Herrero, Javier; Hubbard, Tim J.P.; Kinsella, Rhoda; Muffato, Matthieu; Parker, Anne; Spudich, Giulietta; Yates, Andy; Zerbino, Daniel R.; Searle, Stephen M.J.

    2014-01-01

    Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training. PMID:24316576

  20. Effective Prediction of Errors by Non-native Speakers Using Decision Tree for Speech Recognition-Based CALL System

    NASA Astrophysics Data System (ADS)

    Wang, Hongcui; Kawahara, Tatsuya

    CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.

  1. Personalization algorithm for real-time activity recognition using PDA, wireless motion bands, and binary decision tree.

    PubMed

    Pärkkä, Juha; Cluitmans, Luc; Ermes, Miikka

    2010-09-01

    Inactive and sedentary lifestyle is a major problem in many industrialized countries today. Automatic recognition of type of physical activity can be used to show the user the distribution of his daily activities and to motivate him into more active lifestyle. In this study, an automatic activity-recognition system consisting of wireless motion bands and a PDA is evaluated. The system classifies raw sensor data into activity types online. It uses a decision tree classifier, which has low computational cost and low battery consumption. The classifier parameters can be personalized online by performing a short bout of an activity and by telling the system which activity is being performed. Data were collected with seven volunteers during five everyday activities: lying, sitting/standing, walking, running, and cycling. The online system can detect these activities with overall 86.6% accuracy and with 94.0% accuracy after classifier personalization.

  2. Using image processing technology combined with decision tree algorithm in laryngeal video stroboscope automatic identification of common vocal fold diseases.

    PubMed

    Jeffrey Kuo, Chung-Feng; Wang, Po-Chun; Chu, Yueng-Hsiang; Wang, Hsing-Won; Lai, Chun-Yu

    2013-10-01

    This study used the actual laryngeal video stroboscope videos taken by physicians in clinical practice as the samples for experimental analysis. The samples were dynamic vocal fold videos. Image processing technology was used to automatically capture the image of the largest glottal area from the video to obtain the physiological data of the vocal folds. In this study, an automatic vocal fold disease identification system was designed, which can obtain the physiological parameters for normal vocal folds, vocal paralysis and vocal nodules from image processing according to the pathological features. The decision tree algorithm was used as the classifier of the vocal fold diseases. The identification rate was 92.6%, and the identification rate with an image recognition improvement processing procedure after classification can be improved to 98.7%. Hence, the proposed system has value in clinical practices.

  3. Towards closed-loop deep brain stimulation: decision tree-based essential tremor patient's state classifier and tremor reappearance predictor.

    PubMed

    Shukla, Pitamber; Basu, Ishita; Tuninetti, Daniela

    2014-01-01

    Deep Brain Stimulation (DBS) is a surgical procedure to treat some progressive neurological movement disorders, such as Essential Tremor (ET), in an advanced stage. Current FDA-approved DBS systems operate open-loop, i.e., their parameters are unchanged over time. This work develops a Decision Tree (DT) based algorithm that, by using non-invasively measured surface EMG and accelerometer signals as inputs during DBS-OFF periods, classifies the ET patient's state and then predicts when tremor is about to reappear, at which point DBS is turned ON again for a fixed amount of time. The proposed algorithm achieves an overall accuracy of 93.3% and sensitivity of 97.4%, along with 2.9% false alarm rate. Also, the ratio between predicted tremor delay and the actual detected tremor delay is about 0.93, indicating that tremor prediction is very close to the instant where tremor actually reappeared.

  4. An approach for automated fault diagnosis based on a fuzzy decision tree and boundary analysis of a reconstructed phase space.

    PubMed

    Aydin, Ilhan; Karakose, Mehmet; Akin, Erhan

    2014-03-01

    Although reconstructed phase space is one of the most powerful methods for analyzing a time series, it can fail in fault diagnosis of an induction motor when the appropriate pre-processing is not performed. Therefore, boundary analysis based a new feature extraction method in phase space is proposed for diagnosis of induction motor faults. The proposed approach requires the measurement of one phase current signal to construct the phase space representation. Each phase space is converted into an image, and the boundary of each image is extracted by a boundary detection algorithm. A fuzzy decision tree has been designed to detect broken rotor bars and broken connector faults. The results indicate that the proposed approach has a higher recognition rate than other methods on the same dataset.

  5. A decision tree-based on-line preventive control strategy for power system transient instability prevention

    NASA Astrophysics Data System (ADS)

    Xu, Yan; Dong, Zhao Yang; Zhang, Rui; Wong, Kit Po

    2014-02-01

    Maintaining transient stability is a basic requirement for secure power system operations. Preventive control deals with modifying the system operating point to withstand probable contingencies. In this article, a decision tree (DT)-based on-line preventive control strategy is proposed for transient instability prevention of power systems. Given a stability database, a distance-based feature estimation algorithm is first applied to identify the critical generators, which are then used as features to develop a DT. By interpreting the splitting rules of DT, preventive control is realised by formulating the rules in a standard optimal power flow model and solving it. The proposed method is transparent in control mechanism, on-line computation compatible and convenient to deal with multi-contingency. The effectiveness and efficiency of the method has been verified on New England 10-machine 39-bus test system.

  6. Improved γ/hadron separation for the detection of faint γ-ray sources using boosted decision trees

    NASA Astrophysics Data System (ADS)

    Krause, Maria; Pueschel, Elisa; Maier, Gernot

    2017-03-01

    Imaging atmospheric Cherenkov telescopes record an enormous number of cosmic-ray background events. Suppressing these background events while retaining γ-rays is key to achieving good sensitivity to faint γ-ray sources. The differentiation between signal and background events can be accomplished using machine learning algorithms, which are already used in various fields of physics. Multivariate analyses combine several variables into a single variable that indicates the degree to which an event is γ-ray-like or cosmic-ray-like. In this paper we will focus on the use of "boosted decision trees" for γ/hadron separation. We apply the method to data from the Very Energetic Radiation Imaging Telescope Array System (VERITAS), and demonstrate an improved sensitivity compared to the VERITAS standard analysis.

  7. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    NASA Astrophysics Data System (ADS)

    Kupriyanov, M. S.; Shukeilo, E. Y.; Shichkina, J. A.

    2015-11-01

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient's health condition using data from a wearable device considers in this article.

  8. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    SciTech Connect

    Kupriyanov, M. S. Shukeilo, E. Y. Shichkina, J. A.

    2015-11-17

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article.

  9. An adaptive incremental approach to constructing ensemble classifiers: Application in an information-theoretic computer-aided decision system for detection of masses in mammograms

    SciTech Connect

    Mazurowski, Maciej A.; Zurada, Jacek M.; Tourassi, Georgia D.

    2009-07-15

    Ensemble classifiers have been shown efficient in multiple applications. In this article, the authors explore the effectiveness of ensemble classifiers in a case-based computer-aided diagnosis system for detection of masses in mammograms. They evaluate two general ways of constructing subclassifiers by resampling of the available development dataset: Random division and random selection. Furthermore, they discuss the problem of selecting the ensemble size and propose two adaptive incremental techniques that automatically select the size for the problem at hand. All the techniques are evaluated with respect to a previously proposed information-theoretic CAD system (IT-CAD). The experimental results show that the examined ensemble techniques provide a statistically significant improvement (AUC=0.905{+-}0.024) in performance as compared to the original IT-CAD system (AUC=0.865{+-}0.029). Some of the techniques allow for a notable reduction in the total number of examples stored in the case base (to 1.3% of the original size), which, in turn, results in lower storage requirements and a shorter response time of the system. Among the methods examined in this article, the two proposed adaptive techniques are by far the most effective for this purpose. Furthermore, the authors provide some discussion and guidance for choosing the ensemble parameters.

  10. Exploring the intrinsic differences among breast tumor subtypes defined using immunohistochemistry markers based on the decision tree

    PubMed Central

    Li, Yang; Tang, Xu-Qing; Bai, Zhonghu; Dai, Xiaofeng

    2016-01-01

    Exploring the intrinsic differences among breast cancer subtypes is of crucial importance for precise diagnosis and therapeutic decision-making in diseases of high heterogeneity. The subtypes defined with several layers of information are related but not consistent, especially using immunohistochemistry markers and gene expression profiling. Here, we explored the intrinsic differences among the subtypes defined by the estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2 based on the decision tree. We identified 30 mRNAs and 7 miRNAs differentially expressed along the tree’s branches. The final signature panel contained 30 mRNAs, whose performance was validated using two public datasets based on 3 well-known classifiers. The network and pathway analysis were explored for feature genes, from which key molecules including FOXQ1 and SFRP1 were revealed to be densely connected with other molecules and participate in the validated metabolic pathways. Our study uncovered the differences among the four IHC-defined breast tumor subtypes at the mRNA and miRNA levels, presented a novel signature for breast tumor subtyping, and identified several key molecules potentially driving the heterogeneity of such tumors. The results help us further understand breast tumor heterogeneity, which could be availed in clinics. PMID:27786176

  11. Forest or the trees: At what scale do elephants make foraging decisions?

    NASA Astrophysics Data System (ADS)

    Shrader, Adrian M.; Bell, Caroline; Bertolli, Liandra; Ward, David

    2012-07-01

    For herbivores, food is distributed spatially in a hierarchical manner ranging from plant parts to regions. Ultimately, utilisation of food is dependent on the scale at which herbivores make foraging decisions. A key factor that influences these decisions is body size, because selection inversely relates to body size. As a result, large animals can be less selective than small herbivores. Savanna elephants (Loxodonta africana) are the largest terrestrial herbivore. Thus, they represent a potential extreme with respect to unselective feeding. However, several studies have indicated that elephants prefer specific habitats and certain woody plant species. Thus, it is unclear at which scale elephants focus their foraging decisions. To determine this, we recorded the seasonal selection of habitats and woody plant species by elephants in the Ithala Game Reserve, South Africa. We expected that during the wet season, when both food quality and availability were high, that elephants would select primarily for habitats. This, however, does not mean that they would utilise plant species within these habitats in proportion to availability, but rather would show a stronger selection for habitats compared to plants. In contrast, during the dry season when food quality and availability declined, we expected that elephants would shift and select for the remaining high quality woody species across all habitats. Consistent with our predictions, elephants selected for the larger spatial scale (i.e. habitats) during the wet season. However, elephants did not increase their selection of woody species during the dry season, but rather increased their selection of habitats relative to woody plant selection. Unlike a number of earlier studies, we found that that neither palatability (i.e. crude protein, digestibility, and energy) alone nor tannin concentrations had a significant effect for determining the elephants' selection of woody species. However, the palatability:tannin ratio was

  12. A practical decision-tree model to predict complexity of reconstructive surgery after periocular basal cell carcinoma excision.

    PubMed

    Tan, E; Lin, F; Sheck, L; Salmon, P; Ng, S

    2017-04-01

    Periocular basal cell carcinomas (pBCC) have unpredictable growth. The authors seek to derive a decision rule for predicting surgical complexity in pBCC. This study was conducted at two centres in New Zealand from September 2010 to November 2015. Baseline demographic information and an initial assessment of operative complexity (a four-point grading scale) were collected. Assessment of operative complexity was repeated at the time of reconstruction. Univariate analysis was applied to identify the associative factors and supervised machine learning was used to determine the best predictive models to construct a clinical decision rule. A total of 156 patients and 156 periocular BCC were analysed. Univariate analysis revealed that older age, recurrent skin cancer, large tumour size, being a public patient and high complexity at pre-operative assessment were associated with high actual operative complexity. Tumour histology was not associated with more complex surgery. Machine learning analyses revealed that Naive Bayesian classifier was able to distinguish surgical complexity with an average area under the receiver operating characteristic curve (AUC) of 0.854 (95% CI: 0.762-0.946) whereas a simpler, alternating decision tree (ADT) that used only three clinical variables achieved an AUC of 0.853 (95% CI: 0.739-0.931). The ADT model was 10.1 times more likely to correctly identify a high complexity case. The three predictive variables were pre-operative assessment of complexity (high vs. low), surgical delays [early (<75 days) or delayed (≥75 days)], and tumour size [small (<14 mm), or large (≥14 mm)]. For the subgroup with large tumours but low initial assessed complexity, late surgery was associated with a 6.7-fold increase in risk of high-risk surgery. A simple, three-variable risk stratification system was able to predict the operative complexity of pBCC. © 2016 European Academy of Dermatology and Venereology.

  13. Decision-tree analysis of clinical data to aid diagnostic reasoning for equine laminitis: a cross-sectional study.

    PubMed

    Wylie, C E; Shaw, D J; Verheyen, K L P; Newton, J R

    2016-04-23

    The objective of this cross-sectional study was to compare the prevalence of selected clinical signs in laminitis cases and non-laminitic but lame controls to evaluate their capability to discriminate laminitis from other causes of lameness. Participating veterinary practitioners completed a checklist of laminitis-associated clinical signs identified by literature review. Cases were defined as horses/ponies with veterinary-diagnosed, clinically apparent laminitis; controls were horses/ponies with any lameness other than laminitis. Associations were tested by logistic regression with adjusted odds ratios (ORs) and 95% confidence intervals, with veterinary practice as an a priori fixed effect. Multivariable analysis using graphical classification tree-based statistical models linked laminitis prevalence with specific combinations of clinical signs. Data were collected for 588 cases and 201 controls. Five clinical signs had a difference in prevalence of greater than +50 per cent: 'reluctance to walk' (OR 4.4), 'short, stilted gait at walk' (OR 9.4), 'difficulty turning' (OR 16.9), 'shifting weight' (OR 17.7) and 'increased digital pulse' (OR 13.2) (all P<0.001). 'Bilateral forelimb lameness' was the best discriminator; 92 per cent of animals with this clinical sign had laminitis (OR 40.5, P<0.001). If, in addition, horses/ponies had an 'increased digital pulse', 99 per cent were identified as laminitis. 'Presence of a flat/convex sole' also significantly enhanced clinical diagnosis discrimination (OR 15.5, P<0.001). This is the first epidemiological laminitis study to use decision-tree analysis, providing the first evidence base for evaluating clinical signs to differentially diagnose laminitis from other causes of lameness. Improved evaluation of the clinical signs displayed by laminitic animals examined by first-opinion practitioners will lead to equine welfare improvements.

  14. An improved methodology for land-cover classification using artificial neural networks and a decision tree classifier

    NASA Astrophysics Data System (ADS)

    Arellano-Neri, Olimpia

    Mapping is essential for the analysis of the land and land-cover dynamics, which influence many environmental processes and properties. When creating land-cover maps it is important to minimize error, since error will propagate into later analyses based upon these land cover maps. The reliability of land cover maps derived from remotely sensed data depends upon an accurate classification. For decades, traditional statistical methods have been applied in land-cover classification with varying degrees of accuracy. One of the most significant developments in the field of land-cover classification using remotely sensed data has been the introduction of Artificial Neural Networks (ANN) procedures. In this research, Artificial Neural Networks were applied to remotely sensed data of the southwestern Ohio region for land-cover classification. Three variants on traditional ANN-based classifiers were explored here: (1) the use of a customized architecture of the neural network in terms of the input layer for each land-cover class, (2) the use of texture analysis to combine spectral information and spatial information which is essential for urban classes, and (3) the use of decision tree (DT) classification to refine the ANN classification and ultimately to achieve a more reliable land-cover thematic map. The objective of this research was to prove that a classification based on Artificial Neural Networks (ANN) and decision tree (DT) would outperform by far the National Land Cover Data (NLCD). The NLCD is a land-cover classification produced by a cooperative effort between the United States Geological Survey (USGS) and the United States Environmental Protection Agency (USEPA). In order to achieve this objective, an accuracy assessment was conducted for both NLCD classification and ANN/DT classification. Error matrices resulting from the accuracy assessments provided overall accuracy, accuracy of each class, omission errors, and commission errors for each classification. The

  15. The WHO classification of lymphomas: cost-effective immunohistochemistry using a deductive reasoning "decision tree" approach: part II: the decision tree approach: diffuse patterns of proliferation in lymph nodes.

    PubMed

    Taylor, Clive R

    2009-12-01

    The 2008 World Health Organization Classification of Tumors of the Haematopoietic and Lymphoid Tissues defines current standards of practice for the diagnosis and classification of malignant lymphomas and related entities. More than 50 different types of lymphomas are described. Faced with such a broad range of different lymphomas, some encountered only rarely, and a rapidly growing armamentarium of 80 or more pertinent immunohistochemical (IHC) "stains," the challenge to the pathologist is to use IHC in an efficient manner to arrive at an assured and timely diagnosis. This review uses deductive reasoning following a decision tree or dendrogram model, combining basic morphologic patterns and common IHC markers to classify node-based malignancies by the World Health Organization schema. The review is divided into 2 parts, the first addressing those lymphomas that produce a follicular or nodular pattern of lymph nodal involvement appeared in the previous issue of AIMM. The second part addresses diffuse proliferations in lymph nodes. Emphasis is given to the more common lymphomas and the more commonly available IHC "stains" for a pragmatic and practical approach that is both broadly feasible and cost-effective. By this method, an assured diagnosis may be reached in the majority of nodal lymphomas, at the same time developing a sufficiency of data to recognize those rare or atypical cases that require referral to a specialized center.

  16. Detecting subcanopy invasive plant species in tropical rainforest by integrating optical and microwave (InSAR/PolInSAR) remote sensing data, and a decision tree algorithm

    NASA Astrophysics Data System (ADS)

    Ghulam, Abduwasit; Porton, Ingrid; Freeman, Karen

    2014-02-01

    In this paper, we propose a decision tree algorithm to characterize spatial extent and spectral features of invasive plant species (i.e., guava, Madagascar cardamom, and Molucca raspberry) in tropical rainforests by integrating datasets from passive and active remote sensing sensors. The decision tree algorithm is based on a number of input variables including matching score and infeasibility images from Mixture Tuned Matched Filtering (MTMF), land-cover maps, tree height information derived from high resolution stereo imagery, polarimetric feature images, Radar Forest Degradation Index (RFDI), polarimetric and InSAR coherence and phase difference images. Spatial distributions of the study organisms are mapped using pixel-based Winner-Takes-All (WTA) algorithm, object oriented feature extraction, spectral unmixing, and compared with the newly developed decision tree approach. Our results show that the InSAR phase difference and PolInSAR HH-VV coherence images of L-band PALSAR data are the most important variables following the MTMF outputs in mapping subcanopy invasive plant species in tropical rainforest. We also show that the three types of invasive plants alone occupy about 17.6% of the Betampona Nature Reserve (BNR) while mixed forest, shrubland and grassland areas are summed to 11.9% of the reserve. This work presents the first systematic attempt to evaluate forest degradation, habitat quality and invasive plant statistics in the BNR, and provides significant insights as to management strategies for the control of invasive plants and conversation in the reserve.

  17. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    NASA Astrophysics Data System (ADS)

    Khader, A.; Rosenberg, D.; McKee, M.

    2012-12-01

    Nitrate pollution poses a health risk for infants whose freshwater drinking source is groundwater. This risk creates a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision maker and the expected outcomes from these alternatives. The alternatives include: (i) ignore the health risk of nitrate contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, pollution transport processes, and climate (Khader and McKee, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine where methemoglobinemia is the main health problem associated with the principal pollutant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not-use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs include healthcare for methemoglobinemia, purchase of bottled water, and installation and maintenance of the groundwater monitoring system. At current

  18. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    NASA Astrophysics Data System (ADS)

    Khader, A. I.; Rosenberg, D. E.; McKee, M.

    2013-05-01

    Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i) ignore the health risk of nitrate-contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome) is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs

  19. Ensemble Feature Learning of Genomic Data Using Support Vector Machine.

    PubMed

    Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R; Braytee, Ali; Kennedy, Paul J

    2016-01-01

    The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data.

  20. Landslide susceptibility mapping using decision-tree based CHi-squared automatic interaction detection (CHAID) and Logistic regression (LR) integration

    NASA Astrophysics Data System (ADS)

    Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin

    2014-06-01

    This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.

  1. Diagnosis of pulmonary hypertension from magnetic resonance imaging–based computational models and decision tree analysis

    PubMed Central

    Swift, Andrew J.; Capener, David; Kiely, David; Hose, Rod; Wild, Jim M.

    2016-01-01

    Abstract Accurately identifying patients with pulmonary hypertension (PH) using noninvasive methods is challenging, and right heart catheterization (RHC) is the gold standard. Magnetic resonance imaging (MRI) has been proposed as an alternative to echocardiography and RHC in the assessment of cardiac function and pulmonary hemodynamics in patients with suspected PH. The aim of this study was to assess whether machine learning using computational modeling techniques and image-based metrics of PH can improve the diagnostic accuracy of MRI in PH. Seventy-two patients with suspected PH attending a referral center underwent RHC and MRI within 48 hours. Fifty-seven patients were diagnosed with PH, and 15 had no PH. A number of functional and structural cardiac and cardiovascular markers derived from 2 mathematical models and also solely from MRI of the main pulmonary artery and heart were integrated into a classification algorithm to investigate the diagnostic utility of the combination of the individual markers. A physiological marker based on the quantification of wave reflection in the pulmonary artery was shown to perform best individually, but optimal diagnostic performance was found by the combination of several image-based markers. Classifier results, validated using leave-one-out cross validation, demonstrated that combining computation-derived metrics reflecting hemodynamic changes in the pulmonary vasculature with measurement of right ventricular morphology and function, in a decision support algorithm, provides a method to noninvasively diagnose PH with high accuracy (92%). The high diagnostic accuracy of these MRI-based model parameters may reduce the need for RHC in patients with suspected PH. PMID:27252844

  2. Prediction of healthy blood with data mining classification by using Decision Tree, Naive Baysian and SVM approaches

    NASA Astrophysics Data System (ADS)

    Khalilinezhad, Mahdieh; Minaei, Behrooz; Vernazza, Gianni; Dellepiane, Silvana

    2015-03-01

    Data mining (DM) is the process of discovery knowledge from large databases. Applications of data mining in Blood Transfusion Organizations could be useful for improving the performance of blood donation service. The aim of this research is the prediction of healthiness of blood donors in Blood Transfusion Organization (BTO). For this goal, three famous algorithms such as Decision Tree C4.5, Naïve Bayesian classifier, and Support Vector Machine have been chosen and applied to a real database made of 11006 donors. Seven fields such as sex, age, job, education, marital status, type of donor, results of blood tests (doctors' comments and lab results about healthy or unhealthy blood donors) have been selected as input to these algorithms. The results of the three algorithms have been compared and an error cost analysis has been performed. According to this research and the obtained results, the best algorithm with low error cost and high accuracy is SVM. This research helps BTO to realize a model from blood donors in each area in order to predict the healthy blood or unhealthy blood of donors. This research could be useful if used in parallel with laboratory tests to better separate unhealthy blood.

  3. Systemic inflammation and family history in relation to the prevalence of type 2 diabetes based on an alternating decision tree

    PubMed Central

    Uemura, Hirokazu; Ghaibeh, A. Ammar; Katsuura-Kamano, Sakurako; Yamaguchi, Miwa; Bahari, Tirani; Ishizu, Masashi; Moriguchi, Hiroki; Arisawa, Kokichi

    2017-01-01

    To investigate unknown patterns associated with type 2 diabetes in the Japanese population, we first used an alternating decision tree (ADTree) algorithm, a powerful classification algorithm from data mining, for the data from 1,102 subjects aged 35–69 years. On the basis of the investigated patterns, we then evaluated the associations of serum high-sensitivity C-reactive protein (hs-CRP) as a biomarker of systemic inflammation and family history of diabetes (negative, positive or unknown) with the prevalence of type 2 diabetes because their detailed associations have been scarcely reported. Elevated serum hs-CRP levels were proportionally associated with the increased prevalence of type 2 diabetes after adjusting for probable covariates, including body mass index and family history of diabetes (P for trend = 0.016). Stratified analyses revealed that elevated serum hs-CRP levels were proportionally associated with increased prevalence of diabetes in subjects without a family history of diabetes (P for trend = 0.020) but not in those with a family history or with an unknown family history of diabetes. Our study demonstrates that systemic inflammation was proportionally associated with increased prevalence of type 2 diabetes even after adjusting for body mass index, especially in subjects without a family history of diabetes. PMID:28361994

  4. Assessing and monitoring the risk of desertification in Dobrogea, Romania, using Landsat data and decision tree classifier.

    PubMed

    Vorovencii, Iosif

    2015-04-01

    The risk of the desertification of a part of Romania is increasingly evident, constituting a serious problem for the environment and the society. This article attempts to assess and monitor the risk of desertification in Dobrogea using Landsat Thematic Mapper (TM) satellite images acquired in 1987, 1994, 2000, 2007 and 2011. In order to assess the risk of desertification, we used as indicators the Modified Soil Adjustment Vegetation Index 1 (MSAVI1), the Moving Standard Deviation Index (MSDI) and the albedo, indices relating to the vegetation conditions, the landscape pattern and micrometeorology. The decision tree classifier (DTC) was also used on the basis of pre-established rules, and maps displaying six grades of desertification risk were obtained: non, very low, low, medium, high and severe. Land surface temperature (LST) was also used for the analysis. The results indicate that, according to pre-established rules for the period of 1987-2011, there are two grades of desertification risk that have an ascending trend in Dobrogea, namely very low and medium desertification. An investigation into the causes of the desertification risk revealed that high temperature is the main factor, accompanied by the destruction of forest shelterbelts and of the irrigation system and, to a smaller extent, by the fragmentation of agricultural land and the deforestation in the study area.

  5. Detecting surface coal mining areas from remote sensing imagery: an approach based on object-oriented decision trees

    NASA Astrophysics Data System (ADS)

    Zeng, Xiaoji; Liu, Zhifeng; He, Chunyang; Ma, Qun; Wu, Jianguo

    2017-01-01

    Detecting surface coal mining areas (SCMAs) using remote sensing data in a timely and an accurate manner is necessary for coal industry management and environmental assessment. We developed an approach to effectively extract SCMAs from remote sensing imagery based on object-oriented decision trees (OODT). This OODT approach involves three main steps: object-oriented segmentation, calculation of spectral characteristics, and extraction of SCMAs. The advantage of this approach lies in its effective integration of the spectral and spatial characteristics of SCMAs so as to distinguish the mining areas (i.e., the extracting areas, stripped areas, and dumping areas) from other areas that exhibit similar spectral features (e.g., bare soils and built-up areas). We implemented this method to extract SCMAs in the eastern part of Ordos City in Inner Mongolia, China. Our results had an overall accuracy of 97.07% and a kappa coefficient of 0.80. As compared with three other spectral information-based methods, our OODT approach is more accurate in quantifying the amount and spatial pattern of SCMAs in dryland regions.

  6. A Decision-Tree Approach to the Assessment of Posttraumatic Stress Disorder: Engineering Empirically Rigorous and Ecologically Valid Assessment Measures

    PubMed Central

    Stewart, Regan W.; Tuerk, Peter W.; Metzger, Isha W.; Davidson, Tatiana M.; Young, John

    2017-01-01

    Structured diagnostic interviews are widely considered to be the optimal method of assessing symptoms of posttraumatic stress; however, few clinicians report using structured assessments to guide clinical practice. One commonly cited impediment to these assessment approaches is the amount of time required for test administration and interpretation. Empirically keyed methods to reduce the administration time of structured assessments may be a viable solution to increase the use of standardized and reliable diagnostic tools. Thus, the present research conducted an initial feasibility study using a sample of treatment-seeking military veterans (N = 1,517) to develop a truncated assessment protocol based on the Clinician-Administered Posttraumatic Stress Disorder (PTSD) Scale (CAPS). Decision-tree analysis was utilized to identify a subset of predictor variables among the CAPS items that were most predictive of a diagnosis of PTSD. The algorithm-driven, atheoretical sequence of questions reduced the number of items administered by more than 75% and classified the validation sample at 92% accuracy. These results demonstrated the feasibility of developing a protocol to assess PTSD in a way that imposes little assessment burden while still providing a reliable categorization. PMID:26654473

  7. Feature selection using Decision Tree and classification through Proximal Support Vector Machine for fault diagnostics of roller bearing

    NASA Astrophysics Data System (ADS)

    Sugumaran, V.; Muralidharan, V.; Ramachandran, K. I.

    2007-02-01

    Roller bearing is one of the most widely used rotary elements in a rotary machine. The roller bearing's nature of vibration reveals its condition and the features that show the nature, are to be extracted through some indirect means. Statistical parameters like kurtosis, standard deviation, maximum value, etc. form a set of features, which are widely used in fault diagnostics. Often the problem is, finding out good features that discriminate the different fault conditions of the bearing. Selection of good features is an important phase in pattern recognition and requires detailed domain knowledge. This paper illustrates the use of a Decision Tree that identifies the best features from a given set of samples for the purpose of classification. It uses Proximal Support Vector Machine (PSVM), which has the capability to efficiently classify the faults using statistical features. The vibration signal from a piezoelectric transducer is captured for the following conditions: good bearing, bearing with inner race fault, bearing with outer race fault, and inner and outer race fault. The statistical features are extracted therefrom and classified successfully using PSVM and SVM. The results of PSVM and SVM are compared.

  8. Predicting the variability of water resources in eleven global river basins using multivariate and decision tree analysis with satellite data

    NASA Astrophysics Data System (ADS)

    Fayne, J.; Lakshmi, V.

    2016-12-01

    The increasing trend of floods and droughts over the past decade has made the study of hydrologic processes and water availability vital to our understanding of extreme hydrologic events. As extreme events result in thousands of lives lost in addition to billions in property damage, many of these extreme events occur in developing countries where in-situ observing networks are sparse making forecasting and estimation of impacts near impossible. Eleven river basins around the globe are analyzed using satellite and modeled data from NASA to compute these patterns globally and begin to understand and predict extreme events based on fluctuations in ground water modeled by the GRACE tandem satellite. This study assesses how the water cycle variables such as precipitation, soil moisture, runoff, evapotranspiration and vegetation have changed over the past 15 years, focusing on climate systems represented by the 2007 Koppen Climate Classification. Monthly trends using GRACE Water Equivalent Thickness Anomaly, TRMM and GPM Precipitation, MODIS NDVI and ET, and GLDAS Runoff and Root Zone Soil moisture are analyzed using a combination of multivariate regression and decision tree classification at sub-basin climate level. The result of these analyses yields predicted Water Equivalent Thickness Anomaly maps that are climate specific with a higher resolution, up to 250 meters, compared to the GRACE 100 km product.

  9. Systemic inflammation and family history in relation to the prevalence of type 2 diabetes based on an alternating decision tree.

    PubMed

    Uemura, Hirokazu; Ghaibeh, A Ammar; Katsuura-Kamano, Sakurako; Yamaguchi, Miwa; Bahari, Tirani; Ishizu, Masashi; Moriguchi, Hiroki; Arisawa, Kokichi

    2017-03-31

    To investigate unknown patterns associated with type 2 diabetes in the Japanese population, we first used an alternating decision tree (ADTree) algorithm, a powerful classification algorithm from data mining, for the data from 1,102 subjects aged 35-69 years. On the basis of the investigated patterns, we then evaluated the associations of serum high-sensitivity C-reactive protein (hs-CRP) as a biomarker of systemic inflammation and family history of diabetes (negative, positive or unknown) with the prevalence of type 2 diabetes because their detailed associations have been scarcely reported. Elevated serum hs-CRP levels were proportionally associated with the increased prevalence of type 2 diabetes after adjusting for probable covariates, including body mass index and family history of diabetes (P for trend = 0.016). Stratified analyses revealed that elevated serum hs-CRP levels were proportionally associated with increased prevalence of diabetes in subjects without a family history of diabetes (P for trend = 0.020) but not in those with a family history or with an unknown family history of diabetes. Our study demonstrates that systemic inflammation was proportionally associated with increased prevalence of type 2 diabetes even after adjusting for body mass index, especially in subjects without a family history of diabetes.

  10. Landsat-derived cropland mask for Tanzania using 2010-2013 time series and decision tree classifier methods

    NASA Astrophysics Data System (ADS)

    Justice, C. J.

    2015-12-01

    80% of Tanzania's population is involved in the agriculture sector. Despite this national dependence, agricultural reporting is minimal and monitoring efforts are in their infancy. The cropland mask developed through this study provides the framework for agricultural monitoring through informing analysis of crop conditions, dispersion, and intensity at a national scale. Tanzania is dominated by smallholder agricultural systems with an average field size of less than one hectare (Sarris et al, 2006). At this field scale, previous classifications of agricultural land in Tanzania using MODIS course resolution data are insufficient to inform a working monitoring system. The nation-wide cropland mask in this study was developed using composited Landsat tiles from a 2010-2013 time series. Decision tree classifiers methods were used in the study with representative training areas collected for agriculture and no agriculture using appropriate indices to separate these classes (Hansen et al, 2013). Validation was done using random sample and high resolution satellite images to compare Agriculture and No agriculture samples from the study area. The techniques used in this study were successful and have the potential to be adapted for other countries, allowing targeted monitoring efforts to improve food security, market price, and inform agricultural policy.

  11. An expert system with radial basis function neural network based on decision trees for predicting sediment transport in sewers.

    PubMed

    Ebtehaj, Isa; Bonakdari, Hossein; Zaji, Amir Hossein

    2016-01-01

    In this study, an expert system with a radial basis function neural network (RBF-NN) based on decision trees (DT) is designed to predict sediment transport in sewer pipes at the limit of deposition. First, sensitivity analysis is carried out to investigate the effect of each parameter on predicting the densimetric Froude number (Fr). The results indicate that utilizing the ratio of the median particle diameter to pipe diameter (d/D), ratio of median particle diameter to hydraulic radius (d/R) and volumetric sediment concentration (C(V)) as the input combination leads to the best Fr prediction. Subsequently, the new hybrid DT-RBF method is presented. The results of DT-RBF are compared with RBF and RBF-particle swarm optimization (PSO), which uses PSO for RBF training. It appears that DT-RBF is more accurate (R(2) = 0.934, MARE = 0.103, RMSE = 0.527, SI = 0.13, BIAS = -0.071) than the two other RBF methods. Moreover, the proposed DT-RBF model offers explicit expressions for use by practicing engineers.

  12. The use of decision trees in the classification of beach forms/patterns on IKONOS-2 data

    NASA Astrophysics Data System (ADS)

    Teodoro, A. C.; Ferreira, D.; Gonçalves, H.

    2013-10-01

    Evaluation of beach hydromorphological behaviour and its classification is highly complex. The available beach morphologic and classification models are mainly based on wave, tidal and sediment parameters. Since these parameters are usually unavailable for some regions - such as in the Portuguese coastal zone - a morphologic analysis using remotely sensed data seems to be a valid alternative. Data mining for spatial pattern recognition is the process of discovering useful information, such as patterns/forms, changes and significant structures from large amounts of data. This study focuses on the application of data mining techniques, particularly Decision Trees (DT), to an IKONOS-2 image in order to classify beach features/patterns, in a stretch of the northwest coast of Portugal. Based on the knowledge of the coastal features, five classes were defined: Sea, Suspended-Sediments, Breaking-Zone, Beachface and Beach. The dataset was randomly divided into training and validation subsets. Based on the analysis of several DT algorithms, the CART algorithm was found to be the most adequate and was thus applied. The performance of the DT algorithm was evaluated by the confusion matrix, overall accuracy, and Kappa coefficient. In the classification of beach features/patterns, the algorithm presented an overall accuracy of 98.2% and a kappa coefficient of 0.97. The DTs were compared with a neural network algorithm, and the results were in agreement. The methodology presented in this paper provides promising results and should be considered in further applications of beach forms/patterns classification.

  13. A decision-tree approach to the assessment of posttraumatic stress disorder: Engineering empirically rigorous and ecologically valid assessment measures.

    PubMed

    Stewart, Regan W; Tuerk, Peter W; Metzger, Isha W; Davidson, Tatiana M; Young, John

    2016-02-01

    Structured diagnostic interviews are widely considered to be the optimal method of assessing symptoms of posttraumatic stress; however, few clinicians report using structured assessments to guide clinical practice. One commonly cited impediment to these assessment approaches is the amount of time required for test administration and interpretation. Empirically keyed methods to reduce the administration time of structured assessments may be a viable solution to increase the use of standardized and reliable diagnostic tools. Thus, the present research conducted an initial feasibility study using a sample of treatment-seeking military veterans (N = 1,517) to develop a truncated assessment protocol based on the Clinician-Administered Posttraumatic Stress Disorder (PTSD) Scale (CAPS). Decision-tree analysis was utilized to identify a subset of predictor variables among the CAPS items that were most predictive of a diagnosis of PTSD. The algorithm-driven, atheoretical sequence of questions reduced the number of items administered by more than 75% and classified the validation sample at 92% accuracy. These results demonstrated the feasibility of developing a protocol to assess PTSD in a way that imposes little assessment burden while still providing a reliable categorization. (c) 2016 APA, all rights reserved).

  14. An Ensemble Rule Learning Approach for Automated Morphological Classification of Erythrocytes.

    PubMed

    Maity, Maitreya; Mungle, Tushar; Dhane, Dhiraj; Maiti, A K; Chakraborty, Chandan

    2017-04-01

    The analysis of pathophysiological change to erythrocytes is important for early diagnosis of anaemia. The manual assessment of pathology slides is time-consuming and complicated regarding various types of cell identification. This paper proposes an ensemble rule-based decision-making approach for morphological classification of erythrocytes. Firstly, the digital microscopic blood smear images are pre-processed for removal of spurious regions followed by colour normalisation and thresholding. The erythrocytes are segmented from background image using the watershed algorithm. The shape features are then extracted from the segmented image to detect shape abnormality present in microscopic blood smear images. The decision about the abnormality is taken using proposed multiple rule-based expert systems. The deciding factor is majority ensemble voting for abnormally shaped erythrocytes. Here, shape-based features are considered for nine different types of abnormal erythrocytes including normal erythrocytes. Further, the adaptive boosting algorithm is used to generate multiple decision tree models where each model tree generates an individual rule set. The supervised classification method is followed to generate rules using a C4.5 decision tree. The proposed ensemble approach is precise in detecting eight types of abnormal erythrocytes with an overall accuracy of 97.81% and weighted sensitivity of 97.33%, weighted specificity of 99.7%, and weighted precision of 98%. This approach shows the robustness of proposed strategy for erythrocytes classification into abnormal and normal class. The article also clarifies its latent quality to be incorporated in point of care technology solution targeting a rapid clinical assistance.

  15. MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging

    NASA Astrophysics Data System (ADS)

    Chen, Lei; Kamel, Mohamed S.

    2016-01-01

    In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.

  16. Advanced predictive methods for wine age prediction: Part I - A comparison study of single-block regression approaches based on variable selection, penalized regression, latent variables and tree-based ensemble methods.

    PubMed

    Rendall, Ricardo; Pereira, Ana Cristina; Reis, Marco S

    2017-08-15

    In this paper we test and compare advanced predictive approaches for estimating wine age in the context of the production of a high quality fortified wine - Madeira Wine. We consider four different data sets, namely, volatile, polyphenols, organic acids and the UV-vis spectra. Each one of these data sets contain chemical information of a different nature and present diverse data structures, namely a different dimensionality, level of collinearity and degree of sparsity. These different aspects may imply the use of different modelling approaches in order to better explore the data set's information content, namely their predictive potential for wine age. This happens to be so, because different regression methods have different prior assumptions regarding the predictors, response variable(s) and the data generating mechanism, which may or may not find good adherence to the case study under analysis. In order to cover a wide range of modelling domains, we have incorporated in this work methods belonging to four very distinct classes of approaches that cover most applications found in practice: linear regression with variable selection, penalized regression, latent variables regression and tree-based ensemble methods. We have also developed a rigorous comparison framework based on a double Monte Carlo cross-validation scheme, in order to perform the relative assessment of the performance of the various methods. Upon comparison, models built using the polyphenols and volatile composition data sets led to better wine age predictions, showing lower errors under testing conditions. Furthermore, the results obtained for the polyphenols data set suggest a more sparse structure that can be further explored in order to reduce the number of measured variables. In terms of regression methods, tree-based methods, and boosted regression trees in particular, presented the best results for the polyphenols, volatile and the organic acid data sets, suggesting a possible presence of a

  17. Development and Validation of a Computational Model Ensemble for the Early Detection of BCRP/ABCG2 Substrates during the Drug Design Stage.

    PubMed

    Gantner, Melisa E; Peroni, Roxana N; Morales, Juan F; Villalba, María L; Ruiz, María E; Talevi, Alan

    2017-08-28

    Breast Cancer Resistance Protein (BCRP) is an ATP-dependent efflux transporter linked to the multidrug resistance phenomenon in many diseases such as epilepsy and cancer and a potential source of drug interactions. For these reasons, the early identification of substrates and nonsubstrates of this transporter during the drug discovery stage is of great interest. We have developed a computational nonlinear model ensemble based on conformational independent molecular descriptors using a combined strategy of genetic algorithms, J48 decision tree classifiers, and data fusion. The best model ensemble consists in averaging the ranking of the 12 decision trees that showed the best performance on the training set, which also demonstrated a good performance for the test set. It was experimentally validated using the ex vivo everted rat intestinal sac model. Five anticonvulsant drugs classified as nonsubstrates for BRCP by the model ensemble were experimentally evaluated, and none of them proved to be a BCRP substrate under the experimental conditions used, thus confirming the predictive ability of the model ensemble. The model ensemble reported here is a potentially valuable tool to be used as an in silico ADME filter in computer-aided drug discovery campaigns intended to overcome BCRP-mediated multidrug resistance issues and to prevent drug-drug interactions.

  18. The use of decision trees and naïve Bayes algorithms and trace element patterns for controlling the authenticity of free-range-pastured hens' eggs.

    PubMed

    Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando

    2014-09-01

    This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation. © 2014 Institute of Food Technologists®

  19. Rapid decision support tool based on novel ecosystem service variables for retrofitting of permeable pavement systems in the presence of trees.

    PubMed

    Scholz, Miklas; Uzomah, Vincent C

    2013-08-01

    The retrofitting of sustainable drainage systems (SuDS) such as permeable pavements is currently undertaken ad hoc using expert experience supported by minimal guidance based predominantly on hard engineering variables. There is a lack of practical decision support tools useful for a rapid assessment of the potential of ecosystem services when retrofitting permeable pavements in urban areas that either feature existing trees or should be planted with trees in the near future. Thus the aim of this paper is to develop an innovative rapid decision support tool based on novel ecosystem service variables for retrofitting of permeable pavement systems close to trees. This unique tool proposes the retrofitting of permeable pavements that obtained the highest ecosystem service score for a specific urban site enhanced by the presence of trees. This approach is based on a novel ecosystem service philosophy adapted to permeable pavements rather than on traditional engineering judgement associated with variables based on quick community and environment assessments. For an example case study area such as Greater Manchester, which was dominated by Sycamore and Common Lime, a comparison with the traditional approach of determining community and environment variables indicates that permeable pavements are generally a preferred SuDS option. Permeable pavements combined with urban trees received relatively high scores, because of their great potential impact in terms of water and air quality improvement, and flood control, respectively. The outcomes of this paper are likely to lead to more combined permeable pavement and tree systems in the urban landscape, which are beneficial for humans and the environment.

  20. The risk of disabling, surgery and reoperation in Crohn’s disease – A decision tree-based approach to prognosis

    PubMed Central

    Dias, Cláudia Camila; Pereira Rodrigues, Pedro; Fernandes, Samuel; Portela, Francisco; Ministro, Paula; Martins, Diana; Sousa, Paula; Lago, Paula; Rosa, Isadora; Correia, Luis; Moura Santos, Paula

    2017-01-01

    Introduction Crohn’s disease (CD) is a chronic inflammatory bowel disease known to carry a high risk of disabling and many times requiring surgical interventions. This article describes a decision-tree based approach that defines the CD patients’ risk or undergoing disabling events, surgical interventions and reoperations, based on clinical and demographic variables. Materials and methods This multicentric study involved 1547 CD patients retrospectively enrolled and divided into two cohorts: a derivation one (80%) and a validation one (20%). Decision trees were built upon applying the CHAIRT algorithm for the selection of variables. Results Three-level decision trees were built for the risk of disabling and reoperation, whereas the risk of surgery was described in a two-level one. A receiver operating characteristic (ROC) analysis was performed, and the area under the curves (AUC) Was higher than 70% for all outcomes. The defined risk cut-off values show usefulness for the assessed outcomes: risk levels above 75% for disabling had an odds test positivity of 4.06 [3.50–4.71], whereas risk levels below 34% and 19% excluded surgery and reoperation with an odds test negativity of 0.15 [0.09–0.25] and 0.50 [0.24–1.01], respectively. Overall, patients with B2 or B3 phenotype had a higher proportion of disabling disease and surgery, while patients with later introduction of pharmacological therapeutic (1 months after initial surgery) had a higher proportion of reoperation. Conclusions The decision-tree based approach used in this study, with demographic and clinical variables, has shown to be a valid and useful approach to depict such risks of disabling, surgery and reoperation. PMID:28225800

  1. Subtyping of renal cortical neoplasms in fine needle aspiration biopsies using a decision tree based on genomic alterations detected by fluorescence in situ hybridization

    PubMed Central

    Gowrishankar, Banumathy; Cahill, Lynnette; Arndt, Alexandra E; Al-Ahmadie, Hikmat; Lin, Oscar; Chadalavada, Kalyani; Chaganti, Seeta; Nanjangud, Gouri J; Murty, Vundavalli V; Chaganti, Raju S K; Reuter, Victor E; Houldsworth, Jane

    2014-01-01

    Objectives To improve the overall accuracy of diagnosis in needle biopsies of renal masses, especially small renal masses (SRMs), using fluorescence in situ hybridization (FISH), and to develop a renal cortical neoplasm classification decision tree based on genomic alterations detected by FISH. Patients and Methods Ex vivo fine needle aspiration biopsies of 122 resected renal cortical neoplasms were subjected to FISH using a series of seven-probe sets to assess gain or loss of 10 chromosomes and rearrangement of the 11q13 locus. Using specimen (nephrectomy)-histology as the ‘gold standard’, a genomic aberration-based decision tree was generated to classify specimens. The diagnostic potential of the decision tree was assessed by comparing the FISH-based classification and biopsy histology with specimen histology. Results Of the 114 biopsies diagnostic by either method, a higher diagnostic yield was achieved by FISH (92 and 96%) than histology alone (82 and 84%) in the 65 biopsies from SRMs (<4 cm) and 49 from larger masses, respectively. An optimized decision tree was constructed based on aberrations detected in eight chromosomes, by which the maximum concordance of classification achieved by FISH was 79%, irrespective of mass size. In SRMs, the overall sensitivity of diagnosis by FISH compared with histopathology was higher for benign oncocytoma, was similar for the chromophobe renal cell carcinoma subtype, and was lower for clear-cell and papillary subtypes. The diagnostic accuracy of classification of needle biopsy specimens (from SRMs) increased from 80% obtained by histology alone to 94% when combining histology and FISH. Conclusion The present study suggests that a novel FISH assay developed by us has a role to play in assisting in the yield and accuracy of diagnosis of renal cortical neoplasms in needle biopsies in particular, and can help guide the clinical management of patients with SRMs that were non-diagnostic by histology. PMID:24467611

  2. Comparison of two data mining techniques in labeling diagnosis to Iranian pharmacy claim dataset: artificial neural network (ANN) versus decision tree model.

    PubMed

    Rezaei-Darzi, Ehsan; Farzadfar, Farshad; Hashemi-Meshkini, Amir; Navidi, Iman; Mahmoudi, Mahmoud; Varmaghani, Mehdi; Mehdipour, Parinaz; Soudi Alamdari, Mahsa; Tayefi, Batool; Naderimagham, Shohreh; Soleymani, Fatemeh; Mesdaghinia, Alireza; Delavari, Alireza; Mohammad, Kazem

    2014-12-01

    This study aimed to evaluate and compare the prediction accuracy of two data mining techniques, including decision tree and neural network models in labeling diagnosis to gastrointestinal prescriptions in Iran. This study was conducted in three phases: data preparation, training phase, and testing phase. A sample from a database consisting of 23 million pharmacy insurance claim records, from 2004 to 2011 was used, in which a total of 330 prescriptions were assessed and used to train and test the models simultaneously. In the training phase, the selected prescriptions were assessed by both a physician and a pharmacist separately and assigned a diagnosis. To test the performance of each model, a k-fold stratified cross validation was conducted in addition to measuring their sensitivity and specificity. Generally, two methods had very similar accuracies. Considering the weighted average of true positive rate (sensitivity) and true negative rate (specificity), the decision tree had slightly higher accuracy in its ability for correct classification (83.3% and 96% versus 80.3% and 95.1%, respectively). However, when the weighted average of ROC area (AUC between each class and all other classes) was measured, the ANN displayed higher accuracies in predicting the diagnosis (93.8% compared with 90.6%). According to the result of this study, artificial neural network and decision tree model represent similar accuracy in labeling diagnosis to GI prescription.

  3. Selecting Relevant Descriptors for Classification by Bayesian Estimates: A Comparison with Decision Trees and Support Vector Machines Approaches for Disparate Data Sets.

    PubMed

    Carbon-Mangels, Miriam; Hutter, Michael C

    2011-10-01

    Classification algorithms suffer from the curse of dimensionality, which leads to overfitting, particularly if the problem is over-determined. Therefore it is of particular interest to identify the most relevant descriptors to reduce the complexity. We applied Bayesian estimates to model the probability distribution of descriptors values used for binary classification using n-fold cross-validation. As a measure for the discriminative power of the classifiers, the symmetric form of the Kullback-Leibler divergence of their probability distributions was computed. We found that the most relevant descriptors possess a Gaussian-like distribution of their values, show the largest divergences, and therefore appear most often in the cross-validation scenario. The results were compared to those of the LASSO feature selection method applied to multiple decision trees and support vector machine approaches for data sets of substrates and nonsubstrates of three Cytochrome P450 isoenzymes, which comprise strongly unbalanced compound distributions. In contrast to decision trees and support vector machines, the performance of Bayesian estimates is less affected by unbalanced data sets. This strategy reveals those descriptors that allow a simple linear separation of the classes, whereas the superior accuracy of decision trees and support vector machines can be attributed to nonlinear separation, which are in turn more prone to overfitting.

  4. Energy spectra unfolding of fast neutron sources using the group method of data handling and decision tree algorithms

    NASA Astrophysics Data System (ADS)

    Hosseini, Seyed Abolfazl; Afrakoti, Iman Esmaili Paeen

    2017-04-01

    Accurate unfolding of the energy spectrum of a neutron source gives important information about unknown neutron sources. The obtained information is useful in many areas like nuclear safeguards, nuclear nonproliferation, and homeland security. In the present study, the energy spectrum of a poly-energetic fast neutron source is reconstructed using the developed computational codes based on the Group Method of Data Handling (GMDH) and Decision Tree (DT) algorithms. The neutron pulse height distribution (neutron response function) in the considered NE-213 liquid organic scintillator has been simulated using the developed MCNPX-ESUT computational code (MCNPX-Energy engineering of Sharif University of Technology). The developed computational codes based on the GMDH and DT algorithms use some data for training, testing and validation steps. In order to prepare the required data, 4000 randomly generated energy spectra distributed over 52 bins are used. The randomly generated energy spectra and the simulated neutron pulse height distributions by MCNPX-ESUT for each energy spectrum are used as the output and input data. Since there is no need to solve the inverse problem with an ill-conditioned response matrix, the unfolded energy spectrum has the highest accuracy. The 241Am-9Be and 252Cf neutron sources are used in the validation step of the calculation. The unfolded energy spectra for the used fast neutron sources have an excellent agreement with the reference ones. Also, the accuracy of the unfolded energy spectra obtained using the GMDH is slightly better than those obtained from the DT. The results obtained in the present study have good accuracy in comparison with the previously published paper based on the logsig and tansig transfer functions.

  5. New Landsat derived cropland mask for Tanzania using 2010-2013 time series and decision tree classifier methods.

    NASA Astrophysics Data System (ADS)

    Justice, C. J.

    2016-12-01

    Eighty percent of Tanzania's population is involved in the agriculture sector. Despite this national dependence, agricultural reporting is minimal and monitoring efforts are in their infancy. The cropland mask developed through this study provides an underpinning for agricultural monitoring by informing analysis of crop conditions, dispersion, and intensity at a national scale. Tanzania is dominated by smallholder agricultural systems with an average field size of less than one hectare. At this field scale, previous classifications of agricultural land in Tanzania using MODIS coarse resolution data are insufficient to inform a working monitoring system. The nation-wide cropland mask in this study was developed using composited Landsat tiles from a 2010-2013 time-series. Decision tree classifier methods were used in the study with representative training areas collected for agriculture and no agriculture using appropriate indices to separate these classes. Validation was undertaken using a random sample and high resolution satellite images to compare agriculture and no agriculture samples from the study area. The cropland mask had high producer and user accuracy in the no agriculture class at 95.0% and 97.35% respectively. There was high producer accuracy in the agriculture class at 80.2% and moderate user accuracy at 67.9%. The principal metrics used for the classification support the theme that agriculture in Tanzania and Sub-Saharan Africa are less vegetated than surrounding areas and most similar to bare ground - emphasizing the need for improved access to inputs and irrigation to enhance productivity and smallholder livelihoods. The techniques used in this study were successful for developing a cropland mask and have the potential to be adapted for other countries, allowing targeted monitoring efforts to improve food security, market price, and inform agricultural policy.

  6. Decision-Tree Based Model Analysis for Efficient Identification of Parameter Relations Leading to Different Signaling States

    PubMed Central

    Koch, Yvonne; Wolf, Thomas; Sorger, Peter K.; Eils, Roland; Brors, Benedikt

    2013-01-01

    In systems biology, a mathematical description of signal transduction processes is used to gain a more detailed mechanistic understanding of cellular signaling networks. Such models typically depend on a number of parameters that have different influence on the model behavior. Local sensitivity analysis is able to identify parameters that have the largest effect on signaling strength. Bifurcation analysis shows on which parameters a qualitative model response depends. Most methods for model analysis are intrinsically univariate. They typically cannot consider combinations of parameters since the search space for such analysis would be too large. This limitation is important since activation of a signaling pathway often relies on multiple rather than on single factors. Here, we present a novel method for model analysis that overcomes this limitation. As input to a model defined by a system of ordinary differential equations, we consider parameters for initial chemical species concentrations. The model is used to simulate the system response, which is then classified into pre-defined classes (e.g., active or not active). This is combined with a scan of the parameter space. Parameter sets leading to a certain system response are subjected to a decision tree algorithm, which learns conditions that lead to this response. We compare our method to two alternative multivariate approaches to model analysis: analytical solution for steady states combined with a parameter scan, and direct Lyapunov exponent (DLE) analysis. We use three previously published models including a model for EGF receptor internalization and two apoptosis models to demonstrate the power of our approach. Our method reproduces critical parameter relations previously obtained by both steady-state and DLE analysis while being more generally applicable and substantially less computationally expensive. The method can be used as a general tool to predict multivariate control strategies for pathway activation

  7. [Use the Markov-decision tree model to optimize vaccination strategies of hepatitis E among women aged 15 to 49].

    PubMed

    Chen, Z M; Ji, S B; Shi, X L; Zhao, Y Y; Zhang, X F; Jin, H

    2017-02-10

    Objective: To evaluate the cost-utility of different hepatitis E vaccination strategies in women aged 15 to 49. Methods: The Markov-decision tree model was constructed to evaluate the cost-utility of three hepatitis E virus vaccination strategies. Parameters of the models were estimated on the basis of published studies and experience of experts. Both methods on sensitivity and threshold analysis were used to evaluate the uncertainties of the model. Results: Compared with non-vaccination group, strategy on post-screening vaccination with rate as 100%, could save 0.10 quality-adjusted life years per capital in the women from the societal perspectives. After implementation of screening program and with the vaccination rate reaching 100%, the incremental cost utility ratio (ICUR) of vaccination appeared as 5 651.89 and 6 385.33 Yuan/QALY, respectively. Vaccination post to the implementation of a screening program, the result showed better benefit than the vaccination rate of 100%. Results from the sensitivity analysis showed that both the cost of hepatitis E vaccine and the inoculation compliance rate presented significant effects. If the cost were lower than 191.56 Yuan (RMB) or the inoculation compliance rate lower than 0.23, the vaccination rate of 100% strategy was better than the post-screening vaccination strategy, otherwise the post-screening vaccination strategy appeared the optimal strategy. Conclusion: Post-screening vaccination for women aged 15 to 49 from social perspectives seemed the optimal one but it had to depend on the change of vaccine cost and the rate of inoculation compliance.

  8. IHC and the WHO classification of lymphomas: cost effective immunohistochemistry using a deductive reasoning "decision tree" approach.

    PubMed

    Taylor, Clive R

    2009-10-01

    The 2008 World Health Organization Classification of Tumors of the Hematopoietic and Lymphoid Tissues defines current standards of practice for the diagnosis and classification of malignant lymphomas and related entities. More than 50 different types of lymphomas are described, combining fine morphologic criteria with immunohistochemical (IHC), and sometimes molecular, findings. Faced with such a broad range of different lymphomas, some encountered only rarely, and a rapidly growing, ever changing, armamentarium of approximately 80 pertinent IHC "stains", the challenge to the pathologist is to employ IHC in an efficient manner, to arrive at an assured diagnosis as rapidly as possible. This review uses deductive reasoning, after a decision tree or dendrogram model that relies upon recognition of basic morphologic patterns for efficient selection, use and interpretation of IHC markers to classify node-based malignancies by the World Health Organization schema. The review is divided into 2 parts, the first addressing those lymphomas that produce a follicular or nodular pattern of lymph nodal involvement; the second addressing diffuse proliferations in lymph nodes. It is accepted that only specialized centers are able to apply all of the technical resources and experience necessary for definitive diagnosis of unusual cases. Emphasis therefore is given to the more common lymphomas and the more commonly available IHC "stains", for a pragmatic and practical approach that is both broadly feasible and cost effective. By this method an assured diagnosis may be reached in the majority of nodal lymphomas, at the same time developing a sufficiency of data to recognize those rare or atypical cases that require referral to a specialized center.

  9. Extracting distribution and expansion of rubber plantations from Landsat imagery using the C5.0 decision tree method

    NASA Astrophysics Data System (ADS)

    Sun, Zhongchang; Leinenkugel, Patrick; Guo, Huadong; Huang, Chong; Kuenzer, Claudia

    2017-04-01

    Natural tropical rainforests in China's Xishuangbanna region have undergone dramatic conversion to rubber plantations in recent decades, resulting in altering the region's environment and ecological systems. Therefore, it is of great importance for local environmental and ecological protection agencies to research the distribution and expansion of rubber plantations. The objective of this paper is to monitor dynamic changes of rubber plantations in China's Xishuangbanna region based on multitemporal Landsat images (acquired in 1989, 2000, and 2013) using a C5.0-based decision-tree method. A practical and semiautomatic data processing procedure for mapping rubber plantations was proposed. Especially, haze removal and deshadowing were proposed to perform atmospheric and topographic correction and reduce the effects of haze, shadow, and terrain. Our results showed that the atmospheric and topographic correction could improve the extraction accuracy of rubber plantations, especially in mountainous areas. The overall classification accuracies were 84.2%, 83.9%, and 86.5% for the Landsat images acquired in 1989, 2000, and 2013, respectively. This study also found that the Landsat-8 images could provide significant improvement in the ability to identify rubber plantations. The extracted maps showed the selected study area underwent rapid conversion of natural and seminatural forest to a rubber plantations from 1989 to 2013. The rubber plantation area increased from 2.8% in 1989 to 17.8% in 2013, while the forest/woodland area decreased from 75.6% in 1989 to 44.8% in 2013. The proposed data processing procedure is a promising approach to mapping the spatial distribution and temporal dynamics of rubber plantations on a regional scale.

  10. Ensemble Integration of Forest Disturbance Maps for the Landscape Change Monitoring System (LCMS)

    NASA Astrophysics Data System (ADS)

    Cohen, W. B.; Healey, S. P.; Yang, Z.; Zhu, Z.; Woodcock, C. E.; Kennedy, R. E.; Huang, C.; Steinwand, D.; Vogelmann, J. E.; Stehman, S. V.; Loveland, T. R.

    2014-12-01

    The recent convergence of free, high quality Landsat data and acceleration in the development of dense Landsat time series algorithms has spawned a nascent interagency effort known as the Landscape Change Monitoring System (LCMS). LCMS is being designed to map historic land cover changes associated with all major disturbance agents and land cover types in the US. Currently, five existing algorithms are being evaluated for inclusion in LCMS. The priorities of these five algorithms overlap to some degree, but each has its own strengths. This has led to the adoption of a novel approach, within LCMS, to integrate the map outputs (i.e., base learners) from these change detection algorithms using empirical ensemble models. Training data are derived from independent datasets representing disturbances such as: harvest, fire, insects, wind, and land use change. Ensemble modeling is expected to produce significant increases in predictive accuracy relative to the results of the individual base learners. The non-parametric models used in LCMS also provide a framework for matching output ensemble maps to independent sample-based statistical estimates of disturbance area. Multiple decision trees "vote" on class assignment, and it is possible to manipulate vote thresholds to ensure that ensemble maps reflect areas of disturbance derived from sources such as national-scale ground or image-based inventories. This talk will focus on results of the first ensemble integration of the base learners for six Landsat scenes distributed across the US. We will present an assessment of base learner performance across different types of disturbance against an independently derived, sample-based disturbance dataset (derived from the TimeSync Landsat time series visualization tool). The goal is to understand the contributions of each base learner to the quality of the ensemble map products. We will also demonstrate how the ensemble map products can be manipulated to match sample-based annual

  11. Assessment of the potential allergenicity of ice structuring protein type III HPLC 12 using the FAO/WHO 2001 decision tree for novel foods.

    PubMed

    Bindslev-Jensen, C; Sten, E; Earl, L K; Crevel, R W R; Bindslev-Jensen, U; Hansen, T K; Stahl Skov, P; Poulsen, L K

    2003-01-01

    The introduction of novel proteins into foods carries a risk of eliciting allergic reactions in individuals sensitive to the introduced protein. Therefore, decision trees for evaluation of the risk have been developed, the latest being proposed by WHO/FAO early in 2001. Proteins developed using modern biotechnology and derived from fish are being considered for use in food and other applications, and since allergy to fish is well established, a potential risk from such proteins to susceptible human beings exists. The overall aim of the study was to investigate the potential allergenicity of an Ice Structuring Protein (ISP) originating from an arctic fish (the ocean pout, Macrozoarces americanus) using the newly developed decision tree proposed by FAO/WHO. The methods used were those proposed by FAO/WHO including amino acid sequence analysis for sequence similarity to known allergens, methods for assessing degradability under standardised conditions, assays for detection of specific IgE against the protein (Maxisorb RAST) and histamine release from human basophils. In the present paper we describe the serum screening phase of the study and discuss the overall application of the decision tree to the assessment of the potential allergenicity of ISP Type III. In an accompanying paper [Food Chem. Toxicol. 40 (2002) 965], we detail the specific methodology used for the sequence analysis and assessment of resistance to pepsin-catalysed proteolysis of this protein. The ISP showed no sequence similarity to known allergens nor was it stable to proteolytic degradation using standardised methods. Using sera from 20 patients with a well-documented clinical history of fish allergy, positive in skin prick tests to ocean pout, eel pout and eel were used, positive IgE-binding in vitro to extracts of the same fish was confirmed. The sera also elicited histamine release in vitro in the presence of the same extracts. The ISP was negative in all cases in the same experiments. Using the

  12. Decision Tree Phytoremediation

    DTIC Science & Technology

    1999-12-01

    roots into the aboveground portions of the plants. Certain plants called hyperaccumulators absorb unusually large amounts of metals in comparison to other...radionuclides 3 - Phytoaccumulation, phytoextraction, or hyperaccumulation Metals and organic chemicals are taken up by the plant with water, or by cation...phytoremediation. This work team is continuing the efforts of previous ITRC work teams reviewing innovative technologies to remediate metals in soils

  13. The use of decision tree induction and artificial neural networks for recognizing the geochemical distribution patterns of LREE in the Choghart deposit, Central Iran

    NASA Astrophysics Data System (ADS)

    Zaremotlagh, S.; Hezarkhani, A.

    2017-04-01

    Some evidences of rare earth elements (REE) concentrations are found in iron oxide-apatite (IOA) deposits which are located in Central Iranian microcontinent. There are many unsolved problems about the origin and metallogenesis of IOA deposits in this district. Although it is considered that felsic magmatism and mineralization were simultaneous in the district, interaction of multi-stage hydrothermal-magmatic processes within the Early Cambrian volcano-sedimentary sequence probably caused some epigenetic mineralizations. Secondary geological processes (e.g., multi-stage mineralization, alteration, and weathering) have affected on variations of major elements and possible redistribution of REE in IOA deposits. Hence, the geochemical behaviors and distribution patterns of REE are expected to be complicated in different zones of these deposits. The aim of this paper is recognizing LREE distribution patterns based on whole-rock chemical compositions and automatic discovery of their geochemical rules. For this purpose, the pattern recognition techniques including decision tree and neural network were applied on a high-dimensional geochemical dataset from Choghart IOA deposit. Because some data features were irrelevant or redundant in recognizing the distribution patterns of each LREE, a greedy attribute subset selection technique was employed to select the best subset of predictors used in classification tasks. The decision trees (CART algorithm) were pruned optimally to more accurately categorize independent test data than unpruned ones. The most effective classification rules were extracted from the pruned tree to describe the meaningful relationships between the predictors and different concentrations of LREE. A feed-forward artificial neural network was also applied to reliably predict the influence of various rock compositions on the spatial distribution patterns of LREE with a better performance than the decision tree induction. The findings of this study could be

  14. Assessing the safety of co-exposure to food packaging migrants in food and water using the maximum cumulative ratio and an established decision tree.

    PubMed

    Price, Paul; Zaleski, Rosemary; Hollnagel, Heli; Ketelslegers, Hans; Han, Xianglu

    2014-01-01

    Food contact materials can release low levels of multiple chemicals (migrants) into foods and beverages, to which individuals can be exposed through food consumption. This paper investigates the potential for non-carcinogenic effects from exposure to multiple migrants using the Cefic Mixtures Ad hoc Team (MIAT) decision tree. The purpose of the assessment is to demonstrate how the decision tree can be applied to concurrent exposures to multiple migrants using either hazard or structural data on the specific components, i.e. based on the acceptable daily intake (ADI) or the threshold of toxicological concern. The tree was used to assess risks from co-exposure to migrants reported in a study on non-intentionally added substances (NIAS) eluting from food contact-grade plastic and two studies of water bottles: one on organic compounds and the other on ionic forms of various elements. The MIAT decision tree assigns co-exposures to different risk management groups (I, II, IIIA and IIIB) based on the hazard index, and the maximum cumulative ratio (MCR). The predicted co-exposures for all examples fell into Group II (low toxicological concern) and had MCR values of 1.3 and 2.4 (indicating that one or two components drove the majority of the mixture's toxicity). MCR values from the study of inorganic ions (126 mixtures) ranged from 1.1 to 3.8 for glass and from 1.1 to 5.0 for plastic containers. The MCR values indicated that a single compound drove toxicity in 58% of the mixtures. MCR values also declined with increases in the hazard index for the screening assessments of exposure (suggesting fewer substances contributed as risk potential increased). Overall, it can be concluded that the data on co-exposure to migrants evaluated in these case studies are of low toxicological concern and the safety assessment approach described in this paper was shown to be a helpful screening tool.

  15. An ensemble classification-based approach applied to retinal blood vessel segmentation.

    PubMed

    Fraz, Muhammad Moazam; Remagnino, Paolo; Hoppe, Andreas; Uyyanonvara, Bunyarit; Rudnicka, Alicja R; Owen, Christopher G; Barman, Sarah A

    2012-09-01

    This paper presents a new supervised method for segmentation of blood vessels in retinal photographs. This method uses an ensemble system of bagged and boosted decision trees and utilizes a feature vector based on the orientation analysis of gradient vector field, morphological transformation, line strength measures, and Gabor filter responses. The feature vector encodes information to handle the healthy as well as the pathological retinal image. The method is evaluated on the publicly available DRIVE and STARE databases, frequently used for this purpose and also on a new public retinal vessel reference dataset CHASE_DB1 which is a subset of retinal images of multiethnic children from the Child Heart and Health Study in England (CHASE) dataset. The performance of the ensemble system is evaluated in detail and the incurred accuracy, speed, robustness, and simplicity make the algorithm a suitable tool for automated retinal image analysis.

  16. A decision-tree-based method for reconstructing disturbance history in the Russia boreal forests over 30 years

    NASA Astrophysics Data System (ADS)

    Chen, D.; Loboda, T. V.

    2012-12-01

    The boreal forest is one of the largest biomes on Earth and carries crucial significance in numerous aspects. Located in the high latitude region of the Northern Hemisphere, it is predicted that the boreal forest is subject to the highest level of influence under the changing climate, which may impose profound impacts on the global carbon and energy budget. Of the entire boreal biome, approximately two thirds consists of the Russian boreal forest, which is also the largest forested zone in the world. Fire and logging have been the predominant disturbance types in the Russian boreal forest, which accelerate the speed of carbon release into the atmosphere. To better understand these processes, records of past disturbance are in great need. However, there has been no comprehensive and unbiased multi-decadal record of forest disturbance in this region. This paper illustrates a method for reconstructing disturbance history in the Russia boreal forests over 30 years. This method takes advantage of data from both Landsat, which has a long data record but limited spatial coverage, and the Moderate Resolution Spectroradiometer (MODIS), which has wall-to-wall spatial coverage but limited period of observations. We developed a standardized and semi-automated approach to extract training and validation data samples from Landsat imagery. Landsat data, dating back to 1984, were used to generate maps of forest disturbance using temporal shifts in Disturbance Index through the multi-temporal stack of imagery in selected locations. The disturbed forests are attributed to logging or burning causes by means of visual examination. The Landsat-based disturbance maps are then used as reference data to train a decision tree classifier on 2003 MODIS data. This classifier utilizes multiple direct MODIS products including the BRDF-adjusted surface reflectance, a suite of vegetation indices, and land surface temperature. The algorithm also capitalizes on seasonal variability in class

  17. Measurement of single top quark production in the tau+jets channnel using boosted decision trees at D0

    SciTech Connect

    Liu, Zhiyi

    2009-12-01

    The top quark is the heaviest known matter particle and plays an important role in the Standard Model of particle physics. At hadron colliders, it is possible to produce single top quarks via the weak interaction. This allows a direct measurement of the CKM matrix element Vtb and serves as a window to new physics. The first direct measurement of single top quark production with a tau lepton in the final state (the tau+jets channel) is presented in this thesis. The measurement uses 4.8 fb-1 of Tevatron Run II data in p$\\bar{p}$ collisions at √s = 1.96 TeV acquired by the D0 experiment. After selecting a data sample and building a background model, the data and background model are in good agreement. A multivariate technique, boosted decision trees, is employed in discriminating the small single top quark signal from a large background. The expected sensitivity of the tau+jets channel in the Standard Model is 1.8 standard deviations. Using a Bayesian statistical approach, an upper limit on the cross section of single top quark production in the tau+jets channel is measured as 7.3 pb at 95% confidence level, and the cross section is measured as 3.4-1.8+2.0 pb. The result of the single top quark production in the tau+jets channel is also combined with those in the electron+jets and muon+jets channels. The expected sensitivity of the electron, muon and tau combined analysis is 4.7 standard deviations, to be compared to 4.5 standard deviations in electron and muon alone. The measured cross section in the three combined final states is σ(p$\\bar{p}$ → tb + X,tqb + X) = 3.84-0.83+0.89 pb. A lower limit on |Vtb| is also measured in the three combined final states to be larger than 0.85 at 95% confidence level. These results are consistent with Standard Model expectations.

  18. Decision tree model for predicting long-term outcomes in children with out-of-hospital cardiac arrest: a nationwide, population-based observational study

    PubMed Central

    2014-01-01

    Introduction At hospital arrival, early prognostication for children after out-of-hospital cardiac arrest (OHCA) might help clinicians formulate strategies, particularly in the emergency department. In this study, we aimed to develop a simple and generally applicable bedside tool for predicting outcomes in children after cardiac arrest. Methods We analyzed data of 5,379 children who had undergone OHCA. The data were extracted from a prospectively recorded, nationwide, Utstein-style Japanese database. The primary endpoint was survival with favorable neurological outcome (Cerebral Performance Category (CPC) scale categories 1 and 2) at 1 month after OHCA. We developed a decision tree prediction model by using data from a 2-year period (2008 to 2009, n = 3,693), and the data were validated using external data from 2010 (n = 1,686). Results Recursive partitioning analysis for 11 predictors in the development cohort indicated that the best single predictor for CPC 1 and 2 at 1 month was the prehospital return of spontaneous circulation (ROSC). The next predictor for children with prehospital ROSC was an initial shockable rhythm. For children without prehospital ROSC, the next best predictor was a witnessed arrest. Use of a simple decision tree prediction model permitted stratification into four outcome prediction groups: good (prehospital ROSC and initial shockable rhythm), moderately good (prehospital ROSC and initial nonshockable rhythm), poor (prehospital non-ROSC and witnessed arrest) and very poor (prehospital non-ROSC and unwitnessed arrest). By using this model, we identified patient groups ranging from 0.2% to 66.2% for 1-month CPC 1 and 2 probabilities. The validated decision tree prediction model demonstrated a sensitivity of 69.7% (95% confidence interval (CI) = 58.7% to 78.9%), a specificity of 95.2% (95% CI = 94.1% to 96.2%) and an area under the receiver operating characteristic curve of 0.88 (95% CI = 0.87 to 0.90) for predicting 1-month

  19. Can Religious Beliefs be a Protective Factor for Suicidal Behavior? A Decision Tree Analysis in a Mid-Sized City in Iran, 2013.

    PubMed

    Baneshi, Mohammad Reza; Haghdoost, Ali Akbar; Zolala, Farzaneh; Nakhaee, Nouzar; Jalali, Maryam; Tabrizi, Reza; Akbari, Maryam

    2017-04-01

    This study aimed to assess using tree-based models the impact of different dimensions of religion and other risk factors on suicide attempts in the Islamic Republic of Iran. Three hundred patients who attempted suicide and 300 age- and sex-matched patient attendants with other types of disease who referred to Kerman Afzalipour Hospital were recruited for this study following a convenience sampling. Religiosity was assessed by the Duke University Religion Index. A tree-based model was constructed using the Gini Index as the homogeneity criterion. A complementary discrimination analysis was also applied. Variables contributing to the construction of the tree were stressful life events, mental disorder, family support, and religious belief. Strong religious belief was a protective factor for those with a low number of stressful life events and those with a high mental disorder score; 72 % of those who formed these two groups had not attempted suicide. Moreover, 63 % of those with a high number of stressful life events, strong family support, strong problem-solving skills, and a low mental disorder score were less likely to attempt suicide. The significance of four other variables, GHQ, problem-coping skills, friend support, and neuroticism, was revealed in the discrimination analysis. Religious beliefs seem to be an independent factor that can predict risk for suicidal behavior. Based on the decision tree, religious beliefs among people with a high number of stressful life events might not be a dissuading factor. Such subjects need more family support and problem-solving skills.

  20. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

    PubMed

    Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan; Sakr, Sherif

    2017-01-01

    Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data.

  1. Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project

    PubMed Central

    Alghamdi, Manal; Al-Mallah, Mouaz; Keteyian, Steven; Brawner, Clinton; Ehrman, Jonathan

    2017-01-01

    Machine learning is becoming a popular and important approach in the field of medical research. In this study, we investigate the relative performance of various machine learning methods such as Decision Tree, Naïve Bayes, Logistic Regression, Logistic Model Tree and Random Forests for predicting incident diabetes using medical records of cardiorespiratory fitness. In addition, we apply different techniques to uncover potential predictors of diabetes. This FIT project study used data of 32,555 patients who are free of any known coronary artery disease or heart failure who underwent clinician-referred exercise treadmill stress testing at Henry Ford Health Systems between 1991 and 2009 and had a complete 5-year follow-up. At the completion of the fifth year, 5,099 of those patients have developed diabetes. The dataset contained 62 attributes classified into four categories: demographic characteristics, disease history, medication use history, and stress test vital signs. We developed an Ensembling-based predictive model using 13 attributes that were selected based on their clinical importance, Multiple Linear Regression, and Information Gain Ranking methods. The negative effect of the imbalance class of the constructed model was handled by Synthetic Minority Oversampling Technique (SMOTE). The overall performance of the predictive model classifier was improved by the Ensemble machine learning approach using the Vote method with three Decision Trees (Naïve Bayes Tree, Random Forest, and Logistic Model Tree) and achieved high accuracy of prediction (AUC = 0.92). The study shows the potential of ensembling and SMOTE approaches for predicting incident diabetes using cardiorespiratory fitness data. PMID:28738059

  2. The Reliability of Classification of Terminal Nodes in GUIDE Decision Tree to Predict the Nonalcoholic Fatty Liver Disease.

    PubMed

    Birjandi, Mehdi; Ayatollahi, Seyyed Mohammad Taghi; Pourahmad, Saeedeh

    2016-01-01

    Tree structured modeling is a data mining technique used to recursively partition a dataset into relatively homogeneous subgroups in order to make more accurate predictions on generated classes. One of the classification tree induction algorithms, GUIDE, is a nonparametric method with suitable accuracy and low bias selection, which is used for predicting binary classes based on many predictors. In this tree, evaluating the accuracy of predicted classes (terminal nodes) is clinically of special importance. For this purpose, we used GUIDE classification tree in two statuses of equal and unequal misclassification cost in order to predict nonalcoholic fatty liver disease (NAFLD), considering 30 predictors. Then, to evaluate the accuracy of predicted classes by using bootstrap method, first the classification reliability in which individuals are assigned to a unique class and next the prediction probability reliability as support for that are considered.

  3. The Reliability of Classification of Terminal Nodes in GUIDE Decision Tree to Predict the Nonalcoholic Fatty Liver Disease

    PubMed Central

    Pourahmad, Saeedeh

    2016-01-01

    Tree structured modeling is a data mining technique used to recursively partition a dataset into relatively homogeneous subgroups in order to make more accurate predictions on generated classes. One of the classification tree induction algorithms, GUIDE, is a nonparametric method with suitable accuracy and low bias selection, which is used for predicting binary classes based on many predictors. In this tree, evaluating the accuracy of predicted classes (terminal nodes) is clinically of special importance. For this purpose, we used GUIDE classification tree in two statuses of equal and unequal misclassification cost in order to predict nonalcoholic fatty liver disease (NAFLD), considering 30 predictors. Then, to evaluate the accuracy of predicted classes by using bootstrap method, first the classification reliability in which individuals are assigned to a unique class and next the prediction probability reliability as support for that are considered. PMID:28053651

  4. Exploring ensemble visualization

    NASA Astrophysics Data System (ADS)

    Phadke, Madhura N.; Pinto, Lifford; Alabi, Oluwafemi; Harter, Jonathan; Taylor, Russell M., II; Wu, Xunlei; Petersen, Hannah; Bass, Steffen A.; Healey, Christopher G.

    2012-01-01

    An ensemble is a collection of related datasets. Each dataset, or member, of an ensemble is normally large, multidimensional, and spatio-temporal. Ensembles are used extensively by scientists and mathematicians, for example, by executing a simulation repeatedly with slightly different input parameters and saving the results in an ensemble to see how parameter choices affect the simulation. To draw inferences from an ensemble, scientists need to compare data both within and between ensemble members. We propose two techniques to support ensemble exploration and comparison: a pairwise sequential animation method that visualizes locally neighboring members simultaneously, and a screen door tinting method that visualizes subsets of members using screen space subdivision. We demonstrate the capabilities of both techniques, first using synthetic data, then with simulation data of heavy ion collisions in high-energy physics. Results show that both techniques are capable of supporting meaningful comparisons of ensemble data.

  5. Exploring Ensemble Visualization

    PubMed Central

    Phadke, Madhura N.; Pinto, Lifford; Alabi, Femi; Harter, Jonathan; Taylor, Russell M.; Wu, Xunlei; Petersen, Hannah; Bass, Steffen A.; Healey, Christopher G.

    2012-01-01

    An ensemble is a collection of related datasets. Each dataset, or member, of an ensemble is normally large, multidimensional, and spatio-temporal. Ensembles are used extensively by scientists and mathematicians, for example, by executing a simulation repeatedly with slightly different input parameters and saving the results in an ensemble to see how parameter choices affect the simulation. To draw inferences from an ensemble, scientists need to compare data both within and between ensemble members. We propose two techniques to support ensemble exploration and comparison: a pairwise sequential animation method that visualizes locally neighboring members simultaneously, and a screen door tinting method that visualizes subsets of members using screen space subdivision. We demonstrate the capabilities of both techniques, first using synthetic data, then with simulation data of heavy ion collisions in high-energy physics. Results show that both techniques are capable of supporting meaningful comparisons of ensemble data. PMID:22347540

  6. Using a High-Resolution Ensemble Modeling Method to Inform Risk-Based Decision-Making at Taylor Park Dam, Colorado

    NASA Astrophysics Data System (ADS)

    Mueller, M.; Mahoney, K. M.; Holman, K. D.

    2015-12-01

    The Bureau of Reclamation (Reclamation) is responsible for the safety of Taylor Park Dam, located in central Colorado at an elevation of 9300 feet. A key aspect of dam safety is anticipating extreme precipitation, runoff and the associated inflow of water to the reservoir within a probabilistic framework for risk analyses. The Cooperative Institute for Research in Environmental Sciences (CIRES) has partnered with Reclamation to improve understanding and estimation of precipitation in the western United States, including the Taylor Park watershed. A significant challenge is that Taylor Park Dam is located in a relatively data-sparse region, surrounded by mountains exceeding 12,000 feet. To better estimate heavy precipitation events in this basin, a high-resolution modeling approach is used. The Weather Research and Forecasting (WRF) model is employed to simulate events that have produced observed peaks in streamflow at the location of interest. Importantly, an ensemble of model simulations are run on each event so that uncertainty bounds (i.e., forecast error) may be provided such that the model outputs may be more effectively used in Reclamation's risk assessment framework. Model estimates of precipitation (and the uncertainty thereof) are then used in rainfall runoff models to determine the probability of inflows to the reservoir for use in Reclamation's dam safety risk analyses.

  7. Acceleration of ensemble machine learning methods using many-core devices

    NASA Astrophysics Data System (ADS)

    Tamerus, A.; Washbrook, A.; Wyeth, D.

    2015-12-01

    We present a case study into the acceleration of ensemble machine learning methods using many-core devices in collaboration with Toshiba Medical Visualisation Systems Europe (TMVSE). The adoption of GPUs to execute a key algorithm in the classification of medical image data was shown to significantly reduce overall processing time. Using a representative dataset and pre-trained decision trees as input we will demonstrate how the decision forest classification method can be mapped onto the GPU data processing model. It was found that a GPU-based version of the decision forest method resulted in over 138 times speed-up over a single-threaded CPU implementation with further improvements possible. The same GPU-based software was then directly applied to a suitably formed dataset to benefit supervised learning techniques applied in High Energy Physics (HEP) with similar improvements in performance.

  8. Decision-tree-model identification of nitrate pollution activities in groundwater: A combination of a dual isotope approach and chemical ions.

    PubMed

    Xue, Dongmei; Pang, Fengmei; Meng, Fanqiao; Wang, Zhongliang; Wu, Wenliang

    2015-09-01

    To develop management practices for agricultural crops to protect against NO3(-) contamination in groundwater, dominant pollution activities require reliable classification. In this study, we (1) classified potential NO3(-) pollution activities via an unsupervised learning algorithm based on δ(15)N- and δ(18)O-NO3(-) and physico-chemical properties of groundwater at 55 sampling locations; and (2) determined which water quality parameters could be used to identify the sources of NO3(-) contamination via a decision tree model. When a combination of δ(15)N-, δ(18)O-NO3(-) and physico-chemical properties of groundwater was used as an input for the k-means clustering algorithm, it allowed for a reliable clustering of the 55 sampling locations into 4 corresponding agricultural activities: well irrigated agriculture (28 sampling locations), sewage irrigated agriculture (16 sampling locations), a combination of sewage irrigated agriculture, farm and industry (5 sampling locations) and a combination of well irrigated agriculture and farm (6 sampling locations). A decision tree model with 97.5% classification success was developed based on SO4(2-) and Cl(-) variables. The NO3(-) and the δ(15)N- and δ(18)O-NO3(-) variables demonstrated limitation in developing a decision tree model as multiple N sources and fractionation processes both resulted in difficulties of discriminating NO3(-) concentrations and isotopic values. Although only the SO4(2-) and Cl(-) were selected as important discriminating variables, concentration data alone could not identify the specific NO3(-) sources responsible for groundwater contamination. This is a result of comprehensive analysis. To further reduce NO3(-) contamination, an integrated approach should be set-up by combining N and O isotopes of NO3(-) with land-uses and physico-chemical properties, especially in areas with complex agricultural activities. Copyright © 2015 Elsevier B.V. All rights reserved.

  9. World Music Ensemble: Kulintang

    ERIC Educational Resources Information Center

    Beegle, Amy C.

    2012-01-01

    As instrumental world music ensembles such as steel pan, mariachi, gamelan and West African drums are becoming more the norm than the exception in North American school music programs, there are other world music ensembles just starting to gain popularity in particular parts of the United States. The kulintang ensemble, a drum and gong ensemble…

  10. World Music Ensemble: Kulintang

    ERIC Educational Resources Information Center

    Beegle, Amy C.

    2012-01-01

    As instrumental world music ensembles such as steel pan, mariachi, gamelan and West African drums are becoming more the norm than the exception in North American school music programs, there are other world music ensembles just starting to gain popularity in particular parts of the United States. The kulintang ensemble, a drum and gong ensemble…

  11. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions.

    PubMed

    Sundaram, Mekala; Willoughby, Janna R; Lichti, Nathanael I; Steele, Michael A; Swihart, Robert K

    2015-01-01

    The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae) to eastern gray squirrels (Sciurus carolinensis), and measured the time and distance travelled by squirrels that consumed or cached each seed. We estimated 11 physical and chemical seed traits for each species, and the phylogenetic relationships between the 23 hardwood trees. Variance partitioning revealed that considerable variation in foraging investment was attributable to seed traits alone (27-73%), and combined effects of seed traits and phylogeny of hardwood trees (5-55%). A phylogenetic PCA (pPCA) on seed traits and tree phylogeny resulted in 2 "global" axes of traits that were phylogenetically autocorrelated at the family and genus level and a third "local" axis in which traits were not phylogenetically autocorrelated. Collectively, these axes explained 30-76% of the variation in squirrel foraging investments. The first global pPCA axis, which produced large scores for seed species with thin shells, low lipid and high carbohydrate content, was negatively related to time to consume and cache seeds and travel distance to cache. The second global pPCA axis, which produced large scores for seeds with high protein, low tannin and low dormancy levels, was an important predictor of consumption time only. The local pPCA axis primarily reflected kernel mass. Although it explained only 12% of the variation in trait space and was not autocorrelated among phylogenetic clades, the local axis was related to all four squirrel foraging investments. Squirrel foraging behaviors are influenced by a combination of phylogenetically conserved and more evolutionarily labile seed traits that is consistent with a weak

  12. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions

    PubMed Central

    Sundaram, Mekala; Willoughby, Janna R.; Lichti, Nathanael I.; Steele, Michael A.; Swihart, Robert K.

    2015-01-01

    The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae) to eastern gray squirrels (Sciurus carolinensis), and measured the time and distance travelled by squirrels that consumed or cached each seed. We estimated 11 physical and chemical seed traits for each species, and the phylogenetic relationships between the 23 hardwood trees. Variance partitioning revealed that considerable variation in foraging investment was attributable to seed traits alone (27–73%), and combined effects of seed traits and phylogeny of hardwood trees (5–55%). A phylogenetic PCA (pPCA) on seed traits and tree phylogeny resulted in 2 “global” axes of traits that were phylogenetically autocorrelated at the family and genus level and a third “local” axis in which traits were not phylogenetically autocorrelated. Collectively, these axes explained 30–76% of the variation in squirrel foraging investments. The first global pPCA axis, which produced large scores for seed species with thin shells, low lipid and high carbohydrate content, was negatively related to time to consume and cache seeds and travel distance to cache. The second global pPCA axis, which produced large scores for seeds with high protein, low tannin and low dormancy levels, was an important predictor of consumption time only. The local pPCA axis primarily reflected kernel mass. Although it explained only 12% of the variation in trait space and was not autocorrelated among phylogenetic clades, the local axis was related to all four squirrel foraging investments. Squirrel foraging behaviors are influenced by a combination of phylogenetically conserved and more evolutionarily labile seed traits that is

  13. Application of Decision Tree to Obtain Optimal Operation Rules for Reservoir Flood Control Considering Sediment Desilting-Case Study of Tseng Wen Reservoir

    NASA Astrophysics Data System (ADS)

    ShiouWei, L.

    2014-12-01

    Reservoirs are the most important water resources facilities in Taiwan.However,due to the steep slope and fragile geological conditions in the mountain area,storm events usually cause serious debris flow and flood,and the flood then will flush large amount of sediment into reservoirs.The sedimentation caused by flood has great impact on the reservoirs life.Hence,how to operate a reservoir during flood events to increase the efficiency of sediment desilting without risk the reservoir safety and impact the water supply afterward is a crucial issue in Taiwan.  Therefore,this study developed a novel optimization planning model for reservoir flood operation considering flood control and sediment desilting,and proposed easy to use operating rules represented by decision trees.The decision trees rules have considered flood mitigation,water supply and sediment desilting.The optimal planning model computes the optimal reservoir release for each flood event that minimum water supply impact and maximum sediment desilting without risk the reservoir safety.Beside the optimal flood operation planning model,this study also proposed decision tree based flood operating rules that were trained by the multiple optimal reservoir releases to synthesis flood scenarios.The synthesis flood scenarios consists of various synthesis storm events,reservoir's initial storage and target storages at the end of flood operating.  Comparing the results operated by the decision tree operation rules(DTOR) with that by historical operation for Krosa Typhoon in 2007,the DTOR removed sediment 15.4% more than that of historical operation with reservoir storage only8.38×106m3 less than that of historical operation.For Jangmi Typhoon in 2008,the DTOR removed sediment 24.4% more than that of historical operation with reservoir storage only 7.58×106m3 less than that of historical operation.The results show that the proposed DTOR model can increase the sediment desilting efficiency and extend the

  14. Ensemble global ocean forecasting

    NASA Astrophysics Data System (ADS)

    Brassington, G. B.

    2016-02-01

    A novel time-lagged ensemble system based on multiple independent cycles has been performed in operations at the Australian Bureau of Meteorology for the past 3 years. Despite the use of only four cycles the ensemble mean provided robustly higher skill and the ensemble variance was a reliable predictor of forecast errors. A spectral analysis comparing the ensemble mean with the members demonstrated the gradual increase in power of random errors with wavenumber up to a saturation length scale imposed by the resolution of the observing system. This system has been upgraded to a near-global 0.1 degree system in a new hybrid six-member ensemble system configuration including a new data assimilation system, cycling pattern and initialisation. The hybrid system consists of two ensemble members per day each with a 3 day cycle. We will outline the performance of both the deterministic and ensemble ocean forecast system.

  15. A hybrid cost-sensitive ensemble for imbalanced breast thermogram classification.

    PubMed

    Krawczyk, Bartosz; Schaefer, Gerald; Woźniak, Michał

    2015-11-01

    Early recognition of breast cancer, the most commonly diagnosed form of cancer in women, is of crucial importance, given that it leads to significantly improved chances of survival. Medical thermography, which uses an infrared camera for thermal imaging, has been demonstrated as a particularly useful technique for early diagnosis, because it detects smaller tumors than the standard modality of mammography. In this paper, we analyse breast thermograms by extracting features describing bilateral symmetries between the two breast areas, and present a classification system for decision making. Clearly, the costs associated with missing a cancer case are much higher than those for mislabelling a benign case. At the same time, datasets contain significantly fewer malignant cases than benign ones. Standard classification approaches fail to consider either of these aspects. In this paper, we introduce a hybrid cost-sensitive classifier ensemble to address this challenging problem. Our approach entails a pool of cost-sensitive decision trees which assign a higher misclassification cost to the malignant class, thereby boosting its recognition rate. A genetic algorithm is employed for simultaneous feature selection and classifier fusion. As an optimisation criterion, we use a combination of misclassification cost and diversity to achieve both a high sensitivity and a heterogeneous ensemble. Furthermore, we prune our ensemble by discarding classifiers that contribute minimally to the decision making. For a challenging dataset of about 150 thermograms, our approach achieves an excellent sensitivity of 83.10%, while maintaining a high specificity of 89.44%. This not only signifies improved recognition of malignant cases, it also statistically outperforms other state-of-the-art algorithms designed for imbalanced classification, and hence provides an effective approach for analysing breast thermograms. Our proposed hybrid cost-sensitive ensemble can facilitate a highly accurate

  16. Predicting skin sensitisation using a decision tree integrated testing strategy with an in silico model and in chemico/in vitro assays.

    PubMed

    Macmillan, Donna S; Canipa, Steven J; Chilton, Martyn L; Williams, Richard V; Barber, Christopher G

    2016-04-01

    There is a pressing need for non-animal methods to predict skin sensitisation potential and a number of in chemico and in vitro assays have been designed with this in mind. However, some compounds can fall outside the applicability domain of these in chemico/in vitro assays and may not be predicted accurately. Rule-based in silico models such as Derek Nexus are expert-derived from animal and/or human data and the mechanism-based alert domain can take a number of factors into account (e.g. abiotic/biotic activation). Therefore, Derek Nexus may be able to predict for compounds outside the applicability domain of in chemico/in vitro assays. To this end, an integrated testing strategy (ITS) decision tree using Derek Nexus and a maximum of two assays (from DPRA, KeratinoSens, LuSens, h-CLAT and U-SENS) was developed. Generally, the decision tree improved upon other ITS evaluated in this study with positive and negative predictivity calculated as 86% and 81%, respectively. Our results demonstrate that an ITS using an in silico model such as Derek Nexus with a maximum of two in chemico/in vitro assays can predict the sensitising potential of a number of chemicals, including those outside the applicability domain of existing non-animal assays.

  17. Soft tissue grafting to improve the attached mucosa at dental implants: A review of the literature and proposal of a decision tree.

    PubMed

    Bassetti, Mario; Kaufmann, Regula; Salvi, Giovanni E; Sculean, Anton; Bassetti, Renzo

    2015-06-01

    Scientific data and clinical observations appear to indicate that an adequate width of attached mucosa may facilitate oral hygiene procedures thus preventing peri-implant inflammation and tissue breakdown (eg, biologic complications). Consequently, in order to avoid biologic complications and improve long-term prognosis, soft tissue conditions should be carefully evaluated when implant therapy is planned. At present the necessity and time-point for soft tissue grafting (eg, prior to or during implant placement or after healing) is still controversially discussed while clinical recommendations are vague. To provide a review of the literature on the role of attached mucosa to maintain periimplant health, and to propose a decision tree which may help the clinician to select the appropriate surgical technique for increasing the width of attached mucosa. The available data indicate that ideally, soft tissue conditions should be optimized by various grafting procedures either before or during implant placement or as part of stage-two surgery. In cases, where, despite insufficient peri-implant soft tissue condition (ie, lack of attached mucosa or movements caused by buccal frena), implants have been uncovered and/or loaded, or in cases where biologic complications are already present (eg, mucositis, peri-implantitis), the treatment appears to be more difficult and less predictable. Soft tissue grafting may be important to prevent peri-implant tissue breakdown and should be considered when dental implants are placed. The presented decision tree may help the clinician to select the appropriate grafting technique.

  18. Improvement of the identification of four heavy metals in environmental samples by using predictive decision tree models coupled with a set of five bioluminescent bacteria.

    PubMed

    Jouanneau, Sulivan; Durand, Marie-José; Courcoux, Philippe; Blusseau, Thomas; Thouand, Gérald

    2011-04-01

    A primary statistical model based on the crossings between the different detection ranges of a set of five bioluminescent bacterial strains was developed to identify and quantify four metals which were at several concentrations in different mixtures: cadmium, arsenic III, mercury, and copper. Four specific decision trees based on the CHAID algorithm (CHi-squared Automatic Interaction Detector type) which compose this model were designed from a database of 576 experiments (192 different mixture conditions). A specific software, 'Metalsoft', helped us choose the best decision tree and a user-friendly way to identify the metal. To validate this innovative approach, 18 environmental samples containing a mixture of these metals were submitted to a bioassay and to standardized chemical methods. The results show on average a high correlation of 98.6% for the qualitative metal identification and 94.2% for the quantification. The results are particularly encouraging, and our model is able to provide semiquantitative information after only 60 min without pretreatments of samples.

  19. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  20. The Hydrologic Ensemble Prediction Experiment (HEPEX)

    NASA Astrophysics Data System (ADS)

    Wood, A. W.; Thielen, J.; Pappenberger, F.; Schaake, J. C.; Hartman, R. K.

    2012-12-01

    The Hydrologic Ensemble Prediction Experiment was established in March, 2004, at a workshop hosted by the European Center for Medium Range Weather Forecasting (ECMWF). With support from the US National Weather Service (NWS) and the European Commission (EC), the HEPEX goal was to bring the international hydrological and meteorological communities together to advance the understanding and adoption of hydrological ensemble forecasts for decision support in emergency management and water resources sectors. The strategy to meet this goal includes meetings that connect the user, forecast producer and research communities to exchange ideas, data and methods; the coordination of experiments to address specific challenges; and the formation of testbeds to facilitate shared experimentation. HEPEX has organized about a dozen international workshops, as well as sessions at scientific meetings (including AMS, AGU and EGU) and special issues of scientific journals where workshop results have been published. Today, the HEPEX mission is to demonstrate the added value of hydrological ensemble prediction systems (HEPS) for emergency management and water resources sectors to make decisions that have important consequences for economy, public health, safety, and the environment. HEPEX is now organised around six major themes that represent core elements of a hydrologic ensemble prediction enterprise: input and pre-processing, ensemble techniques, data assimilation, post-processing, verification, and communication and use in decision making. This poster presents an overview of recent and planned HEPEX activities, highlighting case studies that exemplify the focus and objectives of HEPEX.

  1. A Multi Criteria Group Decision-Making Model for Teacher Evaluation in Higher Education Based on Cloud Model and Decision Tree

    ERIC Educational Resources Information Center

    Chang, Ting-Cheng; Wang, Hui

    2016-01-01

    This paper proposes a cloud multi-criteria group decision-making model for teacher evaluation in higher education which is involving subjectivity, imprecision and fuzziness. First, selecting the appropriate evaluation index depending on the evaluation objectives, indicating a clear structural relationship between the evaluation index and…

  2. A Multi Criteria Group Decision-Making Model for Teacher Evaluation in Higher Education Based on Cloud Model and Decision Tree

    ERIC Educational Resources Information Center

    Chang, Ting-Cheng; Wang, Hui

    2016-01-01

    This paper proposes a cloud multi-criteria group decision-making model for teacher evaluation in higher education which is involving subjectivity, imprecision and fuzziness. First, selecting the appropriate evaluation index depending on the evaluation objectives, indicating a clear structural relationship between the evaluation index and…

  3. The Relation of Student Behavior, Peer Status, Race, and Gender to Decisions about School Discipline Using CHAID Decision Trees and Regression Modeling

    ERIC Educational Resources Information Center

    Horner, Stacy B.; Fireman, Gary D.; Wang, Eugene W.

    2010-01-01

    Peer nominations and demographic information were collected from a diverse sample of 1493 elementary school participants to examine behavior (overt and relational aggression, impulsivity, and prosociality), context (peer status), and demographic characteristics (race and gender) as predictors of teacher and administrator decisions about…

  4. Subspace ensembles for classification

    NASA Astrophysics Data System (ADS)

    Sun, Shiliang; Zhang, Changshui

    2007-11-01

    Ensemble learning constitutes one of the principal current directions in machine learning and data mining. In this paper, we explore subspace ensembles for classification by manipulating different feature subspaces. Commencing with the nature of ensemble efficacy, we probe into the microcosmic meaning of ensemble diversity, and propose to use region partitioning and region weighting to implement effective subspace ensembles. Individual classifiers possessing eminent performance on a partitioned region reflected by high neighborhood accuracies are deemed to contribute largely to this region, and are assigned large weights in determining the labels of instances in this area. A robust algorithm “Sena” that incarnates the mechanism is presented, which is insensitive to the number of nearest neighbors chosen to calculate neighborhood accuracies. The algorithm exhibits improved performance over the well-known ensembles of bagging, AdaBoost and random subspace. The difference of its effectivity with varying base classifiers is also investigated.

  5. Ensemble Pruning for Glaucoma Detection in an Unbalanced Data Set.

    PubMed

    Adler, Werner; Gefeller, Olaf; Gul, Asma; Horn, Folkert K; Khan, Zardad; Lausen, Berthold

    2016-12-07

    Random forests are successful classifier ensemble methods consisting of typically 100 to 1000 classification trees. Ensemble pruning techniques reduce the computational cost, especially the memory demand, of random forests by reducing the number of trees without relevant loss of performance or even with increased performance of the sub-ensemble. The application to the problem of an early detection of glaucoma, a severe eye disease with low prevalence, based on topographical measurements of the eye background faces specific challenges. We examine the performance of ensemble pruning strategies for glaucoma detection in an unbalanced data situation. The data set consists of 102 topographical features of the eye background of 254 healthy controls and 55 glaucoma patients. We compare the area under the receiver operating characteristic curve (AUC), and the Brier score on the total data set, in the majority class, and in the minority class of pruned random forest ensembles obtained with strategies based on the prediction accuracy of greedily grown sub-ensembles, the uncertainty weighted accuracy, and the similarity between single trees. To validate the findings and to examine the influence of the prevalence of glaucoma in the data set, we additionally perform a simulation study with lower prevalences of glaucoma. In glaucoma classification all three pruning strategies lead to improved AUC and smaller Brier scores on the total data set with sub-ensembles as small as 30 to 80 trees compared to the classification results obtained with the full ensemble consisting of 1000 trees. In the simulation study, we were able to show that the prevalence of glaucoma is a critical factor and lower prevalence decreases the performance of our pruning strategies. The memory demand for glaucoma classification in an unbalanced data situation based on random forests could effectively be reduced by the application of pruning strategies without loss of performance in a population with increased

  6. Under which conditions, additional monitoring data are worth gathering for improving decision making? Application of the VOI theory in the Bayesian Event Tree eruption forecasting framework

    NASA Astrophysics Data System (ADS)

    Loschetter, Annick; Rohmer, Jérémy

    2016-04-01

    Standard and new generation of monitoring observations provide in almost real-time important information about the evolution of the volcanic system. These observations are used to update the model and contribute to a better hazard assessment and to support decision making concerning potential evacuation. The framework BET_EF (based on Bayesian Event Tree) developed by INGV enables dealing with the integration of information from monitoring with the prospect of decision making. Using this framework, the objectives of the present work are i. to propose a method to assess the added value of information (within the Value Of Information (VOI) theory) from monitoring; ii. to perform sensitivity analysis on the different parameters that influence the VOI from monitoring. VOI consists in assessing the possible increase in expected value provided by gathering information, for instance through monitoring. Basically, the VOI is the difference between the value with information and the value without additional information in a Cost-Benefit approach. This theory is well suited to deal with situations that can be represented in the form of a decision tree such as the BET_EF tool. Reference values and ranges of variation (for sensitivity analysis) were defined for input parameters, based on data from the MESIMEX exercise (performed at Vesuvio volcano in 2006). Complementary methods for sensitivity analyses were implemented: local, global using Sobol' indices and regional using Contribution to Sample Mean and Variance plots. The results (specific to the case considered) obtained with the different techniques are in good agreement and enable answering the following questions: i. Which characteristics of monitoring are important for early warning (reliability)? ii. How do experts' opinions influence the hazard assessment and thus the decision? Concerning the characteristics of monitoring, the more influent parameters are the means rather than the variances for the case considered

  7. The Ensemble Canon

    NASA Technical Reports Server (NTRS)

    MIittman, David S

    2011-01-01

    Ensemble is an open architecture for the development, integration, and deployment of mission operations software. Fundamentally, it is an adaptation of the Eclipse Rich Client Platform (RCP), a widespread, stable, and supported framework for component-based application development. By capitalizing on the maturity and availability of the Eclipse RCP, Ensemble offers a low-risk, politically neutral path towards a tighter integration of operations tools. The Ensemble project is a highly successful, ongoing collaboration among NASA Centers. Since 2004, the Ensemble project has supported the development of mission operations software for NASA's Exploration Systems, Science, and Space Operations Directorates.

  8. Human Activity Recognition from Smart-Phone Sensor Data using a Multi-Class Ensemble Learning in Home Monitoring.

    PubMed

    Ghose, Soumya; Mitra, Jhimli; Karunanithi, Mohan; Dowling, Jason

    2015-01-01

    Home monitoring of chronically ill or elderly patient can reduce frequent hospitalisations and hence provide improved quality of care at a reduced cost to the community, therefore reducing the burden on the healthcare system. Activity recognition of such patients is of high importance in such a design. In this work, a system for automatic human physical activity recognition from smart-phone inertial sensors data is proposed. An ensemble of decision trees framework is adopted to train and predict the multi-class human activity system. A comparison of our proposed method with a multi-class traditional support vector machine shows significant improvement in activity recognition accuracies.

  9. Hydrological Ensemble Prediction System (HEPS)

    NASA Astrophysics Data System (ADS)

    Thielen-Del Pozo, J.; Schaake, J.; Martin, E.; Pailleux, J.; Pappenberger, F.

    2010-09-01

    Flood forecasting systems form a key part of ‘preparedness' strategies for disastrous floods and provide hydrological services, civil protection authorities and the public with information of upcoming events. Provided the warning leadtime is sufficiently long, adequate preparatory actions can be taken to efficiently reduce the impacts of the flooding. Following on the success of the use of ensembles for weather forecasting, the hydrological community now moves increasingly towards Hydrological Ensemble Prediction Systems (HEPS) for improved flood forecasting using operationally available NWP products as inputs. However, these products are often generated on relatively coarse scales compared to hydrologically relevant basin units and suffer systematic biases that may have considerable impact when passed through the non-linear hydrological filters. Therefore, a better understanding on how best to produce, communicate and use hydrologic ensemble forecasts in hydrological short-, medium- und long term prediction of hydrological processes is necessary. The "Hydrologic Ensemble Prediction Experiment" (HEPEX), is an international initiative consisting of hydrologists, meteorologist and end-users to advance probabilistic hydrologic forecast techniques for flood, drought and water management applications. Different aspects of the hydrological ensemble processor are being addressed including • Production of useful meteorological products relevant for hydrological applications, ranging from nowcasting products to seasonal forecasts. The importance of hindcasts that are consistent with the operational weather forecasts will be discussed to support bias correction and downscaling, statistically meaningful verification of HEPS, and the development and testing of operating rules; • Need for downscaling and post-processing of weather ensembles to reduce bias before entering hydrological applications; • Hydrological model and parameter uncertainty and how to correct and

  10. A comparison of the decision tree approach and the neural-networks-based heuristic dynamic programming approach for subcircuit extraction problem

    NASA Astrophysics Data System (ADS)

    Zhang, Nian; Wunsch, Donald C., II

    2003-08-01

    The applications of non-standard logic device are increasing fast in the industry. Many of these applications require high speed, low power, functionality and flexibility, which cannot be obtained by standard logic device. These special logic cells can be constructed by the topology design strategy automatically or manually. However, the need arises for the topology design verification. The layout versus schematic (LVS) analysis is an essential part of topology design verification, and subcircuit extraction is one of the operations in the LVS testing. In this paper, we first provided an efficient decision tree approach to the graph isomorphism problem, and then effectively applied it to the subcircuit extraction problem based on the solution to the graph isomorphism problem. To evaluate its performance, we compare it with the neural networks based heuristic dynamic programming algorithm (SubHDP) which is by far one of the fastest algorithms for subcircuit extraction problem.

  11. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

    PubMed

    Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul

    2014-01-01

    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time.

  12. Identification of Some Zeolite Group Minerals by Application of Artificial Neural Network and Decision Tree Algorithm Based on SEM-EDS Data

    NASA Astrophysics Data System (ADS)

    Akkaş, Efe; Evren Çubukçu, H.; Akin, Lutfiye; Erkut, Volkan; Yurdakul, Yasin; Karayigit, Ali Ihsan

    2016-04-01

    Identification of zeolite group minerals is complicated due to their similar chemical formulas and habits. Although the morphologies of various zeolite crystals can be recognized under Scanning Electron Microscope (SEM), it is relatively more challenging and problematic process to identify zeolites using their mineral chemical data. SEMs integrated with energy dispersive X-ray spectrometers (EDS) provide fast and reliable chemical data of minerals. However, considering elemental similarities of characteristic chemical formulae of zeolite species (e.g. Clinoptilolite ((Na,K,Ca)2 -3Al3(Al,Si)2Si13O3612H2O) and Erionite ((Na2,K2,Ca)2Al4Si14O36ṡ15H2O)) EDS data alone does not seem to be sufficient for correct identification. Furthermore, the physical properties of the specimen (e.g. roughness, electrical conductivity) and the applied analytical conditions (e.g. accelerating voltage, beam current, spot size) of the SEM-EDS should be uniform in order to obtain reliable elemental results of minerals having high alkali (Na, K) and H2O (approx. %14-18) contents. This study which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK Project No: 113Y439), aims to construct a database as large as possible for various zeolite minerals and to develop a general prediction model for the identification of zeolite minerals using SEM-EDS data. For this purpose, an artificial neural network and rule based decision tree algorithm were employed. Throughout the analyses, a total of 1850 chemical data were collected from four distinct zeolite species, (Clinoptilolite-Heulandite, Erionite, Analcime and Mordenite) observed in various rocks (e.g. coals, pyroclastics). In order to obtain a representative training data set for each minerals, a selection procedure for reference mineral analyses was applied. During the selection procedure, SEM based crystal morphology data, XRD spectra and re-calculated cationic distribution, obtained by EDS have been used for the

  13. The Ensembl REST API: Ensembl Data for Any Language.

    PubMed

    Yates, Andrew; Beal, Kathryn; Keenan, Stephen; McLaren, William; Pignatelli, Miguel; Ritchie, Graham R S; Ruffier, Magali; Taylor, Kieron; Vullo, Alessandro; Flicek, Paul

    2015-01-01

    We present a Web service to access Ensembl data using Representational State Transfer (REST). The Ensembl REST server enables the easy retrieval of a wide range of Ensembl data by most programming languages, using standard formats such as JSON and FASTA while minimizing client work. We also introduce bindings to the popular Ensembl Variant Effect Predictor tool permitting large-scale programmatic variant analysis independent of any specific programming language. The Ensembl REST API can be accessed at http://rest.ensembl.org and source code is freely available under an Apache 2.0 license from http://github.com/Ensembl/ensembl-rest. © The Author 2014. Published by Oxford University Press.

  14. Clinical practice decision tree for the choice of the first disease modifying antirheumatic drug for very early rheumatoid arthritis: a 2004 proposal of the French Society of Rheumatology

    PubMed Central

    Loët, X Le; Berthelot, J M; Cantagrel, A; Combe, B; De Bandt, M; Fautrel, B; Flipo, R M; Lioté, F; Maillefert, J F; Meyer, O; Saraux, A; Wendling, D; Guillemin, F

    2006-01-01

    Objective To elaborate a clinical practice decision tree for the choice of the first disease modifying antirheumatic drug (DMARD) for untreated rheumatoid arthritis of less than six months' duration. Methods Four steps were employed: (1) review of published reports on DMARD efficacy against rheumatoid arthritis; (2) inventory of the information available to guide DMARD choice; (3) selection of the most pertinent information by 12 experts using a Delphi method; and (4) choice of DMARDs in 12 clinical situations defined by items selected in step 3 (28 joint disease activity score (DAS 28): ⩽3.2; >3.2 and ⩽5.1; >5.1; rheumatoid factor status (positive/negative); structural damage (with/without)—that is, 3×2×2). Thus, multiplied by all the possible treatment pairs, 180 scenarios were obtained and presented to 36 experts, who ranked treatment choices according to the Thurstone pairwise method. Results Among the 77 item