Science.gov

Sample records for decision tree ensembles

  1. Creating ensembles of decision trees through sampling

    DOEpatents

    Kamath, Chandrika; Cantu-Paz, Erick

    2005-08-30

    A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.

  2. Creating ensembles of decision trees through sampling

    SciTech Connect

    Kamath, C; Cantu-Paz, E

    2001-02-02

    Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. This paper focuses on ensembles of decision trees that are created with a randomized procedure based on sampling. Randomization can be introduced by using random samples of the training data (as in bagging or arcing) and running a conventional tree-building algorithm, or by randomizing the induction algorithm itself. The objective of this paper is to describe our first experiences with a novel randomized tree induction method that uses a subset of samples at a node to determine the split. Our empirical results show that ensembles generated using this approach yield results that are competitive in accuracy and superior in computational cost.

  3. Creating Ensembles of Decision Trees Through Sampling

    SciTech Connect

    Kamath,C; Cantu-Paz, E

    2001-07-26

    Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. This paper focuses on ensembles of decision trees that are created with a randomized procedure based on sampling. Randomization can be introduced by using random samples of the training data (as in bagging or boosting) and running a conventional tree-building algorithm, or by randomizing the induction algorithm itself. The objective of this paper is to describe the first experiences with a novel randomized tree induction method that uses a sub-sample of instances at a node to determine the split. The empirical results show that ensembles generated using this approach yield results that are competitive in accuracy and superior in computational cost to boosting and bagging.

  4. Improving ensemble decision tree performance using Adaboost and Bagging

    NASA Astrophysics Data System (ADS)

    Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie

    2015-12-01

    Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.

  5. Using histograms to introduce randomization in the generation of ensembles of decision trees

    DOEpatents

    Kamath, Chandrika; Cantu-Paz, Erick; Littau, David

    2005-02-22

    A system for decision tree ensembles that includes a module to read the data, a module to create a histogram, a module to evaluate a potential split according to some criterion using the histogram, a module to select a split point randomly in an interval around the best split, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method includes the steps of reading the data; creating a histogram; evaluating a potential split according to some criterion using the histogram, selecting a split point randomly in an interval around the best split, splitting the data, and combining multiple decision trees in ensembles.

  6. Classification of Bent-Double Galaxies: Experiences with Ensembles of Decision Trees

    SciTech Connect

    Kamath, C; Cantu-Paz, E

    2002-01-08

    In earlier work, we have described our experiences with the use of decision tree classifiers to identify radio-emitting galaxies with a bent-double morphology in the FIRST astronomical survey. We now extend this work to include ensembles of decision tree classifiers, including two algorithms developed by us. These algorithms randomize the decision at each node of the tree, and because they consider fewer candidate splitting points, are faster than other methods for creating ensembles. The experiments presented in this paper with our astronomy data show that our algorithms are competitive in accuracy, but faster than other ensemble techniques such as Boosting, Bagging, and Arcx4 with different split criteria.

  7. Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

    DOEpatents

    Cantu-Paz, Erick; Kamath, Chandrika

    2006-06-13

    A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.

  8. Ensemble of Causal Trees

    NASA Astrophysics Data System (ADS)

    Bialas, Piotr

    2003-10-01

    We discuss the geometry of trees endowed with a causal structure using the conventional framework of equilibrium statistical mechanics. We show how this ensemble is related to popular growing network models. In particular we demonstrate that on a class of afine attachment kernels the two models are identical but they can differ substantially for other choice of weights. We show that causal trees exhibit condensation even for asymptotically linear kernels. We derive general formulae describing the degree distribution, the ancestor--descendant correlation and the probability that a randomly chosen node lives at a given geodesic distance from the root. It is shown that the Hausdorff dimension dH of the causal networks is generically infinite.

  9. A protocol for developing early warning score models from vital signs data in hospitals using ensembles of decision trees

    PubMed Central

    Xu, Michael; Tam, Benjamin; Thabane, Lehana; Fox-Robichaud, Alison

    2015-01-01

    Introduction Multiple early warning scores (EWS) have been developed and implemented to reduce cardiac arrests on hospital wards. Case–control observational studies that generate an area under the receiver operator curve (AUROC) are the usual validation method, but investigators have also generated EWS with algorithms with no prior clinical knowledge. We present a protocol for the validation and comparison of our local Hamilton Early Warning Score (HEWS) with that generated using decision tree (DT) methods. Methods and analysis A database of electronically recorded vital signs from 4 medical and 4 surgical wards will be used to generate DT EWS (DT-HEWS). A third EWS will be generated using ensemble-based methods. Missing data will be multiple imputed. For a relative risk reduction of 50% in our composite outcome (cardiac or respiratory arrest, unanticipated intensive care unit (ICU) admission or hospital death) with a power of 80%, we calculated a sample size of 17 151 patient days based on our cardiac arrest rates in 2012. The performance of the National EWS, DT-HEWS and the ensemble EWS will be compared using AUROC. Ethics and dissemination Ethics approval was received from the Hamilton Integrated Research Ethics Board (#13-724-C). The vital signs and associated outcomes are stored in a database on our secure hospital server. Preliminary dissemination of this protocol was presented in abstract form at an international critical care meeting. Final results of this analysis will be used to improve on the existing HEWS and will be shared through publication and presentation at critical care meetings. PMID:26353873

  10. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS

    NASA Astrophysics Data System (ADS)

    Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah

    2013-11-01

    Decision tree (DT) machine learning algorithm was used to map the flood susceptible areas in Kelantan, Malaysia.We used an ensemble frequency ratio (FR) and logistic regression (LR) model in order to overcome weak points of the LR.Combined method of FR and LR was used to map the susceptible areas in Kelantan, Malaysia.Results of both methods were compared and their efficiency was assessed.Most influencing conditioning factors on flooding were recognized.

  11. Tree Ensembles on the Induced Discrete Space.

    PubMed

    Yildiz, Olcay Taner

    2016-05-01

    Decision trees are widely used predictive models in machine learning. Recently, K -tree is proposed, where the original discrete feature space is expanded by generating all orderings of values of k discrete attributes and these orderings are used as the new attributes in decision tree induction. Although K -tree performs significantly better than the proper one, their exponential time complexity can prohibit their use. In this brief, we propose K -forest, an extension of random forest, where a subset of features is selected randomly from the induced discrete space. Simulation results on 17 data sets show that the novel ensemble classifier has significantly lower error rate compared with the random forest based on the original feature space. PMID:26011897

  12. Approximate Splitting for Ensembles of Trees using Histograms

    SciTech Connect

    Kamath, C; Cantu-Paz, E; Littau, D

    2001-09-28

    Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. Implicit in many of these techniques is the concept of randomization that generates different classifiers. In this paper, they focus on ensembles of decision trees that are created using a randomized procedure based on histograms. Techniques, such as histograms, that discretize continuous variables, have long been used in classification to convert the data into a form suitable for processing and to reduce the compute time. The approach combines the ideas behind discretization through histograms and randomization in ensembles to create decision trees by randomly selecting a split point in an interval around the best bin boundary in the histogram. The experimental results with public domain data show that ensembles generated using this approach are competitive in accuracy and superior in computational cost to other ensembles techniques such as boosting and bagging.

  13. Lazy decision trees

    SciTech Connect

    Friedman, J.H.; Yun, Yeogirl; Kohavi, R.

    1996-12-31

    Lazy learning algorithms, exemplified by nearest-neighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single {open_quotes}best{close_quotes} decision tree during the training phase, and this tree is then used to classify test instances. The tests at the nodes of the constructed tree are good on average, but there may be better tests for classifying a specific instance. We propose a lazy decision tree algorithm-LazyDT-that conceptually constructs the {open_quotes}best{close_quote} decision tree for each test instance. In practice, only a path needs to be constructed, and a caching scheme makes the algorithm fast. The algorithm is robust with respect to missing values without resorting to the complicated methods usually seen in induction of decision trees. Experiments on real and artificial problems are presented.

  14. AncesTrees: ancestry estimation with randomized decision trees.

    PubMed

    Navega, David; Coelho, Catarina; Vicente, Ricardo; Ferreira, Maria Teresa; Wasterlain, Sofia; Cunha, Eugénia

    2015-09-01

    In forensic anthropology, ancestry estimation is essential in establishing the individual biological profile. The aim of this study is to present a new program--AncesTrees--developed for assessing ancestry based on metric analysis. AncesTrees relies on a machine learning ensemble algorithm, random forest, to classify the human skull. In the ensemble learning paradigm, several models are generated and co-jointly used to arrive at the final decision. The random forest algorithm creates ensembles of decision trees classifiers, a non-linear and non-parametric classification technique. The database used in AncesTrees is composed by 23 craniometric variables from 1,734 individuals, representative of six major ancestral groups and selected from the Howells' craniometric series. The program was tested in 128 adult crania from the following collections: the African slaves' skeletal collection of Valle da Gafaria; the Medical School Skull Collection and the Identified Skeletal Collection of 21st Century, both curated at the University of Coimbra. The first step of the test analysis was to perform ancestry estimation including all the ancestral groups of the database. The second stage of our test analysis was to conduct ancestry estimation including only the European and the African ancestral groups. In the first test analysis, 75% of the individuals of African ancestry and 79.2% of the individuals of European ancestry were correctly identified. The model involving only African and European ancestral groups had a better performance: 93.8% of all individuals were correctly classified. The obtained results show that AncesTrees can be a valuable tool in forensic anthropology. PMID:25053239

  15. Human decision error (HUMDEE) trees

    SciTech Connect

    Ostrom, L.T.

    1993-08-01

    Graphical presentations of human actions in incident and accident sequences have been used for many years. However, for the most part, human decision making has been underrepresented in these trees. This paper presents a method of incorporating the human decision process into graphical presentations of incident/accident sequences. This presentation is in the form of logic trees. These trees are called Human Decision Error Trees or HUMDEE for short. The primary benefit of HUMDEE trees is that they graphically illustrate what else the individuals involved in the event could have done to prevent either the initiation or continuation of the event. HUMDEE trees also present the alternate paths available at the operator decision points in the incident/accident sequence. This is different from the Technique for Human Error Rate Prediction (THERP) event trees. There are many uses of these trees. They can be used for incident/accident investigations to show what other courses of actions were available and for training operators. The trees also have a consequence component so that not only the decision can be explored, also the consequence of that decision.

  16. Decision-Tree Program

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1994-01-01

    IND computer program introduces Bayesian and Markov/maximum-likelihood (MML) methods and more-sophisticated methods of searching in growing trees. Produces more-accurate class-probability estimates important in applications like diagnosis. Provides range of features and styles with convenience for casual user, fine-tuning for advanced user or for those interested in research. Consists of four basic kinds of routines: data-manipulation, tree-generation, tree-testing, and tree-display. Written in C language.

  17. Weighted Hybrid Decision Tree Model for Random Forest Classifier

    NASA Astrophysics Data System (ADS)

    Kulkarni, Vrushali Y.; Sinha, Pradeep K.; Petare, Manisha C.

    2016-06-01

    Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

  18. Decision tree modeling using R.

    PubMed

    Zhang, Zhongheng

    2016-08-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building. PMID:27570769

  19. Decision tree modeling using R

    PubMed Central

    2016-01-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building. PMID:27570769

  20. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-07-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  1. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-02-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  2. Extensions and applications of ensemble-of-trees methods in machine learning

    NASA Astrophysics Data System (ADS)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of

  3. Bayesian Evidence Framework for Decision Tree Learning

    NASA Astrophysics Data System (ADS)

    Chatpatanasiri, Ratthachat; Kijsirikul, Boonserm

    2005-11-01

    This work is primary interested in the problem of, given the observed data, selecting a single decision (or classification) tree. Although a single decision tree has a high risk to be overfitted, the induced tree is easily interpreted. Researchers have invented various methods such as tree pruning or tree averaging for preventing the induced tree from overfitting (and from underfitting) the data. In this paper, instead of using those conventional approaches, we apply the Bayesian evidence framework of Gull, Skilling and Mackay to a process of selecting a decision tree. We derive a formal function to measure `the fitness' for each decision tree given a set of observed data. Our method, in fact, is analogous to a well-known Bayesian model selection method for interpolating noisy continuous-value data. As in regression problems, given reasonable assumptions, this derived score function automatically quantifies the principle of Ockham's razor, and hence reasonably deals with the issue of underfitting-overfitting tradeoff.

  4. From Family Trees to Decision Trees.

    ERIC Educational Resources Information Center

    Trobian, Helen R.

    This paper is a preliminary inquiry by a non-mathematician into graphic methods of sequential planning and ways in which hierarchical analysis and tree structures can be helpful in developing interest in the use of mathematical modeling in the search for creative solutions to real-life problems. Highlights include a discussion of hierarchical…

  5. Decision Tree Technique for Particle Identification

    SciTech Connect

    Quiller, Ryan

    2003-09-05

    Particle identification based on measurements such as the Cerenkov angle, momentum, and the rate of energy loss per unit distance (-dE/dx) is fundamental to the BaBar detector for particle physics experiments. It is particularly important to separate the charged forms of kaons and pions. Currently, the Neural Net, an algorithm based on mapping input variables to an output variable using hidden variables as intermediaries, is one of the primary tools used for identification. In this study, a decision tree classification technique implemented in the computer program, CART, was investigated and compared to the Neural Net over the range of momenta, 0.25 GeV/c to 5.0 GeV/c. For a given subinterval of momentum, three decision trees were made using different sets of input variables. The sensitivity and specificity were calculated for varying kaon acceptance thresholds. This data was used to plot Receiver Operating Characteristic curves (ROC curves) to compare the performance of the classification methods. Also, input variables used in constructing the decision trees were analyzed. It was found that the Neural Net was a significant contributor to decision trees using dE/dx and the Cerenkov angle as inputs. Furthermore, the Neural Net had poorer performance than the decision tree technique, but tended to improve decision tree performance when used as an input variable. These results suggest that the decision tree technique using Neural Net input may possibly increase accuracy of particle identification in BaBar.

  6. Classification based on full decision trees

    NASA Astrophysics Data System (ADS)

    Genrikhov, I. E.; Djukova, E. V.

    2012-04-01

    The ideas underlying a series of the authors' studies dealing with the design of classification algorithms based on full decision trees are further developed. It is shown that the decision tree construction under consideration takes into account all the features satisfying a branching criterion. Full decision trees with an entropy branching criterion are studied as applied to precedent-based pattern recognition problems with real-valued data. Recognition procedures are constructed for solving problems with incomplete data (gaps in the feature descriptions of the objects) in the case when the learning objects are nonuniformly distributed over the classes. The authors' basic results previously obtained in this area are overviewed.

  7. The decision tree approach to classification

    NASA Technical Reports Server (NTRS)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  8. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class

  9. A survey of decision tree classifier methodology

    NASA Technical Reports Server (NTRS)

    Safavian, S. Rasoul; Landgrebe, David

    1990-01-01

    Decision Tree Classifiers (DTC's) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps, the most important feature of DTC's is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issue. After considering potential advantages of DTC's over single stage classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  10. A survey of decision tree classifier methodology

    NASA Technical Reports Server (NTRS)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  11. Parallel object-oriented decision tree system

    DOEpatents

    Kamath; Chandrika , Cantu-Paz; Erick

    2006-02-28

    A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.

  12. Speeding up Boosting decision trees training

    NASA Astrophysics Data System (ADS)

    Zheng, Chao; Wei, Zhenzhong

    2015-10-01

    To overcome the drawback that Boosting decision trees perform fast speed in the test time while the training process is relatively too slow to meet the requirements of applications with real-time learning, we propose a fast decision trees training method by pruning those noneffective features in advance. And basing on this method, we also design a fast Boosting decision trees training algorithm. Firstly, we analyze the structure of each decision trees node, and prove that the classification error of each node has a bound through derivation. Then, by using the error boundary to prune non-effective features in the early stage, we greatly accelerate the decision tree training process, and would not affect the training results at all. Finally, the decision tree accelerated training method is integrated into the general Boosting process forming a fast boosting decision trees training algorithm. This algorithm is not a new variant of Boosting, on the contrary, it should be used in conjunction with existing Boosting algorithms to achieve more training acceleration. To test the algorithm's speedup performance and performance combined with other accelerated algorithms, the original AdaBoost and two typical acceleration algorithms LazyBoost and StochasticBoost were respectively used in conjunction with this algorithm into three fast versions, and their classification performance was tested by using the Lsis face database which contained 12788 images. Experimental results reveal that this fast algorithm can achieve more than double training speedup without affecting the results of the trained classifier, and can be combined with other acceleration algorithms. Key words: Boosting algorithm, decision trees, classifier training, preliminary classification error, face detection

  13. Support Vector Machine with Ensemble Tree Kernel for Relation Extraction.

    PubMed

    Liu, Xiaoyong; Fu, Hui; Du, Zhiguo

    2016-01-01

    Relation extraction is one of the important research topics in the field of information extraction research. To solve the problem of semantic variation in traditional semisupervised relation extraction algorithm, this paper proposes a novel semisupervised relation extraction algorithm based on ensemble learning (LXRE). The new algorithm mainly uses two kinds of support vector machine classifiers based on tree kernel for integration and integrates the strategy of constrained extension seed set. The new algorithm can weaken the inaccuracy of relation extraction, which is caused by the phenomenon of semantic variation. The numerical experimental research based on two benchmark data sets (PropBank and AIMed) shows that the LXRE algorithm proposed in the paper is superior to other two common relation extraction methods in four evaluation indexes (Precision, Recall, F-measure, and Accuracy). It indicates that the new algorithm has good relation extraction ability compared with others. PMID:27118966

  14. Support Vector Machine with Ensemble Tree Kernel for Relation Extraction

    PubMed Central

    Fu, Hui; Du, Zhiguo

    2016-01-01

    Relation extraction is one of the important research topics in the field of information extraction research. To solve the problem of semantic variation in traditional semisupervised relation extraction algorithm, this paper proposes a novel semisupervised relation extraction algorithm based on ensemble learning (LXRE). The new algorithm mainly uses two kinds of support vector machine classifiers based on tree kernel for integration and integrates the strategy of constrained extension seed set. The new algorithm can weaken the inaccuracy of relation extraction, which is caused by the phenomenon of semantic variation. The numerical experimental research based on two benchmark data sets (PropBank and AIMed) shows that the LXRE algorithm proposed in the paper is superior to other two common relation extraction methods in four evaluation indexes (Precision, Recall, F-measure, and Accuracy). It indicates that the new algorithm has good relation extraction ability compared with others. PMID:27118966

  15. Decision Tree Approach for Soil Liquefaction Assessment

    PubMed Central

    Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.

    2013-01-01

    In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498

  16. Fast Image Texture Classification Using Decision Trees

    NASA Technical Reports Server (NTRS)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  17. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  18. Algorithms for optimal dyadic decision trees

    SciTech Connect

    Hush, Don; Porter, Reid

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  19. Prediction of regional streamflow frequency using model tree ensembles

    NASA Astrophysics Data System (ADS)

    Schnier, Spencer; Cai, Ximing

    2014-09-01

    This study introduces a novel data-driven method called model tree ensembles (MTEs) to predict streamflow frequency statistics based on known drainage area characteristics, which yields insights into the dominant controls of regional streamflow. The database used to induce the models contains both natural and anthropogenic drainage area characteristics for 294 USGS stream gages (164 in Texas and 130 in Illinois). MTEs were used to predict complete flow duration curves (FDCs) of ungaged streams by developing 17 models corresponding to 17 points along the FDC. Model accuracy was evaluated using ten-fold cross-validation and the coefficient of determination (R2). During the validation, the gages withheld from the analysis represent ungaged watersheds. MTEs are shown to outperform global multiple-linear regression models for predictions in ungaged watersheds. The accuracy of models for low flow is enhanced by explicit consideration of variables that capture human interference in watershed hydrology (e.g., population). Human factors (e.g., population and groundwater use) appear in the regionalizations for low flows, while annual and seasonal precipitation and drainage area are important for regionalizations of all flows. The results of this study have important implications for predictions in ungaged watersheds as well as gaged watersheds subject to anthropogenically-driven hydrologic changes.

  20. Decision Tree Modeling for Ranking Data

    NASA Astrophysics Data System (ADS)

    Yu, Philip L. H.; Wan, Wai Ming; Lee, Paul H.

    Ranking/preference data arises from many applications in marketing, psychology, and politics. We establish a new decision tree model for the analysis of ranking data by adopting the concept of classification and regression tree. The existing splitting criteria are modified in a way that allows them to precisely measure the impurity of a set of ranking data. Two types of impurity measures for ranking data are introduced, namelyg-wise and top-k measures. Theoretical results show that the new measures exhibit properties of impurity functions. In model assessment, the area under the ROC curve (AUC) is applied to evaluate the tree performance. Experiments are carried out to investigate the predictive performance of the tree model for complete and partially ranked data and promising results are obtained. Finally, a real-world application of the proposed methodology to analyze a set of political rankings data is presented.

  1. IND - THE IND DECISION TREE PACKAGE

    NASA Technical Reports Server (NTRS)

    Buntine, W.

    1994-01-01

    A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The

  2. Two Trees: Migrating Fault Trees to Decision Trees for Real Time Fault Detection on International Space Station

    NASA Technical Reports Server (NTRS)

    Lee, Charles; Alena, Richard L.; Robinson, Peter

    2004-01-01

    We started from ISS fault trees example to migrate to decision trees, presented a method to convert fault trees to decision trees. The method shows that the visualizations of root cause of fault are easier and the tree manipulating becomes more programmatic via available decision tree programs. The visualization of decision trees for the diagnostic shows a format of straight forward and easy understands. For ISS real time fault diagnostic, the status of the systems could be shown by mining the signals through the trees and see where it stops at. The other advantage to use decision trees is that the trees can learn the fault patterns and predict the future fault from the historic data. The learning is not only on the static data sets but also can be online, through accumulating the real time data sets, the decision trees can gain and store faults patterns in the trees and recognize them when they come.

  3. An Application of Decision Tree Based on ID3

    NASA Astrophysics Data System (ADS)

    Xiaohu, Wang; Lele, Wang; Nianfeng, Li

    This article deals with the application of classical decision tree ID3 of the data mining in a certain site data. It constitutes a decision tree based on information gain and thus produces some useful purchasing behavior rules. It also proves that the decision tree has a wide applicable future in the sale field on site.

  4. CUDT: A CUDA Based Decision Tree Algorithm

    PubMed Central

    Sheu, Ruey-Kai; Chiu, Chun-Chieh

    2014-01-01

    Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture), which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set. PMID:25140346

  5. Using Decision Trees for Comparing Pattern Recognition Feature Sets

    SciTech Connect

    Proctor, D D

    2005-08-18

    Determination of the best set of features has been acknowledged as one of the most difficult tasks in the pattern recognition process. In this report significance tests on the sort-ordered, sample-size normalized vote distribution of an ensemble of decision trees is introduced as a method of evaluating relative quality of feature sets. Alternative functional forms for feature sets are also examined. Associated standard deviations provide the means to evaluate the effect of the number of folds, the number of classifiers per fold, and the sample size on the resulting classifications. The method is applied to a problem for which a significant portion of the training set cannot be classified unambiguously.

  6. Using attribute behavior diversity to build accurate decision tree committees for microarray data.

    PubMed

    Han, Qian; Dong, Guozhu

    2012-08-01

    DNA microarrays (gene chips), frequently used in biological and medical studies, measure the expressions of thousands of genes per sample. Using microarray data to build accurate classifiers for diseases is an important task. This paper introduces an algorithm, called Committee of Decision Trees by Attribute Behavior Diversity (CABD), to build highly accurate ensembles of decision trees for such data. Since a committee's accuracy is greatly influenced by the diversity among its member classifiers, CABD uses two new ideas to "optimize" that diversity, namely (1) the concept of attribute behavior-based similarity between attributes, and (2) the concept of attribute usage diversity among trees. The ideas are effective for microarray data, since such data have many features and behavior similarity between genes can be high. Experiments on microarray data for six cancers show that CABD outperforms previous ensemble methods significantly and outperforms SVM, and show that the diversified features used by CABD's decision tree committee can be used to improve performance of other classifiers such as SVM. CABD has potential for other high-dimensional data, and its ideas may apply to ensembles of other classifier types. PMID:22809418

  7. Quantum Decision Trees and Semidefinite Programming.

    SciTech Connect

    Barnum, Howard; Saks, M.; Szegedy, M.

    2001-01-01

    We reformulate the notion of quantum query complexity in terms of inequalities and equations for a set of positive matrices, which we view as a quantum analogue of a decision tree. Using the new formulation we show that: 1. every quantum query algorithm needs to use at most n quantum bits in addition to the query register. 2. For any function f there is an algorithm that runs in polynomial time in terms the truth table of f and (for {var_epsilon} > 0) computes the {var_epsilon}-error quantum decision tree complexity of f. 3. Using the dual of our system we can treat lower bound methods on a uniform platform, which paves the way to their future comparison. In particular we describe Ambainis's bound in our framework. 4. The output condition on quantum algorithms used by Ambainis and others is not sufficient for an algorithm to compute a function with {var_epsilon}-bounded error: we show the existence of algorithms whose final entanglement matrix satisfies the condition, but for which the value of f cannot be determined from a quantum measurement on the accessible part of the computer.

  8. Identification of metabolic syndrome using decision tree analysis.

    PubMed

    Worachartcheewan, Apilak; Nantasenamat, Chanin; Isarankura-Na-Ayudhya, Chartchalerm; Pidetcha, Phannee; Prachayasittikul, Virapong

    2010-10-01

    This study employs decision tree as a decision support system for rapid and automated identification of individuals with metabolic syndrome (MS) among a Thai population. Results demonstrated strong predictivity of the decision tree in classification of individuals with and without MS, displaying an overall accuracy in excess of 99%. PMID:20619912

  9. Safety validation of decision trees for hepatocellular carcinoma

    PubMed Central

    Wang, Xian-Qiang; Liu, Zhe; Lv, Wen-Ping; Luo, Ying; Yang, Guang-Yun; Li, Chong-Hui; Meng, Xiang-Fei; Liu, Yang; Xu, Ke-Sen; Dong, Jia-Hong

    2015-01-01

    AIM: To evaluate a different decision tree for safe liver resection and verify its efficiency. METHODS: A total of 2457 patients underwent hepatic resection between January 2004 and December 2010 at the Chinese PLA General Hospital, and 634 hepatocellular carcinoma (HCC) patients were eligible for the final analyses. Post-hepatectomy liver failure (PHLF) was identified by the association of prothrombin time < 50% and serum bilirubin > 50 μmol/L (the “50-50” criteria), which were assessed at day 5 postoperatively or later. The Swiss-Clavien decision tree, Tokyo University-Makuuchi decision tree, and Chinese consensus decision tree were adopted to divide patients into two groups based on those decision trees in sequence, and the PHLF rates were recorded. RESULTS: The overall mortality and PHLF rate were 0.16% and 3.0%. A total of 19 patients experienced PHLF. The numbers of patients to whom the Swiss-Clavien, Tokyo University-Makuuchi, and Chinese consensus decision trees were applied were 581, 573, and 622, and the PHLF rates were 2.75%, 2.62%, and 2.73%, respectively. Significantly more cases satisfied the Chinese consensus decision tree than the Swiss-Clavien decision tree and Tokyo University-Makuuchi decision tree (P < 0.01,P < 0.01); nevertheless, the latter two shared no difference (P = 0.147). The PHLF rate exhibited no significant difference with respect to the three decision trees. CONCLUSION: The Chinese consensus decision tree expands the indications for hepatic resection for HCC patients and does not increase the PHLF rate compared to the Swiss-Clavien and Tokyo University-Makuuchi decision trees. It would be a safe and effective algorithm for hepatectomy in patients with hepatocellular carcinoma. PMID:26309366

  10. Coherent neuronal ensembles are rapidly recruited when making a look-reach decision

    PubMed Central

    Wong, Yan T.; Fabiszak, Margaret M.; Novikov, Yevgeny; Daw, Nathaniel D.; Pesaran, Bijan

    2015-01-01

    Summary Selecting and planning actions recruits neurons across many areas of the brain but how ensembles of neurons work together to make decisions is unknown. Temporally-coherent neural activity may provide a mechanism by which neurons coordinate their activity in order to make decisions. If so, neurons that are part of coherent ensembles may predict movement choices before other ensembles of neurons. We recorded neuronal activity in the lateral and medial banks of the intraparietal sulcus (IPS) of the posterior parietal cortex, while monkeys made choices about where to look and reach and decoded the activity to predict the choices. Ensembles of neurons that displayed coherent patterns of spiking activity extending across the IPS, “dual coherent” ensembles, predicted movement choices substantially earlier than other neuronal ensembles. We propose that dual-coherent spike timing reflects interactions between groups of neurons that play an important role in how we make decisions. PMID:26752158

  11. Ensembles of extremely randomized trees and feature ranking for streamflow prediction

    NASA Astrophysics Data System (ADS)

    Castelletti, Andrea; Galelli, Stefano

    2010-05-01

    Accurate and reliable stream-flow predictions are an important input to water resources planning and management processes, which heavily depend upon the availability of water (e.g. river basin planning, optimal reservoir operation, irrigation system management). Hydrological processes are extremely complex, combining high non-linearity and spatial-temporal variability. The prediction of hydrological variables is therefore a challenging task, very often complicated by lack of data and/or the presence of outliers. Usually, data-driven modelling provides a good balance between model accuracy and complexity, which are ultimately critical to the adoption of optimization-based approaches. While neural networks have been widely used in hydrological modelling (e.g. Govindaraju and Rao, 2000), tree-based model is a relatively unexplored methodology (Solomatine and Dual, 2003; Solomatine and Xue, 2004; Iorgulescu and Beven, 2004; Stravs and Brilly, 2007). In this paper a new data-driven modelling approach based on Ensembles of Extremely Randomized Trees (ETs; Geurts et al., 2006) is proposed for stream-flow prediction using different hydro-meteorological predictors. By randomizing the tree construction process and merging a forest of diversified trees to predict the output, ETs alleviate the well-known poor generalization property of traditional standalone decision tress (e.g. CART), thus avoid over fitting the training data. Input to the model are selected using a tree-based feature ranking algorithm, which ranks the candidate predictors (e.g. precipitation and evaporation at different stations, linear combinations thereof) according to their contribution in explaining the variance of an underlying ETs-based model of the stream-flow process. The approach is applied in the Red river basin (Vietnam), a sub-tropical catchment characterized by extremely variable weather conditions, where strong precipitations significantly contribute to the high flow. Results shown that

  12. Accounting for Epistemic Uncertainty in PSHA: Logic Tree and Ensemble Model

    NASA Astrophysics Data System (ADS)

    Taroni, M.; Marzocchi, W.; Selva, J.

    2014-12-01

    The logic tree scheme is the probabilistic framework that has been widely used in the last decades to take into account epistemic uncertainties in probabilistic seismic hazard analysis (PSHA). Notwithstanding the vital importance for PSHA to incorporate properly the epistemic uncertainties, we argue that the use of the logic tree in a PSHA context has conceptual and practical drawbacks. Despite some of these drawbacks have been reported in the past, a careful evaluation of their impact on PSHA is still lacking. This is the goal of the present work. In brief, we show that i) PSHA practice does not meet the assumptions that stand behind the logic tree scheme; ii) the output of a logic tree is often misinterpreted and/or misleading, e.g., the use of percentiles (median included) in a logic tree scheme raises theoretical difficulties from a probabilistic point of view; iii) in case the assumptions that stand behind a logic tree are actually met, this leads to several problems in testing any PSHA model. We suggest a different strategy - based on ensemble modeling - to account for epistemic uncertainties in a more proper probabilistic framework. Finally, we show that in many PSHA practical applications, the logic tree is de facto loosely applied to build sound ensemble models.

  13. Accounting for Epistemic Uncertainty in PSHA: Logic Tree and Ensemble Model

    NASA Astrophysics Data System (ADS)

    Taroni, Matteo; Marzocchi, Warner; Selva, Jacopo

    2014-05-01

    The logic tree scheme is the probabilistic framework that has been widely used in the last decades to take into account epistemic uncertainties in probabilistic seismic hazard analysis (PSHA). Notwithstanding the vital importance for PSHA to incorporate properly the epistemic uncertainties, we argue that the use of the logic tree in a PSHA context has conceptual and practical drawbacks. Despite some of these drawbacks have been reported in the past, a careful evaluation of their impact on PSHA is still lacking. This is the goal of the present work. In brief, we show that i) PSHA practice does not meet the assumptions that stand behind the logic tree scheme; ii) the output of a logic tree is often misinterpreted and/or misleading, e.g., the use of percentiles (median included) in a logic tree scheme raises theoretical difficulties from a probabilistic point of view; iii) in case the assumptions that stand behind a logic tree are actually met, this leads to several problems in testing any PSHA model. We suggest a different strategy - based on ensemble modeling - to account for epistemic uncertainties in a more proper probabilistic framework. Finally, we show that in many PSHA practical applications, the logic tree is improperly applied to build sound ensemble models.

  14. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 15 Commerce and Foreign Trade 2 2011-01-01 2011-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  15. 15 CFR Supplement No 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 15 Commerce and Foreign Trade 2 2013-01-01 2013-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000...

  16. 15 CFR Supplement No 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 15 Commerce and Foreign Trade 2 2014-01-01 2014-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000...

  17. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  18. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 15 Commerce and Foreign Trade 2 2012-01-01 2012-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  19. Decision-Tree Formulation With Order-1 Lateral Execution

    NASA Technical Reports Server (NTRS)

    James, Mark

    2007-01-01

    A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive

  20. Computational study of developing high-quality decision trees

    NASA Astrophysics Data System (ADS)

    Fu, Zhiwei

    2002-03-01

    Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio- Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

  1. An automated approach to the design of decision tree classifiers

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Chin, R.; Beaudet, P.

    1982-01-01

    An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.

  2. Decision tree methods: applications for classification and prediction.

    PubMed

    Song, Yan-Yan; Lu, Ying

    2015-04-25

    Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure. PMID:26120265

  3. Operational optimization of irrigation scheduling for citrus trees using an ensemble based data assimilation approach

    NASA Astrophysics Data System (ADS)

    Hendricks Franssen, H.; Han, X.; Martinez, F.; Jimenez, M.; Manzano, J.; Chanzy, A.; Vereecken, H.

    2013-12-01

    Data assimilation (DA) techniques, like the local ensemble transform Kalman filter (LETKF) not only offer the opportunity to update model predictions by assimilating new measurement data in real time, but also provide an improved basis for real-time (DA-based) control. This study focuses on the optimization of real-time irrigation scheduling for fields of citrus trees near Picassent (Spain). For three selected fields the irrigation was optimized with DA-based control, and for other fields irrigation was optimized on the basis of a more traditional approach where reference evapotranspiration for citrus trees was estimated using the FAO-method. The performance of the two methods is compared for the year 2013. The DA-based real-time control approach is based on ensemble predictions of soil moisture profiles, using the Community Land Model (CLM). The uncertainty in the model predictions is introduced by feeding the model with weather predictions from an ensemble prediction system (EPS) and uncertain soil hydraulic parameters. The model predictions are updated daily by assimilating soil moisture data measured by capacitance probes. The measurement data are assimilated with help of LETKF. The irrigation need was calculated for each of the ensemble members, averaged, and logistic constraints (hydraulics, energy costs) were taken into account for the final assigning of irrigation in space and time. For the operational scheduling based on this approach only model states and no model parameters were updated by the model. Other, non-operational simulation experiments for the same period were carried out where (1) neither ensemble weather forecast nor DA were used (open loop), (2) Only ensemble weather forecast was used, (3) Only DA was used, (4) also soil hydraulic parameters were updated in data assimilation and (5) both soil hydraulic and plant specific parameters were updated. The FAO-based and DA-based real-time irrigation control are compared in terms of soil moisture

  4. Decision tree based transient stability method -- A case study

    SciTech Connect

    Wehenkel, L.; Pavella, M. . Inst. Montefiore); Euxibie, E.; Heilbronn, B. . Direction des Etudes et Recherches)

    1994-02-01

    The decision tree transient stability method is revisited via a case study carried out on the French EHV power system. In short, the method consists of building off-line decision trees, able to subsequently assess the system transient behavior in terms of precontingency parameters (or attributes'') of it, likely to drive the stability phenomena. This case study aims at investigating practical feasibility aspects and features of the trees, at enhancing their reliability to the extent possible, and at generalizing them. Feasibility aspects encompass data base generation, candidate attributes, stability classes; tree features concern in particular complexity in terms of their size and interpretability capabilities, robustness with respect to both their building and use. Reliability is enhanced by defining and exploiting pragmatic quality measures. Generalization concerns multicontingency, instead of single-contingency trees. The results obtained show real promise for the method to meet practical needs of electric power utilities.

  5. RNA search with decision trees and partial covariance models.

    PubMed

    Smith, Jennifer A

    2009-01-01

    The use of partial covariance models to search for RNA family members in genomic sequence databases is explored. The partial models are formed from contiguous subranges of the overall RNA family multiple alignment columns. A binary decision-tree framework is presented for choosing the order to apply the partial models and the score thresholds on which to make the decisions. The decision trees are chosen to minimize computation time subject to the constraint that all of the training sequences are passed to the full covariance model for final evaluation. Computational intelligence methods are suggested to select the decision tree since the tree can be quite complex and there is no obvious method to build the tree in these cases. Experimental results from seven RNA families shows execution times of 0.066-0.268 relative to using the full covariance model alone. Tests on the full sets of known sequences for each family show that at least 95 percent of these sequences are found for two families and 100 percent for five others. Since the full covariance model is run on all sequences accepted by the partial model decision tree, the false alarm rate is at least as low as that of the full model alone. PMID:19644178

  6. Learning accurate very fast decision trees from uncertain data streams

    NASA Astrophysics Data System (ADS)

    Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo

    2015-12-01

    Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.

  7. EEG feature selection method based on decision tree.

    PubMed

    Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun

    2015-01-01

    This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results. PMID:26405856

  8. Automatic sleep staging using state machine-controlled decision trees.

    PubMed

    Imtiaz, Syed Anas; Rodriguez-Villegas, Esther

    2015-01-01

    Automatic sleep staging from a reduced number of channels is desirable to save time, reduce costs and make sleep monitoring more accessible by providing home-based polysomnography. This paper introduces a novel algorithm for automatic scoring of sleep stages using a combination of small decision trees driven by a state machine. The algorithm uses two channels of EEG for feature extraction and has a state machine that selects a suitable decision tree for classification based on the prevailing sleep stage. Its performance has been evaluated using the complete dataset of 61 recordings from PhysioNet Sleep EDF Expanded database achieving an overall accuracy of 82% and 79% on training and test sets respectively. The algorithm has been developed with a very small number of decision tree nodes that are active at any given time making it suitable for use in resource-constrained wearable systems. PMID:26736278

  9. Decision tree-based learning to predict patient controlled analgesia consumption and readjustment

    PubMed Central

    2012-01-01

    Background Appropriate postoperative pain management contributes to earlier mobilization, shorter hospitalization, and reduced cost. The under treatment of pain may impede short-term recovery and have a detrimental long-term effect on health. This study focuses on Patient Controlled Analgesia (PCA), which is a delivery system for pain medication. This study proposes and demonstrates how to use machine learning and data mining techniques to predict analgesic requirements and PCA readjustment. Methods The sample in this study included 1099 patients. Every patient was described by 280 attributes, including the class attribute. In addition to commonly studied demographic and physiological factors, this study emphasizes attributes related to PCA. We used decision tree-based learning algorithms to predict analgesic consumption and PCA control readjustment based on the first few hours of PCA medications. We also developed a nearest neighbor-based data cleaning method to alleviate the class-imbalance problem in PCA setting readjustment prediction. Results The prediction accuracies of total analgesic consumption (continuous dose and PCA dose) and PCA analgesic requirement (PCA dose only) by an ensemble of decision trees were 80.9% and 73.1%, respectively. Decision tree-based learning outperformed Artificial Neural Network, Support Vector Machine, Random Forest, Rotation Forest, and Naïve Bayesian classifiers in analgesic consumption prediction. The proposed data cleaning method improved the performance of every learning method in this study of PCA setting readjustment prediction. Comparative analysis identified the informative attributes from the data mining models and compared them with the correlates of analgesic requirement reported in previous works. Conclusion This study presents a real-world application of data mining to anesthesiology. Unlike previous research, this study considers a wider variety of predictive factors, including PCA demands over time. We analyzed

  10. An automated approach to the design of decision tree classifiers

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Chin, P.; Beaudet, P.

    1980-01-01

    The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.

  11. Visualization method and tool for interactive learning of large decision trees

    NASA Astrophysics Data System (ADS)

    Nguyen, Trong Dung; Ho, TuBao

    2002-03-01

    When learning from large datasets, decision tree induction programs often produce very large trees. How to visualize efficiently trees in the learning process, particularly large trees, is still questionable and currently requires efficient tools. This paper presents a visualization method and tool for interactive learning of large decision trees, that includes a new visualization technique called T2.5D (stands for Tress 2.5 Dimensions). After a brief discussion on requirements for tree visualizers and related work, the paper focuses on presenting developing techniques for the issues (1) how to visualize efficiently large decision trees; and (2) how to visualize decision trees in the learning process.

  12. Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets

    PubMed Central

    Doubravsky, Karel; Dohnal, Mirko

    2015-01-01

    Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details. PMID:26158662

  13. Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets.

    PubMed

    Doubravsky, Karel; Dohnal, Mirko

    2015-01-01

    Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details. PMID:26158662

  14. Evaluation of Decision Trees for Cloud Detection from AVHRR Data

    NASA Technical Reports Server (NTRS)

    Shiffman, Smadar; Nemani, Ramakrishna

    2005-01-01

    Automated cloud detection and tracking is an important step in assessing changes in radiation budgets associated with global climate change via remote sensing. Data products based on satellite imagery are available to the scientific community for studying trends in the Earth's atmosphere. The data products include pixel-based cloud masks that assign cloud-cover classifications to pixels. Many cloud-mask algorithms have the form of decision trees. The decision trees employ sequential tests that scientists designed based on empirical astrophysics studies and simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In a previous study we compared automatically learned decision trees to cloud masks included in Advanced Very High Resolution Radiometer (AVHRR) data products from the year 2000. In this paper we report the replication of the study for five-year data, and for a gold standard based on surface observations performed by scientists at weather stations in the British Islands. For our sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks p < 0.001.

  15. Taste-Guided Decisions Differentially Engage Neuronal Ensembles across Gustatory Cortices

    PubMed Central

    MacDonald, Christopher J.; Meck, Warren H.; Simon, Sidney A.; Nicolelis, Miguel A.L.

    2009-01-01

    Much remains to be understood about the differential contributions from primary and secondary sensory cortices to sensory guided decision making. To address this issue we simultaneously recorded activity from neuronal ensembles in primary (gustatory cortex – GC) and secondary gustatory (orbitofrontal cortex – OFC) cortices while rats made a taste-guided decision between two response alternatives. We found that before animals commenced a response guided by a tastant cue, GC ensembles contained more information than OFC about the response alternative about to be selected. Thereafter, while the animal’s response was underway the response selective information in ensembles from both regions increased, albeit to a greater degree in OFC. In GC, this increase depends on a representation of the taste cue guiding the animal‘s response. The increase in the OFC also depends on the taste cue guiding and other features of the response such as its spatiomotor properties and the behavioral context under which it is executed. Each of these latter features is encoded by different ensembles of OFC neurons that are recruited at specific times throughout the response selection process. These results indicate that during a taste-guided decision task both primary and secondary gustatory cortices dynamically encode different types of information. PMID:19741134

  16. Supervised learning with decision tree-based methods in computational and systems biology.

    PubMed

    Geurts, Pierre; Irrthum, Alexandre; Wehenkel, Louis

    2009-12-01

    At the intersection between artificial intelligence and statistics, supervised learning allows algorithms to automatically build predictive models from just observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the review is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the review provides a survey of their applications in the context of computational and systems biology. PMID:20023720

  17. Using Evolutionary Algorithms to Induce Oblique Decision Trees

    SciTech Connect

    Cantu-Paz, E.; Kamath, C.

    2000-01-21

    This paper illustrates the application of evolutionary algorithms (EAs) to the problem of oblique decision tree induction. The objectives are to demonstrate that EAs can find classifiers whose accuracy is competitive with other oblique tree construction methods, and that this can be accomplished in a shorter time. Experiments were performed with a (1+1) evolutionary strategy and a simple genetic algorithm on public domain and artificial data sets. The empirical results suggest that the EAs quickly find Competitive classifiers, and that EAs scale up better than traditional methods to the dimensionality of the domain and the number of training instances.

  18. Multi-Model Long-Range Ensemble Forecast for Decision Support in Hydroelectric Operations

    NASA Astrophysics Data System (ADS)

    Kunkel, M. L.; Parkinson, S.; Blestrud, D.; Holbrook, V. P.

    2014-12-01

    Idaho Power Company (IPC) is a hydroelectric based utility serving over a million customers in southern Idaho and eastern Oregon. Hydropower makes up ~50% of our power generation and accurate predictions of streamflow and precipitation drive our long-term planning and decision support for operations. We investigate the use of a multi-model ensemble approach for mid and long-range streamflow and precipitation forecasts throughout the Snake River Basin. Forecast are prepared using an Idaho Power developed ensemble forecasting technique for 89 locations throughout the Snake River Basin for periods of 3 to 18 months in advance. A series of multivariable linear regression, multivariable non-linear regression and multivariable Kalman filter techniques are combined in an ensemble forecast based upon two data types, historical data (streamflow, precipitation, climate indices [i.e. PDO, ENSO, AO, etc…]) and single value decomposition derived values based upon atmospheric heights and sea surface temperatures.

  19. The Decision-Identification Tree: A New NEPA Scoping Tool.

    PubMed

    Eccleston

    2000-10-01

    / No single methodology has been universally accepted for determining the appropriate scope of analysis for an environmental impact statement (EIS). Most typically, the scope of analysis is determined by first identifying actions and facilities that need to be analyzed. Once the scope of actions and facilities is identified, the scope of impacts is determined. Yet agencies sometimes complete an EIS only to discover that the analysis does not adequately support decisions that need to be made. Such discrepancies can often be traced to disconnects between scoping, the subsequent analysis, and the final decision-making process that follows. A new and markedly different approach-decision-based scoping-provides an effective methodology for improving the EIS scoping process. Decision-based scoping, in conjunction with a new tool, the decision-identification tree (DIT), places emphasis on first identifying the potential decisions that may eventually need to be made. The DIT provides a methodology for mapping alternative courses of action as a function of fundamental decision points. Once these decision points have been correctly identified, the range of actions, alternatives, and impacts can be more accurately assessed; this approach can improve the effectiveness of EIS planning, while reducing the risk of future disconnects between the EIS analysis and reaching a final decision. This approach also has applications in other planning disciplines beyond that of the EIS. PMID:10954809

  20. Towards the assimilation of tree-ring-width records using ensemble Kalman filtering techniques

    NASA Astrophysics Data System (ADS)

    Acevedo, Walter; Reich, Sebastian; Cubasch, Ulrich

    2016-03-01

    This paper investigates the applicability of the Vaganov-Shashkin-Lite (VSL) forward model for tree-ring-width chronologies as observation operator within a proxy data assimilation (DA) setting. Based on the principle of limiting factors, VSL combines temperature and moisture time series in a nonlinear fashion to obtain simulated TRW chronologies. When used as observation operator, this modelling approach implies three compounding, challenging features: (1) time averaging, (2) "switching recording" of 2 variables and (3) bounded response windows leading to "thresholded response". We generate pseudo-TRW observations from a chaotic 2-scale dynamical system, used as a cartoon of the atmosphere-land system, and attempt to assimilate them via ensemble Kalman filtering techniques. Results within our simplified setting reveal that VSL's nonlinearities may lead to considerable loss of assimilation skill, as compared to the utilization of a time-averaged (TA) linear observation operator. In order to understand this undesired effect, we embed VSL's formulation into the framework of fuzzy logic (FL) theory, which thereby exposes multiple representations of the principle of limiting factors. DA experiments employing three alternative growth rate functions disclose a strong link between the lack of smoothness of the growth rate function and the loss of optimality in the estimate of the TA state. Accordingly, VSL's performance as observation operator can be enhanced by resorting to smoother FL representations of the principle of limiting factors. This finding fosters new interpretations of tree-ring-growth limitation processes.

  1. Identifying ultrasound and clinical features of breast cancer molecular subtypes by ensemble decision.

    PubMed

    Zhang, Lei; Li, Jing; Xiao, Yun; Cui, Hao; Du, Guoqing; Wang, Ying; Li, Ziyao; Wu, Tong; Li, Xia; Tian, Jiawei

    2015-01-01

    Breast cancer is molecularly heterogeneous and categorized into four molecular subtypes: Luminal-A, Luminal-B, HER2-amplified and Triple-negative. In this study, we aimed to apply an ensemble decision approach to identify the ultrasound and clinical features related to the molecular subtypes. We collected ultrasound and clinical features from 1,000 breast cancer patients and performed immunohistochemistry on these samples. We used the ensemble decision approach to select unique features and to construct decision models. The decision model for Luminal-A subtype was constructed based on the presence of an echogenic halo and post-acoustic shadowing or indifference. The decision model for Luminal-B subtype was constructed based on the absence of an echogenic halo and vascularity. The decision model for HER2-amplified subtype was constructed based on the presence of post-acoustic enhancement, calcification, vascularity and advanced age. The model for Triple-negative subtype followed two rules. One was based on irregular shape, lobulate margin contour, the absence of calcification and hypovascularity, whereas the other was based on oval shape, hypovascularity and micro-lobulate margin contour. The accuracies of the models were 83.8%, 77.4%, 87.9% and 92.7%, respectively. We identified specific features of each molecular subtype and expanded the scope of ultrasound for making diagnoses using these decision models. PMID:26046791

  2. Comparing the decision-relevance and utility of alternative ensembles of climate projections in water management and other applications

    NASA Astrophysics Data System (ADS)

    Lempert, R. J.; Tingstad, A.

    2015-12-01

    Decisions to manage the risks of climate change hinge, among many other things, on deeply uncertain and imperfect climate projections. Improving the decision relevance and utility of climate projections requires navigating a trade-off between increasing the physical realism of the model (often by improving the spatial resolution) and increasing the representation of decision-relevant uncertainties. This talk will examine the decision-relevance and utility of alternative ensembles of climate information by comparing two decision support applications, in water management and biodiversity perseveration, both in California. The climate ensembles will consist of different combinations of high and medium resolution projections from NARCCAP (North American Regional Climate Assessment Program) as well as low resolution, but more numerous, projections from the CMIP3 and CMIP5 ensembles. The decision support applications will use the same ensembles of climate projections in different contexts. Workshops with decision makers examine the extent to which the different ensembles lead to different decisions, the extent to which considering a wider range of uncertainty affects decisions, the extent to which decision makers' confidence in the projections and the decisions based on them will be sensitive to the resolution at which they are communicated and the resolution dependent skill, and how the answers to these questions varies with the water management and biodiversity contexts. This study aims to provide empirical evidence to support judgments on how best to use uncertainty climate information in water management and other decision support applications.

  3. Decision trees for denoising in H.264/AVC video sequences

    NASA Astrophysics Data System (ADS)

    Huchet, G.; Chouinard, J.-Y.; Wang, D.; Vincent, A.

    2008-01-01

    All existing video coding standards are based on block-wise motion compensation and block-wise DCT. At high levels of quantization, block-wise motion compensation and transform produces blocking artifacts in the decoded video, a form of distortion to which the human visual system is very sensitive. The latest video coding standard, H.264/AVC, introduces a deblocking filter to reduce the blocking artifacts. However, there is still visible distortion after the filtering when compared to the original video. In this paper, we propose a non-conventional filter to further reduce the distortion and to improve the decoded picture quality. Different from conventional filters, the proposed filter is based on a machine learning algorithm (decision tree). The decision trees are used to classify the filter's inputs and select the best filter coeffcients for the inputs. Experimental results with 4 × 4 DCT indicate that using the filter holds promise in improving the quality of H.264/AVC video sequences.

  4. The xeroderma pigmentosum pathway: decision tree analysis of DNA quality.

    PubMed

    Naegeli, Hanspeter; Sugasawa, Kaoru

    2011-07-15

    The nucleotide excision repair (NER) system is a fundamental cellular stress response that uses only a handful of DNA binding factors, mutated in the cancer-prone syndrome xeroderma pigmentosum (XP), to detect an astounding diversity of bulky base lesions, including those induced by ultraviolet light, electrophilic chemicals, oxygen radicals and further genetic insults. Several of these XP proteins are characterized by a mediocre preference for damaged substrates over the native double helix but, intriguingly, none of them recognizes injured bases with sufficient selectivity to account for the very high precision of bulky lesion excision. Instead, substrate versatility as well as damage specificity and strand selectivity are achieved by a multistage quality control strategy whereby different subunits of the XP pathway, in succession, interrogate the DNA double helix for a distinct abnormality in its structural or dynamic parameters. Through this step-by-step filtering procedure, the XP proteins operate like a systematic decision making tool, generally known as decision tree analysis, to sort out rare damaged bases embedded in a vast excess of native DNA. The present review is focused on the mechanisms by which multiple XP subunits of the NER pathway contribute to the proposed decision tree analysis of DNA quality in eukaryotic cells. PMID:21684221

  5. Finding the right decision tree's induction strategy for a hard real world problem.

    PubMed

    Zorman, M; Podgorelec, V; Kokol, P; Peterson, M; Sprogar, M; Ojstersek, M

    2001-09-01

    Decision trees have been already successfully used in medicine, but as in traditional statistics, some hard real world problems can not be solved successfully using the traditional way of induction. In our experiments we tested various methods for building univariate decision trees in order to find the best induction strategy. On a hard real world problem of the Orthopaedic fracture data with 2637 cases, described by 23 attributes and a decision with three possible values, we built decision trees with four classical approaches, one hybrid approach where we combined neural networks and decision trees, and with an evolutionary approach. The results show that all approaches had problems with either accuracy, sensitivity, or decision tree size. The comparison shows that the best compromise in hard real world problem decision trees building is the evolutionary approach. PMID:11518670

  6. Topographic controls on overland flow generation in a forest - An ensemble tree approach

    NASA Astrophysics Data System (ADS)

    Loos, Martin; Elsenbeer, Helmut

    2011-10-01

    SummaryOverland flow is an important hydrological pathway in many forests of the humid tropics. Its generation is subject to topographic controls at differing spatial scales. Our objective was to identify such controls on the occurrence of overland flow in a lowland tropical rainforest. To this end, we installed 95 overland flow detectors (OFDs) in four nested subcatchments of the Lutzito catchment on Barro Colorado Island, Panama, and monitored the frequency of overland flow occurrence during 18 rainfall events at each OFD location temporal frequency. For each such location, we derived three non-digital terrain attributes and 17 digital ones, of which 15 were based on Digital Elevation Models (DEMs) of three different resolutions. These attributes then served as input into a Random Forest ensemble tree model to elucidate the importance and partial and joint dependencies of topographic controls for overland flow occurrence. Lutzito features a high median temporal frequency in overland flow occurrence of 0.421 among OFD locations. However, spatial temporal frequencies of overland flow occurrence vary strongly among these locations and the subcatchments of Lutzito catchment. This variability is best explained by (1) microtopography, (2) coarse terrain sloping and (3) various measures of distance-to-channel, with the contribution of all other terrain attributes being small. Microtopographic features such as concentrated flowlines and wash areas produce highest temporal frequencies, whereas the occurrence of overland flow drops sharply for flow distances and terrain sloping beyond certain threshold values. Our study contributes to understanding both the spatial controls on overland flow generation and the limitations of terrain attributes for the spatially explicit prediction of overland flow frequencies.

  7. An efficient tree classifier ensemble-based approach for pedestrian detection.

    PubMed

    Xu, Yanwu; Cao, Xianbin; Qiao, Hong

    2011-02-01

    Classification-based pedestrian detection systems (PDSs) are currently a hot research topic in the field of intelligent transportation. A PDS detects pedestrians in real time on moving vehicles. A practical PDS demands not only high detection accuracy but also high detection speed. However, most of the existing classification-based approaches mainly seek for high detection accuracy, while the detection speed is not purposely optimized for practical application. At the same time, the performance, particularly the speed, is primarily tuned based on experiments without theoretical foundations, leading to a long training procedure. This paper starts with measuring and optimizing detection speed, and then a practical classification-based pedestrian detection solution with high detection speed and training speed is described. First, an extended classification/detection speed metric, named feature-per-object (fpo), is proposed to measure the detection speed independently from execution. Then, an fpo minimization model with accuracy constraints is formulated based on a tree classifier ensemble, where the minimum fpo can guarantee the highest detection speed. Finally, the minimization problem is solved efficiently by using nonlinear fitting based on radial basis function neural networks. In addition, the optimal solution is directly used to instruct classifier training; thus, the training speed could be accelerated greatly. Therefore, a rapid and accurate classification-based detection technique is proposed for the PDS. Experimental results on urban traffic videos show that the proposed method has a high detection speed with an acceptable detection rate and a false-alarm rate for onboard detection; moreover, the training procedure is also very fast. PMID:20457550

  8. Decision Tree, Bagging and Random Forest methods detect TEC seismo-ionospheric anomalies around the time of the Chile, (Mw = 8.8) earthquake of 27 February 2010

    NASA Astrophysics Data System (ADS)

    Akhoondzadeh, Mehdi

    2016-06-01

    In this paper for the first time ensemble methods including Decision Tree, Bagging and Random Forest have been proposed in the field of earthquake precursors to detect GPS-TEC (Total Electron Content) seismo-ionospheric anomalies around the time and location of Chile earthquake of 27 February 2010. All of the implemented ensemble methods detected a striking anomaly in time series of TEC data, 1 day after the earthquake at 14:00 UTC. The results indicate that the proposed methods due to their performance, speed and simplicity are quite promising and deserve serious attention as a new predictor tools for seismo-ionospheric anomalies detection.

  9. Toward the Decision Tree for Inferring Requirements Maturation Types

    NASA Astrophysics Data System (ADS)

    Nakatani, Takako; Kondo, Narihito; Shirogane, Junko; Kaiya, Haruhiko; Hori, Shozo; Katamine, Keiichi

    Requirements are elicited step by step during the requirements engineering (RE) process. However, some types of requirements are elicited completely after the scheduled requirements elicitation process is finished. Such a situation is regarded as problematic situation. In our study, the difficulties of eliciting various kinds of requirements is observed by components. We refer to the components as observation targets (OTs) and introduce the word “Requirements maturation.” It means when and how requirements are elicited completely in the project. The requirements maturation is discussed on physical and logical OTs. OTs Viewed from a logical viewpoint are called logical OTs, e.g. quality requirements. The requirements of physical OTs, e.g., modules, components, subsystems, etc., includes functional and non-functional requirements. They are influenced by their requesters' environmental changes, as well as developers' technical changes. In order to infer the requirements maturation period of each OT, we need to know how much these factors influence the OTs' requirements maturation. According to the observation of actual past projects, we defined the PRINCE (Pre Requirements Intelligence Net Consideration and Evaluation) model. It aims to guide developers in their observation of the requirements maturation of OTs. We quantitatively analyzed the actual cases with their requirements elicitation process and extracted essential factors that influence the requirements maturation. The results of interviews of project managers are analyzed by WEKA, a data mining system, from which the decision tree was derived. This paper introduces the PRINCE model and the category of logical OTs to be observed. The decision tree that helps developers infer the maturation type of an OT is also described. We evaluate the tree through real projects and discuss its ability to infer the requirements maturation types.

  10. Using decision trees to understand structure in missing data

    PubMed Central

    Tierney, Nicholas J; Harden, Fiona A; Harden, Maurice J; Mengersen, Kerrie L

    2015-01-01

    Objectives Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting Data taken from employees at 3 different industrial sites in Australia. Participants 7915 observations were included. Materials and methods The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions Researchers are encouraged to use CART and BRT models to explore and understand missing data. PMID:26124509

  11. DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION

    SciTech Connect

    Vasconcellos, E. C.; Ruiz, R. S. R.; De Carvalho, R. R.; Capelato, H. V.; Gal, R. R.; LaBarbera, F. L.; Frago Campos Velho, H.; Trevisan, M.

    2011-06-15

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 {<=} r {<=} 21 (85.2%) and r {>=} 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 {<=} r {<=} 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination ({approx}2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 {<=} r {<=} 21.

  12. A Novel Approach on Designing Augmented Fuzzy Cognitive Maps Using Fuzzified Decision Trees

    NASA Astrophysics Data System (ADS)

    Papageorgiou, Elpiniki I.

    This paper proposes a new methodology for designing Fuzzy Cognitive Maps using crisp decision trees that have been fuzzified. Fuzzy cognitive map is a knowledge-based technique that works as an artificial cognitive network inheriting the main aspects of cognitive maps and artificial neural networks. Decision trees, in the other hand, are well known intelligent techniques that extract rules from both symbolic and numeric data. Fuzzy theoretical techniques are used to fuzzify crisp decision trees in order to soften decision boundaries at decision nodes inherent in this type of trees. Comparisons between crisp decision trees and the fuzzified decision trees suggest that the later fuzzy tree is significantly more robust and produces a more balanced decision making. The approach proposed in this paper could incorporate any type of fuzzy decision trees. Through this methodology, new linguistic weights were determined in FCM model, thus producing augmented FCM tool. The framework is consisted of a new fuzzy algorithm to generate linguistic weights that describe the cause-effect relationships among the concepts of the FCM model, from induced fuzzy decision trees.

  13. Ethical decision-making made easier. The use of decision trees in case management.

    PubMed

    Storl, H; DuBois, B; Seline, J

    1999-01-01

    Case managers have never before faced the multitude of difficult ethical dilemmas that now confront them daily. Legal, medical, social, and ethical considerations often fly in the face of previously reliable intuitions. The importance and urgency of facing these dilemmas head-on has resulted in clear calls for action. What are the appropriate legal, ethical, and professional parameters for effective decision making? Are normatively sensitive, but also practically sensible protocols possible? In an effort to address these concerns, Alternatives for the Older Adult, Inc., Rock Island, Illinois established an ethics committee to look into possible means of resolving or dissolving commonly occurring dilemmas. As a result of year-long deliberations, the committee formulated a decision-making strategy whose central apparatus is the decision tree--a flowchart of reasonable decisions and their consequent implications. In this article, we explore the development of this approach as well as the theory that underlies it. PMID:10695172

  14. Decision-relevant early-warning thresholds for ensemble flood forecasting systems

    NASA Astrophysics Data System (ADS)

    Stephens, Liz; Pappenberger, Florian; Cloke, Hannah; Alfieri, Lorenzo

    2014-05-01

    Over and under warning of potential future floods is problematic for decision-making, and could ultimately lead to trust being lost in the forecasts. The use of ensemble flood forecasting systems for early warning therefore requires a consideration of how to determine and implement decision-relevant thresholds for flood magnitude and probability. This study uses a year's worth of hindcasts from the Global Flood Awareness System (GloFAS) to explore the sensitivity of the warning system to the choice of threshold. We use a number of different methods for choosing these thresholds, building on current approaches that use model climatologies to determine the critical flow magnitudes, to those that can provide 'first guesses' of potential impacts (through integration with global-scale inundation mapping), as well as methods that could incorporate resource limitations.

  15. Applying an Ensemble Classification Tree Approach to the Prediction of Completion of a 12-Step Facilitation Intervention with Stimulant Abusers

    PubMed Central

    Doyle, Suzanne R.; Donovan, Dennis M.

    2014-01-01

    Aims The purpose of this study was to explore the selection of predictor variables in the evaluation of drug treatment completion using an ensemble approach with classification trees. The basic methodology is reviewed and the subagging procedure of random subsampling is applied. Methods Among 234 individuals with stimulant use disorders randomized to a 12-Step facilitative intervention shown to increase stimulant use abstinence, 67.52% were classified as treatment completers. A total of 122 baseline variables were used to identify factors associated with completion. Findings The number of types of self-help activity involvement prior to treatment was the predominant predictor. Other effective predictors included better coping self-efficacy for substance use in high-risk situations, more days of prior meeting attendance, greater acceptance of the Disease model, higher confidence for not resuming use following discharge, lower ASI Drug and Alcohol composite scores, negative urine screens for cocaine or marijuana, and fewer employment problems. Conclusions The application of an ensemble subsampling regression tree method utilizes the fact that classification trees are unstable but, on average, produce an improved prediction of the completion of drug abuse treatment. The results support the notion there are early indicators of treatment completion that may allow for modification of approaches more tailored to fitting the needs of individuals and potentially provide more successful treatment engagement and improved outcomes. PMID:25134038

  16. Applying an ensemble classification tree approach to the prediction of completion of a 12-step facilitation intervention with stimulant abusers.

    PubMed

    Doyle, Suzanne R; Donovan, Dennis M

    2014-12-01

    The purpose of this study was to explore the selection of predictor variables in the evaluation of drug treatment completion using an ensemble approach with classification trees. The basic methodology is reviewed, and the subagging procedure of random subsampling is applied. Among 234 individuals with stimulant use disorders randomized to a 12-step facilitative intervention shown to increase stimulant use abstinence, 67.52% were classified as treatment completers. A total of 122 baseline variables were used to identify factors associated with completion. The number of types of self-help activity involvement prior to treatment was the predominant predictor. Other effective predictors included better coping self-efficacy for substance use in high-risk situations, more days of prior meeting attendance, greater acceptance of the Disease model, higher confidence for not resuming use following discharge, lower Addiction Severity Index (ASI) Drug and Alcohol composite scores, negative urine screens for cocaine or marijuana, and fewer employment problems. The application of an ensemble subsampling regression tree method utilizes the fact that classification trees are unstable but, on average, produce an improved prediction of the completion of drug abuse treatment. The results support the notion there are early indicators of treatment completion that may allow for modification of approaches more tailored to fitting the needs of individuals and potentially provide more successful treatment engagement and improved outcomes. PMID:25134038

  17. Ensemble empirical mode decomposition as a tool of lake sediments and tree-ring width chronologies investigation

    NASA Astrophysics Data System (ADS)

    Ovchinnikov, Dmitriy; Mordvinov, Alexandr; Kalugin, Ivan; Darin, Andrey; Myglan, Vladimir

    2014-05-01

    A method named ensemble empirical mode decomposition (EEMD) was used to analyse different paleoclimatic data such as non-varved lake sediments of the Teletskoye lake and long tree-ring width chronologies from the Altai region (Altai Mountains, South Siberia, Russia) in the late Holocene (2000 years). Core of the bottom sediments from the Teletskoe lake (Altai Mountains) were investigated using scanning X-ray fluorescent analysis method with synchrotron radiation (spatial resolution is 0.1 mm). Low-frequency signals (modes) were extracted from both paleoarchives and shown: ~ 60, ~ 100, ~ 200, ~ 300-500 and ~1000-year cycles in the Teletskoye lake; ~ 25-33, ~ 50-60, ~100- 200, ~ 300 and ~ 1000 year cycles in tree-ring width chronologies. A common 200-year cycle was found in both archives. Also EEMD method was used to analyse a solar-activity during late Holocene. The magnetic solar activity well associated with tree-ring width chronologies. Changes of the tree-ring width chronology on the millennial time scale coincide with similar changes of the solar activity in the Holocene. Stable relationships between solar activity and climate characteristics are found on 100-200 years time scales (Glaysberg and Suess cycles). The magnetic solar activity and paleotemperature changes are observed as solar-terrestrial relations on a large time scale. It is indicate that the temperature increase in the 19-20 centuries is largely due to the impact of solar activity on the Earth's climate system. Solar-terrestrial relations analysis shown common 200-year cycle in all presented paleoarchives. The study was funded by: Interdisciplinary Integration Project SB RAS # 34 and grants # 13-05-00620 from the Russian Foundation for Basic Research. Key words: ensemble empirical mode decomposition (EEMD), lake sediments, tree-ring width chronologies, solar-terrestrial relations

  18. Prediction model based on decision tree analysis for laccase mediators.

    PubMed

    Medina, Fabiola; Aguila, Sergio; Baratto, Maria Camilla; Martorana, Andrea; Basosi, Riccardo; Alderete, Joel B; Vazquez-Duhalt, Rafael

    2013-01-10

    A Structure Activity Relationship (SAR) study for laccase mediator systems was performed in order to correctly classify different natural phenolic mediators. Decision tree (DT) classification models with a set of five quantum-chemical calculated molecular descriptors were used. These descriptors included redox potential (ɛ°), ionization energy (E(i)), pK(a), enthalpy of formation of radical (Δ(f)H), and OH bond dissociation energy (D(O-H)). The rationale for selecting these descriptors is derived from the laccase-mediator mechanism. To validate the DT predictions, the kinetic constants of different compounds as laccase substrates, their ability for pesticide transformation as laccase-mediators, and radical stability were experimentally determined using Coriolopsis gallica laccase and the pesticide dichlorophen. The prediction capability of the DT model based on three proposed descriptors showed a complete agreement with the obtained experimental results. PMID:23199741

  19. Classification of Liss IV Imagery Using Decision Tree Methods

    NASA Astrophysics Data System (ADS)

    Verma, Amit Kumar; Garg, P. K.; Prasad, K. S. Hari; Dadhwal, V. K.

    2016-06-01

    Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green) has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness) with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  20. The value of decision tree analysis in planning anaesthetic care in obstetrics.

    PubMed

    Bamber, J H; Evans, S A

    2016-08-01

    The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed. PMID:27026589

  1. Classification of Subcellular Phenotype Images by Decision Templates for Classifier Ensemble

    NASA Astrophysics Data System (ADS)

    Zhang, Bailing

    2010-01-01

    Subcellular localization is a key functional characteristic of proteins. An automatic, reliable and efficient prediction system for protein subcellular localization is needed for large-scale genome analysis. The automated cell phenotype image classification problem is an interesting "bioimage informatics" application. It can be used for establishing knowledge of the spatial distribution of proteins within living cells and permits to screen systems for drug discovery or for early diagnosis of a disease. In this paper, three well-known texture feature extraction methods including local binary patterns (LBP), Gabor filtering and Gray Level Coocurrence Matrix (GLCM) have been applied to cell phenotype images and the multiple layer perceptron (MLP) method has been used to classify cell phenotype image. After classification of the extracted features, decision-templates ensemble algorithm (DT) is used to combine base classifiers built on the different feature sets. Different texture feature sets can provide sufficient diversity among base classifiers, which is known as a necessary condition for improvement in ensemble performance. For the HeLa cells, the human classification error rate on this task is of 17% as reported in previous publications. We obtain with our method an error rate of 4.8%.

  2. Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments

    ERIC Educational Resources Information Center

    Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.

    2009-01-01

    The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…

  3. An Improved Decision Tree for Predicting a Major Product in Competing Reactions

    ERIC Educational Resources Information Center

    Graham, Kate J.

    2014-01-01

    When organic chemistry students encounter competing reactions, they are often overwhelmed by the task of evaluating multiple factors that affect the outcome of a reaction. The use of a decision tree is a useful tool to teach students to evaluate a complex situation and propose a likely outcome. Specifically, a decision tree can help students…

  4. Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model

    NASA Astrophysics Data System (ADS)

    Jung, M.; Reichstein, M.; Bondeau, A.

    2009-10-01

    Global, spatially and temporally explicit estimates of carbon and water fluxes derived from empirical up-scaling eddy covariance measurements would constitute a new and possibly powerful data stream to study the variability of the global terrestrial carbon and water cycle. This paper introduces and validates a machine learning approach dedicated to the upscaling of observations from the current global network of eddy covariance towers (FLUXNET). We present a new model TRee Induction ALgorithm (TRIAL) that performs hierarchical stratification of the data set into units where particular multiple regressions for a target variable hold. We propose an ensemble approach (Evolving tRees with RandOm gRowth, ERROR) where the base learning algorithm is perturbed in order to gain a diverse sequence of different model trees which evolves over time. We evaluate the efficiency of the model tree ensemble (MTE) approach using an artificial data set derived from the Lund-Potsdam-Jena managed Land (LPJmL) biosphere model. We aim at reproducing global monthly gross primary production as simulated by LPJmL from 1998-2005 using only locations and months where high quality FLUXNET data exist for the training of the model trees. The model trees are trained with the LPJmL land cover and meteorological input data, climate data, and the fraction of absorbed photosynthetic active radiation simulated by LPJmL. Given that we know the "true result" in the form of global LPJmL simulations we can effectively study the performance of the MTE upscaling and associated problems of extrapolation capacity. We show that MTE is able to explain 92% of the variability of the global LPJmL GPP simulations. The mean spatial pattern and the seasonal variability of GPP that constitute the largest sources of variance are very well reproduced (96% and 94% of variance explained respectively) while the monthly interannual anomalies which occupy much less variance are less well matched (41% of variance explained

  5. Decision trees and decision committee applied to star/galaxy separation problem

    NASA Astrophysics Data System (ADS)

    Vasconcellos, Eduardo Charles

    Vasconcellos et al [1] study the efficiency of 13 diferente decision tree algorithms applied to photometric data in the Sloan Digital Sky Digital Survey Data Release Seven (SDSS-DR7) to perform star/galaxy separation. Each algorithm is defined by a set fo parameters which, when varied, produce diferente final classifications trees. In that work we extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. We find that Functional Tree algorithm (FT) yields the best results by the mean completeness function (galaxy true positive rate) in two magnitude intervals:14<=r<=21 (85.2%) and r>=19 (82.1%). We compare FT classification to the SDSS parametric, 2DPHOT and Ball et al (2006) classifications. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination ( 2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 <= r <= 21. We now study the performance of a decision committee composed by FT classifiers. We will train six FT classifiers with random selected objects from the same 884,126 SDSS-DR7 objects with spectroscopic data that we use before. Both, the decision commitee and our previous single FT classifier will be applied to the new ojects from SDSS data releses eight, nine and ten. Finally we will compare peformances of both methods in this new data set. [1] Vasconcellos, E. C.; de Carvalho, R. R.; Gal, R. R.; LaBarbera, F. L.; Capelato, H. V.; Fraga Campos Velho, H.; Trevisan, M.; Ruiz, R. S. R

  6. The decision tree classifier - Design and potential. [for Landsat-1 data

    NASA Technical Reports Server (NTRS)

    Hauska, H.; Swain, P. H.

    1975-01-01

    A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.

  7. Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with ...

  8. Accurate estimation of retinal vessel width using bagged decision trees and an extended multiresolution Hermite model.

    PubMed

    Lupaşcu, Carmen Alina; Tegolo, Domenico; Trucco, Emanuele

    2013-12-01

    We present an algorithm estimating the width of retinal vessels in fundus camera images. The algorithm uses a novel parametric surface model of the cross-sectional intensities of vessels, and ensembles of bagged decision trees to estimate the local width from the parameters of the best-fit surface. We report comparative tests with REVIEW, currently the public database of reference for retinal width estimation, containing 16 images with 193 annotated vessel segments and 5066 profile points annotated manually by three independent experts. Comparative tests are reported also with our own set of 378 vessel widths selected sparsely in 38 images from the Tayside Scotland diabetic retinopathy screening programme and annotated manually by two clinicians. We obtain considerably better accuracies compared to leading methods in REVIEW tests and in Tayside tests. An important advantage of our method is its stability (success rate, i.e., meaningful measurement returned, of 100% on all REVIEW data sets and on the Tayside data set) compared to a variety of methods from the literature. We also find that results depend crucially on testing data and conditions, and discuss criteria for selecting a training set yielding optimal accuracy. PMID:24001930

  9. ArborZ: Photometric Redshifts Using Boosted Decision Trees

    NASA Astrophysics Data System (ADS)

    Gerdes, David W.; Sypniewski, Adam J.; McKay, Timothy A.; Hao, Jiangang; Weis, Matthew R.; Wechsler, Risa H.; Busha, Michael T.

    2010-06-01

    Precision photometric redshifts will be essential for extracting cosmological parameters from the next generation of wide-area imaging surveys. In this paper, we introduce a photometric redshift algorithm, ArborZ, based on the machine-learning technique of boosted decision trees. We study the algorithm using galaxies from the Sloan Digital Sky Survey (SDSS) and from mock catalogs intended to simulate both the SDSS and the upcoming Dark Energy Survey. We show that it improves upon the performance of existing algorithms. Moreover, the method naturally leads to the reconstruction of a full probability density function (PDF) for the photometric redshift of each galaxy, not merely a single "best estimate" and error, and also provides a photo-z quality figure of merit for each galaxy that can be used to reject outliers. We show that the stacked PDFs yield a more accurate reconstruction of the redshift distribution N(z). We discuss limitations of the current algorithm and ideas for future work.

  10. Classification of dopamine, serotonin, and dual antagonists by decision trees.

    PubMed

    Kim, Hye-Jung; Choo, Hyunah; Cho, Yong Seo; Koh, Hun Yeong; No, Kyoung Tai; Pae, Ae Nim

    2006-04-15

    Dopamine antagonists (DA), serotonin antagonists (SA), and serotonin-dopamine dual antagonists (Dual) are being used as antipsychotics. A lot of dopamine and serotonin antagonists reveal non-selective binding affinity against these two receptors because the antagonists share structurally common features originated from conserved residues of binding site of the aminergic receptor family. Therefore, classification of dopamine and serotonin antagonists into their own receptors can be useful in the designing of selective antagonist for individual therapy of antipsychotic disorders. Data set containing 1135 dopamine antagonists (D2, D3, and D4), 1251 serotonin antagonists (5-HT1A, 5-HT2A, and 5-HT2C), and 386 serotonin-dopamine dual antagonists was collected from the MDDR database. Cerius2 descriptors were employed to develop a classification model for the 2772 compounds with antipsychotic activity. LDA (linear discriminant analysis), SIMCA (soft independent modeling of class analogy), RP (recursive partitioning), and ANN (artificial neural network) algorithms successfully classified the active class of each compound at the average 73.6% and predicted at the average 69.8%. The decision trees from RP, the best model, were generated to identify and interpret those descriptors that discriminate the active classes more easily. These classification models could be used as a virtual screening tool to predict the active class of new candidates. PMID:16387502

  11. ArborZ: PHOTOMETRIC REDSHIFTS USING BOOSTED DECISION TREES

    SciTech Connect

    Gerdes, David W.; Sypniewski, Adam J.; McKay, Timothy A.; Hao, Jiangang; Weis, Matthew R.; Wechsler, Risa H.; Busha, Michael T.

    2010-06-01

    Precision photometric redshifts will be essential for extracting cosmological parameters from the next generation of wide-area imaging surveys. In this paper, we introduce a photometric redshift algorithm, ArborZ, based on the machine-learning technique of boosted decision trees. We study the algorithm using galaxies from the Sloan Digital Sky Survey (SDSS) and from mock catalogs intended to simulate both the SDSS and the upcoming Dark Energy Survey. We show that it improves upon the performance of existing algorithms. Moreover, the method naturally leads to the reconstruction of a full probability density function (PDF) for the photometric redshift of each galaxy, not merely a single 'best estimate' and error, and also provides a photo-z quality figure of merit for each galaxy that can be used to reject outliers. We show that the stacked PDFs yield a more accurate reconstruction of the redshift distribution N(z). We discuss limitations of the current algorithm and ideas for future work.

  12. Learning from examples - Generation and evaluation of decision trees for software resource analysis

    NASA Technical Reports Server (NTRS)

    Selby, Richard W.; Porter, Adam A.

    1988-01-01

    A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.

  13. Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine

    NASA Technical Reports Server (NTRS)

    Schwabacher, Mark A.; Aguilar, Robert; Figueroa, Fernando F.

    2009-01-01

    The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically "learns" a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to "train" and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it "learned" a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location.

  14. Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    NASA Astrophysics Data System (ADS)

    Kim, Jong Kyu; Kim, Nam Soo

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  15. Supervised hashing using graph cuts and boosted decision trees.

    PubMed

    Lin, Guosheng; Shen, Chunhua; Hengel, Anton van den

    2015-11-01

    To build large-scale query-by-example image retrieval systems, embedding image features into a binary Hamming space provides great benefits. Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the binary Hamming space. Most existing approaches apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of those methods, and can result in complex optimization problems that are difficult to solve. In this work we proffer a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. The proposed framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problem-specific hashing methods. Our framework decomposes the hashing learning problem into two steps: binary code (hash bit) learning and hash function learning. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training a standard binary classifier. For solving large-scale binary code inference, we show how it is possible to ensure that the binary quadratic problems are submodular such that efficient graph cut methods may be used. To achieve efficiency as well as efficacy on large-scale high-dimensional data, we propose to use boosted decision trees as the hash functions, which are nonlinear, highly descriptive, and are very fast to train and evaluate. Experiments demonstrate that the proposed method significantly outperforms most state-of-the-art methods, especially on high-dimensional data. PMID:26440270

  16. Prediction of Regional Streamflow Frequency using Model Tree Ensembles: A data-driven approach based on natural and anthropogenic drainage area characteristics

    NASA Astrophysics Data System (ADS)

    Schnier, S.; Cai, X.

    2012-12-01

    This study introduces a highly accurate data-driven method to predict streamflow frequency statistics based on known drainage area characteristics which yields insights into the dominant controls of regional streamflow. The model is enhanced by explicit consideration of human interference in local hydrology. The basic idea is to use decision trees (i.e., regression trees) to regionalize the dataset and create a model tree by fitting multi-linear equations to the leaves of the regression tree. We improve model accuracy and obtain a measure of variable importance by creating an ensemble of randomized model trees using bootstrap aggregation (i.e., bagging). The database used to induce the models is built from public domain drainage area characteristics for 715 USGS stream gages (455 in Texas and 260 in Illinois). The database includes information on natural characteristics such as precipitation, soil type and slope, as well as anthropogenic ones including land cover, human population and water use. Model accuracy was evaluated using cross-validation and several performance metrics. During the validation, the gauges that are withheld from the analysis represent ungauged watersheds. The proposed method outperforms standard regression models such as the method of residuals for predictions in ungauged watersheds. Importantly, out-of-bag variable importance combined with models for 17 points along the flow duration curve (FDC) (i.e., from 0% to 100% exceedance frequency) yields insight into the dominant controls of regional streamflow. The most discriminant variables for high flows are drainage area and seasonal precipitation. Discriminant variables for low flows are more complex and model accuracy is improved with base-flow data, which is particularly difficult to obtain for ungauged sites. Consideration of human activities, such as percent urban and water use, is also shown to improve accuracy of low flow predictions. Drainage area characteristics, especially

  17. Application of preprocessing filtering on Decision Tree C4.5 and rough set theory

    NASA Astrophysics Data System (ADS)

    Chan, Joseph C. C.; Lin, Tsau Y.

    2001-03-01

    This paper compares two artificial intelligence methods: the Decision Tree C4.5 and Rough Set Theory on the stock market data. The Decision Tree C4.5 is reviewed with the Rough Set Theory. An enhanced window application is developed to facilitate the pre-processing filtering by introducing the feature (attribute) transformations, which allows users to input formulas and create new attributes. Also, the application produces three varieties of data set with delaying, averaging, and summation. The results prove the improvement of pre-processing by applying feature (attribute) transformations on Decision Tree C4.5. Moreover, the comparison between Decision Tree C4.5 and Rough Set Theory is based on the clarity, automation, accuracy, dimensionality, raw data, and speed, which is supported by the rules sets generated by both algorithms on three different sets of data.

  18. A Decision Tree Approach to the Interpretation of Multivariate Statistical Techniques.

    ERIC Educational Resources Information Center

    Fok, Lillian Y.; And Others

    1995-01-01

    Discusses the nature, power, and limitations of four multivariate techniques: factor analysis, multiple analysis of variance, multiple regression, and multiple discriminant analysis. Shows how decision trees assist in interpreting results. (SK)

  19. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis

    PubMed Central

    Lo, Benjamin W. Y.; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R. Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A. H.

    2016-01-01

    Background: Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). Methods: The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Results: Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56–2.45, P < 0.01). Conclusions: A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH. PMID:27512607

  20. Application of decision tree model for the ground subsidence hazard mapping near abandoned underground coal mines.

    PubMed

    Lee, Saro; Park, Inhye

    2013-09-30

    Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. PMID:23702378

  1. Combining evolutionary algorithms with oblique decision trees to detect bent double galaxies

    SciTech Connect

    Cantu-Paz, E; Kamath, C

    2000-06-22

    Decision trees have long been popular in classification as they use simple and easy-to-understand tests at each node. Most variants of decision trees test a single attribute at a node, leading to axis-parallel trees, where the test results in a hyperplane which is parallel to one of the dimensions in the attribute space. These trees can be rather large and inaccurate in cases where the concept to be learnt is best approximated by oblique hyperplanes. In such cases, it may be more appropriate to use an oblique decision tree, where the decision at each node is a linear combination of the attributes. Oblique decision trees have not gained wide popularity in part due to the complexity of constructing good oblique splits and the tendency of existing splitting algorithms to get stuck in local minima. Several alternatives have been proposed to handle these problems including randomization in conjunction with deterministic hill climbing and the use of simulated annealing. In this paper, they use evolutionary algorithms (EAs) to determine the split. EAs are well suited for this problem because of their global search properties, their tolerance to noisy fitness evaluations, and their scalability to large dimensional search spaces. They demonstrate the technique on a practical problem from astronomy, namely, the classification of galaxies with a bent-double morphology, and describe their experiences with several split evaluation criteria.

  2. Using GEFS ensemble forecasts for decision making in reservoir management in California

    NASA Astrophysics Data System (ADS)

    Scheuerer, M.; Hamill, T.; Webb, R. S.

    2015-12-01

    Reservoirs such as Lake Mendocino in California's Russian River Basin provide flood control, water supply, recreation, and environmental stream flow regulation. Many of these reservoirs are operated by the U.S. Army Corps of Engineers (Corps) according to water control manuals that specify elevations for an upper volume of reservoir storage that must be kept available for capturing storm runoff and reducing flood risk, and a lower volume of storage that may be used for water supply. During extreme rainfall events, runoff is captured by these reservoirs and released as quickly as possible to create flood storage space for another potential storm. These flood control manuals are based on typical historical weather patterns - wet during the winter, dry otherwise - but are not informed directly by weather prediction. Alternative reservoir management approaches such as Forecast-Informed Reservoir Operations (FIRO), which seek to incorporate advances in weather prediction, are currently being explored as means to improve water supply availability while maintaining flood risk reduction and providing additional ecosystem benefits.We present results from a FIRO proof-of-concept study investigating the reliability of post-processed GEFS ensemble forecasts to predict the probability that day 6-to-10 precipitation accumulations in certain areas in California exceed a high threshold. Our results suggest that reliable forecast guidance can be provided, and the resulting probabilities could be used to inform decisions to release or hold water in the reservoirs. We illustrate the potential of these forecasts in a case study of extreme event probabilities for the Russian River Basin in California.

  3. Production of diagnostic rules from a neurotologic database with decision trees.

    PubMed

    Kentala, E; Viikki, K; Pyykkö, I; Juhola, M

    2000-02-01

    A decision tree is an artificial intelligence program that is adaptive and is closely related to a neural network, but can handle missing or nondecisive data in decision-making. Data on patients with Meniere's disease, vestibular schwannoma, traumatic vertigo, sudden deafness, benign paroxysmal positional vertigo, and vestibular neuritis were retrieved from the database of the otoneurologic expert system ONE for the development and testing of the accuracy of decision trees in the diagnostic workup. Decision trees were constructed separately for each disease. The accuracies of the best decision trees were 94%, 95%, 99%, 99%, 100%, and 100% for the respective diseases. The most important questions concerned the presence of vertigo, hearing loss, and tinnitus; duration of vertigo; frequency of vertigo attacks; severity of rotational vertigo; onset and type of hearing loss; and occurrence of head injury in relation to the timing of onset of vertigo. Meniere's disease was the most difficult to classify correctly. The validity and structure of the decision trees are easily comprehended and can be used outside the expert system. PMID:10685569

  4. Ensemble Methods

    NASA Astrophysics Data System (ADS)

    Re, Matteo; Valentini, Giorgio

    2012-03-01

    Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been

  5. Ensemble Methods

    NASA Astrophysics Data System (ADS)

    Re, Matteo; Valentini, Giorgio

    2012-03-01

    Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been

  6. Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data

    PubMed Central

    2012-01-01

    Background This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor. PMID:23171000

  7. Use of a decision tree to select the mud system for the Oso field, Nigeria

    SciTech Connect

    Dear, S.F. III; Beasley, R.D.; Barr, K.P.

    1995-10-01

    Far too often, the basis for selection of a mud system is the ``latest, greatest`` technology or personal preference rather than sound cost-effective analysis. The use of risk-vs.-cost decision analysis improves mud selection and makes it a proper business decision. Several mud systems usually are available to drill and well and, with good decision analysis, the cost-effectiveness of each alternative becomes apparent. This paper describes how the drilling team used structured decision analysis to evaluate and select the best mud system for the project. First, Monte Carlo simulations forecast the range of possible results with each alternative. The simulations provide most-likely values for the variables in the decision tree, including reasonable ranges for sensitivity analyses. This paper presents and discusses the simulations, the decision tree, and the sensitivity analyses.

  8. How to pose the question matters: Behavioural Economics concepts in decision making on the basis of ensemble forecasts

    NASA Astrophysics Data System (ADS)

    Alfonso, Leonardo; van Andel, Schalk Jan

    2014-05-01

    Part of recent research in ensemble and probabilistic hydro-meteorological forecasting analyses which probabilistic information is required by decision makers and how it can be most effectively visualised. This work, in addition, analyses if decision making in flood early warning is also influenced by the way the decision question is posed. For this purpose, the decision-making game "Do probabilistic forecasts lead to better decisions?", which Ramos et al (2012) conducted at the EGU General Assembly 2012 in the city of Vienna, has been repeated with a small group and expanded. In that game decision makers had to decide whether or not to open a flood release gate, on the basis of flood forecasts, with and without uncertainty information. A conclusion of that game was that, in the absence of uncertainty information, decision makers are compelled towards a more risk-averse attitude. In order to explore to what extent the answers were driven by the way the questions were framed, in addition to the original experiment, a second variant was introduced where participants were asked to choose between a sure value (for either loosing or winning with a giving probability) and a gamble. This set-up is based on Kahneman and Tversky (1979). Results indicate that the way how the questions are posed may play an important role in decision making and that Prospect Theory provides promising concepts to further understand how this works.

  9. Combining evolutionary algorithms with oblique decision trees to detect bent-double galaxies

    NASA Astrophysics Data System (ADS)

    Cantu-Paz, Erick; Kamath, Chandrika

    2000-10-01

    Decision tress have long been popular in classification as they use simple and easy-to-understand tests at each node. Most variants of decision trees test a single attribute at a node, leading to axis- parallel trees, where the test results in a hyperplane which is parallel to one of the dimensions in the attribute space. These trees can be rather large and inaccurate in cases where the concept to be learned is best approximated by oblique hyperplanes. In such cases, it may be more appropriate to use an oblique decision tree, where the decision at each node is a linear combination of the attributes. Oblique decision trees have not gained wide popularity in part due to the complexity of constructing good oblique splits and the tendency of existing splitting algorithms to get stuck in local minima. Several alternatives have been proposed to handle these problems including randomization in conjunction wiht deterministic hill-climbing and the use of simulated annealing. In this paper, we use evolutionary algorithms (EAs) to determine the split. EAs are well suited for this problem because of their global search properties, their tolerance to noisy fitness evaluations, and their scalability to large dimensional search spaces. We demonstrate our technique on a synthetic data set, and then we apply it to a practical problem from astronomy, namely, the classification of galaxies with a bent-double morphology. In addition, we describe our experiences with several split evaluation criteria. Our results suggest that, in some cases, the evolutionary approach is faster and more accurate than existing oblique decision tree algorithms. However, for our astronomical data, the accuracy is not significantly different than the axis-parallel trees.

  10. Prediction of Weather Impacted Airport Capacity using Ensemble Learning

    NASA Technical Reports Server (NTRS)

    Wang, Yao Xun

    2011-01-01

    Ensemble learning with the Bagging Decision Tree (BDT) model was used to assess the impact of weather on airport capacities at selected high-demand airports in the United States. The ensemble bagging decision tree models were developed and validated using the Federal Aviation Administration (FAA) Aviation System Performance Metrics (ASPM) data and weather forecast at these airports. The study examines the performance of BDT, along with traditional single Support Vector Machines (SVM), for airport runway configuration selection and airport arrival rates (AAR) prediction during weather impacts. Testing of these models was accomplished using observed weather, weather forecast, and airport operation information at the chosen airports. The experimental results show that ensemble methods are more accurate than a single SVM classifier. The airport capacity ensemble method presented here can be used as a decision support model that supports air traffic flow management to meet the weather impacted airport capacity in order to reduce costs and increase safety.

  11. Using decision trees to characterize verbal communication during change and stuck episodes in the therapeutic process

    PubMed Central

    Masías, Víctor H.; Krause, Mariane; Valdés, Nelson; Pérez, J. C.; Laengle, Sigifredo

    2015-01-01

    Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice. PMID:25914657

  12. Pruning a decision tree for selecting computer-related assistive devices for people with disabilities.

    PubMed

    Chi, Chia-Fen; Tseng, Li-Kai; Jang, Yuh

    2012-07-01

    Many disabled individuals lack extensive knowledge about assistive technology, which could help them use computers. In 1997, Denis Anson developed a decision tree of 49 evaluative questions designed to evaluate the functional capabilities of the disabled user and choose an appropriate combination of assistive devices, from a selection of 26, that enable the individual to use a computer. In general, occupational therapists guide the disabled users through this process. They often have to go over repetitive questions in order to find an appropriate device. A disabled user may require an alphanumeric entry device, a pointing device, an output device, a performance enhancement device, or some combination of these. Therefore, the current research eliminates redundant questions and divides Anson's decision tree into multiple independent subtrees to meet the actual demand of computer users with disabilities. The modified decision tree was tested by six disabled users to prove it can determine a complete set of assistive devices with a smaller number of evaluative questions. The means to insert new categories of computer-related assistive devices was included to ensure the decision tree can be expanded and updated. The current decision tree can help the disabled users and assistive technology practitioners to find appropriate computer-related assistive devices that meet with clients' individual needs in an efficient manner. PMID:22552588

  13. Cloud detection based on decision tree over Tibetan Plateau with MODIS data

    NASA Astrophysics Data System (ADS)

    Xu, Lina; Niu, Ruiqing; Fang, Shenghui; Dong, Yanfang

    2013-10-01

    Snow cover area is a very critical parameter for hydrologic cycle of the Earth. Furthermore, it will be a key factor for the effect of the climate change. An unbelievable situation in mapping snow cover is the existence of clouds. Clouds can easily be found in any image from satellite, because clouds are bright and white in the visible wavelengths. But it is not the case when there is snow or ice in the background. It is similar spectral appearance of snow and clouds. Many cloud decision methods are built on decision trees. The decision trees were designed based on empirical studies and simulations. In this paper a classification trees were used to build the decision tree. And then with a great deal repeating scenes coming from the same area the cloud pixel can be replaced by "its" real surface types, such as snow pixel or vegetation or water. The effect of the cloud can be distinguished in the short wave infrared. The results show that most cloud coverage being removed. A validation was carried out for all subsequent steps. It led to the removal of all remaining cloud cover. The results show that the decision tree method performed satisfied.

  14. Cloud Detection Based on Decision Tree Over Tibetan Plateau with Modis Data

    NASA Astrophysics Data System (ADS)

    Xu, L.; Fang, S.; Niu, R.; Li, J.

    2012-07-01

    Snow cover area is a very critical parameter for hydrologic cycle of the Earth. Furthermore, it will be a key factor for the effect of the climate change. An unbelievable situation in mapping snow cover is the existence of clouds. Clouds can easily be found in any image from satellite, because clouds are bright and white in the visible wavelengths. But it is not the case when there is snow or ice in the background. It is similar spectral appearance of snow and clouds. Many cloud decision methods are built on decision trees. The decision trees were designed based on empirical studies and simulations. In this paper a classification trees were used to build the decision tree. And then with a great deal repeating scenes coming from the same area the cloud pixel can be replaced by "its" real surface types, such as snow pixel or vegetation or water. The effect of the cloud can be distinguished in the short wave infrared. The results show that most cloud coverage being removed. A validation was carried out for all subsequent steps. It led to the removal of all remaining cloud cover. The results show that the decision tree method performed satisfied.

  15. Outsourcing the Portal: Another Branch in the Decision Tree.

    ERIC Educational Resources Information Center

    McMahon, Tim

    2000-01-01

    Discussion of the management of information resources in organizations focuses on the use of portal technologies to update intranet capabilities. Considers application outsourcing decisions, reviews benefits (including reducing costs) as well as concerns, and describes application service providers (ASPs). (LRW)

  16. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    PubMed

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  17. A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem

    PubMed Central

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  18. Post-event human decision errors: operator action tree/time reliability correlation

    SciTech Connect

    Hall, R E; Fragola, J; Wreathall, J

    1982-11-01

    This report documents an interim framework for the quantification of the probability of errors of decision on the part of nuclear power plant operators after the initiation of an accident. The framework can easily be incorporated into an event tree/fault tree analysis. The method presented consists of a structure called the operator action tree and a time reliability correlation which assumes the time available for making a decision to be the dominating factor in situations requiring cognitive human response. This limited approach decreases the magnitude and complexity of the decision modeling task. Specifically, in the past, some human performance models have attempted prediction by trying to emulate sequences of human actions, or by identifying and modeling the information processing approach applicable to the task. The model developed here is directed at describing the statistical performance of a representative group of hypothetical individuals responding to generalized situations.

  19. The decision - identification tree: A new EIS scoping tool

    SciTech Connect

    Eccleston, C.H.

    1997-04-02

    No single methodology has been developed or universally accepted for determining the scope of an Environmental Impact Statement (EIS). Most typically, the scope is determined by first identifying actions and facilities to be analyzed. Yet, agencies sometimes complete an EIS, only to discover that the scope does not adequately address decisions that need to be made. Such discrepancies can often be traced to disconnects between the scoping process and the actual decision making that follows. A new tool, for use in a value engineering setting, provides an effective methodology for improving the EIS scoping process. Application of this tool is not limited to National Environmental Policy Act (NEPA) scoping efforts. This tool, could in fact, be used to map potential decision points for a range of diverse planning applications and exercises.

  20. Decision tree for the binding of dipeptides to the thermally fluctuating surface of cathepsin K

    NASA Astrophysics Data System (ADS)

    Nishiyama, Katsuhiko

    2016-03-01

    The behavior of 15 dipeptides on thermally fluctuating cathepsin K was investigated by molecular dynamics and docking simulations. Four dipeptides were distributed on sites near the active center, and the variations were small. Eleven dipeptides were distributed on sites far from the active center, and the variations were large for nine dipeptides and very large for the other two. The decision tree was constructed using genetic programming, and it accurately classified the 15 dipeptides. The decision tree would accurately estimate the behavior of various peptides, and should significantly contribute to the design of useful peptides.

  1. Ramping ensemble activity in dorsal anterior cingulate neurons during persistent commitment to a decision.

    PubMed

    Blanchard, Tommy C; Strait, Caleb E; Hayden, Benjamin Y

    2015-10-01

    We frequently need to commit to a choice to achieve our goals; however, the neural processes that keep us motivated in pursuit of delayed goals remain obscure. We examined ensemble responses of neurons in macaque dorsal anterior cingulate cortex (dACC), an area previously implicated in self-control and persistence, in a task that requires commitment to a choice to obtain a reward. After reward receipt, dACC neurons signaled reward amount with characteristic ensemble firing rate patterns; during the delay in anticipation of the reward, ensemble activity smoothly and gradually came to resemble the postreward pattern. On the subset of risky trials, in which a reward was anticipated with 50% certainty, ramping ensemble activity evolved to the pattern associated with the anticipated reward (and not with the anticipated loss) and then, on loss trials, took on an inverted form anticorrelated with the form associated with a win. These findings enrich our knowledge of reward processing in dACC and may have broader implications for our understanding of persistence and self-control. PMID:26334016

  2. Decision Support on the Sediments Flushing of Aimorés Dam Using Medium-Range Ensemble Forecasts

    NASA Astrophysics Data System (ADS)

    Mainardi Fan, Fernando; Schwanenberg, Dirk; Collischonn, Walter; Assis dos Reis, Alberto; Alvarado Montero, Rodolfo; Alencar Siqueira, Vinicius

    2015-04-01

    In the present study we investigate the use of medium-range streamflow forecasts in the Doce River basin (Brazil), at the reservoir of Aimorés Hydro Power Plant (HPP). During daily operations this reservoir acts as a "trap" to the sediments that originate from the upstream basin of the Doce River. This motivates a cleaning process called "pass through" to periodically remove the sediments from the reservoir. The "pass through" or "sediments flushing" process consists of a decrease of the reservoir's water level to a certain flushing level when a determined reservoir inflow threshold is forecasted. Then, the water in the approaching inflow is used to flush the sediments from the reservoir through the spillway and to recover the original reservoir storage. To be triggered, the sediments flushing operation requires an inflow larger than 3000m³/s in a forecast horizon of 7 days. This lead-time of 7 days is far beyond the basin's concentration time (around 2 days), meaning that the forecasts for the pass through procedure highly depends on Numerical Weather Predictions (NWP) models that generate Quantitative Precipitation Forecasts (QPF). This dependency creates an environment with a high amount of uncertainty to the operator. To support the decision making at Aimorés HPP we developed a fully operational hydrological forecasting system to the basin. The system is capable of generating ensemble streamflow forecasts scenarios when driven by QPF data from meteorological Ensemble Prediction Systems (EPS). This approach allows accounting for uncertainties in the NWP at a decision making level. This system is starting to be used operationally by CEMIG and is the one shown in the present study, including a hindcasting analysis to assess the performance of the system for the specific flushing problem. The QPF data used in the hindcasting study was derived from the TIGGE (THORPEX Interactive Grand Global Ensemble) database. Among all EPS available on TIGGE, three were

  3. Fuzzy decision trees for planning and autonomous control of a coordinated team of UAVs

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Nguyen, ThanhVu H.

    2007-04-01

    A fuzzy logic resource manager that enables a collection of unmanned aerial vehicles (UAVs) to automatically cooperate to make meteorological measurements will be discussed. Once in flight no human intervention is required. Planning and real-time control algorithms determine the optimal trajectory and points each UAV will sample, while taking into account the UAVs' risk, risk tolerance, reliability, mission priority, fuel limitations, mission cost, and related uncertainties. The control algorithm permits newly obtained information about weather and other events to be introduced to allow the UAVs to be more effective. The approach is illustrated by a discussion of the fuzzy decision tree for UAV path assignment and related simulation. The different fuzzy membership functions on the tree are described in mathematical detail. The different methods by which this tree is obtained are summarized including a method based on using a genetic program as a data mining function. A second fuzzy decision tree that allows the UAVs to automatically collaborate without human intervention is discussed. This tree permits three different types of collaborative behavior between the UAVs. Simulations illustrating how the tree allows the different types of collaboration to be automated are provided. Simulations also show the ability of the control algorithm to allow UAVs to effectively cooperate to increase the UAV team's likelihood of success.

  4. Test Reviews: Euler, B. L. (2007). "Emotional Disturbance Decision Tree". Lutz, FL: Psychological Assessment Resources

    ERIC Educational Resources Information Center

    Tansy, Michael

    2009-01-01

    The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…

  5. What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    2004-01-01

    To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…

  6. Classification and concentration estimation of explosive precursors using nanowires sensor array and decision tree learning

    NASA Astrophysics Data System (ADS)

    Cho, Junghwan; Li, Xiaopeng; Gu, Zhiyong; Kurup, Pradeep

    2011-09-01

    This paper aims to classify and estimate concentrations of explosive precursors using a nanowire sensor array and decision tree learning algorithm. The nanowire sensor array consists of tin oxide sensors with four different additives, platinum (Pt), copper (Cu), indium (In), and nickel (Ni). The nanowire sensor array was tested using the vapors from four explosives precursors, acetone, nitrobenzene, nitrotoluene, and octane with 10 different concentration levels each. A pattern recognition technique based on decision tree learning was applied to classify the explosive precursors and estimate their concentration. Classification and regression tree (CART) analysis was used for classification. The CART was also utilized for the purpose of structure identification in Sugeno fuzzy inference system (FIS) for estimating the concentration of the precursors. Two CARTs were trained and their testing results were investigated.

  7. Data-Mining-Based Coronary Heart Disease Risk Prediction Model Using Fuzzy Logic and Decision Tree

    PubMed Central

    Kim, Jaekwon; Lee, Jongsik

    2015-01-01

    Objectives The importance of the prediction of coronary heart disease (CHD) has been recognized in Korea; however, few studies have been conducted in this area. Therefore, it is necessary to develop a method for the prediction and classification of CHD in Koreans. Methods A model for CHD prediction must be designed according to rule-based guidelines. In this study, a fuzzy logic and decision tree (classification and regression tree [CART])-driven CHD prediction model was developed for Koreans. Datasets derived from the Korean National Health and Nutrition Examination Survey VI (KNHANES-VI) were utilized to generate the proposed model. Results The rules were generated using a decision tree technique, and fuzzy logic was applied to overcome problems associated with uncertainty in CHD prediction. Conclusions The accuracy and receiver operating characteristic (ROC) curve values of the propose systems were 69.51% and 0.594, proving that the proposed methods were more efficient than other models. PMID:26279953

  8. An expert-guided decision tree construction strategy: an application in knowledge discovery with medical databases.

    PubMed Central

    Tsai, Y. S.; King, P. H.; Higgins, M. S.; Pierce, D.; Patel, N. P.

    1997-01-01

    With the steady growth in electronic patient records and clinical medical informatics systems, the data collected for routine clinical use have been accumulating at a dramatic rate. Inter-disciplinary research provides a new generation of computation tools in knowledge discovery and data management is in great demand. In this study, an expert-guided decision tree construction strategy is proposed to offer an user-oriented knowledge discovery environment. The strategy allows experts, based on their expertise and/or preference, to override inductive decision tree construction process. Moreover, by reviewing decision paths, experts could focus on subsets of data that may be clues to new findings, or simply contaminated cases. PMID:9357618

  9. Ensembl 2007.

    PubMed

    Hubbard, T J P; Aken, B L; Beal, K; Ballester, B; Caccamo, M; Chen, Y; Clarke, L; Coates, G; Cunningham, F; Cutts, T; Down, T; Dyer, S C; Fitzgerald, S; Fernandez-Banet, J; Graf, S; Haider, S; Hammond, M; Herrero, J; Holland, R; Howe, K; Howe, K; Johnson, N; Kahari, A; Keefe, D; Kokocinski, F; Kulesha, E; Lawson, D; Longden, I; Melsopp, C; Megy, K; Meidl, P; Ouverdin, B; Parker, A; Prlic, A; Rice, S; Rios, D; Schuster, M; Sealy, I; Severin, J; Slater, G; Smedley, D; Spudich, G; Trevanion, S; Vilella, A; Vogel, J; White, S; Wood, M; Cox, T; Curwen, V; Durbin, R; Fernandez-Suarez, X M; Flicek, P; Kasprzyk, A; Proctor, G; Searle, S; Smith, J; Ureta-Vidal, A; Birney, E

    2007-01-01

    The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of chordate genome sequences. Over the past year the number of genomes available from Ensembl has increased from 15 to 33, with the addition of sites for the mammalian genomes of elephant, rabbit, armadillo, tenrec, platypus, pig, cat, bush baby, common shrew, microbat and european hedgehog; the fish genomes of stickleback and medaka and the second example of the genomes of the sea squirt (Ciona savignyi) and the mosquito (Aedes aegypti). Some of the major features added during the year include the first complete gene sets for genomes with low-sequence coverage, the introduction of new strain variation data and the introduction of new orthology/paralog annotations based on gene trees. PMID:17148474

  10. Minimizing the cost of translocation failure with decision-tree models that predict species' behavioral response in translocation sites.

    PubMed

    Ebrahimi, Mehregan; Ebrahimie, Esmaeil; Bull, C Michael

    2015-08-01

    The high number of failures is one reason why translocation is often not recommended. Considering how behavior changes during translocations may improve translocation success. To derive decision-tree models for species' translocation, we used data on the short-term responses of an endangered Australian skink in 5 simulated translocations with different release conditions. We used 4 different decision-tree algorithms (decision tree, decision-tree parallel, decision stump, and random forest) with 4 different criteria (gain ratio, information gain, gini index, and accuracy) to investigate how environmental and behavioral parameters may affect the success of a translocation. We assumed behavioral changes that increased dispersal away from a release site would reduce translocation success. The trees became more complex when we included all behavioral parameters as attributes, but these trees yielded more detailed information about why and how dispersal occurred. According to these complex trees, there were positive associations between some behavioral parameters, such as fight and dispersal, that showed there was a higher chance, for example, of dispersal among lizards that fought than among those that did not fight. Decision trees based on parameters related to release conditions were easier to understand and could be used by managers to make translocation decisions under different circumstances. PMID:25737134

  11. Application of decision tree algorithm for identification of rock forming minerals using energy dispersive spectrometry

    NASA Astrophysics Data System (ADS)

    Akkaş, Efe; Çubukçu, H. Evren; Artuner, Harun

    2014-05-01

    Rapid and automated mineral identification is compulsory in certain applications concerning natural rocks. Among all microscopic and spectrometric methods, energy dispersive X-ray spectrometers (EDS) integrated with scanning electron microscopes produce rapid information with reliable chemical data. Although obtaining elemental data with EDS analyses is fast and easy by the help of improving technology, it is rather challenging to perform accurate and rapid identification considering the large quantity of minerals in a rock sample with varying dimensions ranging between nanometer to centimeter. Furthermore, the physical properties of the specimen (roughness, thickness, electrical conductivity, position in the instrument etc.) and the incident electron beam (accelerating voltage, beam current, spot size etc.) control the produced characteristic X-ray, which in turn affect the elemental analyses. In order to minimize the effects of these physical constraints and develop an automated mineral identification system, a rule induction paradigm has been applied to energy dispersive spectral data. Decision tree classifiers divide training data sets into subclasses using generated rules or decisions and thereby it produces classification or recognition associated with these data sets. A number of thinsections prepared from rock samples with suitable mineralogy have been investigated and a preliminary 12 distinct mineral groups (olivine, orthopyroxene, clinopyroxene, apatite, amphibole, plagioclase, K- feldspar, zircon, magnetite, titanomagnetite, biotite, quartz), comprised mostly of silicates and oxides, have been selected. Energy dispersive spectral data for each group, consisting of 240 reference and 200 test analyses, have been acquired under various, non-standard, physical and electrical conditions. The reference X-Ray data have been used to assign the spectral distribution of elements to the specified mineral groups. Consequently, the test data have been analyzed using

  12. Tools of the Future: How Decision Tree Analysis Will Impact Mission Planning

    NASA Technical Reports Server (NTRS)

    Otterstatter, Matthew R.

    2005-01-01

    The universe is infinitely complex; however, the human mind has a finite capacity. The multitude of possible variables, metrics, and procedures in mission planning are far too many to address exhaustively. This is unfortunate because, in general, considering more possibilities leads to more accurate and more powerful results. To compensate, we can get more insightful results by employing our greatest tool, the computer. The power of the computer will be utilized through a technology that considers every possibility, decision tree analysis. Although decision trees have been used in many other fields, this is innovative for space mission planning. Because this is a new strategy, no existing software is able to completely accommodate all of the requirements. This was determined through extensive research and testing of current technologies. It was necessary to create original software, for which a short-term model was finished this summer. The model was built into Microsoft Excel to take advantage of the familiar graphical interface for user input, computation, and viewing output. Macros were written to automate the process of tree construction, optimization, and presentation. The results are useful and promising. If this tool is successfully implemented in mission planning, our reliance on old-fashioned heuristics, an error-prone shortcut for handling complexity, will be reduced. The computer algorithms involved in decision trees will revolutionize mission planning. The planning will be faster and smarter, leading to optimized missions with the potential for more valuable data.

  13. Image Change Detection via Ensemble Learning

    SciTech Connect

    Martin, Benjamin W; Vatsavai, Raju

    2013-01-01

    The concept of geographic change detection is relevant in many areas. Changes in geography can reveal much information about a particular location. For example, analysis of changes in geography can identify regions of population growth, change in land use, and potential environmental disturbance. A common way to perform change detection is to use a simple method such as differencing to detect regions of change. Though these techniques are simple, often the application of these techniques is very limited. Recently, use of machine learning methods such as neural networks for change detection has been explored with great success. In this work, we explore the use of ensemble learning methodologies for detecting changes in bitemporal synthetic aperture radar (SAR) images. Ensemble learning uses a collection of weak machine learning classifiers to create a stronger classifier which has higher accuracy than the individual classifiers in the ensemble. The strength of the ensemble lies in the fact that the individual classifiers in the ensemble create a mixture of experts in which the final classification made by the ensemble classifier is calculated from the outputs of the individual classifiers. Our methodology leverages this aspect of ensemble learning by training collections of weak decision tree based classifiers to identify regions of change in SAR images collected of a region in the Staten Island, New York area during Hurricane Sandy. Preliminary studies show that the ensemble method has approximately 11.5% higher change detection accuracy than an individual classifier.

  14. Confident Surgical Decision Making in Temporal Lobe Epilepsy by Heterogeneous Classifier Ensembles

    PubMed Central

    Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Jafari-Khouzani, Kourosh; Elisevich, Kost; Fotouhi, Farshad

    2015-01-01

    In medical domains with low tolerance for invalid predictions, classification confidence is highly important and traditional performance measures such as overall accuracy cannot provide adequate insight into classifications reliability. In this paper, a confident-prediction rate (CPR) which measures the upper limit of confident predictions has been proposed based on receiver operating characteristic (ROC) curves. It has been shown that heterogeneous ensemble of classifiers improves this measure. This ensemble approach has been applied to lateralization of focal epileptogenicity in temporal lobe epilepsy (TLE) and prediction of surgical outcomes. A goal of this study is to reduce extraoperative electrocorticography (eECoG) requirement which is the practice of using electrodes placed directly on the exposed surface of the brain. We have shown that such goal is achievable with application of data mining techniques. Furthermore, all TLE surgical operations do not result in complete relief from seizures and it is not always possible for human experts to identify such unsuccessful cases prior to surgery. This study demonstrates the capability of data mining techniques in prediction of undesirable outcome for a portion of such cases. PMID:26609547

  15. Validating a decision tree for serious infection: diagnostic accuracy in acutely ill children in ambulatory care

    PubMed Central

    Verbakel, Jan Y; Lemiengre, Marieke B; De Burghgraeve, Tine; De Sutter, An; Aertgeerts, Bert; Bullens, Dominique M A; Shinkins, Bethany; Van den Bruel, Ann; Buntinx, Frank

    2015-01-01

    Objective Acute infection is the most common presentation of children in primary care with only few having a serious infection (eg, sepsis, meningitis, pneumonia). To avoid complications or death, early recognition and adequate referral are essential. Clinical prediction rules have the potential to improve diagnostic decision-making for rare but serious conditions. In this study, we aimed to validate a recently developed decision tree in a new but similar population. Design Diagnostic accuracy study validating a clinical prediction rule. Setting and participants Acutely ill children presenting to ambulatory care in Flanders, Belgium, consisting of general practice and paediatric assessment in outpatient clinics or the emergency department. Intervention Physicians were asked to score the decision tree in every child. Primary outcome measures The outcome of interest was hospital admission for at least 24 h with a serious infection within 5 days after initial presentation. We report the diagnostic accuracy of the decision tree in sensitivity, specificity, likelihood ratios and predictive values. Results In total, 8962 acute illness episodes were included, of which 283 lead to admission to hospital with a serious infection. Sensitivity of the decision tree was 100% (95% CI 71.5% to 100%) at a specificity of 83.6% (95% CI 82.3% to 84.9%) in the general practitioner setting with 17% of children testing positive. In the paediatric outpatient and emergency department setting, sensitivities were below 92%, with specificities below 44.8%. Conclusions In an independent validation cohort, this clinical prediction rule has shown to be extremely sensitive to identify children at risk of hospital admission for a serious infection in general practice, making it suitable for ruling out. Trial registration number NCT02024282. PMID:26254472

  16. Using Boosted Decision Trees to Separate Signal and Background in B to XsGamma Decays

    SciTech Connect

    Barber, James; /Massachusetts U., Amherst /SLAC

    2006-09-27

    The measurement of the branching fraction of the flavor changing neutral current B {yields} X{sub s}{gamma} transition can be used to expose physics outside the Standard Model. In order to make a precise measurement of this inclusive branching fraction, it is necessary to be able to effectively separate signal and background in the data. In order to achieve better separation, an algorithm based on Boosted Decision Trees (BDTs) is implemented. Using Monte Carlo simulated events, ''forests'' of trees were trained and tested with different sets of parameters. This parameter space was studied with the goal of maximizing the figure of merit, Q, the measure of separation quality used in this analysis. It is found that the use of 1000 trees, with 100 values tested for each variable at each node, and 50 events required for a node to continue separating give the highest figure of merit, Q = 18.37.

  17. Decision tree approach for classification of remotely sensed satellite data using open source support

    NASA Astrophysics Data System (ADS)

    Sharma, Richa; Ghosh, Aniruddha; Joshi, P. K.

    2013-10-01

    In this study, an attempt has been made to develop a decision tree classification (DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source data mining software. The classified image is compared with the image classified using classical ISODATA clustering and Maximum Likelihood Classifier (MLC) algorithms. Classification result based on DTC method provided better visual depiction than results produced by ISODATA clustering or by MLC algorithms. The overall accuracy was found to be 90% (kappa = 0.88) using the DTC, 76.67% (kappa = 0.72) using the Maximum Likelihood and 57.5% (kappa = 0.49) using ISODATA clustering method. Based on the overall accuracy and kappa statistics, DTC was found to be more preferred classification approach than others.

  18. Circum-Arctic petroleum systems identified using decision-tree chemometrics

    USGS Publications Warehouse

    Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.

    2007-01-01

    Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.

  19. Three-dimensional object recognition using similar triangles and decision trees

    NASA Technical Reports Server (NTRS)

    Spirkovska, Lilly

    1993-01-01

    A system, TRIDEC, that is capable of distinguishing between a set of objects despite changes in the objects' positions in the input field, their size, or their rotational orientation in 3D space is described. TRIDEC combines very simple yet effective features with the classification capabilities of inductive decision tree methods. The feature vector is a list of all similar triangles defined by connecting all combinations of three pixels in a coarse coded 127 x 127 pixel input field. The classification is accomplished by building a decision tree using the information provided from a limited number of translated, scaled, and rotated samples. Simulation results are presented which show that TRIDEC achieves 94 percent recognition accuracy in the 2D invariant object recognition domain and 98 percent recognition accuracy in the 3D invariant object recognition domain after training on only a small sample of transformed views of the objects.

  20. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach.

    PubMed

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  1. Using Decision Trees for Estimating Mode Choice of Trips in Buca-Izmir

    NASA Astrophysics Data System (ADS)

    Oral, L. O.; Tecim, V.

    2013-05-01

    Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  2. Office of Legacy Management Decision Tree for Solar Photovoltaic Projects - 13317

    SciTech Connect

    Elmer, John; Butherus, Michael; Barr, Deborah L.

    2013-07-01

    To support consideration of renewable energy power development as a land reuse option, the DOE Office of Legacy Management (LM) and the National Renewable Energy Laboratory (NREL) established a partnership to conduct an assessment of wind and solar renewable energy resources on LM lands. From a solar capacity perspective, the larger sites in the western United States present opportunities for constructing solar photovoltaic (PV) projects. A detailed analysis and preliminary plan was developed for three large sites in New Mexico, assessing the costs, the conceptual layout of a PV system, and the electric utility interconnection process. As a result of the study, a 1,214-hectare (3,000-acre) site near Grants, New Mexico, was chosen for further study. The state incentives, utility connection process, and transmission line capacity were key factors in assessing the feasibility of the project. LM's Durango, Colorado, Disposal Site was also chosen for consideration because the uranium mill tailings disposal cell is on a hillside facing south, transmission lines cross the property, and the community was very supportive of the project. LM worked with the regulators to demonstrate that the disposal cell's long-term performance would not be impacted by the installation of a PV solar system. A number of LM-unique issues were resolved in making the site available for a private party to lease a portion of the site for a solar PV project. A lease was awarded in September 2012. Using a solar decision tree that was developed and launched by the EPA and NREL, LM has modified and expanded the decision tree structure to address the unique aspects and challenges faced by LM on its multiple sites. The LM solar decision tree covers factors such as land ownership, usable acreage, financial viability of the project, stakeholder involvement, and transmission line capacity. As additional sites are transferred to LM in the future, the decision tree will assist in determining whether a solar

  3. MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination

    NASA Technical Reports Server (NTRS)

    Riggs, George A.; Hall, Dorothy K.

    2010-01-01

    Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.

  4. Data mining for multiagent rules, strategies, and fuzzy decision tree structure

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Rhyne, Robert D., II; Fisher, Kristin

    2002-03-01

    A fuzzy logic based resource manager (RM) has been developed that automatically allocates electronic attack resources in real-time over many dissimilar platforms. Two different data mining algorithms have been developed to determine rules, strategies, and fuzzy decision tree structure. The first data mining algorithm uses a genetic algorithm as a data mining function and is called from an electronic game. The game allows a human expert to play against the resource manager in a simulated battlespace with each of the defending platforms being exclusively directed by the fuzzy resource manager and the attacking platforms being controlled by the human expert or operating autonomously under their own logic. This approach automates the data mining problem. The game automatically creates a database reflecting the domain expert's knowledge. It calls a data mining function, a genetic algorithm, for data mining of the database as required and allows easy evaluation of the information mined in the second step. The criterion for re- optimization is discussed as well as experimental results. Then a second data mining algorithm that uses a genetic program as a data mining function is introduced to automatically discover fuzzy decision tree structures. Finally, a fuzzy decision tree generated through this process is discussed.

  5. Flood-type classification in mountainous catchments using crisp and fuzzy decision trees

    NASA Astrophysics Data System (ADS)

    Sikorska, Anna E.; Viviroli, Daniel; Seibert, Jan

    2015-10-01

    Floods are governed by largely varying processes and thus exhibit various behaviors. Classification of flood events into flood types and the determination of their respective frequency is therefore important for a better understanding and prediction of floods. This study presents a flood classification for identifying flood patterns at a catchment scale by means of a fuzzy decision tree. Hence, events are represented as a spectrum of six main possible flood types that are attributed with their degree of acceptance. Considered types are flash, short rainfall, long rainfall, snow-melt, rainfall on snow and, in high alpine catchments, glacier-melt floods. The fuzzy decision tree also makes it possible to acknowledge the uncertainty present in the identification of flood processes and thus allows for more reliable flood class estimates than using a crisp decision tree, which identifies one flood type per event. Based on the data set in nine Swiss mountainous catchments, it was demonstrated that this approach is less sensitive to uncertainties in the classification attributes than the classical crisp approach. These results show that the fuzzy approach bears additional potential for analyses of flood patterns at a catchment scale and thereby it provides more realistic representation of flood processes.

  6. Building Decision Trees for Characteristic Ellipsoid Method to Monitor Power System Transient Behaviors

    SciTech Connect

    Ma, Jian; Diao, Ruisheng; Makarov, Yuri V.; Etingov, Pavel V.; Zhou, Ning; Dagle, Jeffery E.

    2010-12-01

    The characteristic ellipsoid is a new method to monitor the dynamics of power systems. Decision trees (DTs) play an important role in applying the characteristic ellipsoid method to system operation and analysis. This paper presents the idea and initial results of building DTs for detecting transient dynamic events using the characteristic ellipsoid method. The objective is to determine fault types, fault locations and clearance time in the system using decision trees based on ellipsoids of system transient responses. The New England 10-machine 39-bus system is used for running dynamic simulations to generate a sufficiently large number of transient events in different system configurations. Comprehensive transient simulations considering three fault types, two fault clearance times and different fault locations were conducted in the study. Bus voltage magnitudes and monitored reactive and active power flows are recorded as the phasor measurements to calculate characteristic ellipsoids whose volume, eccentricity, center and projection of the longest axis are used as indices to build decision trees. The DT performances are tested and compared by considering different sets of PMU locations. The proposed method demonstrates that the characteristic ellipsoid method is a very efficient and promising tool to monitor power system dynamic behaviors.

  7. Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm

    PubMed Central

    2014-01-01

    Background In the application of microarray data, how to select a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers is an important issue. Many researchers use various computational intelligence methods to analyzed gene expression data. Results To achieve efficient gene selection from thousands of candidate genes that can contribute in identifying cancers, this study aims at developing a novel method utilizing particle swarm optimization combined with a decision tree as the classifier. This study also compares the performance of our proposed method with other well-known benchmark classification methods (support vector machine, self-organizing map, back propagation neural network, C4.5 decision tree, Naive Bayes, CART decision tree, and artificial immune recognition system) and conducts experiments on 11 gene expression cancer datasets. Conclusion Based on statistical analysis, our proposed method outperforms other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets. Further, the housekeeping genes with various expression patterns and tissue-specific genes are identified. These genes provide a high discrimination power on cancer classification. PMID:24555567

  8. A Study of Factors that Influence First-Year Nonmusic Majors' Decisions to Participate in Music Ensembles at Small Liberal Arts Colleges in Indiana

    ERIC Educational Resources Information Center

    Faber, Ardis R.

    2010-01-01

    The purpose of this study was to investigate factors that influence first-year nonmusic majors' decisions regarding participation in music ensembles at small liberal arts colleges in Indiana. A survey questionnaire was used to gather data. The data collected was analyzed to determine significant differences between the nonmusic majors who have…

  9. A decision tree for selecting the most cost-effective waste disposal strategy in foodservice operations.

    PubMed

    Wie, Seunghee; Shanklin, Carol W; Lee, Kyung-Eun

    2003-04-01

    The purposes of this study were to determine costs of disposal strategies for wastes generated in foodservice operations and to develop a decision tree to determine the most cost-effective disposal strategy for foodservice operations. Four cases, including the central food processing center (CFPC) in a school district, a continuing-care retirement center (CCRC), a university dining center (UDC), and a commercial chain restaurant (CCR), were studied to determine the most cost-effective disposal strategy. Annual costs for the current and projected strategies were determined for each case. Results of waste characterization studies and stopwatch studies, interviews with foodservice directors, and water flow and electrical requirements from manufacturers' specifications were used to determine cost incurred. The annual percentage increases for labor, fees, and services were used to reflect an inflated economic condition for the ensuing 10 years of the study period. The Net Present Worth method was used to compare costs of strategies, and the multiparameter sensitivity analysis was conducted to examine the tolerance of the chosen strategy. The most cost-effective strategy differed among foodservice operations because of the composition of food and packaging wastes, the quantity of recyclable materials, the waste-hauling charges, labor costs, start-up costs, and inflation rate. For example, the use of a garbage disposal for food waste and landfills and recycling for packaging waste were the most cost-effective strategies for the CCRC. A decision tree was developed to illustrate the decision-making process that occurs when conducting cost analysis and subsequent decisions. Dietetics practitioners can use the decision tree when evaluating the results of the cost analysis. PMID:12669011

  10. Bonsai Trees in Your Head: How the Pavlovian System Sculpts Goal-Directed Choices by Pruning Decision Trees

    PubMed Central

    O'Nions, Elizabeth; Sheridan, Luke; Dayan, Peter; Roiser, Jonathan P.

    2012-01-01

    When planning a series of actions, it is usually infeasible to consider all potential future sequences; instead, one must prune the decision tree. Provably optimal pruning is, however, still computationally ruinous and the specific approximations humans employ remain unknown. We designed a new sequential reinforcement-based task and showed that human subjects adopted a simple pruning strategy: during mental evaluation of a sequence of choices, they curtailed any further evaluation of a sequence as soon as they encountered a large loss. This pruning strategy was Pavlovian: it was reflexively evoked by large losses and persisted even when overwhelmingly counterproductive. It was also evident above and beyond loss aversion. We found that the tendency towards Pavlovian pruning was selectively predicted by the degree to which subjects exhibited sub-clinical mood disturbance, in accordance with theories that ascribe Pavlovian behavioural inhibition, via serotonin, a role in mood disorders. We conclude that Pavlovian behavioural inhibition shapes highly flexible, goal-directed choices in a manner that may be important for theories of decision-making in mood disorders. PMID:22412360

  11. Socioeconomic determinants of menarche in rural Polish girls using the decision trees method.

    PubMed

    Matusik, Stanisław; Laska-Mierzejewska, Teresa; Chrzanowska, Maria

    2011-05-01

    The aim of this study was to assess the usefulness of the decision trees method as a research method of multidimensional associations between menarche and socioeconomic variables. The article is based on data collected from the rural area of Choszczno in the West Pomerania district of Poland between 1987 and 2001. Girls were asked about the appearance of first menstruation (a yes/no method). The average menarchal age was estimated by the probit analysis method, using second grade polynomials. The socioeconomic status of the girls' families was determined using five qualitative variables: fathers' and mothers' educational level, source of income, household appliances and the number of children in a family. For classification based on five socioeconomic variables, one of the most effective algorithms CART (Classification and Regression Trees) was used. In 2001 the menarchal age in 66% of examined girls was properly classified, while a higher efficiency of 70% was obtained for girls examined in 1987. The decision trees method enabled the definition of the hierarchy of socioeconomic variables influencing girls' biological development level. The strongest discriminatory power was attributed to the number of children in a family, and the mother's and then father's educational level. Using this method it is possible to detect differences in strength of socioeconomic variables associated with girls' pubescence before 1987 and after 2001 during the transformation of the economic and political systems in Poland. However, the decision trees method is infrequently applied in social sciences and constitutes a novelty; this article proves its usefulness in examining relations between biological processes and a population's living conditions. PMID:21211091

  12. Using decision trees to manage hospital readmission risk for acute myocardial infarction, heart failure, and pneumonia.

    PubMed

    Hilbert, John P; Zasadil, Scott; Keyser, Donna J; Peele, Pamela B

    2014-12-01

    To improve healthcare quality and reduce costs, the Affordable Care Act places hospitals at financial risk for excessive readmissions associated with acute myocardial infarction (AMI), heart failure (HF), and pneumonia (PN). Although predictive analytics is increasingly looked to as a means for measuring, comparing, and managing this risk, many modeling tools require data inputs that are not readily available and/or additional resources to yield actionable information. This article demonstrates how hospitals and clinicians can use their own structured discharge data to create decision trees that produce highly transparent, clinically relevant decision rules for better managing readmission risk associated with AMI, HF, and PN. For illustrative purposes, basic decision trees are trained and tested using publically available data from the California State Inpatient Databases and an open-source statistical package. As expected, these simple models perform less well than other more sophisticated tools, with areas under the receiver operating characteristic (ROC) curve (or AUC) of 0.612, 0.583, and 0.650, respectively, but achieve a lift of at least 1.5 or greater for higher-risk patients with any of the three conditions. More importantly, they are shown to offer substantial advantages in terms of transparency and interpretability, comprehensiveness, and adaptability. By enabling hospitals and clinicians to identify important factors associated with readmissions, target subgroups of patients at both high and low risk, and design and implement interventions that are appropriate to the risk levels observed, decision trees serve as an ideal application for addressing the challenge of reducing hospital readmissions. PMID:25160603

  13. Cloud Detection from Satellite Imagery: A Comparison of Expert-Generated and Automatically-Generated Decision Trees

    NASA Technical Reports Server (NTRS)

    Shiffman, Smadar

    2004-01-01

    Automated cloud detection and tracking is an important step in assessing global climate change via remote sensing. Cloud masks, which indicate whether individual pixels depict clouds, are included in many of the data products that are based on data acquired on- board earth satellites. Many cloud-mask algorithms have the form of decision trees, which employ sequential tests that scientists designed based on empirical astrophysics studies and astrophysics simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In this study we explored the potential benefits of automatically-learned decision trees for detecting clouds from images acquired using the Advanced Very High Resolution Radiometer (AVHRR) instrument on board the NOAA-14 weather satellite of the National Oceanic and Atmospheric Administration. We constructed three decision trees for a sample of 8km-daily AVHRR data from 2000 using a decision-tree learning procedure provided within MATLAB(R), and compared the accuracy of the decision trees to the accuracy of the cloud mask. We used ground observations collected by the National Aeronautics and Space Administration Clouds and the Earth s Radiant Energy Systems S COOL project as the gold standard. For the sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks included in the AVHRR data product.

  14. Decision Tree based Prediction and Rule Induction for Groundwater Trichloroethene (TCE) Pollution Vulnerability

    NASA Astrophysics Data System (ADS)

    Park, J.; Yoo, K.

    2013-12-01

    For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.

  15. Merging Multi-model CMIP5/PMIP3 Past-1000 Ensemble Simulations with Tree Ring Proxy Data by Optimal Interpolation Approach

    NASA Astrophysics Data System (ADS)

    Chen, Xin; Luo, Yong; Xing, Pei; Nie, Suping; Tian, Qinhua

    2015-04-01

    Two sets of gridded annual mean surface air temperature in past millennia over the Northern Hemisphere was constructed employing optimal interpolation (OI) method so as to merge the tree ring proxy records with the simulations from CMIP5 (the fifth phase of the Climate Model Intercomparison Project). Both the uncertainties in proxy reconstruction and model simulations can be taken into account applying OI algorithm. For better preservation of physical coordinated features and spatial-temporal completeness of climate variability in 7 copies of model results, we perform the Empirical Orthogonal Functions (EOF) analysis to truncate the ensemble mean field as the first guess (background field) for OI. 681 temperature sensitive tree-ring chronologies are collected and screened from International Tree Ring Data Bank (ITRDB) and Past Global Changes (PAGES-2k) project. Firstly, two methods (variance matching and linear regression) are employed to calibrate the tree ring chronologies with instrumental data (CRUTEM4v) individually. In addition, we also remove the bias of both the background field and proxy records relative to instrumental dataset. Secondly, time-varying background error covariance matrix (B) and static "observation" error covariance matrix (R) are calculated for OI frame. In our scheme, matrix B was calculated locally, and "observation" error covariance are partially considered in R matrix (the covariance value between the pairs of tree ring sites that are very close to each other would be counted), which is different from the traditional assumption that R matrix should be diagonal. Comparing our results, it turns out that regional averaged series are not sensitive to the selection for calibration methods. The Quantile-Quantile plots indicate regional climatologies based on both methods are tend to be more agreeable with regional reconstruction of PAGES-2k in 20th century warming period than in little ice age (LIA). Lager volcanic cooling response over Asia

  16. Application of Decision Tree Algorithm for classification and identification of natural minerals using SEM-EDS

    NASA Astrophysics Data System (ADS)

    Akkaş, Efe; Akin, Lutfiye; Evren Çubukçu, H.; Artuner, Harun

    2015-07-01

    A mineral is a natural, homogeneous solid with a definite chemical composition and a highly ordered atomic arrangement. Recently, fast and accurate mineral identification/classification became a necessity. Energy Dispersive X-ray Spectrometers integrated with Scanning Electron Microscopes (SEM) are used to obtain rapid and reliable elemental analysis or chemical characterization of a solid. However, mineral identification is challenging since there is wide range of spectral dataset for natural minerals. The more mineralogical data acquired, time required for classification procedures increases. Moreover, applied instrumental conditions on a SEM-EDS differ for various applications, affecting the produced X-ray patterns even for the same mineral. This study aims to test whether C5.0 Decision Tree is a rapid and reliable method algorithm for classification and identification of various natural magmatic minerals. Ten distinct mineral groups (olivine, orthopyroxene, clinopyroxene, apatite, amphibole, plagioclase, K-feldspar, zircon, magnetite, biotite) from different igneous rocks have been analyzed on SEM-EDS. 4601 elemental X-ray intensity data have been collected under various instrumental conditions. 2400 elemental data have been used to train and the remaining 2201 data have been tested to identify the minerals. The vast majority of the test data have been classified accurately. Additionally, high accuracy has been reached on the minerals with similar chemical composition, such as olivine ((Mg,Fe)2[SiO4]) and orthopyroxene ((Mg,Fe)2[SiO6]). Furthermore, two members from amphibole group (magnesiohastingsite, tschermakite) and two from clinopyroxene group (diopside, hedenbergite) have been accurately identified by the Decision Tree Algorithm. These results demonstrate that C5.0 Decision Tree Algorithm is an efficient method for mineral group classification and the identification of mineral members.

  17. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

    PubMed

    Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

    2015-08-01

    Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods. PMID:26737960

  18. Decision Tree Classifier for Classification of Plant and Animal Micro RNA's

    NASA Astrophysics Data System (ADS)

    Pant, Bhasker; Pant, Kumud; Pardasani, K. R.

    Gene expression is regulated by miRNAs or micro RNAs which can be 21-23 nucleotide in length. They are non coding RNAs which control gene expression either by translation repression or mRNA degradation. Plants and animals both contain miRNAs which have been classified by wet lab techniques. These techniques are highly expensive, labour intensive and time consuming. Hence faster and economical computational approaches are needed. In view of above a machine learning model has been developed for classification of plant and animal miRNAs using decision tree classifier. The model has been tested on available data and it gives results with 91% accuracy.

  19. Improvement and analysis of ID3 algorithm in decision-making tree

    NASA Astrophysics Data System (ADS)

    Xie, Xiao-Lan; Long, Zhen; Liao, Wen-Qi

    2015-12-01

    For the cooperative system under development, it needs to use the spatial analysis and relative technology concerning data mining in order to carry out the detection of the subject conflict and redundancy, while the ID3 algorithm is an important data mining. Due to the traditional ID3 algorithm in the decision-making tree towards the log part is rather complicated, this paper obtained a new computational formula of information gain through the optimization of algorithm of the log part. During the experiment contrast and theoretical analysis, it is found that IID3 (Improved ID3 Algorithm) algorithm owns higher calculation efficiency and accuracy and thus worth popularizing.

  20. A comparison of student academic achievement using decision trees techniques: Reflection from University Malaysia Perlis

    NASA Astrophysics Data System (ADS)

    Aziz, Fatihah; Jusoh, Abd Wahab; Abu, Mohd Syafarudy

    2015-05-01

    A decision tree is one of the techniques in data mining for prediction. Using this method, hidden information from abundant of data can be taken out and interpret the information into useful knowledge. In this paper the academic performance of the student will be examined from 2002 to 2012 from two faculties; Faculty of Manufacturing Engineering and Faculty of Microelectronic Engineering in University Malaysia Perlis (UniMAP). The objectives of this study are to determine and compare the factors that affect the students' academic achievement between the two faculties. The prediction results show there are five attributes that have been considered as factors that influence the students' academic performance.

  1. Are decision trees a feasible knowledge representation to guide extraction of critical information from randomized controlled trial reports?

    PubMed Central

    Chung, Grace Y; Coiera, Enrico

    2008-01-01

    Background This paper proposes the use of decision trees as the basis for automatically extracting information from published randomized controlled trial (RCT) reports. An exploratory analysis of RCT abstracts is undertaken to investigate the feasibility of using decision trees as a semantic structure. Quality-of-paper measures are also examined. Methods A subset of 455 abstracts (randomly selected from a set of 7620 retrieved from Medline from 1998 – 2006) are examined for the quality of RCT reporting, the identifiability of RCTs from abstracts, and the completeness and complexity of RCT abstracts with respect to key decision tree elements. Abstracts were manually assigned to 6 sub-groups distinguishing whether they were primary RCTs versus other design types. For primary RCT studies, we analyzed and annotated the reporting of intervention comparison, population assignment and outcome values. To measure completeness, the frequencies by which complete intervention, population and outcome information are reported in abstracts were measured. A qualitative examination of the reporting language was conducted. Results Decision tree elements are manually identifiable in the majority of primary RCT abstracts. 73.8% of a random subset was primary studies with a single population assigned to two or more interventions. 68% of these primary RCT abstracts were structured. 63% contained pharmaceutical interventions. 84% reported the total number of study subjects. In a subset of 21 abstracts examined, 71% reported numerical outcome values. Conclusion The manual identifiability of decision tree elements in the abstract suggests that decision trees could be a suitable construct to guide machine summarisation of RCTs. The presence of decision tree elements could also act as an indicator for RCT report quality in terms of completeness and uniformity. PMID:18957129

  2. Integrating Decision Tree and Hidden Markov Model (HMM) for Subtype Prediction of Human Influenza A Virus

    NASA Astrophysics Data System (ADS)

    Attaluri, Pavan K.; Chen, Zhengxin; Weerakoon, Aruna M.; Lu, Guoqing

    Multiple criteria decision making (MCDM) has significant impact in bioinformatics. In the research reported here, we explore the integration of decision tree (DT) and Hidden Markov Model (HMM) for subtype prediction of human influenza A virus. Infection with influenza viruses continues to be an important public health problem. Viral strains of subtype H3N2 and H1N1 circulates in humans at least twice annually. The subtype detection depends mainly on the antigenic assay, which is time-consuming and not fully accurate. We have developed a Web system for accurate subtype detection of human influenza virus sequences. The preliminary experiment showed that this system is easy-to-use and powerful in identifying human influenza subtypes. Our next step is to examine the informative positions at the protein level and extend its current functionality to detect more subtypes. The web functions can be accessed at http://glee.ist.unomaha.edu/.

  3. Trees

    ERIC Educational Resources Information Center

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  4. Genetic algorithm-based neural fuzzy decision tree for mixed scheduling in ATM networks.

    PubMed

    Lin, Chin-Teng; Chung, I-Fang; Pu, Her-Chang; Lee', Tsern-Huei; Chang, Jyh-Yeong

    2002-01-01

    Future broadband integrated services networks based on asynchronous transfer mode (ATM) technology are expected to support multiple types of multimedia information with diverse statistical characteristics and quality of service (QoS) requirements. To meet these requirements, efficient scheduling methods are important for traffic control in ATM networks. Among general scheduling schemes, the rate monotonic algorithm is simple enough to be used in high-speed networks, but does not attain the high system utilization of the deadline driven algorithm. However, the deadline driven scheme is computationally complex and hard to implement in hardware. The mixed scheduling algorithm is a combination of the rate monotonic algorithm and the deadline driven algorithm; thus it can provide most of the benefits of these two algorithms. In this paper, we use the mixed scheduling algorithm to achieve high system utilization under the hardware constraint. Because there is no analytic method for schedulability testing of mixed scheduling, we propose a genetic algorithm-based neural fuzzy decision tree (GANFDT) to realize it in a real-time environment. The GANFDT combines a GA and a neural fuzzy network into a binary classification tree. This approach also exploits the power of the classification tree. Simulation results show that the GANFDT provides an efficient way of carrying out mixed scheduling in ATM networks. PMID:18244889

  5. Decision-tree analysis of factors influencing rainfall-related building structure and content damage

    NASA Astrophysics Data System (ADS)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-09-01

    Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the

  6. Decision tree analysis of factors influencing rainfall-related building damage

    NASA Astrophysics Data System (ADS)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-04-01

    Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a

  7. Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree

    PubMed Central

    Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad

    2015-01-01

    MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen–host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules. PMID:26649272

  8. Using decision-tree classifier systems to extract knowledge from databases

    NASA Technical Reports Server (NTRS)

    St.clair, D. C.; Sabharwal, C. L.; Hacke, Keith; Bond, W. E.

    1990-01-01

    One difficulty in applying artificial intelligence techniques to the solution of real world problems is that the development and maintenance of many AI systems, such as those used in diagnostics, require large amounts of human resources. At the same time, databases frequently exist which contain information about the process(es) of interest. Recently, efforts to reduce development and maintenance costs of AI systems have focused on using machine learning techniques to extract knowledge from existing databases. Research is described in the area of knowledge extraction using a class of machine learning techniques called decision-tree classifier systems. Results of this research suggest ways of performing knowledge extraction which may be applied in numerous situations. In addition, a measurement called the concept strength metric (CSM) is described which can be used to determine how well the resulting decision tree can differentiate between the concepts it has learned. The CSM can be used to determine whether or not additional knowledge needs to be extracted from the database. An experiment involving real world data is presented to illustrate the concepts described.

  9. Snow event classification with a 2D video disdrometer - A decision tree approach

    NASA Astrophysics Data System (ADS)

    Bernauer, F.; Hürkamp, K.; Rühm, W.; Tschiersch, J.

    2016-05-01

    Snowfall classification according to crystal type or degree of riming of the snowflakes is import for many atmospheric processes, e.g. wet deposition of aerosol particles. 2D video disdrometers (2DVD) have recently proved their capability to measure microphysical parameters of snowfall. The present work has the aim of classifying snowfall according to microphysical properties of single hydrometeors (e.g. shape and fall velocity) measured by means of a 2DVD. The constraints for the shape and velocity parameters which are used in a decision tree for classification of the 2DVD measurements, are derived from detailed on-site observations, combining automatic 2DVD classification with visual inspection. The developed decision tree algorithm subdivides the detected events into three classes of dominating crystal type (single crystals, complex crystals and pellets) and three classes of dominating degree of riming (weak, moderate and strong). The classification results for the crystal type were validated with an independent data set proving the unambiguousness of the classification. In addition, for three long-term events, good agreement of the classification results with independently measured maximum dimension of snowflakes, snowflake bulk density and surrounding temperature was found. The developed classification algorithm is applicable for wind speeds below 5.0 m s -1 and has the advantage of being easily implemented by other users.

  10. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

    PubMed Central

    Chang, Wan-Yu; Chiu, Chung-Cheng; Yang, Jia-Horng

    2015-01-01

    In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods. PMID:26393597

  11. Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining

    PubMed Central

    Habibi, Shafi; Ahmadi, Maryam; Alizadeh, Somayeh

    2015-01-01

    Objectives: The aim of this study was to examine a predictive model using features related to the diabetes type 2 risk factors. Methods: The data were obtained from a database in a diabetes control system in Tabriz, Iran. The data included all people referred for diabetes screening between 2009 and 2011. The features considered as “Inputs” were: age, sex, systolic and diastolic blood pressure, family history of diabetes, and body mass index (BMI). Moreover, we used diagnosis as “Class”. We applied the “Decision Tree” technique and “J48” algorithm in the WEKA (3.6.10 version) software to develop the model. Results: After data preprocessing and preparation, we used 22,398 records for data mining. The model precision to identify patients was 0.717. The age factor was placed in the root node of the tree as a result of higher information gain. The ROC curve indicates the model function in identification of patients and those individuals who are healthy. The curve indicates high capability of the model, especially in identification of the healthy persons. Conclusions: We developed a model using the decision tree for screening T2DM which did not require laboratory tests for T2DM diagnosis. PMID:26156928

  12. Prediction of Antimicrobial Activity of Synthetic Peptides by a Decision Tree Model

    PubMed Central

    Lira, Felipe; Perez, Pedro S.; Baranauskas, José A.

    2013-01-01

    Antimicrobial resistance is a persistent problem in the public health sphere. However, recent attempts to find effective substitutes to combat infections have been directed at identifying natural antimicrobial peptides in order to circumvent resistance to commercial antibiotics. This study describes the development of synthetic peptides with antimicrobial activity, created in silico by site-directed mutation modeling using wild-type peptides as scaffolds for these mutations. Fragments of antimicrobial peptides were used for modeling with molecular modeling computational tools. To analyze these peptides, a decision tree model, which indicated the action range of peptides on the types of microorganisms on which they can exercise biological activity, was created. The decision tree model was processed using physicochemistry properties from known antimicrobial peptides available at the Antimicrobial Peptide Database (APD). The two most promising peptides were synthesized, and antimicrobial assays showed inhibitory activity against Gram-positive and Gram-negative bacteria. Colossomin C and colossomin D were the most inhibitory peptides at 5 μg/ml against Staphylococcus aureus and Escherichia coli. The methods described in this work and the results obtained are useful for the identification and development of new compounds with antimicrobial activity through the use of computational tools. PMID:23455341

  13. Analysis of acid rain patterns in northeastern China using a decision tree method

    NASA Astrophysics Data System (ADS)

    Zhang, Xiuying; Jiang, Hong; Jin, Jiaxin; Xu, Xiaohua; Zhang, Qingxin

    2012-01-01

    Acid rain is a major regional-scale environmental problem in China. To control acid rain pollution and to protect the ecological environment, it is urgent to document acid rain patterns in various regions of China. Taking Liaoning Province as the study area, the present work focused on the spatial and temporal variations of acid rains in northeastern China. It presents a means for predicting the occurrence of acid rain using geographic position, terrain characteristics, routinely monitored meteorological factors and column concentrations of atmospheric SO 2 and NO 2. The analysis applies a decision tree approach to the foregoing observation data. Results showed that: (1) acid rain occurred at 17 stations among the 81 monitoring stations in Liaoning Province, with the frequency of acid rain from 0 to 84.38%; (2) summer had the most acid rain occurrences followed by spring and autumn, and the winter had the least; (3) the total accuracy for the simulation of precipitation pH (pH ≤ 4.5, 4.5 < pH ≤ 5.6, and pH > 5.6) was 98.04% using the decision tree method known as C5. The simulation results also indicated that the distance to coastline, elevation, wind direction, wind speed, rainfall amount, atmospheric pressure, and the precursors of acid rain all have a strong influence on the occurrence of acid rains in northeastern China.

  14. Cardiovascular Dysautonomias Diagnosis Using Crisp and Fuzzy Decision Tree: A Comparative Study.

    PubMed

    Kadi, Ilham; Idri, Ali

    2016-01-01

    Decision trees (DTs) are one of the most popular techniques for learning classification systems, especially when it comes to learning from discrete examples. In real world, many data occurred in a fuzzy form. Hence a DT must be able to deal with such fuzzy data. In fact, integrating fuzzy logic when dealing with imprecise and uncertain data allows reducing uncertainty and providing the ability to model fine knowledge details. In this paper, a fuzzy decision tree (FDT) algorithm was applied on a dataset extracted from the ANS (Autonomic Nervous System) unit of the Moroccan university hospital Avicenne. This unit is specialized on performing several dynamic tests to diagnose patients with autonomic disorder and suggest them the appropriate treatment. A set of fuzzy classifiers were generated using FID 3.4. The error rates of the generated FDTs were calculated to measure their performances. Moreover, a comparison between the error rates obtained using crisp and FDTs was carried out and has proved that the results of FDTs were better than those obtained using crisp DTs. PMID:27139378

  15. Generation of 2D Land Cover Maps for Urban Areas Using Decision Tree Classification

    NASA Astrophysics Data System (ADS)

    Höhle, J.

    2014-09-01

    A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects like buildings, roads, grassland, trees, hedges, and walls from such an "intelligent" point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using image analysis techniques. The proposed methodology is described step by step. The classification, assessment, and refinement is carried out by the open source software "R"; the generation of the dense and accurate digital surface model by the "Match-T DSM" program of the Trimble Company. A practical example of a 2D land cover map generation is carried out. Images of a multispectral medium-format aerial camera covering an urban area in Switzerland are used. The assessment of the produced land cover map is based on class-wise stratified sampling where reference values of samples are determined by means of stereo-observations of false-colour stereopairs. The stratified statistical assessment of the produced land cover map with six classes and based on 91 points per class reveals a high thematic accuracy for classes "building" (99 %, 95 % CI: 95 %-100 %) and "road and parking lot" (90 %, 95 % CI: 83 %-95 %). Some other accuracy measures (overall accuracy, kappa value) and their 95 % confidence intervals are derived as well. The proposed methodology has a high potential for automation and fast processing and may be applied to other scenes and sensors.

  16. Determinants of farmers' tree planting investment decision as a degraded landscape management strategy in the central highlands of Ethiopia

    NASA Astrophysics Data System (ADS)

    Gessesse, B.; Bewket, W.; Bräuning, A.

    2015-11-01

    Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and binary logistic regression model. The model significantly predicted farmers' tree planting decision (Chi-square = 37.29, df = 15, P<0.001). Besides, the computed significant value of the model suggests that all the considered predictor variables jointly influenced the farmers' decision to plant trees as a land management strategy. In this regard, the finding of the study show that local land-users' willingness to adopt tree growing decision is a function of a wide range of biophysical, institutional, socioeconomic and household level factors, however, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system have positively and significantly influence on tree growing investment decisions in the study watershed. Eventually, the processes of land use conversion and land degradation are serious which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, devising sustainable and integrated land management policy options and implementing them would enhance ecological restoration and livelihood sustainability in the study watershed.

  17. Determinants of farmers' tree-planting investment decisions as a degraded landscape management strategy in the central highlands of Ethiopia

    NASA Astrophysics Data System (ADS)

    Gessesse, Berhan; Bewket, Woldeamlak; Bräuning, Achim

    2016-04-01

    Land degradation due to lack of sustainable land management practices is one of the critical challenges in many developing countries including Ethiopia. This study explored the major determinants of farm-level tree-planting decisions as a land management strategy in a typical farming and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and a binary logistic regression model. The model significantly predicted farmers' tree-planting decisions (χ2 = 37.29, df = 15, P < 0.001). Besides, the computed significant value of the model revealed that all the considered predictor variables jointly influenced the farmers' decisions to plant trees as a land management strategy. The findings of the study demonstrated that the adoption of tree-growing decisions by local land users was a function of a wide range of biophysical, institutional, socioeconomic and household-level factors. In this regard, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system had a critical influence on tree-growing investment decisions in the study watershed. Eventually, the processes of land-use conversion and land degradation were serious, which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, the study recommended that devising and implementing sustainable land management policy options would enhance ecological restoration and livelihood sustainability in the study watershed.

  18. Multi-output decision trees for lesion segmentation in multiple sclerosis

    NASA Astrophysics Data System (ADS)

    Jog, Amod; Carass, Aaron; Pham, Dzung L.; Prince, Jerry L.

    2015-03-01

    Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.

  19. Decision-Tree-based data mining and rule induction for predicting and mapping soil bacterial diversity.

    PubMed

    Kim, Kangsuk; Yoo, Keunje; Ki, Dongwon; Son, Il Suh; Oh, Kyong Joo; Park, Joonhong

    2011-07-01

    Soilmicrobial ecology plays a significant role in global ecosystems. Nevertheless, methods of model prediction and mapping have yet to be established for soil microbial ecology. The present study was undertaken to develop an artificial-intelligence- and geographical information system (GIS)-integrated framework for predicting and mapping soil bacterial diversity using pre-existing environmental geospatial database information, and to further evaluate the applicability of soil bacterial diversity mapping for planning construction of eco-friendly roads. Using a stratified random sampling, soil bacterial diversity was measured in 196 soil samples in a forest area where construction of an eco-friendly road was planned. Model accuracy, coherence analyses, and tree analysis were systematically performed, and four-class discretized decision tree (DT) with ordinary pair-wise partitioning (OPP) was selected as the optimal model among tested five DT model variants. GIS-based simulations of the optimal DT model with varying weights assigned to soil ecological quality showed that the inclusion of soil ecology in environmental components, which are considered in environmental impact assessment, significantly affects the spatial distributions of overall environmental quality values as well as the determination of an environmentally optimized road route. This work suggests a guideline to use systematic accuracy, coherence, and tree analyses in selecting an optimal DT model from multiple candidate model variants, and demonstrates the applicability of the OPP-improved DT integrated with GIS in rule induction for mapping bacterial diversity. These findings also provide implication on the significance of soil microbial ecology in environmental impact assessment and eco-friendly construction planning. PMID:21072585

  20. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree

    PubMed Central

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  1. Bayesian decision tree for the classification of the mode of motion in single-molecule trajectories.

    PubMed

    Türkcan, Silvan; Masson, Jean-Baptiste

    2013-01-01

    Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. The method is based on Bayesian inference to calculate the posteriori probability of an observed trajectory according to a certain model. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and we built a decision tree. This decision tree first uses the BIC to distinguish between free Brownian motion and confined motion. In a second step, it classifies the confining potential further using the AIC. We apply the method to experimental Clostridium Perfingens [Formula: see text]-toxin (CP[Formula: see text]T) receptor trajectories to show that these receptors are confined by a spring-like potential. An adaptation of this technique was applied on a sliding window in the temporal dimension along the trajectory. We applied this adaptation to experimental CP[Formula: see text]T trajectories that lose confinement due to disaggregation of confining domains. This new technique adds another dimension to the discussion of SMT data. The mode of motion of a receptor might hold more biologically relevant information than the diffusion

  2. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree.

    PubMed

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  3. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    PubMed

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. PMID:25835791

  4. IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework.

    PubMed

    Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan

    2016-02-01

    Accuracy plays a vital role in the medical field as it concerns with the life of an individual. Extensive research has been conducted on disease classification and prediction using machine learning techniques. However, there is no agreement on which classifier produces the best results. A specific classifier may be better than others for a specific dataset, but another classifier could perform better for some other dataset. Ensemble of classifiers has been proved to be an effective way to improve classification accuracy. In this research we present an ensemble framework with multi-layer classification using enhanced bagging and optimized weighting. The proposed model called "HM-BagMoov" overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on five different heart disease datasets, four breast cancer datasets, two diabetes datasets, two liver disease datasets and one hepatitis dataset obtained from public repositories. The analysis of the results show that ensemble framework achieved the highest accuracy, sensitivity and F-Measure when compared with individual classifiers for all the diseases. In addition to this, the ensemble framework also achieved the highest accuracy when compared with the state of the art techniques. An application named "IntelliHealth" is also developed based on proposed model that may be used by hospitals/doctors for diagnostic advice. PMID:26703093

  5. What Satisfies Students? Mining Student-Opinion Data with Regression and Decision-Tree Analysis. AIR 2002 Forum Paper.

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…

  6. A Decision-Tree-Oriented Guidance Mechanism for Conducting Nature Science Observation Activities in a Context-Aware Ubiquitous Learning

    ERIC Educational Resources Information Center

    Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung

    2010-01-01

    A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…

  7. Model-independent evaluation of tumor markers and a logistic-tree approach to diagnostic decision support.

    PubMed

    Ni, Weizeng; Huang, Samuel H; Su, Qiang; Shi, Jinghua

    2014-01-01

    Sensitivity and specificity of using individual tumor markers hardly meet the clinical requirement. This challenge gave rise to many efforts, e.g., combing multiple tumor markers and employing machine learning algorithms. However, results from different studies are often inconsistent, which are partially attributed to the use of different evaluation criteria. Also, the wide use of model-dependent validation leads to high possibility of data overfitting when complex models are used for diagnosis. We propose two model-independent criteria, namely, area under the curve (AUC) and Relief to evaluate the diagnostic values of individual and multiple tumor markers, respectively. For diagnostic decision support, we propose the use of logistic-tree which combines decision tree and logistic regression. Application on a colorectal cancer dataset shows that the proposed evaluation criteria produce results that are consistent with current knowledge. Furthermore, the simple and highly interpretable logistic-tree has diagnostic performance that is competitive with other complex models. PMID:25516124

  8. Proposal of a Clinical Decision Tree Algorithm Using Factors Associated with Severe Dengue Infection

    PubMed Central

    Hussin, Narwani; Cheah, Wee Kooi; Ng, Kee Sing; Muninathan, Prema

    2016-01-01

    Background WHO’s new classification in 2009: dengue with or without warning signs and severe dengue, has necessitated large numbers of admissions to hospitals of dengue patients which in turn has been imposing a huge economical and physical burden on many hospitals around the globe, particularly South East Asia and Malaysia where the disease has seen a rapid surge in numbers in recent years. Lack of a simple tool to differentiate mild from life threatening infection has led to unnecessary hospitalization of dengue patients. Methods We conducted a single-centre, retrospective study involving serologically confirmed dengue fever patients, admitted in a single ward, in Hospital Kuala Lumpur, Malaysia. Data was collected for 4 months from February to May 2014. Socio demography, co-morbidity, days of illness before admission, symptoms, warning signs, vital signs and laboratory result were all recorded. Descriptive statistics was tabulated and simple and multiple logistic regression analysis was done to determine significant risk factors associated with severe dengue. Results 657 patients with confirmed dengue were analysed, of which 59 (9.0%) had severe dengue. Overall, the commonest warning sign were vomiting (36.1%) and abdominal pain (32.1%). Previous co-morbid, vomiting, diarrhoea, pleural effusion, low systolic blood pressure, high haematocrit, low albumin and high urea were found as significant risk factors for severe dengue using simple logistic regression. However the significant risk factors for severe dengue with multiple logistic regressions were only vomiting, pleural effusion, and low systolic blood pressure. Using those 3 risk factors, we plotted an algorithm for predicting severe dengue. When compared to the classification of severe dengue based on the WHO criteria, the decision tree algorithm had a sensitivity of 0.81, specificity of 0.54, positive predictive value of 0.16 and negative predictive of 0.96. Conclusion The decision tree algorithm proposed

  9. Ensembl 2012.

    PubMed

    Flicek, Paul; Amode, M Ridwan; Barrell, Daniel; Beal, Kathryn; Brent, Simon; Carvalho-Silva, Denise; Clapham, Peter; Coates, Guy; Fairley, Susan; Fitzgerald, Stephen; Gil, Laurent; Gordon, Leo; Hendrix, Maurice; Hourlier, Thibaut; Johnson, Nathan; Kähäri, Andreas K; Keefe, Damian; Keenan, Stephen; Kinsella, Rhoda; Komorowska, Monika; Koscielny, Gautier; Kulesha, Eugene; Larsson, Pontus; Longden, Ian; McLaren, William; Muffato, Matthieu; Overduin, Bert; Pignatelli, Miguel; Pritchard, Bethan; Riat, Harpreet Singh; Ritchie, Graham R S; Ruffier, Magali; Schuster, Michael; Sobral, Daniel; Tang, Y Amy; Taylor, Kieron; Trevanion, Stephen; Vandrovcova, Jana; White, Simon; Wilson, Mark; Wilder, Steven P; Aken, Bronwen L; Birney, Ewan; Cunningham, Fiona; Dunham, Ian; Durbin, Richard; Fernández-Suarez, Xosé M; Harrow, Jennifer; Herrero, Javier; Hubbard, Tim J P; Parker, Anne; Proctor, Glenn; Spudich, Giulietta; Vogel, Jan; Yates, Andy; Zadissa, Amonida; Searle, Stephen M J

    2012-01-01

    The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project. PMID:22086963

  10. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    NASA Astrophysics Data System (ADS)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  11. An Efficient Ensemble Learning Method for Gene Microarray Classification

    PubMed Central

    Shadgar, Bita

    2013-01-01

    The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost. PMID:24024194

  12. A Low Complexity System Based on Multiple Weighted Decision Trees for Indoor Localization

    PubMed Central

    Sánchez-Rodríguez, David; Hernández-Morera, Pablo; Quinteiro, José Ma.; Alonso-González, Itziar

    2015-01-01

    Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity. The localization system is built using a dataset from sensor fusion, which combines the strength of radio signals from different wireless local area network access points and device orientation information from a digital compass built-in mobile device, so that extra sensors are unnecessary. Experimental results indicate that the proposed system leads to substantial improvements on computational complexity over the widely-used traditional fingerprinting methods, and it has a better accuracy than they have. PMID:26110413

  13. Interactive change detection based on dissimilarity image and decision tree classification

    NASA Astrophysics Data System (ADS)

    Wang, Yan; Crouzil, Alain; Puel, Jean-Baptiste

    2015-02-01

    Our study mainly focus on detecting changed regions in two images of the same scene taken by digital cameras at different times. The images taken by digital cameras generally provide less information than multi-channel remote sensing images. Moreover, the application-dependent insignificant changes, such as shadows or clouds, may cause the failure of the classical methods based on image differences. The machine learning approach seems to be promising, but the lack of a sufficient volume of training data for photographic landscape observatories discards a lot of methods. So we investigate in this work the interactive learning approach and provide a discriminative model that is a 16-dimensional feature space comprising the textural appearance and contextual information. Dissimilarity measures in different neighborhood sizes are used to detect the difference within the neighborhood of an image pair. To detect changes between two images, the user designates change and non-change samples (pixel sets) in the images using a selection tool. This data is used to train a classifier using decision tree training method which is then applied to all the other pixels of the image pair. The experiments have proved the potential of the proposed approach.

  14. Tailored approach in inguinal hernia repair - decision tree based on the guidelines.

    PubMed

    Köckerling, Ferdinand; Schug-Pass, Christine

    2014-01-01

    The endoscopic procedures TEP and TAPP and the open techniques Lichtenstein, Plug and Patch, and PHS currently represent the gold standard in inguinal hernia repair recommended in the guidelines of the European Hernia Society, the International Endohernia Society, and the European Association of Endoscopic Surgery. Eighty-two percent of experienced hernia surgeons use the "tailored approach," the differentiated use of the several inguinal hernia repair techniques depending on the findings of the patient, trying to minimize the risks. The following differential therapeutic situations must be distinguished in inguinal hernia repair: unilateral in men, unilateral in women, bilateral, scrotal, after previous pelvic and lower abdominal surgery, no general anesthesia possible, recurrence, and emergency surgery. Evidence-based guidelines and consensus conferences of experts give recommendations for the best approach in the individual situation of a patient. This review tries to summarize the recommendations of the various guidelines and to transfer them into a practical decision tree for the daily work of surgeons performing inguinal hernia repair. PMID:25593944

  15. A Low Complexity System Based on Multiple Weighted Decision Trees for Indoor Localization.

    PubMed

    Sánchez-Rodríguez, David; Hernández-Morera, Pablo; Quinteiro, José Ma; Alonso-González, Itziar

    2015-01-01

    Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity. The localization system is built using a dataset from sensor fusion, which combines the strength of radio signals from different wireless local area network access points and device orientation information from a digital compass built-in mobile device, so that extra sensors are unnecessary. Experimental results indicate that the proposed system leads to substantial improvements on computational complexity over the widely-used traditional fingerprinting methods, and it has a better accuracy than they have. PMID:26110413

  16. Effect of training characteristics on object classification: An application using Boosted Decision Trees

    NASA Astrophysics Data System (ADS)

    Sevilla-Noarbe, I.; Etayo-Sotos, P.

    2015-06-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects that different input vectors and training sets have on the classification performance, the results being of wider use to other machine learning techniques.

  17. Object classification in images for Epo doping control based on fuzzy decision trees

    NASA Astrophysics Data System (ADS)

    Bajla, Ivan; Hollander, Igor; Heiss, Dorothea; Granec, Reinhard; Minichmayr, Markus

    2005-02-01

    Erythropoietin (Epo) is a hormone which can be misused as a doping substance. Its detection involves analysis of images containing specific objects (bands), whose position and intensity are critical for doping positivity. Within a research project of the World Anti-Doping Agency (WADA) we are implementing the GASepo software that should serve for Epo testing in doping control laboratories world-wide. For identification of the bands we have developed a segmentation procedure based on a sequence of filters and edge detectors. Whereas all true bands are properly segmented, the procedure generates a relatively high number of false positives (artefacts). To separate these artefacts we suggested a post-segmentation supervised classification using real-valued geometrical measures of objects. The method is based on the ID3 (Ross Quinlan's) rule generation method, where fuzzy representation is used for linking the linguistic terms to quantitative data. The fuzzy modification of the ID3 method provides a framework that generates fuzzy decision trees, as well as fuzzy sets for input data. Using the MLTTM software (Machine Learning Framework) we have generated a set of fuzzy rules explicitly describing bands and artefacts. The method eliminated most of the artefacts. The contribution includes a comparison of the obtained misclassification errors to the errors produced by some other statistical classification methods.

  18. Smart on-board diagnostic decision trees for quantitative aviation equipment and safety procedures validation

    NASA Astrophysics Data System (ADS)

    Ali, Ali H.; Markarian, Garik; Tarter, Alex; Kölle, Rainer

    2010-04-01

    The current trend in high-accuracy aircraft navigation systems is towards using data from one or more inertial navigation subsystem and one or more navigational reference subsystems. The enhancement in fault diagnosis and detection is achieved via computing the minimum mean square estimate of the aircraft states using, for instance, Kalman filter method. However, this enhancement might degrade if the cause of a subsystem fault has some effect on other subsystems that are calculating the same measurement. One instance of such case is the tragic incident of Air France Flight 447 in June, 2009 where message transmissions in the last moment before the crash indicated inconsistencies in measured airspeed as reported by Airbus. In this research, we propose the use of mathematical aircraft model to work out the current states of the airplane and in turn, using these states to validate the readings of the navigation equipment throughout smart diagnostic decision tree network. Various simulated equipment failures have been introduced in a controlled environment to proof the concept of operation. The results have showed successful detection of the failing equipment in all cases.

  19. Using Ensemble Decisions and Active Selection to Improve Low-Cost Labeling for Multi-View Data

    NASA Technical Reports Server (NTRS)

    Rebbapragada, Umaa; Wagstaff, Kiri L.

    2011-01-01

    This paper seeks to improve low-cost labeling in terms of training set reliability (the fraction of correctly labeled training items) and test set performance for multi-view learning methods. Co-training is a popular multiview learning method that combines high-confidence example selection with low-cost (self) labeling. However, co-training with certain base learning algorithms significantly reduces training set reliability, causing an associated drop in prediction accuracy. We propose the use of ensemble labeling to improve reliability in such cases. We also discuss and show promising results on combining low-cost ensemble labeling with active (low-confidence) example selection. We unify these example selection and labeling strategies under collaborative learning, a family of techniques for multi-view learning that we are developing for distributed, sensor-network environments.

  20. Ensemble-based analysis of Front Range severe convection on 6-7 June 2012: Forecast uncertainty and communication of weather information to Front Range decision-makers

    NASA Astrophysics Data System (ADS)

    Vincente, Vanessa

    -allowing ensemble also showed greater skill in forecasting heavy precipitation amounts in the vicinity of where they were observed during the most active convective period, particularly near urbanized areas. A total of 9 Front Range EMs were interviewed to research how they understood hazardous weather information, and how their perception of forecast uncertainty would influence their decision making following a heavy rain event. Many of the EMs use situational awareness and past experiences with major weather events to guide their emergency planning. They also highly valued their relationship with the National Weather Service to improve their understanding of weather forecasts and ask questions about the uncertainties. Most of the EMs perceived forecast uncertainty in terms of probability and with the understanding that forecasting the weather is an imprecise science. The greater the likelihood of occurrence (implied by a higher probability of precipitation) showed greater confidence in the forecast that an event was likely to happen. Five probabilistic forecast products were generated from the convection-allowing ensemble output to generate a hypothetical warm season heavy rain event scenario. Responses varied between the EMs in which products they found most practical or least useful. Most EMs believed that there was a high probability for flooding, as illustrated by the degree of forecasted precipitation intensity. Most confirmed perceiving uncertainty in the different forecast representations, sharing the idea that there is an inherent uncertainty that follows modeled forecasts. The long-term goal of this research is to develop and add reliable probabilistic forecast products to the "toolbox" of decision-makers to help them better assess hazardous weather information and improve warning notifications and response.

  1. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    PubMed

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  2. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    PubMed Central

    Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  3. Ant colony optimisation of decision tree and contingency table models for the discovery of gene-gene interactions.

    PubMed

    Sapin, Emmanuel; Keedwell, Ed; Frayling, Tim

    2015-12-01

    In this study, ant colony optimisation (ACO) algorithm is used to derive near-optimal interactions between a number of single nucleotide polymorphisms (SNPs). This approach is used to discover small numbers of SNPs that are combined into a decision tree or contingency table model. The ACO algorithm is shown to be very robust as it is proven to be able to find results that are discriminatory from a statistical perspective with logical interactions, decision tree and contingency table models for various numbers of SNPs considered in the interaction. A large number of the SNPs discovered here have been already identified in large genome-wide association studies to be related to type II diabetes in the literature, lending additional confidence to the results. PMID:26577156

  4. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China

    PubMed Central

    Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin

    2016-01-01

    Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328

  5. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

    NASA Astrophysics Data System (ADS)

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-01

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.

  6. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography.

    PubMed

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-01

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology. PMID:27273293

  7. Evaluation of the potential allergenicity of the enzyme microbial transglutaminase using the 2001 FAO/WHO Decision Tree.

    PubMed

    Pedersen, Mona H; Hansen, Tine K; Sten, Eva; Seguro, Katsuya; Ohtsuka, Tomoko; Morita, Akiko; Bindslev-Jensen, Carsten; Poulsen, Lars K

    2004-11-01

    All novel proteins must be assessed for their potential allergenicity before they are introduced into the food market. One method to achieve this is the 2001 FAO/WHO Decision Tree recommended for evaluation of proteins from genetically modified organisms (GMOs). It was the aim of this study to investigate the allergenicity of microbial transglutaminase (m-TG) from Streptoverticillium mobaraense. Amino acid sequence similarity to known allergens, pepsin resistance, and detection of protein binding to specific serum immunoglobulin E (IgE) (RAST) have been evaluated as recommended by the decision tree. Allergenicity in the source material was thought unlikely, since no IgE-mediated allergy to any bacteria has been reported. m-TG is fully degraded after 5 min of pepsin treatment. A database search showed that the enzyme has no homology with known allergens, down to a match of six contiguous amino acids, which meets the requirements of the decision tree. However, there is a match at the five contiguous amino acid level to the major codfish allergen Gad c1. The potential cross reactivity between m-TG and Gad c1 was investigated in RAST using sera from 25 documented cod-allergic patients and an extract of raw codfish. No binding between patient IgE and m-TG was observed. It can be concluded that no safety concerns with regard to the allergenic potential of m-TG were identified. PMID:15508178

  8. Decision tree supported substructure prediction of metabolites from GC-MS profiles.

    PubMed

    Hummel, Jan; Strehmel, Nadine; Selbig, Joachim; Walther, Dirk; Kopka, Joachim

    2010-06-01

    Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities. PMID:20526350

  9. Accurate and interpretable nanoSAR models from genetic programming-based decision tree construction approaches.

    PubMed

    Oksel, Ceyda; Winkler, David A; Ma, Cai Y; Wilkins, Terry; Wang, Xue Z

    2016-09-01

    The number of engineered nanomaterials (ENMs) being exploited commercially is growing rapidly, due to the novel properties they exhibit. Clearly, it is important to understand and minimize any risks to health or the environment posed by the presence of ENMs. Data-driven models that decode the relationships between the biological activities of ENMs and their physicochemical characteristics provide an attractive means of maximizing the value of scarce and expensive experimental data. Although such structure-activity relationship (SAR) methods have become very useful tools for modelling nanotoxicity endpoints (nanoSAR), they have limited robustness and predictivity and, most importantly, interpretation of the models they generate is often very difficult. New computational modelling tools or new ways of using existing tools are required to model the relatively sparse and sometimes lower quality data on the biological effects of ENMs. The most commonly used SAR modelling methods work best with large datasets, are not particularly good at feature selection, can be relatively opaque to interpretation, and may not account for nonlinearity in the structure-property relationships. To overcome these limitations, we describe the application of a novel algorithm, a genetic programming-based decision tree construction tool (GPTree) to nanoSAR modelling. We demonstrate the use of GPTree in the construction of accurate and interpretable nanoSAR models by applying it to four diverse literature datasets. We describe the algorithm and compare model results across the four studies. We show that GPTree generates models with accuracies equivalent to or superior to those of prior modelling studies on the same datasets. GPTree is a robust, automatic method for generation of accurate nanoSAR models with important advantages that it works with small datasets, automatically selects descriptors, and provides significantly improved interpretability of models. PMID:26956430

  10. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    PubMed

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988

  11. Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

    PubMed Central

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988

  12. Ensembl comparative genomics resources.

    PubMed

    Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

  13. Ensembl comparative genomics resources

    PubMed Central

    Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

  14. The creation of a digital soil map for Cyprus using decision-tree classification techniques

    NASA Astrophysics Data System (ADS)

    Camera, Corrado; Zomeni, Zomenia; Bruggeman, Adriana; Noller, Joy; Zissimos, Andreas

    2014-05-01

    Considering the increasing threats soil are experiencing especially in semi-arid, Mediterranean environments like Cyprus (erosion, contamination, sealing and salinisation), producing a high resolution, reliable soil map is essential for further soil conservation studies. This study aims to create a 1:50.000 soil map covering the area under the direct control of the Republic of Cyprus (5.760 km2). The study consists of two major steps. The first is the creation of a raster database of predictive variables selected according to the scorpan formula (McBratney et al., 2003). It is of particular interest the possibility of using, as soil properties, data coming from three older island-wide soil maps and the recently published geochemical atlas of Cyprus (Cohen et al., 2011). Ten highly characterizing elements were selected and used as predictors in the present study. For the other factors usual variables were used: temperature and aridity index for climate; total loss on ignition, vegetation and forestry types maps for organic matter; the DEM and related relief derivatives (slope, aspect, curvature, landscape units); bedrock, surficial geology and geomorphology (Noller, 2009) for parent material and age; and a sub-watershed map to better bound location related to parent material sources. In the second step, the digital soil map is created using the Random Forests package in R. Random Forests is a decision tree classification technique where many trees, instead of a single one, are developed and compared to increase the stability and the reliability of the prediction. The model is trained and verified on areas where a 1:25.000 published soil maps obtained from field work is available and then it is applied for predictive mapping to the other areas. Preliminary results obtained in a small area in the plain around the city of Lefkosia, where eight different soil classes are present, show very good capacities of the method. The Ramdom Forest approach leads to reproduce soil

  15. Genetic programming based ensemble system for microarray data classification.

    PubMed

    Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To

    2015-01-01

    Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748

  16. Analysis of the impact of recreational trail usage for prioritising management decisions: a regression tree approach

    NASA Astrophysics Data System (ADS)

    Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek

    2016-04-01

    The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of

  17. Genetic program based data mining of fuzzy decision trees and methods of improving convergence and reducing bloat

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Nguyen, ThanhVu H.

    2007-04-01

    A data mining procedure for automatic determination of fuzzy decision tree structure using a genetic program (GP) is discussed. A GP is an algorithm that evolves other algorithms or mathematical expressions. Innovative methods for accelerating convergence of the data mining procedure and reducing bloat are given. In genetic programming, bloat refers to excessive tree growth. It has been observed that the trees in the evolving GP population will grow by a factor of three every 50 generations. When evolving mathematical expressions much of the bloat is due to the expressions not being in algebraically simplest form. So a bloat reduction method based on automated computer algebra has been introduced. The effectiveness of this procedure is discussed. Also, rules based on fuzzy logic have been introduced into the GP to accelerate convergence, reduce bloat and produce a solution more readily understood by the human user. These rules are discussed as well as other techniques for convergence improvement and bloat control. Comparisons between trees created using a genetic program and those constructed solely by interviewing experts are made. A new co-evolutionary method that improves the control logic evolved by the GP by having a genetic algorithm evolve pathological scenarios is discussed. The effect on the control logic is considered. Finally, additional methods that have been used to validate the data mining algorithm are referenced.

  18. Effective Visualization of Temporal Ensembles.

    PubMed

    Hao, Lihua; Healey, Christopher G; Bass, Steffen A

    2016-01-01

    An ensemble is a collection of related datasets, called members, built from a series of runs of a simulation or an experiment. Ensembles are large, temporal, multidimensional, and multivariate, making them difficult to analyze. Another important challenge is visualizing ensembles that vary both in space and time. Initial visualization techniques displayed ensembles with a small number of members, or presented an overview of an entire ensemble, but without potentially important details. Recently, researchers have suggested combining these two directions, allowing users to choose subsets of members to visualization. This manual selection process places the burden on the user to identify which members to explore. We first introduce a static ensemble visualization system that automatically helps users locate interesting subsets of members to visualize. We next extend the system to support analysis and visualization of temporal ensembles. We employ 3D shape comparison, cluster tree visualization, and glyph based visualization to represent different levels of detail within an ensemble. This strategy is used to provide two approaches for temporal ensemble analysis: (1) segment based ensemble analysis, to capture important shape transition time-steps, clusters groups of similar members, and identify common shape changes over time across multiple members; and (2) time-step based ensemble analysis, which assumes ensemble members are aligned in time by combining similar shapes at common time-steps. Both approaches enable users to interactively visualize and analyze a temporal ensemble from different perspectives at different levels of detail. We demonstrate our techniques on an ensemble studying matter transition from hadronic gas to quark-gluon plasma during gold-on-gold particle collisions. PMID:26529728

  19. Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes

    PubMed Central

    2013-01-01

    Background Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets. Results A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups. Conclusions The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease. PMID:24188919

  20. Predicting Lung Radiotherapy-Induced Pneumonitis Using a Model Combining Parametric Lyman Probit With Nonparametric Decision Trees

    SciTech Connect

    Das, Shiva K. . E-mail: shiva.das@duke.edu; Zhou Sumin; Zhang, Junan; Yin, F.-F.; Dewhirst, Mark W.; Marks, Lawrence B.

    2007-07-15

    Purpose: To develop and test a model to predict for lung radiation-induced Grade 2+ pneumonitis. Methods and Materials: The model was built from a database of 234 lung cancer patients treated with radiotherapy (RT), of whom 43 were diagnosed with pneumonitis. The model augmented the predictive capability of the parametric dose-based Lyman normal tissue complication probability (LNTCP) metric by combining it with weighted nonparametric decision trees that use dose and nondose inputs. The decision trees were sequentially added to the model using a 'boosting' process that enhances the accuracy of prediction. The model's predictive capability was estimated by 10-fold cross-validation. To facilitate dissemination, the cross-validation result was used to extract a simplified approximation to the complicated model architecture created by boosting. Application of the simplified model is demonstrated in two example cases. Results: The area under the model receiver operating characteristics curve for cross-validation was 0.72, a significant improvement over the LNTCP area of 0.63 (p = 0.005). The simplified model used the following variables to output a measure of injury: LNTCP, gender, histologic type, chemotherapy schedule, and treatment schedule. For a given patient RT plan, injury prediction was highest for the combination of pre-RT chemotherapy, once-daily treatment, female gender and lowest for the combination of no pre-RT chemotherapy and nonsquamous cell histologic type. Application of the simplified model to the example cases revealed that injury prediction for a given treatment plan can range from very low to very high, depending on the settings of the nondose variables. Conclusions: Radiation pneumonitis prediction was significantly enhanced by decision trees that added the influence of nondose factors to the LNTCP formulation.

  1. Ensembl 2013

    PubMed Central

    Flicek, Paul; Ahmed, Ikhlak; Amode, M. Ridwan; Barrell, Daniel; Beal, Kathryn; Brent, Simon; Carvalho-Silva, Denise; Clapham, Peter; Coates, Guy; Fairley, Susan; Fitzgerald, Stephen; Gil, Laurent; García-Girón, Carlos; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah; Juettemann, Thomas; Kähäri, Andreas K.; Keenan, Stephen; Komorowska, Monika; Kulesha, Eugene; Longden, Ian; Maurel, Thomas; McLaren, William M.; Muffato, Matthieu; Nag, Rishi; Overduin, Bert; Pignatelli, Miguel; Pritchard, Bethan; Pritchard, Emily; Riat, Harpreet Singh; Ritchie, Graham R. S.; Ruffier, Magali; Schuster, Michael; Sheppard, Daniel; Sobral, Daniel; Taylor, Kieron; Thormann, Anja; Trevanion, Stephen; White, Simon; Wilder, Steven P.; Aken, Bronwen L.; Birney, Ewan; Cunningham, Fiona; Dunham, Ian; Harrow, Jennifer; Herrero, Javier; Hubbard, Tim J. P.; Johnson, Nathan; Kinsella, Rhoda; Parker, Anne; Spudich, Giulietta; Yates, Andy; Zadissa, Amonida; Searle, Stephen M. J.

    2013-01-01

    The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces. PMID:23203987

  2. Ensemble Models

    EPA Science Inventory

    Ensemble forecasting has been used for operational numerical weather prediction in the United States and Europe since the early 1990s. An ensemble of weather or climate forecasts is used to characterize the two main sources of uncertainty in computer models of physical systems: ...

  3. Ensembl 2005

    PubMed Central

    Hubbard, T.; Andrews, D.; Caccamo, M.; Cameron, G.; Chen, Y.; Clamp, M.; Clarke, L.; Coates, G.; Cox, T.; Cunningham, F.; Curwen, V.; Cutts, T.; Down, T.; Durbin, R.; Fernandez-Suarez, X. M.; Gilbert, J.; Hammond, M.; Herrero, J.; Hotz, H.; Howe, K.; Iyer, V.; Jekosch, K.; Kahari, A.; Kasprzyk, A.; Keefe, D.; Keenan, S.; Kokocinsci, F.; London, D.; Longden, I.; McVicker, G.; Melsopp, C.; Meidl, P.; Potter, S.; Proctor, G.; Rae, M.; Rios, D.; Schuster, M.; Searle, S.; Severin, J.; Slater, G.; Smedley, D.; Smith, J.; Spooner, W.; Stabenau, A.; Stalker, J.; Storey, R.; Trevanion, S.; Ureta-Vidal, A.; Vogel, J.; White, S.; Woodwark, C.; Birney, E.

    2005-01-01

    The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased by 7 to 16, with the addition of the six vertebrate genomes of chimpanzee, dog, cow, chicken, tetraodon and frog and the insect genome of honeybee. The majority have been annotated automatically using the Ensembl gene build system, showing its flexibility to reliably annotate a wide variety of genomes. With the increased number of vertebrate genomes, the comparative analysis provided to users has been greatly improved, with new website interfaces allowing annotation of different genomes to be directly compared. The Ensembl software system is being increasingly widely reused in different projects showing the benefits of a completely open approach to software development and distribution. PMID:15608235

  4. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections.

    PubMed

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule's overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  5. Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier.

    PubMed

    Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram

    2015-08-01

    In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%. PMID:26609414

  6. Improving Crop Classification Techniques Using Optical Remote Sensing Imagery, High-Resolution Agriculture Resource Inventory Shapefiles and Decision Trees

    NASA Astrophysics Data System (ADS)

    Melnychuk, A. L.; Berg, A. A.; Sweeney, S.

    2010-12-01

    Recognition of anthropogenic effects of land use management practices on bodies of water is important for remediating and preventing eutrophication. In the case of Lake Simcoe, Ontario the main surrounding landuse is agriculture. To better manage the nutrient flow into the lake, knowledge of the management of the agricultural land is important. For this basin, a comprehensive agricultural resource inventory is required for assessment of policy and for input into water quality management and assessment tools. Supervised decision tree classification schemes, used in many previous applications, have yielded reliable classifications in agricultural land-use systems. However, when using these classification techniques the user is confronted with numerous data sources. In this study we use a large inventory of optical satellite image products (Landsat, AWiFS, SPOT and MODIS) and ancillary data sources (temporal MODIS-NDVI product signatures, digital elevation models and soil maps) at various spatial and temporal resolutions in a decision tree classification scheme. The sensitivity of the classification accuracy to various products is assessed to identify optimal data sources for classifying crop systems.

  7. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    PubMed Central

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule’s overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  8. Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier

    PubMed Central

    Kambhampati, Satya Samyukta; Singh, Vishal; Ramkumar, Barathram

    2015-01-01

    In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%. PMID:26609414

  9. Lessons Learned from Applications of a Climate Change Decision Tree toWater System Projects in Kenya and Nepal

    NASA Astrophysics Data System (ADS)

    Ray, P. A.; Bonzanigo, L.; Taner, M. U.; Wi, S.; Yang, Y. C. E.; Brown, C.

    2015-12-01

    The Decision Tree Framework developed for the World Bank's Water Partnership Program provides resource-limited project planners and program managers with a cost-effective and effort-efficient, scientifically defensible, repeatable, and clear method for demonstrating the robustness of a project to climate change. At the conclusion of this process, the project planner is empowered to confidently communicate the method by which the vulnerabilities of the project have been assessed, and how the adjustments that were made (if any were necessary) improved the project's feasibility and profitability. The framework adopts a "bottom-up" approach to risk assessment that aims at a thorough understanding of a project's vulnerabilities to climate change in the context of other nonclimate uncertainties (e.g., economic, environmental, demographic, political). It helps identify projects that perform well across a wide range of potential future climate conditions, as opposed to seeking solutions that are optimal in expected conditions but fragile to conditions deviating from the expected. Lessons learned through application of the Decision Tree to case studies in Kenya and Nepal will be presented, and aspects of the framework requiring further refinement will be described.

  10. Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients.

    PubMed

    Hashem, Somaya; Esmat, Gamal; Elakel, Wafaa; Habashy, Shahira; Abdel Raouf, Safaa; Darweesh, Samar; Soliman, Mohamad; Elhefnawi, Mohamed; El-Adawy, Mohamed; ElHefnawi, Mahmoud

    2016-01-01

    Background/Aim. Respectively with the prevalence of chronic hepatitis C in the world, using noninvasive methods as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy is significantly increasing. The aim of this study is to combine the serum biomarkers and clinical information to develop a classification model that can predict advanced liver fibrosis. Methods. 39,567 patients with chronic hepatitis C were included and randomly divided into two separate sets. Liver fibrosis was assessed via METAVIR score; patients were categorized as mild to moderate (F0-F2) or advanced (F3-F4) fibrosis stages. Two models were developed using alternating decision tree algorithm. Model 1 uses six parameters, while model 2 uses four, which are similar to FIB-4 features except alpha-fetoprotein instead of alanine aminotransferase. Sensitivity and receiver operating characteristic curve were performed to evaluate the performance of the proposed models. Results. The best model achieved 86.2% negative predictive value and 0.78 ROC with 84.8% accuracy which is better than FIB-4. Conclusions. The risk of advanced liver fibrosis, due to chronic hepatitis C, could be predicted with high accuracy using decision tree learning algorithm that could be used to reduce the need to assess the liver biopsy. PMID:26880886