Science.gov

Sample records for decision tree ensembles

  1. Creating ensembles of decision trees through sampling

    DOEpatents

    Kamath, Chandrika; Cantu-Paz, Erick

    2005-08-30

    A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.

  2. Creating Ensembles of Decision Trees Through Sampling

    SciTech Connect

    Kamath,C; Cantu-Paz, E

    2001-07-26

    Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. This paper focuses on ensembles of decision trees that are created with a randomized procedure based on sampling. Randomization can be introduced by using random samples of the training data (as in bagging or boosting) and running a conventional tree-building algorithm, or by randomizing the induction algorithm itself. The objective of this paper is to describe the first experiences with a novel randomized tree induction method that uses a sub-sample of instances at a node to determine the split. The empirical results show that ensembles generated using this approach yield results that are competitive in accuracy and superior in computational cost to boosting and bagging.

  3. Creating ensembles of decision trees through sampling

    SciTech Connect

    Kamath, C; Cantu-Paz, E

    2001-02-02

    Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. This paper focuses on ensembles of decision trees that are created with a randomized procedure based on sampling. Randomization can be introduced by using random samples of the training data (as in bagging or arcing) and running a conventional tree-building algorithm, or by randomizing the induction algorithm itself. The objective of this paper is to describe our first experiences with a novel randomized tree induction method that uses a subset of samples at a node to determine the split. Our empirical results show that ensembles generated using this approach yield results that are competitive in accuracy and superior in computational cost.

  4. Improving ensemble decision tree performance using Adaboost and Bagging

    NASA Astrophysics Data System (ADS)

    Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie

    2015-12-01

    Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.

  5. Using histograms to introduce randomization in the generation of ensembles of decision trees

    DOEpatents

    Kamath, Chandrika; Cantu-Paz, Erick; Littau, David

    2005-02-22

    A system for decision tree ensembles that includes a module to read the data, a module to create a histogram, a module to evaluate a potential split according to some criterion using the histogram, a module to select a split point randomly in an interval around the best split, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method includes the steps of reading the data; creating a histogram; evaluating a potential split according to some criterion using the histogram, selecting a split point randomly in an interval around the best split, splitting the data, and combining multiple decision trees in ensembles.

  6. Classification of Bent-Double Galaxies: Experiences with Ensembles of Decision Trees

    SciTech Connect

    Kamath, C; Cantu-Paz, E

    2002-01-08

    In earlier work, we have described our experiences with the use of decision tree classifiers to identify radio-emitting galaxies with a bent-double morphology in the FIRST astronomical survey. We now extend this work to include ensembles of decision tree classifiers, including two algorithms developed by us. These algorithms randomize the decision at each node of the tree, and because they consider fewer candidate splitting points, are faster than other methods for creating ensembles. The experiments presented in this paper with our astronomy data show that our algorithms are competitive in accuracy, but faster than other ensemble techniques such as Boosting, Bagging, and Arcx4 with different split criteria.

  7. Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

    DOEpatents

    Cantu-Paz, Erick; Kamath, Chandrika

    2006-06-13

    A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.

  8. Ensemble of Causal Trees

    NASA Astrophysics Data System (ADS)

    Bialas, Piotr

    2003-10-01

    We discuss the geometry of trees endowed with a causal structure using the conventional framework of equilibrium statistical mechanics. We show how this ensemble is related to popular growing network models. In particular we demonstrate that on a class of afine attachment kernels the two models are identical but they can differ substantially for other choice of weights. We show that causal trees exhibit condensation even for asymptotically linear kernels. We derive general formulae describing the degree distribution, the ancestor--descendant correlation and the probability that a randomly chosen node lives at a given geodesic distance from the root. It is shown that the Hausdorff dimension dH of the causal networks is generically infinite.

  9. A protocol for developing early warning score models from vital signs data in hospitals using ensembles of decision trees

    PubMed Central

    Xu, Michael; Tam, Benjamin; Thabane, Lehana; Fox-Robichaud, Alison

    2015-01-01

    Introduction Multiple early warning scores (EWS) have been developed and implemented to reduce cardiac arrests on hospital wards. Case–control observational studies that generate an area under the receiver operator curve (AUROC) are the usual validation method, but investigators have also generated EWS with algorithms with no prior clinical knowledge. We present a protocol for the validation and comparison of our local Hamilton Early Warning Score (HEWS) with that generated using decision tree (DT) methods. Methods and analysis A database of electronically recorded vital signs from 4 medical and 4 surgical wards will be used to generate DT EWS (DT-HEWS). A third EWS will be generated using ensemble-based methods. Missing data will be multiple imputed. For a relative risk reduction of 50% in our composite outcome (cardiac or respiratory arrest, unanticipated intensive care unit (ICU) admission or hospital death) with a power of 80%, we calculated a sample size of 17 151 patient days based on our cardiac arrest rates in 2012. The performance of the National EWS, DT-HEWS and the ensemble EWS will be compared using AUROC. Ethics and dissemination Ethics approval was received from the Hamilton Integrated Research Ethics Board (#13-724-C). The vital signs and associated outcomes are stored in a database on our secure hospital server. Preliminary dissemination of this protocol was presented in abstract form at an international critical care meeting. Final results of this analysis will be used to improve on the existing HEWS and will be shared through publication and presentation at critical care meetings. PMID:26353873

  10. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS

    NASA Astrophysics Data System (ADS)

    Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah

    2013-11-01

    Decision tree (DT) machine learning algorithm was used to map the flood susceptible areas in Kelantan, Malaysia.We used an ensemble frequency ratio (FR) and logistic regression (LR) model in order to overcome weak points of the LR.Combined method of FR and LR was used to map the susceptible areas in Kelantan, Malaysia.Results of both methods were compared and their efficiency was assessed.Most influencing conditioning factors on flooding were recognized.

  11. Tree Ensembles on the Induced Discrete Space.

    PubMed

    Yildiz, Olcay Taner

    2016-05-01

    Decision trees are widely used predictive models in machine learning. Recently, K -tree is proposed, where the original discrete feature space is expanded by generating all orderings of values of k discrete attributes and these orderings are used as the new attributes in decision tree induction. Although K -tree performs significantly better than the proper one, their exponential time complexity can prohibit their use. In this brief, we propose K -forest, an extension of random forest, where a subset of features is selected randomly from the induced discrete space. Simulation results on 17 data sets show that the novel ensemble classifier has significantly lower error rate compared with the random forest based on the original feature space. PMID:26011897

  12. Approximate Splitting for Ensembles of Trees using Histograms

    SciTech Connect

    Kamath, C; Cantu-Paz, E; Littau, D

    2001-09-28

    Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. Implicit in many of these techniques is the concept of randomization that generates different classifiers. In this paper, they focus on ensembles of decision trees that are created using a randomized procedure based on histograms. Techniques, such as histograms, that discretize continuous variables, have long been used in classification to convert the data into a form suitable for processing and to reduce the compute time. The approach combines the ideas behind discretization through histograms and randomization in ensembles to create decision trees by randomly selecting a split point in an interval around the best bin boundary in the histogram. The experimental results with public domain data show that ensembles generated using this approach are competitive in accuracy and superior in computational cost to other ensembles techniques such as boosting and bagging.

  13. Lazy decision trees

    SciTech Connect

    Friedman, J.H.; Yun, Yeogirl; Kohavi, R.

    1996-12-31

    Lazy learning algorithms, exemplified by nearest-neighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single {open_quotes}best{close_quotes} decision tree during the training phase, and this tree is then used to classify test instances. The tests at the nodes of the constructed tree are good on average, but there may be better tests for classifying a specific instance. We propose a lazy decision tree algorithm-LazyDT-that conceptually constructs the {open_quotes}best{close_quote} decision tree for each test instance. In practice, only a path needs to be constructed, and a caching scheme makes the algorithm fast. The algorithm is robust with respect to missing values without resorting to the complicated methods usually seen in induction of decision trees. Experiments on real and artificial problems are presented.

  14. AncesTrees: ancestry estimation with randomized decision trees.

    PubMed

    Navega, David; Coelho, Catarina; Vicente, Ricardo; Ferreira, Maria Teresa; Wasterlain, Sofia; Cunha, Eugénia

    2015-09-01

    In forensic anthropology, ancestry estimation is essential in establishing the individual biological profile. The aim of this study is to present a new program--AncesTrees--developed for assessing ancestry based on metric analysis. AncesTrees relies on a machine learning ensemble algorithm, random forest, to classify the human skull. In the ensemble learning paradigm, several models are generated and co-jointly used to arrive at the final decision. The random forest algorithm creates ensembles of decision trees classifiers, a non-linear and non-parametric classification technique. The database used in AncesTrees is composed by 23 craniometric variables from 1,734 individuals, representative of six major ancestral groups and selected from the Howells' craniometric series. The program was tested in 128 adult crania from the following collections: the African slaves' skeletal collection of Valle da Gafaria; the Medical School Skull Collection and the Identified Skeletal Collection of 21st Century, both curated at the University of Coimbra. The first step of the test analysis was to perform ancestry estimation including all the ancestral groups of the database. The second stage of our test analysis was to conduct ancestry estimation including only the European and the African ancestral groups. In the first test analysis, 75% of the individuals of African ancestry and 79.2% of the individuals of European ancestry were correctly identified. The model involving only African and European ancestral groups had a better performance: 93.8% of all individuals were correctly classified. The obtained results show that AncesTrees can be a valuable tool in forensic anthropology. PMID:25053239

  15. Human decision error (HUMDEE) trees

    SciTech Connect

    Ostrom, L.T.

    1993-08-01

    Graphical presentations of human actions in incident and accident sequences have been used for many years. However, for the most part, human decision making has been underrepresented in these trees. This paper presents a method of incorporating the human decision process into graphical presentations of incident/accident sequences. This presentation is in the form of logic trees. These trees are called Human Decision Error Trees or HUMDEE for short. The primary benefit of HUMDEE trees is that they graphically illustrate what else the individuals involved in the event could have done to prevent either the initiation or continuation of the event. HUMDEE trees also present the alternate paths available at the operator decision points in the incident/accident sequence. This is different from the Technique for Human Error Rate Prediction (THERP) event trees. There are many uses of these trees. They can be used for incident/accident investigations to show what other courses of actions were available and for training operators. The trees also have a consequence component so that not only the decision can be explored, also the consequence of that decision.

  16. Decision-Tree Program

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1994-01-01

    IND computer program introduces Bayesian and Markov/maximum-likelihood (MML) methods and more-sophisticated methods of searching in growing trees. Produces more-accurate class-probability estimates important in applications like diagnosis. Provides range of features and styles with convenience for casual user, fine-tuning for advanced user or for those interested in research. Consists of four basic kinds of routines: data-manipulation, tree-generation, tree-testing, and tree-display. Written in C language.

  17. Weighted Hybrid Decision Tree Model for Random Forest Classifier

    NASA Astrophysics Data System (ADS)

    Kulkarni, Vrushali Y.; Sinha, Pradeep K.; Petare, Manisha C.

    2016-06-01

    Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

  18. Decision tree modeling using R

    PubMed Central

    2016-01-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building. PMID:27570769

  19. Decision tree modeling using R.

    PubMed

    Zhang, Zhongheng

    2016-08-01

    In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building. PMID:27570769

  20. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-07-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modelling. In this paper, we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modelling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalisation property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analysed on two real-world case studies - Marina catchment (Singapore) and Canning River (Western Australia) - representing two different morphoclimatic contexts. The evaluation is performed against other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  1. Assessing the predictive capability of randomized tree-based ensembles in streamflow modelling

    NASA Astrophysics Data System (ADS)

    Galelli, S.; Castelletti, A.

    2013-02-01

    Combining randomization methods with ensemble prediction is emerging as an effective option to balance accuracy and computational efficiency in data-driven modeling. In this paper we investigate the prediction capability of extremely randomized trees (Extra-Trees), in terms of accuracy, explanation ability and computational efficiency, in a streamflow modeling exercise. Extra-Trees are a totally randomized tree-based ensemble method that (i) alleviates the poor generalization property and tendency to overfitting of traditional standalone decision trees (e.g. CART); (ii) is computationally very efficient; and, (iii) allows to infer the relative importance of the input variables, which might help in the ex-post physical interpretation of the model. The Extra-Trees potential is analyzed on two real-world case studies (Marina catchment (Singapore) and Canning River (Western Australia)) representing two different morphoclimatic contexts comparatively with other tree-based methods (CART and M5) and parametric data-driven approaches (ANNs and multiple linear regression). Results show that Extra-Trees perform comparatively well to the best of the benchmarks (i.e. M5) in both the watersheds, while outperforming the other approaches in terms of computational requirement when adopted on large datasets. In addition, the ranking of the input variable provided can be given a physically meaningful interpretation.

  2. Extensions and applications of ensemble-of-trees methods in machine learning

    NASA Astrophysics Data System (ADS)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of

  3. Bayesian Evidence Framework for Decision Tree Learning

    NASA Astrophysics Data System (ADS)

    Chatpatanasiri, Ratthachat; Kijsirikul, Boonserm

    2005-11-01

    This work is primary interested in the problem of, given the observed data, selecting a single decision (or classification) tree. Although a single decision tree has a high risk to be overfitted, the induced tree is easily interpreted. Researchers have invented various methods such as tree pruning or tree averaging for preventing the induced tree from overfitting (and from underfitting) the data. In this paper, instead of using those conventional approaches, we apply the Bayesian evidence framework of Gull, Skilling and Mackay to a process of selecting a decision tree. We derive a formal function to measure `the fitness' for each decision tree given a set of observed data. Our method, in fact, is analogous to a well-known Bayesian model selection method for interpolating noisy continuous-value data. As in regression problems, given reasonable assumptions, this derived score function automatically quantifies the principle of Ockham's razor, and hence reasonably deals with the issue of underfitting-overfitting tradeoff.

  4. From Family Trees to Decision Trees.

    ERIC Educational Resources Information Center

    Trobian, Helen R.

    This paper is a preliminary inquiry by a non-mathematician into graphic methods of sequential planning and ways in which hierarchical analysis and tree structures can be helpful in developing interest in the use of mathematical modeling in the search for creative solutions to real-life problems. Highlights include a discussion of hierarchical…

  5. Decision Tree Technique for Particle Identification

    SciTech Connect

    Quiller, Ryan

    2003-09-05

    Particle identification based on measurements such as the Cerenkov angle, momentum, and the rate of energy loss per unit distance (-dE/dx) is fundamental to the BaBar detector for particle physics experiments. It is particularly important to separate the charged forms of kaons and pions. Currently, the Neural Net, an algorithm based on mapping input variables to an output variable using hidden variables as intermediaries, is one of the primary tools used for identification. In this study, a decision tree classification technique implemented in the computer program, CART, was investigated and compared to the Neural Net over the range of momenta, 0.25 GeV/c to 5.0 GeV/c. For a given subinterval of momentum, three decision trees were made using different sets of input variables. The sensitivity and specificity were calculated for varying kaon acceptance thresholds. This data was used to plot Receiver Operating Characteristic curves (ROC curves) to compare the performance of the classification methods. Also, input variables used in constructing the decision trees were analyzed. It was found that the Neural Net was a significant contributor to decision trees using dE/dx and the Cerenkov angle as inputs. Furthermore, the Neural Net had poorer performance than the decision tree technique, but tended to improve decision tree performance when used as an input variable. These results suggest that the decision tree technique using Neural Net input may possibly increase accuracy of particle identification in BaBar.

  6. The decision tree approach to classification

    NASA Technical Reports Server (NTRS)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  7. Classification based on full decision trees

    NASA Astrophysics Data System (ADS)

    Genrikhov, I. E.; Djukova, E. V.

    2012-04-01

    The ideas underlying a series of the authors' studies dealing with the design of classification algorithms based on full decision trees are further developed. It is shown that the decision tree construction under consideration takes into account all the features satisfying a branching criterion. Full decision trees with an entropy branching criterion are studied as applied to precedent-based pattern recognition problems with real-valued data. Recognition procedures are constructed for solving problems with incomplete data (gaps in the feature descriptions of the objects) in the case when the learning objects are nonuniformly distributed over the classes. The authors' basic results previously obtained in this area are overviewed.

  8. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class

  9. A survey of decision tree classifier methodology

    NASA Technical Reports Server (NTRS)

    Safavian, S. Rasoul; Landgrebe, David

    1990-01-01

    Decision Tree Classifiers (DTC's) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps, the most important feature of DTC's is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issue. After considering potential advantages of DTC's over single stage classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  10. A survey of decision tree classifier methodology

    NASA Technical Reports Server (NTRS)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  11. Parallel object-oriented decision tree system

    DOEpatents

    Kamath; Chandrika , Cantu-Paz; Erick

    2006-02-28

    A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.

  12. Speeding up Boosting decision trees training

    NASA Astrophysics Data System (ADS)

    Zheng, Chao; Wei, Zhenzhong

    2015-10-01

    To overcome the drawback that Boosting decision trees perform fast speed in the test time while the training process is relatively too slow to meet the requirements of applications with real-time learning, we propose a fast decision trees training method by pruning those noneffective features in advance. And basing on this method, we also design a fast Boosting decision trees training algorithm. Firstly, we analyze the structure of each decision trees node, and prove that the classification error of each node has a bound through derivation. Then, by using the error boundary to prune non-effective features in the early stage, we greatly accelerate the decision tree training process, and would not affect the training results at all. Finally, the decision tree accelerated training method is integrated into the general Boosting process forming a fast boosting decision trees training algorithm. This algorithm is not a new variant of Boosting, on the contrary, it should be used in conjunction with existing Boosting algorithms to achieve more training acceleration. To test the algorithm's speedup performance and performance combined with other accelerated algorithms, the original AdaBoost and two typical acceleration algorithms LazyBoost and StochasticBoost were respectively used in conjunction with this algorithm into three fast versions, and their classification performance was tested by using the Lsis face database which contained 12788 images. Experimental results reveal that this fast algorithm can achieve more than double training speedup without affecting the results of the trained classifier, and can be combined with other acceleration algorithms. Key words: Boosting algorithm, decision trees, classifier training, preliminary classification error, face detection

  13. Support Vector Machine with Ensemble Tree Kernel for Relation Extraction.

    PubMed

    Liu, Xiaoyong; Fu, Hui; Du, Zhiguo

    2016-01-01

    Relation extraction is one of the important research topics in the field of information extraction research. To solve the problem of semantic variation in traditional semisupervised relation extraction algorithm, this paper proposes a novel semisupervised relation extraction algorithm based on ensemble learning (LXRE). The new algorithm mainly uses two kinds of support vector machine classifiers based on tree kernel for integration and integrates the strategy of constrained extension seed set. The new algorithm can weaken the inaccuracy of relation extraction, which is caused by the phenomenon of semantic variation. The numerical experimental research based on two benchmark data sets (PropBank and AIMed) shows that the LXRE algorithm proposed in the paper is superior to other two common relation extraction methods in four evaluation indexes (Precision, Recall, F-measure, and Accuracy). It indicates that the new algorithm has good relation extraction ability compared with others. PMID:27118966

  14. Support Vector Machine with Ensemble Tree Kernel for Relation Extraction

    PubMed Central

    Fu, Hui; Du, Zhiguo

    2016-01-01

    Relation extraction is one of the important research topics in the field of information extraction research. To solve the problem of semantic variation in traditional semisupervised relation extraction algorithm, this paper proposes a novel semisupervised relation extraction algorithm based on ensemble learning (LXRE). The new algorithm mainly uses two kinds of support vector machine classifiers based on tree kernel for integration and integrates the strategy of constrained extension seed set. The new algorithm can weaken the inaccuracy of relation extraction, which is caused by the phenomenon of semantic variation. The numerical experimental research based on two benchmark data sets (PropBank and AIMed) shows that the LXRE algorithm proposed in the paper is superior to other two common relation extraction methods in four evaluation indexes (Precision, Recall, F-measure, and Accuracy). It indicates that the new algorithm has good relation extraction ability compared with others. PMID:27118966

  15. Decision Tree Approach for Soil Liquefaction Assessment

    PubMed Central

    Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.

    2013-01-01

    In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498

  16. Fast Image Texture Classification Using Decision Trees

    NASA Technical Reports Server (NTRS)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  17. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  18. Algorithms for optimal dyadic decision trees

    SciTech Connect

    Hush, Don; Porter, Reid

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  19. Prediction of regional streamflow frequency using model tree ensembles

    NASA Astrophysics Data System (ADS)

    Schnier, Spencer; Cai, Ximing

    2014-09-01

    This study introduces a novel data-driven method called model tree ensembles (MTEs) to predict streamflow frequency statistics based on known drainage area characteristics, which yields insights into the dominant controls of regional streamflow. The database used to induce the models contains both natural and anthropogenic drainage area characteristics for 294 USGS stream gages (164 in Texas and 130 in Illinois). MTEs were used to predict complete flow duration curves (FDCs) of ungaged streams by developing 17 models corresponding to 17 points along the FDC. Model accuracy was evaluated using ten-fold cross-validation and the coefficient of determination (R2). During the validation, the gages withheld from the analysis represent ungaged watersheds. MTEs are shown to outperform global multiple-linear regression models for predictions in ungaged watersheds. The accuracy of models for low flow is enhanced by explicit consideration of variables that capture human interference in watershed hydrology (e.g., population). Human factors (e.g., population and groundwater use) appear in the regionalizations for low flows, while annual and seasonal precipitation and drainage area are important for regionalizations of all flows. The results of this study have important implications for predictions in ungaged watersheds as well as gaged watersheds subject to anthropogenically-driven hydrologic changes.

  20. Decision Tree Modeling for Ranking Data

    NASA Astrophysics Data System (ADS)

    Yu, Philip L. H.; Wan, Wai Ming; Lee, Paul H.

    Ranking/preference data arises from many applications in marketing, psychology, and politics. We establish a new decision tree model for the analysis of ranking data by adopting the concept of classification and regression tree. The existing splitting criteria are modified in a way that allows them to precisely measure the impurity of a set of ranking data. Two types of impurity measures for ranking data are introduced, namelyg-wise and top-k measures. Theoretical results show that the new measures exhibit properties of impurity functions. In model assessment, the area under the ROC curve (AUC) is applied to evaluate the tree performance. Experiments are carried out to investigate the predictive performance of the tree model for complete and partially ranked data and promising results are obtained. Finally, a real-world application of the proposed methodology to analyze a set of political rankings data is presented.

  1. IND - THE IND DECISION TREE PACKAGE

    NASA Technical Reports Server (NTRS)

    Buntine, W.

    1994-01-01

    A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The

  2. Two Trees: Migrating Fault Trees to Decision Trees for Real Time Fault Detection on International Space Station

    NASA Technical Reports Server (NTRS)

    Lee, Charles; Alena, Richard L.; Robinson, Peter

    2004-01-01

    We started from ISS fault trees example to migrate to decision trees, presented a method to convert fault trees to decision trees. The method shows that the visualizations of root cause of fault are easier and the tree manipulating becomes more programmatic via available decision tree programs. The visualization of decision trees for the diagnostic shows a format of straight forward and easy understands. For ISS real time fault diagnostic, the status of the systems could be shown by mining the signals through the trees and see where it stops at. The other advantage to use decision trees is that the trees can learn the fault patterns and predict the future fault from the historic data. The learning is not only on the static data sets but also can be online, through accumulating the real time data sets, the decision trees can gain and store faults patterns in the trees and recognize them when they come.

  3. An Application of Decision Tree Based on ID3

    NASA Astrophysics Data System (ADS)

    Xiaohu, Wang; Lele, Wang; Nianfeng, Li

    This article deals with the application of classical decision tree ID3 of the data mining in a certain site data. It constitutes a decision tree based on information gain and thus produces some useful purchasing behavior rules. It also proves that the decision tree has a wide applicable future in the sale field on site.

  4. CUDT: A CUDA Based Decision Tree Algorithm

    PubMed Central

    Sheu, Ruey-Kai; Chiu, Chun-Chieh

    2014-01-01

    Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture), which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set. PMID:25140346

  5. Using Decision Trees for Comparing Pattern Recognition Feature Sets

    SciTech Connect

    Proctor, D D

    2005-08-18

    Determination of the best set of features has been acknowledged as one of the most difficult tasks in the pattern recognition process. In this report significance tests on the sort-ordered, sample-size normalized vote distribution of an ensemble of decision trees is introduced as a method of evaluating relative quality of feature sets. Alternative functional forms for feature sets are also examined. Associated standard deviations provide the means to evaluate the effect of the number of folds, the number of classifiers per fold, and the sample size on the resulting classifications. The method is applied to a problem for which a significant portion of the training set cannot be classified unambiguously.

  6. Using attribute behavior diversity to build accurate decision tree committees for microarray data.

    PubMed

    Han, Qian; Dong, Guozhu

    2012-08-01

    DNA microarrays (gene chips), frequently used in biological and medical studies, measure the expressions of thousands of genes per sample. Using microarray data to build accurate classifiers for diseases is an important task. This paper introduces an algorithm, called Committee of Decision Trees by Attribute Behavior Diversity (CABD), to build highly accurate ensembles of decision trees for such data. Since a committee's accuracy is greatly influenced by the diversity among its member classifiers, CABD uses two new ideas to "optimize" that diversity, namely (1) the concept of attribute behavior-based similarity between attributes, and (2) the concept of attribute usage diversity among trees. The ideas are effective for microarray data, since such data have many features and behavior similarity between genes can be high. Experiments on microarray data for six cancers show that CABD outperforms previous ensemble methods significantly and outperforms SVM, and show that the diversified features used by CABD's decision tree committee can be used to improve performance of other classifiers such as SVM. CABD has potential for other high-dimensional data, and its ideas may apply to ensembles of other classifier types. PMID:22809418

  7. Quantum Decision Trees and Semidefinite Programming.

    SciTech Connect

    Barnum, Howard; Saks, M.; Szegedy, M.

    2001-01-01

    We reformulate the notion of quantum query complexity in terms of inequalities and equations for a set of positive matrices, which we view as a quantum analogue of a decision tree. Using the new formulation we show that: 1. every quantum query algorithm needs to use at most n quantum bits in addition to the query register. 2. For any function f there is an algorithm that runs in polynomial time in terms the truth table of f and (for {var_epsilon} > 0) computes the {var_epsilon}-error quantum decision tree complexity of f. 3. Using the dual of our system we can treat lower bound methods on a uniform platform, which paves the way to their future comparison. In particular we describe Ambainis's bound in our framework. 4. The output condition on quantum algorithms used by Ambainis and others is not sufficient for an algorithm to compute a function with {var_epsilon}-bounded error: we show the existence of algorithms whose final entanglement matrix satisfies the condition, but for which the value of f cannot be determined from a quantum measurement on the accessible part of the computer.

  8. Identification of metabolic syndrome using decision tree analysis.

    PubMed

    Worachartcheewan, Apilak; Nantasenamat, Chanin; Isarankura-Na-Ayudhya, Chartchalerm; Pidetcha, Phannee; Prachayasittikul, Virapong

    2010-10-01

    This study employs decision tree as a decision support system for rapid and automated identification of individuals with metabolic syndrome (MS) among a Thai population. Results demonstrated strong predictivity of the decision tree in classification of individuals with and without MS, displaying an overall accuracy in excess of 99%. PMID:20619912

  9. Safety validation of decision trees for hepatocellular carcinoma

    PubMed Central

    Wang, Xian-Qiang; Liu, Zhe; Lv, Wen-Ping; Luo, Ying; Yang, Guang-Yun; Li, Chong-Hui; Meng, Xiang-Fei; Liu, Yang; Xu, Ke-Sen; Dong, Jia-Hong

    2015-01-01

    AIM: To evaluate a different decision tree for safe liver resection and verify its efficiency. METHODS: A total of 2457 patients underwent hepatic resection between January 2004 and December 2010 at the Chinese PLA General Hospital, and 634 hepatocellular carcinoma (HCC) patients were eligible for the final analyses. Post-hepatectomy liver failure (PHLF) was identified by the association of prothrombin time < 50% and serum bilirubin > 50 μmol/L (the “50-50” criteria), which were assessed at day 5 postoperatively or later. The Swiss-Clavien decision tree, Tokyo University-Makuuchi decision tree, and Chinese consensus decision tree were adopted to divide patients into two groups based on those decision trees in sequence, and the PHLF rates were recorded. RESULTS: The overall mortality and PHLF rate were 0.16% and 3.0%. A total of 19 patients experienced PHLF. The numbers of patients to whom the Swiss-Clavien, Tokyo University-Makuuchi, and Chinese consensus decision trees were applied were 581, 573, and 622, and the PHLF rates were 2.75%, 2.62%, and 2.73%, respectively. Significantly more cases satisfied the Chinese consensus decision tree than the Swiss-Clavien decision tree and Tokyo University-Makuuchi decision tree (P < 0.01,P < 0.01); nevertheless, the latter two shared no difference (P = 0.147). The PHLF rate exhibited no significant difference with respect to the three decision trees. CONCLUSION: The Chinese consensus decision tree expands the indications for hepatic resection for HCC patients and does not increase the PHLF rate compared to the Swiss-Clavien and Tokyo University-Makuuchi decision trees. It would be a safe and effective algorithm for hepatectomy in patients with hepatocellular carcinoma. PMID:26309366

  10. Coherent neuronal ensembles are rapidly recruited when making a look-reach decision

    PubMed Central

    Wong, Yan T.; Fabiszak, Margaret M.; Novikov, Yevgeny; Daw, Nathaniel D.; Pesaran, Bijan

    2015-01-01

    Summary Selecting and planning actions recruits neurons across many areas of the brain but how ensembles of neurons work together to make decisions is unknown. Temporally-coherent neural activity may provide a mechanism by which neurons coordinate their activity in order to make decisions. If so, neurons that are part of coherent ensembles may predict movement choices before other ensembles of neurons. We recorded neuronal activity in the lateral and medial banks of the intraparietal sulcus (IPS) of the posterior parietal cortex, while monkeys made choices about where to look and reach and decoded the activity to predict the choices. Ensembles of neurons that displayed coherent patterns of spiking activity extending across the IPS, “dual coherent” ensembles, predicted movement choices substantially earlier than other neuronal ensembles. We propose that dual-coherent spike timing reflects interactions between groups of neurons that play an important role in how we make decisions. PMID:26752158

  11. Ensembles of extremely randomized trees and feature ranking for streamflow prediction

    NASA Astrophysics Data System (ADS)

    Castelletti, Andrea; Galelli, Stefano

    2010-05-01

    Accurate and reliable stream-flow predictions are an important input to water resources planning and management processes, which heavily depend upon the availability of water (e.g. river basin planning, optimal reservoir operation, irrigation system management). Hydrological processes are extremely complex, combining high non-linearity and spatial-temporal variability. The prediction of hydrological variables is therefore a challenging task, very often complicated by lack of data and/or the presence of outliers. Usually, data-driven modelling provides a good balance between model accuracy and complexity, which are ultimately critical to the adoption of optimization-based approaches. While neural networks have been widely used in hydrological modelling (e.g. Govindaraju and Rao, 2000), tree-based model is a relatively unexplored methodology (Solomatine and Dual, 2003; Solomatine and Xue, 2004; Iorgulescu and Beven, 2004; Stravs and Brilly, 2007). In this paper a new data-driven modelling approach based on Ensembles of Extremely Randomized Trees (ETs; Geurts et al., 2006) is proposed for stream-flow prediction using different hydro-meteorological predictors. By randomizing the tree construction process and merging a forest of diversified trees to predict the output, ETs alleviate the well-known poor generalization property of traditional standalone decision tress (e.g. CART), thus avoid over fitting the training data. Input to the model are selected using a tree-based feature ranking algorithm, which ranks the candidate predictors (e.g. precipitation and evaporation at different stations, linear combinations thereof) according to their contribution in explaining the variance of an underlying ETs-based model of the stream-flow process. The approach is applied in the Red river basin (Vietnam), a sub-tropical catchment characterized by extremely variable weather conditions, where strong precipitations significantly contribute to the high flow. Results shown that

  12. Accounting for Epistemic Uncertainty in PSHA: Logic Tree and Ensemble Model

    NASA Astrophysics Data System (ADS)

    Taroni, M.; Marzocchi, W.; Selva, J.

    2014-12-01

    The logic tree scheme is the probabilistic framework that has been widely used in the last decades to take into account epistemic uncertainties in probabilistic seismic hazard analysis (PSHA). Notwithstanding the vital importance for PSHA to incorporate properly the epistemic uncertainties, we argue that the use of the logic tree in a PSHA context has conceptual and practical drawbacks. Despite some of these drawbacks have been reported in the past, a careful evaluation of their impact on PSHA is still lacking. This is the goal of the present work. In brief, we show that i) PSHA practice does not meet the assumptions that stand behind the logic tree scheme; ii) the output of a logic tree is often misinterpreted and/or misleading, e.g., the use of percentiles (median included) in a logic tree scheme raises theoretical difficulties from a probabilistic point of view; iii) in case the assumptions that stand behind a logic tree are actually met, this leads to several problems in testing any PSHA model. We suggest a different strategy - based on ensemble modeling - to account for epistemic uncertainties in a more proper probabilistic framework. Finally, we show that in many PSHA practical applications, the logic tree is de facto loosely applied to build sound ensemble models.

  13. Accounting for Epistemic Uncertainty in PSHA: Logic Tree and Ensemble Model

    NASA Astrophysics Data System (ADS)

    Taroni, Matteo; Marzocchi, Warner; Selva, Jacopo

    2014-05-01

    The logic tree scheme is the probabilistic framework that has been widely used in the last decades to take into account epistemic uncertainties in probabilistic seismic hazard analysis (PSHA). Notwithstanding the vital importance for PSHA to incorporate properly the epistemic uncertainties, we argue that the use of the logic tree in a PSHA context has conceptual and practical drawbacks. Despite some of these drawbacks have been reported in the past, a careful evaluation of their impact on PSHA is still lacking. This is the goal of the present work. In brief, we show that i) PSHA practice does not meet the assumptions that stand behind the logic tree scheme; ii) the output of a logic tree is often misinterpreted and/or misleading, e.g., the use of percentiles (median included) in a logic tree scheme raises theoretical difficulties from a probabilistic point of view; iii) in case the assumptions that stand behind a logic tree are actually met, this leads to several problems in testing any PSHA model. We suggest a different strategy - based on ensemble modeling - to account for epistemic uncertainties in a more proper probabilistic framework. Finally, we show that in many PSHA practical applications, the logic tree is improperly applied to build sound ensemble models.

  14. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 15 Commerce and Foreign Trade 2 2011-01-01 2011-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  15. 15 CFR Supplement No 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 15 Commerce and Foreign Trade 2 2013-01-01 2013-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000...

  16. 15 CFR Supplement No 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 15 Commerce and Foreign Trade 2 2014-01-01 2014-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000...

  17. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  18. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 15 Commerce and Foreign Trade 2 2012-01-01 2012-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  19. Decision-Tree Formulation With Order-1 Lateral Execution

    NASA Technical Reports Server (NTRS)

    James, Mark

    2007-01-01

    A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive

  20. Computational study of developing high-quality decision trees

    NASA Astrophysics Data System (ADS)

    Fu, Zhiwei

    2002-03-01

    Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio- Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

  1. An automated approach to the design of decision tree classifiers

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Chin, R.; Beaudet, P.

    1982-01-01

    An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.

  2. Decision tree methods: applications for classification and prediction.

    PubMed

    Song, Yan-Yan; Lu, Ying

    2015-04-25

    Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure. PMID:26120265

  3. Operational optimization of irrigation scheduling for citrus trees using an ensemble based data assimilation approach

    NASA Astrophysics Data System (ADS)

    Hendricks Franssen, H.; Han, X.; Martinez, F.; Jimenez, M.; Manzano, J.; Chanzy, A.; Vereecken, H.

    2013-12-01

    Data assimilation (DA) techniques, like the local ensemble transform Kalman filter (LETKF) not only offer the opportunity to update model predictions by assimilating new measurement data in real time, but also provide an improved basis for real-time (DA-based) control. This study focuses on the optimization of real-time irrigation scheduling for fields of citrus trees near Picassent (Spain). For three selected fields the irrigation was optimized with DA-based control, and for other fields irrigation was optimized on the basis of a more traditional approach where reference evapotranspiration for citrus trees was estimated using the FAO-method. The performance of the two methods is compared for the year 2013. The DA-based real-time control approach is based on ensemble predictions of soil moisture profiles, using the Community Land Model (CLM). The uncertainty in the model predictions is introduced by feeding the model with weather predictions from an ensemble prediction system (EPS) and uncertain soil hydraulic parameters. The model predictions are updated daily by assimilating soil moisture data measured by capacitance probes. The measurement data are assimilated with help of LETKF. The irrigation need was calculated for each of the ensemble members, averaged, and logistic constraints (hydraulics, energy costs) were taken into account for the final assigning of irrigation in space and time. For the operational scheduling based on this approach only model states and no model parameters were updated by the model. Other, non-operational simulation experiments for the same period were carried out where (1) neither ensemble weather forecast nor DA were used (open loop), (2) Only ensemble weather forecast was used, (3) Only DA was used, (4) also soil hydraulic parameters were updated in data assimilation and (5) both soil hydraulic and plant specific parameters were updated. The FAO-based and DA-based real-time irrigation control are compared in terms of soil moisture

  4. Decision tree based transient stability method -- A case study

    SciTech Connect

    Wehenkel, L.; Pavella, M. . Inst. Montefiore); Euxibie, E.; Heilbronn, B. . Direction des Etudes et Recherches)

    1994-02-01

    The decision tree transient stability method is revisited via a case study carried out on the French EHV power system. In short, the method consists of building off-line decision trees, able to subsequently assess the system transient behavior in terms of precontingency parameters (or attributes'') of it, likely to drive the stability phenomena. This case study aims at investigating practical feasibility aspects and features of the trees, at enhancing their reliability to the extent possible, and at generalizing them. Feasibility aspects encompass data base generation, candidate attributes, stability classes; tree features concern in particular complexity in terms of their size and interpretability capabilities, robustness with respect to both their building and use. Reliability is enhanced by defining and exploiting pragmatic quality measures. Generalization concerns multicontingency, instead of single-contingency trees. The results obtained show real promise for the method to meet practical needs of electric power utilities.

  5. RNA search with decision trees and partial covariance models.

    PubMed

    Smith, Jennifer A

    2009-01-01

    The use of partial covariance models to search for RNA family members in genomic sequence databases is explored. The partial models are formed from contiguous subranges of the overall RNA family multiple alignment columns. A binary decision-tree framework is presented for choosing the order to apply the partial models and the score thresholds on which to make the decisions. The decision trees are chosen to minimize computation time subject to the constraint that all of the training sequences are passed to the full covariance model for final evaluation. Computational intelligence methods are suggested to select the decision tree since the tree can be quite complex and there is no obvious method to build the tree in these cases. Experimental results from seven RNA families shows execution times of 0.066-0.268 relative to using the full covariance model alone. Tests on the full sets of known sequences for each family show that at least 95 percent of these sequences are found for two families and 100 percent for five others. Since the full covariance model is run on all sequences accepted by the partial model decision tree, the false alarm rate is at least as low as that of the full model alone. PMID:19644178

  6. Learning accurate very fast decision trees from uncertain data streams

    NASA Astrophysics Data System (ADS)

    Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo

    2015-12-01

    Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.

  7. EEG feature selection method based on decision tree.

    PubMed

    Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun

    2015-01-01

    This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results. PMID:26405856

  8. Automatic sleep staging using state machine-controlled decision trees.

    PubMed

    Imtiaz, Syed Anas; Rodriguez-Villegas, Esther

    2015-01-01

    Automatic sleep staging from a reduced number of channels is desirable to save time, reduce costs and make sleep monitoring more accessible by providing home-based polysomnography. This paper introduces a novel algorithm for automatic scoring of sleep stages using a combination of small decision trees driven by a state machine. The algorithm uses two channels of EEG for feature extraction and has a state machine that selects a suitable decision tree for classification based on the prevailing sleep stage. Its performance has been evaluated using the complete dataset of 61 recordings from PhysioNet Sleep EDF Expanded database achieving an overall accuracy of 82% and 79% on training and test sets respectively. The algorithm has been developed with a very small number of decision tree nodes that are active at any given time making it suitable for use in resource-constrained wearable systems. PMID:26736278

  9. Decision tree-based learning to predict patient controlled analgesia consumption and readjustment

    PubMed Central

    2012-01-01

    Background Appropriate postoperative pain management contributes to earlier mobilization, shorter hospitalization, and reduced cost. The under treatment of pain may impede short-term recovery and have a detrimental long-term effect on health. This study focuses on Patient Controlled Analgesia (PCA), which is a delivery system for pain medication. This study proposes and demonstrates how to use machine learning and data mining techniques to predict analgesic requirements and PCA readjustment. Methods The sample in this study included 1099 patients. Every patient was described by 280 attributes, including the class attribute. In addition to commonly studied demographic and physiological factors, this study emphasizes attributes related to PCA. We used decision tree-based learning algorithms to predict analgesic consumption and PCA control readjustment based on the first few hours of PCA medications. We also developed a nearest neighbor-based data cleaning method to alleviate the class-imbalance problem in PCA setting readjustment prediction. Results The prediction accuracies of total analgesic consumption (continuous dose and PCA dose) and PCA analgesic requirement (PCA dose only) by an ensemble of decision trees were 80.9% and 73.1%, respectively. Decision tree-based learning outperformed Artificial Neural Network, Support Vector Machine, Random Forest, Rotation Forest, and Naïve Bayesian classifiers in analgesic consumption prediction. The proposed data cleaning method improved the performance of every learning method in this study of PCA setting readjustment prediction. Comparative analysis identified the informative attributes from the data mining models and compared them with the correlates of analgesic requirement reported in previous works. Conclusion This study presents a real-world application of data mining to anesthesiology. Unlike previous research, this study considers a wider variety of predictive factors, including PCA demands over time. We analyzed

  10. An automated approach to the design of decision tree classifiers

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Chin, P.; Beaudet, P.

    1980-01-01

    The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.

  11. Visualization method and tool for interactive learning of large decision trees

    NASA Astrophysics Data System (ADS)

    Nguyen, Trong Dung; Ho, TuBao

    2002-03-01

    When learning from large datasets, decision tree induction programs often produce very large trees. How to visualize efficiently trees in the learning process, particularly large trees, is still questionable and currently requires efficient tools. This paper presents a visualization method and tool for interactive learning of large decision trees, that includes a new visualization technique called T2.5D (stands for Tress 2.5 Dimensions). After a brief discussion on requirements for tree visualizers and related work, the paper focuses on presenting developing techniques for the issues (1) how to visualize efficiently large decision trees; and (2) how to visualize decision trees in the learning process.

  12. Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets

    PubMed Central

    Doubravsky, Karel; Dohnal, Mirko

    2015-01-01

    Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details. PMID:26158662

  13. Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets.

    PubMed

    Doubravsky, Karel; Dohnal, Mirko

    2015-01-01

    Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details. PMID:26158662

  14. Evaluation of Decision Trees for Cloud Detection from AVHRR Data

    NASA Technical Reports Server (NTRS)

    Shiffman, Smadar; Nemani, Ramakrishna

    2005-01-01

    Automated cloud detection and tracking is an important step in assessing changes in radiation budgets associated with global climate change via remote sensing. Data products based on satellite imagery are available to the scientific community for studying trends in the Earth's atmosphere. The data products include pixel-based cloud masks that assign cloud-cover classifications to pixels. Many cloud-mask algorithms have the form of decision trees. The decision trees employ sequential tests that scientists designed based on empirical astrophysics studies and simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In a previous study we compared automatically learned decision trees to cloud masks included in Advanced Very High Resolution Radiometer (AVHRR) data products from the year 2000. In this paper we report the replication of the study for five-year data, and for a gold standard based on surface observations performed by scientists at weather stations in the British Islands. For our sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks p < 0.001.

  15. Taste-Guided Decisions Differentially Engage Neuronal Ensembles across Gustatory Cortices

    PubMed Central

    MacDonald, Christopher J.; Meck, Warren H.; Simon, Sidney A.; Nicolelis, Miguel A.L.

    2009-01-01

    Much remains to be understood about the differential contributions from primary and secondary sensory cortices to sensory guided decision making. To address this issue we simultaneously recorded activity from neuronal ensembles in primary (gustatory cortex – GC) and secondary gustatory (orbitofrontal cortex – OFC) cortices while rats made a taste-guided decision between two response alternatives. We found that before animals commenced a response guided by a tastant cue, GC ensembles contained more information than OFC about the response alternative about to be selected. Thereafter, while the animal’s response was underway the response selective information in ensembles from both regions increased, albeit to a greater degree in OFC. In GC, this increase depends on a representation of the taste cue guiding the animal‘s response. The increase in the OFC also depends on the taste cue guiding and other features of the response such as its spatiomotor properties and the behavioral context under which it is executed. Each of these latter features is encoded by different ensembles of OFC neurons that are recruited at specific times throughout the response selection process. These results indicate that during a taste-guided decision task both primary and secondary gustatory cortices dynamically encode different types of information. PMID:19741134

  16. Supervised learning with decision tree-based methods in computational and systems biology.

    PubMed

    Geurts, Pierre; Irrthum, Alexandre; Wehenkel, Louis

    2009-12-01

    At the intersection between artificial intelligence and statistics, supervised learning allows algorithms to automatically build predictive models from just observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the review is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the review provides a survey of their applications in the context of computational and systems biology. PMID:20023720

  17. Using Evolutionary Algorithms to Induce Oblique Decision Trees

    SciTech Connect

    Cantu-Paz, E.; Kamath, C.

    2000-01-21

    This paper illustrates the application of evolutionary algorithms (EAs) to the problem of oblique decision tree induction. The objectives are to demonstrate that EAs can find classifiers whose accuracy is competitive with other oblique tree construction methods, and that this can be accomplished in a shorter time. Experiments were performed with a (1+1) evolutionary strategy and a simple genetic algorithm on public domain and artificial data sets. The empirical results suggest that the EAs quickly find Competitive classifiers, and that EAs scale up better than traditional methods to the dimensionality of the domain and the number of training instances.

  18. Multi-Model Long-Range Ensemble Forecast for Decision Support in Hydroelectric Operations

    NASA Astrophysics Data System (ADS)

    Kunkel, M. L.; Parkinson, S.; Blestrud, D.; Holbrook, V. P.

    2014-12-01

    Idaho Power Company (IPC) is a hydroelectric based utility serving over a million customers in southern Idaho and eastern Oregon. Hydropower makes up ~50% of our power generation and accurate predictions of streamflow and precipitation drive our long-term planning and decision support for operations. We investigate the use of a multi-model ensemble approach for mid and long-range streamflow and precipitation forecasts throughout the Snake River Basin. Forecast are prepared using an Idaho Power developed ensemble forecasting technique for 89 locations throughout the Snake River Basin for periods of 3 to 18 months in advance. A series of multivariable linear regression, multivariable non-linear regression and multivariable Kalman filter techniques are combined in an ensemble forecast based upon two data types, historical data (streamflow, precipitation, climate indices [i.e. PDO, ENSO, AO, etc…]) and single value decomposition derived values based upon atmospheric heights and sea surface temperatures.

  19. The Decision-Identification Tree: A New NEPA Scoping Tool.

    PubMed

    Eccleston

    2000-10-01

    / No single methodology has been universally accepted for determining the appropriate scope of analysis for an environmental impact statement (EIS). Most typically, the scope of analysis is determined by first identifying actions and facilities that need to be analyzed. Once the scope of actions and facilities is identified, the scope of impacts is determined. Yet agencies sometimes complete an EIS only to discover that the analysis does not adequately support decisions that need to be made. Such discrepancies can often be traced to disconnects between scoping, the subsequent analysis, and the final decision-making process that follows. A new and markedly different approach-decision-based scoping-provides an effective methodology for improving the EIS scoping process. Decision-based scoping, in conjunction with a new tool, the decision-identification tree (DIT), places emphasis on first identifying the potential decisions that may eventually need to be made. The DIT provides a methodology for mapping alternative courses of action as a function of fundamental decision points. Once these decision points have been correctly identified, the range of actions, alternatives, and impacts can be more accurately assessed; this approach can improve the effectiveness of EIS planning, while reducing the risk of future disconnects between the EIS analysis and reaching a final decision. This approach also has applications in other planning disciplines beyond that of the EIS. PMID:10954809

  20. Identifying ultrasound and clinical features of breast cancer molecular subtypes by ensemble decision.

    PubMed

    Zhang, Lei; Li, Jing; Xiao, Yun; Cui, Hao; Du, Guoqing; Wang, Ying; Li, Ziyao; Wu, Tong; Li, Xia; Tian, Jiawei

    2015-01-01

    Breast cancer is molecularly heterogeneous and categorized into four molecular subtypes: Luminal-A, Luminal-B, HER2-amplified and Triple-negative. In this study, we aimed to apply an ensemble decision approach to identify the ultrasound and clinical features related to the molecular subtypes. We collected ultrasound and clinical features from 1,000 breast cancer patients and performed immunohistochemistry on these samples. We used the ensemble decision approach to select unique features and to construct decision models. The decision model for Luminal-A subtype was constructed based on the presence of an echogenic halo and post-acoustic shadowing or indifference. The decision model for Luminal-B subtype was constructed based on the absence of an echogenic halo and vascularity. The decision model for HER2-amplified subtype was constructed based on the presence of post-acoustic enhancement, calcification, vascularity and advanced age. The model for Triple-negative subtype followed two rules. One was based on irregular shape, lobulate margin contour, the absence of calcification and hypovascularity, whereas the other was based on oval shape, hypovascularity and micro-lobulate margin contour. The accuracies of the models were 83.8%, 77.4%, 87.9% and 92.7%, respectively. We identified specific features of each molecular subtype and expanded the scope of ultrasound for making diagnoses using these decision models. PMID:26046791

  1. Towards the assimilation of tree-ring-width records using ensemble Kalman filtering techniques

    NASA Astrophysics Data System (ADS)

    Acevedo, Walter; Reich, Sebastian; Cubasch, Ulrich

    2016-03-01

    This paper investigates the applicability of the Vaganov-Shashkin-Lite (VSL) forward model for tree-ring-width chronologies as observation operator within a proxy data assimilation (DA) setting. Based on the principle of limiting factors, VSL combines temperature and moisture time series in a nonlinear fashion to obtain simulated TRW chronologies. When used as observation operator, this modelling approach implies three compounding, challenging features: (1) time averaging, (2) "switching recording" of 2 variables and (3) bounded response windows leading to "thresholded response". We generate pseudo-TRW observations from a chaotic 2-scale dynamical system, used as a cartoon of the atmosphere-land system, and attempt to assimilate them via ensemble Kalman filtering techniques. Results within our simplified setting reveal that VSL's nonlinearities may lead to considerable loss of assimilation skill, as compared to the utilization of a time-averaged (TA) linear observation operator. In order to understand this undesired effect, we embed VSL's formulation into the framework of fuzzy logic (FL) theory, which thereby exposes multiple representations of the principle of limiting factors. DA experiments employing three alternative growth rate functions disclose a strong link between the lack of smoothness of the growth rate function and the loss of optimality in the estimate of the TA state. Accordingly, VSL's performance as observation operator can be enhanced by resorting to smoother FL representations of the principle of limiting factors. This finding fosters new interpretations of tree-ring-growth limitation processes.

  2. Comparing the decision-relevance and utility of alternative ensembles of climate projections in water management and other applications

    NASA Astrophysics Data System (ADS)

    Lempert, R. J.; Tingstad, A.

    2015-12-01

    Decisions to manage the risks of climate change hinge, among many other things, on deeply uncertain and imperfect climate projections. Improving the decision relevance and utility of climate projections requires navigating a trade-off between increasing the physical realism of the model (often by improving the spatial resolution) and increasing the representation of decision-relevant uncertainties. This talk will examine the decision-relevance and utility of alternative ensembles of climate information by comparing two decision support applications, in water management and biodiversity perseveration, both in California. The climate ensembles will consist of different combinations of high and medium resolution projections from NARCCAP (North American Regional Climate Assessment Program) as well as low resolution, but more numerous, projections from the CMIP3 and CMIP5 ensembles. The decision support applications will use the same ensembles of climate projections in different contexts. Workshops with decision makers examine the extent to which the different ensembles lead to different decisions, the extent to which considering a wider range of uncertainty affects decisions, the extent to which decision makers' confidence in the projections and the decisions based on them will be sensitive to the resolution at which they are communicated and the resolution dependent skill, and how the answers to these questions varies with the water management and biodiversity contexts. This study aims to provide empirical evidence to support judgments on how best to use uncertainty climate information in water management and other decision support applications.

  3. Decision trees for denoising in H.264/AVC video sequences

    NASA Astrophysics Data System (ADS)

    Huchet, G.; Chouinard, J.-Y.; Wang, D.; Vincent, A.

    2008-01-01

    All existing video coding standards are based on block-wise motion compensation and block-wise DCT. At high levels of quantization, block-wise motion compensation and transform produces blocking artifacts in the decoded video, a form of distortion to which the human visual system is very sensitive. The latest video coding standard, H.264/AVC, introduces a deblocking filter to reduce the blocking artifacts. However, there is still visible distortion after the filtering when compared to the original video. In this paper, we propose a non-conventional filter to further reduce the distortion and to improve the decoded picture quality. Different from conventional filters, the proposed filter is based on a machine learning algorithm (decision tree). The decision trees are used to classify the filter's inputs and select the best filter coeffcients for the inputs. Experimental results with 4 × 4 DCT indicate that using the filter holds promise in improving the quality of H.264/AVC video sequences.

  4. The xeroderma pigmentosum pathway: decision tree analysis of DNA quality.

    PubMed

    Naegeli, Hanspeter; Sugasawa, Kaoru

    2011-07-15

    The nucleotide excision repair (NER) system is a fundamental cellular stress response that uses only a handful of DNA binding factors, mutated in the cancer-prone syndrome xeroderma pigmentosum (XP), to detect an astounding diversity of bulky base lesions, including those induced by ultraviolet light, electrophilic chemicals, oxygen radicals and further genetic insults. Several of these XP proteins are characterized by a mediocre preference for damaged substrates over the native double helix but, intriguingly, none of them recognizes injured bases with sufficient selectivity to account for the very high precision of bulky lesion excision. Instead, substrate versatility as well as damage specificity and strand selectivity are achieved by a multistage quality control strategy whereby different subunits of the XP pathway, in succession, interrogate the DNA double helix for a distinct abnormality in its structural or dynamic parameters. Through this step-by-step filtering procedure, the XP proteins operate like a systematic decision making tool, generally known as decision tree analysis, to sort out rare damaged bases embedded in a vast excess of native DNA. The present review is focused on the mechanisms by which multiple XP subunits of the NER pathway contribute to the proposed decision tree analysis of DNA quality in eukaryotic cells. PMID:21684221

  5. Finding the right decision tree's induction strategy for a hard real world problem.

    PubMed

    Zorman, M; Podgorelec, V; Kokol, P; Peterson, M; Sprogar, M; Ojstersek, M

    2001-09-01

    Decision trees have been already successfully used in medicine, but as in traditional statistics, some hard real world problems can not be solved successfully using the traditional way of induction. In our experiments we tested various methods for building univariate decision trees in order to find the best induction strategy. On a hard real world problem of the Orthopaedic fracture data with 2637 cases, described by 23 attributes and a decision with three possible values, we built decision trees with four classical approaches, one hybrid approach where we combined neural networks and decision trees, and with an evolutionary approach. The results show that all approaches had problems with either accuracy, sensitivity, or decision tree size. The comparison shows that the best compromise in hard real world problem decision trees building is the evolutionary approach. PMID:11518670

  6. An efficient tree classifier ensemble-based approach for pedestrian detection.

    PubMed

    Xu, Yanwu; Cao, Xianbin; Qiao, Hong

    2011-02-01

    Classification-based pedestrian detection systems (PDSs) are currently a hot research topic in the field of intelligent transportation. A PDS detects pedestrians in real time on moving vehicles. A practical PDS demands not only high detection accuracy but also high detection speed. However, most of the existing classification-based approaches mainly seek for high detection accuracy, while the detection speed is not purposely optimized for practical application. At the same time, the performance, particularly the speed, is primarily tuned based on experiments without theoretical foundations, leading to a long training procedure. This paper starts with measuring and optimizing detection speed, and then a practical classification-based pedestrian detection solution with high detection speed and training speed is described. First, an extended classification/detection speed metric, named feature-per-object (fpo), is proposed to measure the detection speed independently from execution. Then, an fpo minimization model with accuracy constraints is formulated based on a tree classifier ensemble, where the minimum fpo can guarantee the highest detection speed. Finally, the minimization problem is solved efficiently by using nonlinear fitting based on radial basis function neural networks. In addition, the optimal solution is directly used to instruct classifier training; thus, the training speed could be accelerated greatly. Therefore, a rapid and accurate classification-based detection technique is proposed for the PDS. Experimental results on urban traffic videos show that the proposed method has a high detection speed with an acceptable detection rate and a false-alarm rate for onboard detection; moreover, the training procedure is also very fast. PMID:20457550

  7. Topographic controls on overland flow generation in a forest - An ensemble tree approach

    NASA Astrophysics Data System (ADS)

    Loos, Martin; Elsenbeer, Helmut

    2011-10-01

    SummaryOverland flow is an important hydrological pathway in many forests of the humid tropics. Its generation is subject to topographic controls at differing spatial scales. Our objective was to identify such controls on the occurrence of overland flow in a lowland tropical rainforest. To this end, we installed 95 overland flow detectors (OFDs) in four nested subcatchments of the Lutzito catchment on Barro Colorado Island, Panama, and monitored the frequency of overland flow occurrence during 18 rainfall events at each OFD location temporal frequency. For each such location, we derived three non-digital terrain attributes and 17 digital ones, of which 15 were based on Digital Elevation Models (DEMs) of three different resolutions. These attributes then served as input into a Random Forest ensemble tree model to elucidate the importance and partial and joint dependencies of topographic controls for overland flow occurrence. Lutzito features a high median temporal frequency in overland flow occurrence of 0.421 among OFD locations. However, spatial temporal frequencies of overland flow occurrence vary strongly among these locations and the subcatchments of Lutzito catchment. This variability is best explained by (1) microtopography, (2) coarse terrain sloping and (3) various measures of distance-to-channel, with the contribution of all other terrain attributes being small. Microtopographic features such as concentrated flowlines and wash areas produce highest temporal frequencies, whereas the occurrence of overland flow drops sharply for flow distances and terrain sloping beyond certain threshold values. Our study contributes to understanding both the spatial controls on overland flow generation and the limitations of terrain attributes for the spatially explicit prediction of overland flow frequencies.

  8. Decision Tree, Bagging and Random Forest methods detect TEC seismo-ionospheric anomalies around the time of the Chile, (Mw = 8.8) earthquake of 27 February 2010

    NASA Astrophysics Data System (ADS)

    Akhoondzadeh, Mehdi

    2016-06-01

    In this paper for the first time ensemble methods including Decision Tree, Bagging and Random Forest have been proposed in the field of earthquake precursors to detect GPS-TEC (Total Electron Content) seismo-ionospheric anomalies around the time and location of Chile earthquake of 27 February 2010. All of the implemented ensemble methods detected a striking anomaly in time series of TEC data, 1 day after the earthquake at 14:00 UTC. The results indicate that the proposed methods due to their performance, speed and simplicity are quite promising and deserve serious attention as a new predictor tools for seismo-ionospheric anomalies detection.

  9. Using decision trees to understand structure in missing data

    PubMed Central

    Tierney, Nicholas J; Harden, Fiona A; Harden, Maurice J; Mengersen, Kerrie L

    2015-01-01

    Objectives Demonstrate the application of decision trees—classification and regression trees (CARTs), and their cousins, boosted regression trees (BRTs)—to understand structure in missing data. Setting Data taken from employees at 3 different industrial sites in Australia. Participants 7915 observations were included. Materials and methods The approach was evaluated using an occupational health data set comprising results of questionnaires, medical tests and environmental monitoring. Statistical methods included standard statistical tests and the ‘rpart’ and ‘gbm’ packages for CART and BRT analyses, respectively, from the statistical software ‘R’. A simulation study was conducted to explore the capability of decision tree models in describing data with missingness artificially introduced. Results CART and BRT models were effective in highlighting a missingness structure in the data, related to the type of data (medical or environmental), the site in which it was collected, the number of visits, and the presence of extreme values. The simulation study revealed that CART models were able to identify variables and values responsible for inducing missingness. There was greater variation in variable importance for unstructured as compared to structured missingness. Discussion Both CART and BRT models were effective in describing structural missingness in data. CART models may be preferred over BRT models for exploratory analysis of missing data, and selecting variables important for predicting missingness. BRT models can show how values of other variables influence missingness, which may prove useful for researchers. Conclusions Researchers are encouraged to use CART and BRT models to explore and understand missing data. PMID:26124509

  10. Toward the Decision Tree for Inferring Requirements Maturation Types

    NASA Astrophysics Data System (ADS)

    Nakatani, Takako; Kondo, Narihito; Shirogane, Junko; Kaiya, Haruhiko; Hori, Shozo; Katamine, Keiichi

    Requirements are elicited step by step during the requirements engineering (RE) process. However, some types of requirements are elicited completely after the scheduled requirements elicitation process is finished. Such a situation is regarded as problematic situation. In our study, the difficulties of eliciting various kinds of requirements is observed by components. We refer to the components as observation targets (OTs) and introduce the word “Requirements maturation.” It means when and how requirements are elicited completely in the project. The requirements maturation is discussed on physical and logical OTs. OTs Viewed from a logical viewpoint are called logical OTs, e.g. quality requirements. The requirements of physical OTs, e.g., modules, components, subsystems, etc., includes functional and non-functional requirements. They are influenced by their requesters' environmental changes, as well as developers' technical changes. In order to infer the requirements maturation period of each OT, we need to know how much these factors influence the OTs' requirements maturation. According to the observation of actual past projects, we defined the PRINCE (Pre Requirements Intelligence Net Consideration and Evaluation) model. It aims to guide developers in their observation of the requirements maturation of OTs. We quantitatively analyzed the actual cases with their requirements elicitation process and extracted essential factors that influence the requirements maturation. The results of interviews of project managers are analyzed by WEKA, a data mining system, from which the decision tree was derived. This paper introduces the PRINCE model and the category of logical OTs to be observed. The decision tree that helps developers infer the maturation type of an OT is also described. We evaluate the tree through real projects and discuss its ability to infer the requirements maturation types.

  11. DECISION TREE CLASSIFIERS FOR STAR/GALAXY SEPARATION

    SciTech Connect

    Vasconcellos, E. C.; Ruiz, R. S. R.; De Carvalho, R. R.; Capelato, H. V.; Gal, R. R.; LaBarbera, F. L.; Frago Campos Velho, H.; Trevisan, M.

    2011-06-15

    We study the star/galaxy classification efficiency of 13 different decision tree algorithms applied to photometric objects in the Sloan Digital Sky Survey Data Release Seven (SDSS-DR7). Each algorithm is defined by a set of parameters which, when varied, produce different final classification trees. We extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. The efficiency of star-galaxy separation is measured using the completeness function. We find that the Functional Tree algorithm (FT) yields the best results as measured by the mean completeness in two magnitude intervals: 14 {<=} r {<=} 21 (85.2%) and r {>=} 19 (82.1%). We compare the performance of the tree generated with the optimal FT configuration to the classifications provided by the SDSS parametric classifier, 2DPHOT, and Ball et al. We find that our FT classifier is comparable to or better in completeness over the full magnitude range 15 {<=} r {<=} 21, with much lower contamination than all but the Ball et al. classifier. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination ({approx}2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 {<=} r {<=} 21.

  12. A Novel Approach on Designing Augmented Fuzzy Cognitive Maps Using Fuzzified Decision Trees

    NASA Astrophysics Data System (ADS)

    Papageorgiou, Elpiniki I.

    This paper proposes a new methodology for designing Fuzzy Cognitive Maps using crisp decision trees that have been fuzzified. Fuzzy cognitive map is a knowledge-based technique that works as an artificial cognitive network inheriting the main aspects of cognitive maps and artificial neural networks. Decision trees, in the other hand, are well known intelligent techniques that extract rules from both symbolic and numeric data. Fuzzy theoretical techniques are used to fuzzify crisp decision trees in order to soften decision boundaries at decision nodes inherent in this type of trees. Comparisons between crisp decision trees and the fuzzified decision trees suggest that the later fuzzy tree is significantly more robust and produces a more balanced decision making. The approach proposed in this paper could incorporate any type of fuzzy decision trees. Through this methodology, new linguistic weights were determined in FCM model, thus producing augmented FCM tool. The framework is consisted of a new fuzzy algorithm to generate linguistic weights that describe the cause-effect relationships among the concepts of the FCM model, from induced fuzzy decision trees.

  13. Ethical decision-making made easier. The use of decision trees in case management.

    PubMed

    Storl, H; DuBois, B; Seline, J

    1999-01-01

    Case managers have never before faced the multitude of difficult ethical dilemmas that now confront them daily. Legal, medical, social, and ethical considerations often fly in the face of previously reliable intuitions. The importance and urgency of facing these dilemmas head-on has resulted in clear calls for action. What are the appropriate legal, ethical, and professional parameters for effective decision making? Are normatively sensitive, but also practically sensible protocols possible? In an effort to address these concerns, Alternatives for the Older Adult, Inc., Rock Island, Illinois established an ethics committee to look into possible means of resolving or dissolving commonly occurring dilemmas. As a result of year-long deliberations, the committee formulated a decision-making strategy whose central apparatus is the decision tree--a flowchart of reasonable decisions and their consequent implications. In this article, we explore the development of this approach as well as the theory that underlies it. PMID:10695172

  14. Decision-relevant early-warning thresholds for ensemble flood forecasting systems

    NASA Astrophysics Data System (ADS)

    Stephens, Liz; Pappenberger, Florian; Cloke, Hannah; Alfieri, Lorenzo

    2014-05-01

    Over and under warning of potential future floods is problematic for decision-making, and could ultimately lead to trust being lost in the forecasts. The use of ensemble flood forecasting systems for early warning therefore requires a consideration of how to determine and implement decision-relevant thresholds for flood magnitude and probability. This study uses a year's worth of hindcasts from the Global Flood Awareness System (GloFAS) to explore the sensitivity of the warning system to the choice of threshold. We use a number of different methods for choosing these thresholds, building on current approaches that use model climatologies to determine the critical flow magnitudes, to those that can provide 'first guesses' of potential impacts (through integration with global-scale inundation mapping), as well as methods that could incorporate resource limitations.

  15. Applying an Ensemble Classification Tree Approach to the Prediction of Completion of a 12-Step Facilitation Intervention with Stimulant Abusers

    PubMed Central

    Doyle, Suzanne R.; Donovan, Dennis M.

    2014-01-01

    Aims The purpose of this study was to explore the selection of predictor variables in the evaluation of drug treatment completion using an ensemble approach with classification trees. The basic methodology is reviewed and the subagging procedure of random subsampling is applied. Methods Among 234 individuals with stimulant use disorders randomized to a 12-Step facilitative intervention shown to increase stimulant use abstinence, 67.52% were classified as treatment completers. A total of 122 baseline variables were used to identify factors associated with completion. Findings The number of types of self-help activity involvement prior to treatment was the predominant predictor. Other effective predictors included better coping self-efficacy for substance use in high-risk situations, more days of prior meeting attendance, greater acceptance of the Disease model, higher confidence for not resuming use following discharge, lower ASI Drug and Alcohol composite scores, negative urine screens for cocaine or marijuana, and fewer employment problems. Conclusions The application of an ensemble subsampling regression tree method utilizes the fact that classification trees are unstable but, on average, produce an improved prediction of the completion of drug abuse treatment. The results support the notion there are early indicators of treatment completion that may allow for modification of approaches more tailored to fitting the needs of individuals and potentially provide more successful treatment engagement and improved outcomes. PMID:25134038

  16. Applying an ensemble classification tree approach to the prediction of completion of a 12-step facilitation intervention with stimulant abusers.

    PubMed

    Doyle, Suzanne R; Donovan, Dennis M

    2014-12-01

    The purpose of this study was to explore the selection of predictor variables in the evaluation of drug treatment completion using an ensemble approach with classification trees. The basic methodology is reviewed, and the subagging procedure of random subsampling is applied. Among 234 individuals with stimulant use disorders randomized to a 12-step facilitative intervention shown to increase stimulant use abstinence, 67.52% were classified as treatment completers. A total of 122 baseline variables were used to identify factors associated with completion. The number of types of self-help activity involvement prior to treatment was the predominant predictor. Other effective predictors included better coping self-efficacy for substance use in high-risk situations, more days of prior meeting attendance, greater acceptance of the Disease model, higher confidence for not resuming use following discharge, lower Addiction Severity Index (ASI) Drug and Alcohol composite scores, negative urine screens for cocaine or marijuana, and fewer employment problems. The application of an ensemble subsampling regression tree method utilizes the fact that classification trees are unstable but, on average, produce an improved prediction of the completion of drug abuse treatment. The results support the notion there are early indicators of treatment completion that may allow for modification of approaches more tailored to fitting the needs of individuals and potentially provide more successful treatment engagement and improved outcomes. PMID:25134038

  17. Ensemble empirical mode decomposition as a tool of lake sediments and tree-ring width chronologies investigation

    NASA Astrophysics Data System (ADS)

    Ovchinnikov, Dmitriy; Mordvinov, Alexandr; Kalugin, Ivan; Darin, Andrey; Myglan, Vladimir

    2014-05-01

    A method named ensemble empirical mode decomposition (EEMD) was used to analyse different paleoclimatic data such as non-varved lake sediments of the Teletskoye lake and long tree-ring width chronologies from the Altai region (Altai Mountains, South Siberia, Russia) in the late Holocene (2000 years). Core of the bottom sediments from the Teletskoe lake (Altai Mountains) were investigated using scanning X-ray fluorescent analysis method with synchrotron radiation (spatial resolution is 0.1 mm). Low-frequency signals (modes) were extracted from both paleoarchives and shown: ~ 60, ~ 100, ~ 200, ~ 300-500 and ~1000-year cycles in the Teletskoye lake; ~ 25-33, ~ 50-60, ~100- 200, ~ 300 and ~ 1000 year cycles in tree-ring width chronologies. A common 200-year cycle was found in both archives. Also EEMD method was used to analyse a solar-activity during late Holocene. The magnetic solar activity well associated with tree-ring width chronologies. Changes of the tree-ring width chronology on the millennial time scale coincide with similar changes of the solar activity in the Holocene. Stable relationships between solar activity and climate characteristics are found on 100-200 years time scales (Glaysberg and Suess cycles). The magnetic solar activity and paleotemperature changes are observed as solar-terrestrial relations on a large time scale. It is indicate that the temperature increase in the 19-20 centuries is largely due to the impact of solar activity on the Earth's climate system. Solar-terrestrial relations analysis shown common 200-year cycle in all presented paleoarchives. The study was funded by: Interdisciplinary Integration Project SB RAS # 34 and grants # 13-05-00620 from the Russian Foundation for Basic Research. Key words: ensemble empirical mode decomposition (EEMD), lake sediments, tree-ring width chronologies, solar-terrestrial relations

  18. Prediction model based on decision tree analysis for laccase mediators.

    PubMed

    Medina, Fabiola; Aguila, Sergio; Baratto, Maria Camilla; Martorana, Andrea; Basosi, Riccardo; Alderete, Joel B; Vazquez-Duhalt, Rafael

    2013-01-10

    A Structure Activity Relationship (SAR) study for laccase mediator systems was performed in order to correctly classify different natural phenolic mediators. Decision tree (DT) classification models with a set of five quantum-chemical calculated molecular descriptors were used. These descriptors included redox potential (ɛ°), ionization energy (E(i)), pK(a), enthalpy of formation of radical (Δ(f)H), and OH bond dissociation energy (D(O-H)). The rationale for selecting these descriptors is derived from the laccase-mediator mechanism. To validate the DT predictions, the kinetic constants of different compounds as laccase substrates, their ability for pesticide transformation as laccase-mediators, and radical stability were experimentally determined using Coriolopsis gallica laccase and the pesticide dichlorophen. The prediction capability of the DT model based on three proposed descriptors showed a complete agreement with the obtained experimental results. PMID:23199741

  19. Classification of Liss IV Imagery Using Decision Tree Methods

    NASA Astrophysics Data System (ADS)

    Verma, Amit Kumar; Garg, P. K.; Prasad, K. S. Hari; Dadhwal, V. K.

    2016-06-01

    Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green) has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness) with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  20. The value of decision tree analysis in planning anaesthetic care in obstetrics.

    PubMed

    Bamber, J H; Evans, S A

    2016-08-01

    The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed. PMID:27026589

  1. Classification of Subcellular Phenotype Images by Decision Templates for Classifier Ensemble

    NASA Astrophysics Data System (ADS)

    Zhang, Bailing

    2010-01-01

    Subcellular localization is a key functional characteristic of proteins. An automatic, reliable and efficient prediction system for protein subcellular localization is needed for large-scale genome analysis. The automated cell phenotype image classification problem is an interesting "bioimage informatics" application. It can be used for establishing knowledge of the spatial distribution of proteins within living cells and permits to screen systems for drug discovery or for early diagnosis of a disease. In this paper, three well-known texture feature extraction methods including local binary patterns (LBP), Gabor filtering and Gray Level Coocurrence Matrix (GLCM) have been applied to cell phenotype images and the multiple layer perceptron (MLP) method has been used to classify cell phenotype image. After classification of the extracted features, decision-templates ensemble algorithm (DT) is used to combine base classifiers built on the different feature sets. Different texture feature sets can provide sufficient diversity among base classifiers, which is known as a necessary condition for improvement in ensemble performance. For the HeLa cells, the human classification error rate on this task is of 17% as reported in previous publications. We obtain with our method an error rate of 4.8%.

  2. An Improved Decision Tree for Predicting a Major Product in Competing Reactions

    ERIC Educational Resources Information Center

    Graham, Kate J.

    2014-01-01

    When organic chemistry students encounter competing reactions, they are often overwhelmed by the task of evaluating multiple factors that affect the outcome of a reaction. The use of a decision tree is a useful tool to teach students to evaluate a complex situation and propose a likely outcome. Specifically, a decision tree can help students…

  3. Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments

    ERIC Educational Resources Information Center

    Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.

    2009-01-01

    The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…

  4. Towards global empirical upscaling of FLUXNET eddy covariance observations: validation of a model tree ensemble approach using a biosphere model

    NASA Astrophysics Data System (ADS)

    Jung, M.; Reichstein, M.; Bondeau, A.

    2009-10-01

    Global, spatially and temporally explicit estimates of carbon and water fluxes derived from empirical up-scaling eddy covariance measurements would constitute a new and possibly powerful data stream to study the variability of the global terrestrial carbon and water cycle. This paper introduces and validates a machine learning approach dedicated to the upscaling of observations from the current global network of eddy covariance towers (FLUXNET). We present a new model TRee Induction ALgorithm (TRIAL) that performs hierarchical stratification of the data set into units where particular multiple regressions for a target variable hold. We propose an ensemble approach (Evolving tRees with RandOm gRowth, ERROR) where the base learning algorithm is perturbed in order to gain a diverse sequence of different model trees which evolves over time. We evaluate the efficiency of the model tree ensemble (MTE) approach using an artificial data set derived from the Lund-Potsdam-Jena managed Land (LPJmL) biosphere model. We aim at reproducing global monthly gross primary production as simulated by LPJmL from 1998-2005 using only locations and months where high quality FLUXNET data exist for the training of the model trees. The model trees are trained with the LPJmL land cover and meteorological input data, climate data, and the fraction of absorbed photosynthetic active radiation simulated by LPJmL. Given that we know the "true result" in the form of global LPJmL simulations we can effectively study the performance of the MTE upscaling and associated problems of extrapolation capacity. We show that MTE is able to explain 92% of the variability of the global LPJmL GPP simulations. The mean spatial pattern and the seasonal variability of GPP that constitute the largest sources of variance are very well reproduced (96% and 94% of variance explained respectively) while the monthly interannual anomalies which occupy much less variance are less well matched (41% of variance explained

  5. Decision trees and decision committee applied to star/galaxy separation problem

    NASA Astrophysics Data System (ADS)

    Vasconcellos, Eduardo Charles

    Vasconcellos et al [1] study the efficiency of 13 diferente decision tree algorithms applied to photometric data in the Sloan Digital Sky Digital Survey Data Release Seven (SDSS-DR7) to perform star/galaxy separation. Each algorithm is defined by a set fo parameters which, when varied, produce diferente final classifications trees. In that work we extensively explore the parameter space of each algorithm, using the set of 884,126 SDSS objects with spectroscopic data as the training set. We find that Functional Tree algorithm (FT) yields the best results by the mean completeness function (galaxy true positive rate) in two magnitude intervals:14<=r<=21 (85.2%) and r>=19 (82.1%). We compare FT classification to the SDSS parametric, 2DPHOT and Ball et al (2006) classifications. At the faintest magnitudes (r > 19), our classifier is the only one that maintains high completeness (>80%) while simultaneously achieving low contamination ( 2.5%). We also examine the SDSS parametric classifier (psfMag - modelMag) to see if the dividing line between stars and galaxies can be adjusted to improve the classifier. We find that currently stars in close pairs are often misclassified as galaxies, and suggest a new cut to improve the classifier. Finally, we apply our FT classifier to separate stars from galaxies in the full set of 69,545,326 SDSS photometric objects in the magnitude range 14 <= r <= 21. We now study the performance of a decision committee composed by FT classifiers. We will train six FT classifiers with random selected objects from the same 884,126 SDSS-DR7 objects with spectroscopic data that we use before. Both, the decision commitee and our previous single FT classifier will be applied to the new ojects from SDSS data releses eight, nine and ten. Finally we will compare peformances of both methods in this new data set. [1] Vasconcellos, E. C.; de Carvalho, R. R.; Gal, R. R.; LaBarbera, F. L.; Capelato, H. V.; Fraga Campos Velho, H.; Trevisan, M.; Ruiz, R. S. R

  6. The decision tree classifier - Design and potential. [for Landsat-1 data

    NASA Technical Reports Server (NTRS)

    Hauska, H.; Swain, P. H.

    1975-01-01

    A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.

  7. Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with ...

  8. Accurate estimation of retinal vessel width using bagged decision trees and an extended multiresolution Hermite model.

    PubMed

    Lupaşcu, Carmen Alina; Tegolo, Domenico; Trucco, Emanuele

    2013-12-01

    We present an algorithm estimating the width of retinal vessels in fundus camera images. The algorithm uses a novel parametric surface model of the cross-sectional intensities of vessels, and ensembles of bagged decision trees to estimate the local width from the parameters of the best-fit surface. We report comparative tests with REVIEW, currently the public database of reference for retinal width estimation, containing 16 images with 193 annotated vessel segments and 5066 profile points annotated manually by three independent experts. Comparative tests are reported also with our own set of 378 vessel widths selected sparsely in 38 images from the Tayside Scotland diabetic retinopathy screening programme and annotated manually by two clinicians. We obtain considerably better accuracies compared to leading methods in REVIEW tests and in Tayside tests. An important advantage of our method is its stability (success rate, i.e., meaningful measurement returned, of 100% on all REVIEW data sets and on the Tayside data set) compared to a variety of methods from the literature. We also find that results depend crucially on testing data and conditions, and discuss criteria for selecting a training set yielding optimal accuracy. PMID:24001930

  9. Classification of dopamine, serotonin, and dual antagonists by decision trees.

    PubMed

    Kim, Hye-Jung; Choo, Hyunah; Cho, Yong Seo; Koh, Hun Yeong; No, Kyoung Tai; Pae, Ae Nim

    2006-04-15

    Dopamine antagonists (DA), serotonin antagonists (SA), and serotonin-dopamine dual antagonists (Dual) are being used as antipsychotics. A lot of dopamine and serotonin antagonists reveal non-selective binding affinity against these two receptors because the antagonists share structurally common features originated from conserved residues of binding site of the aminergic receptor family. Therefore, classification of dopamine and serotonin antagonists into their own receptors can be useful in the designing of selective antagonist for individual therapy of antipsychotic disorders. Data set containing 1135 dopamine antagonists (D2, D3, and D4), 1251 serotonin antagonists (5-HT1A, 5-HT2A, and 5-HT2C), and 386 serotonin-dopamine dual antagonists was collected from the MDDR database. Cerius2 descriptors were employed to develop a classification model for the 2772 compounds with antipsychotic activity. LDA (linear discriminant analysis), SIMCA (soft independent modeling of class analogy), RP (recursive partitioning), and ANN (artificial neural network) algorithms successfully classified the active class of each compound at the average 73.6% and predicted at the average 69.8%. The decision trees from RP, the best model, were generated to identify and interpret those descriptors that discriminate the active classes more easily. These classification models could be used as a virtual screening tool to predict the active class of new candidates. PMID:16387502

  10. ArborZ: PHOTOMETRIC REDSHIFTS USING BOOSTED DECISION TREES

    SciTech Connect

    Gerdes, David W.; Sypniewski, Adam J.; McKay, Timothy A.; Hao, Jiangang; Weis, Matthew R.; Wechsler, Risa H.; Busha, Michael T.

    2010-06-01

    Precision photometric redshifts will be essential for extracting cosmological parameters from the next generation of wide-area imaging surveys. In this paper, we introduce a photometric redshift algorithm, ArborZ, based on the machine-learning technique of boosted decision trees. We study the algorithm using galaxies from the Sloan Digital Sky Survey (SDSS) and from mock catalogs intended to simulate both the SDSS and the upcoming Dark Energy Survey. We show that it improves upon the performance of existing algorithms. Moreover, the method naturally leads to the reconstruction of a full probability density function (PDF) for the photometric redshift of each galaxy, not merely a single 'best estimate' and error, and also provides a photo-z quality figure of merit for each galaxy that can be used to reject outliers. We show that the stacked PDFs yield a more accurate reconstruction of the redshift distribution N(z). We discuss limitations of the current algorithm and ideas for future work.

  11. ArborZ: Photometric Redshifts Using Boosted Decision Trees

    NASA Astrophysics Data System (ADS)

    Gerdes, David W.; Sypniewski, Adam J.; McKay, Timothy A.; Hao, Jiangang; Weis, Matthew R.; Wechsler, Risa H.; Busha, Michael T.

    2010-06-01

    Precision photometric redshifts will be essential for extracting cosmological parameters from the next generation of wide-area imaging surveys. In this paper, we introduce a photometric redshift algorithm, ArborZ, based on the machine-learning technique of boosted decision trees. We study the algorithm using galaxies from the Sloan Digital Sky Survey (SDSS) and from mock catalogs intended to simulate both the SDSS and the upcoming Dark Energy Survey. We show that it improves upon the performance of existing algorithms. Moreover, the method naturally leads to the reconstruction of a full probability density function (PDF) for the photometric redshift of each galaxy, not merely a single "best estimate" and error, and also provides a photo-z quality figure of merit for each galaxy that can be used to reject outliers. We show that the stacked PDFs yield a more accurate reconstruction of the redshift distribution N(z). We discuss limitations of the current algorithm and ideas for future work.

  12. Learning from examples - Generation and evaluation of decision trees for software resource analysis

    NASA Technical Reports Server (NTRS)

    Selby, Richard W.; Porter, Adam A.

    1988-01-01

    A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.

  13. Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine

    NASA Technical Reports Server (NTRS)

    Schwabacher, Mark A.; Aguilar, Robert; Figueroa, Fernando F.

    2009-01-01

    The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically "learns" a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to "train" and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it "learned" a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location.

  14. Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree

    NASA Astrophysics Data System (ADS)

    Kim, Jong Kyu; Kim, Nam Soo

    In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.

  15. Supervised hashing using graph cuts and boosted decision trees.

    PubMed

    Lin, Guosheng; Shen, Chunhua; Hengel, Anton van den

    2015-11-01

    To build large-scale query-by-example image retrieval systems, embedding image features into a binary Hamming space provides great benefits. Supervised hashing aims to map the original features to compact binary codes that are able to preserve label based similarity in the binary Hamming space. Most existing approaches apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of those methods, and can result in complex optimization problems that are difficult to solve. In this work we proffer a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. The proposed framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problem-specific hashing methods. Our framework decomposes the hashing learning problem into two steps: binary code (hash bit) learning and hash function learning. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training a standard binary classifier. For solving large-scale binary code inference, we show how it is possible to ensure that the binary quadratic problems are submodular such that efficient graph cut methods may be used. To achieve efficiency as well as efficacy on large-scale high-dimensional data, we propose to use boosted decision trees as the hash functions, which are nonlinear, highly descriptive, and are very fast to train and evaluate. Experiments demonstrate that the proposed method significantly outperforms most state-of-the-art methods, especially on high-dimensional data. PMID:26440270

  16. Prediction of Regional Streamflow Frequency using Model Tree Ensembles: A data-driven approach based on natural and anthropogenic drainage area characteristics

    NASA Astrophysics Data System (ADS)

    Schnier, S.; Cai, X.

    2012-12-01

    This study introduces a highly accurate data-driven method to predict streamflow frequency statistics based on known drainage area characteristics which yields insights into the dominant controls of regional streamflow. The model is enhanced by explicit consideration of human interference in local hydrology. The basic idea is to use decision trees (i.e., regression trees) to regionalize the dataset and create a model tree by fitting multi-linear equations to the leaves of the regression tree. We improve model accuracy and obtain a measure of variable importance by creating an ensemble of randomized model trees using bootstrap aggregation (i.e., bagging). The database used to induce the models is built from public domain drainage area characteristics for 715 USGS stream gages (455 in Texas and 260 in Illinois). The database includes information on natural characteristics such as precipitation, soil type and slope, as well as anthropogenic ones including land cover, human population and water use. Model accuracy was evaluated using cross-validation and several performance metrics. During the validation, the gauges that are withheld from the analysis represent ungauged watersheds. The proposed method outperforms standard regression models such as the method of residuals for predictions in ungauged watersheds. Importantly, out-of-bag variable importance combined with models for 17 points along the flow duration curve (FDC) (i.e., from 0% to 100% exceedance frequency) yields insight into the dominant controls of regional streamflow. The most discriminant variables for high flows are drainage area and seasonal precipitation. Discriminant variables for low flows are more complex and model accuracy is improved with base-flow data, which is particularly difficult to obtain for ungauged sites. Consideration of human activities, such as percent urban and water use, is also shown to improve accuracy of low flow predictions. Drainage area characteristics, especially

  17. Application of preprocessing filtering on Decision Tree C4.5 and rough set theory

    NASA Astrophysics Data System (ADS)

    Chan, Joseph C. C.; Lin, Tsau Y.

    2001-03-01

    This paper compares two artificial intelligence methods: the Decision Tree C4.5 and Rough Set Theory on the stock market data. The Decision Tree C4.5 is reviewed with the Rough Set Theory. An enhanced window application is developed to facilitate the pre-processing filtering by introducing the feature (attribute) transformations, which allows users to input formulas and create new attributes. Also, the application produces three varieties of data set with delaying, averaging, and summation. The results prove the improvement of pre-processing by applying feature (attribute) transformations on Decision Tree C4.5. Moreover, the comparison between Decision Tree C4.5 and Rough Set Theory is based on the clarity, automation, accuracy, dimensionality, raw data, and speed, which is supported by the rules sets generated by both algorithms on three different sets of data.

  18. A Decision Tree Approach to the Interpretation of Multivariate Statistical Techniques.

    ERIC Educational Resources Information Center

    Fok, Lillian Y.; And Others

    1995-01-01

    Discusses the nature, power, and limitations of four multivariate techniques: factor analysis, multiple analysis of variance, multiple regression, and multiple discriminant analysis. Shows how decision trees assist in interpreting results. (SK)

  19. Application of decision tree model for the ground subsidence hazard mapping near abandoned underground coal mines.

    PubMed

    Lee, Saro; Park, Inhye

    2013-09-30

    Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. PMID:23702378

  20. Combining evolutionary algorithms with oblique decision trees to detect bent double galaxies

    SciTech Connect

    Cantu-Paz, E; Kamath, C

    2000-06-22

    Decision trees have long been popular in classification as they use simple and easy-to-understand tests at each node. Most variants of decision trees test a single attribute at a node, leading to axis-parallel trees, where the test results in a hyperplane which is parallel to one of the dimensions in the attribute space. These trees can be rather large and inaccurate in cases where the concept to be learnt is best approximated by oblique hyperplanes. In such cases, it may be more appropriate to use an oblique decision tree, where the decision at each node is a linear combination of the attributes. Oblique decision trees have not gained wide popularity in part due to the complexity of constructing good oblique splits and the tendency of existing splitting algorithms to get stuck in local minima. Several alternatives have been proposed to handle these problems including randomization in conjunction with deterministic hill climbing and the use of simulated annealing. In this paper, they use evolutionary algorithms (EAs) to determine the split. EAs are well suited for this problem because of their global search properties, their tolerance to noisy fitness evaluations, and their scalability to large dimensional search spaces. They demonstrate the technique on a practical problem from astronomy, namely, the classification of galaxies with a bent-double morphology, and describe their experiences with several split evaluation criteria.

  1. Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis

    PubMed Central

    Lo, Benjamin W. Y.; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R. Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A. H.

    2016-01-01

    Background: Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). Methods: The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Results: Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56–2.45, P < 0.01). Conclusions: A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH. PMID:27512607

  2. Using GEFS ensemble forecasts for decision making in reservoir management in California

    NASA Astrophysics Data System (ADS)

    Scheuerer, M.; Hamill, T.; Webb, R. S.

    2015-12-01

    Reservoirs such as Lake Mendocino in California's Russian River Basin provide flood control, water supply, recreation, and environmental stream flow regulation. Many of these reservoirs are operated by the U.S. Army Corps of Engineers (Corps) according to water control manuals that specify elevations for an upper volume of reservoir storage that must be kept available for capturing storm runoff and reducing flood risk, and a lower volume of storage that may be used for water supply. During extreme rainfall events, runoff is captured by these reservoirs and released as quickly as possible to create flood storage space for another potential storm. These flood control manuals are based on typical historical weather patterns - wet during the winter, dry otherwise - but are not informed directly by weather prediction. Alternative reservoir management approaches such as Forecast-Informed Reservoir Operations (FIRO), which seek to incorporate advances in weather prediction, are currently being explored as means to improve water supply availability while maintaining flood risk reduction and providing additional ecosystem benefits.We present results from a FIRO proof-of-concept study investigating the reliability of post-processed GEFS ensemble forecasts to predict the probability that day 6-to-10 precipitation accumulations in certain areas in California exceed a high threshold. Our results suggest that reliable forecast guidance can be provided, and the resulting probabilities could be used to inform decisions to release or hold water in the reservoirs. We illustrate the potential of these forecasts in a case study of extreme event probabilities for the Russian River Basin in California.

  3. Production of diagnostic rules from a neurotologic database with decision trees.

    PubMed

    Kentala, E; Viikki, K; Pyykkö, I; Juhola, M

    2000-02-01

    A decision tree is an artificial intelligence program that is adaptive and is closely related to a neural network, but can handle missing or nondecisive data in decision-making. Data on patients with Meniere's disease, vestibular schwannoma, traumatic vertigo, sudden deafness, benign paroxysmal positional vertigo, and vestibular neuritis were retrieved from the database of the otoneurologic expert system ONE for the development and testing of the accuracy of decision trees in the diagnostic workup. Decision trees were constructed separately for each disease. The accuracies of the best decision trees were 94%, 95%, 99%, 99%, 100%, and 100% for the respective diseases. The most important questions concerned the presence of vertigo, hearing loss, and tinnitus; duration of vertigo; frequency of vertigo attacks; severity of rotational vertigo; onset and type of hearing loss; and occurrence of head injury in relation to the timing of onset of vertigo. Meniere's disease was the most difficult to classify correctly. The validity and structure of the decision trees are easily comprehended and can be used outside the expert system. PMID:10685569

  4. Ensemble Methods

    NASA Astrophysics Data System (ADS)

    Re, Matteo; Valentini, Giorgio

    2012-03-01

    Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been

  5. Ensemble Methods

    NASA Astrophysics Data System (ADS)

    Re, Matteo; Valentini, Giorgio

    2012-03-01

    Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been

  6. Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data

    PubMed Central

    2012-01-01

    Background This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor. PMID:23171000

  7. Use of a decision tree to select the mud system for the Oso field, Nigeria

    SciTech Connect

    Dear, S.F. III; Beasley, R.D.; Barr, K.P.

    1995-10-01

    Far too often, the basis for selection of a mud system is the ``latest, greatest`` technology or personal preference rather than sound cost-effective analysis. The use of risk-vs.-cost decision analysis improves mud selection and makes it a proper business decision. Several mud systems usually are available to drill and well and, with good decision analysis, the cost-effectiveness of each alternative becomes apparent. This paper describes how the drilling team used structured decision analysis to evaluate and select the best mud system for the project. First, Monte Carlo simulations forecast the range of possible results with each alternative. The simulations provide most-likely values for the variables in the decision tree, including reasonable ranges for sensitivity analyses. This paper presents and discusses the simulations, the decision tree, and the sensitivity analyses.

  8. How to pose the question matters: Behavioural Economics concepts in decision making on the basis of ensemble forecasts

    NASA Astrophysics Data System (ADS)

    Alfonso, Leonardo; van Andel, Schalk Jan

    2014-05-01

    Part of recent research in ensemble and probabilistic hydro-meteorological forecasting analyses which probabilistic information is required by decision makers and how it can be most effectively visualised. This work, in addition, analyses if decision making in flood early warning is also influenced by the way the decision question is posed. For this purpose, the decision-making game "Do probabilistic forecasts lead to better decisions?", which Ramos et al (2012) conducted at the EGU General Assembly 2012 in the city of Vienna, has been repeated with a small group and expanded. In that game decision makers had to decide whether or not to open a flood release gate, on the basis of flood forecasts, with and without uncertainty information. A conclusion of that game was that, in the absence of uncertainty information, decision makers are compelled towards a more risk-averse attitude. In order to explore to what extent the answers were driven by the way the questions were framed, in addition to the original experiment, a second variant was introduced where participants were asked to choose between a sure value (for either loosing or winning with a giving probability) and a gamble. This set-up is based on Kahneman and Tversky (1979). Results indicate that the way how the questions are posed may play an important role in decision making and that Prospect Theory provides promising concepts to further understand how this works.

  9. Combining evolutionary algorithms with oblique decision trees to detect bent-double galaxies

    NASA Astrophysics Data System (ADS)

    Cantu-Paz, Erick; Kamath, Chandrika

    2000-10-01

    Decision tress have long been popular in classification as they use simple and easy-to-understand tests at each node. Most variants of decision trees test a single attribute at a node, leading to axis- parallel trees, where the test results in a hyperplane which is parallel to one of the dimensions in the attribute space. These trees can be rather large and inaccurate in cases where the concept to be learned is best approximated by oblique hyperplanes. In such cases, it may be more appropriate to use an oblique decision tree, where the decision at each node is a linear combination of the attributes. Oblique decision trees have not gained wide popularity in part due to the complexity of constructing good oblique splits and the tendency of existing splitting algorithms to get stuck in local minima. Several alternatives have been proposed to handle these problems including randomization in conjunction wiht deterministic hill-climbing and the use of simulated annealing. In this paper, we use evolutionary algorithms (EAs) to determine the split. EAs are well suited for this problem because of their global search properties, their tolerance to noisy fitness evaluations, and their scalability to large dimensional search spaces. We demonstrate our technique on a synthetic data set, and then we apply it to a practical problem from astronomy, namely, the classification of galaxies with a bent-double morphology. In addition, we describe our experiences with several split evaluation criteria. Our results suggest that, in some cases, the evolutionary approach is faster and more accurate than existing oblique decision tree algorithms. However, for our astronomical data, the accuracy is not significantly different than the axis-parallel trees.

  10. Using decision trees to characterize verbal communication during change and stuck episodes in the therapeutic process

    PubMed Central

    Masías, Víctor H.; Krause, Mariane; Valdés, Nelson; Pérez, J. C.; Laengle, Sigifredo

    2015-01-01

    Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice. PMID:25914657

  11. Cloud detection based on decision tree over Tibetan Plateau with MODIS data

    NASA Astrophysics Data System (ADS)

    Xu, Lina; Niu, Ruiqing; Fang, Shenghui; Dong, Yanfang

    2013-10-01

    Snow cover area is a very critical parameter for hydrologic cycle of the Earth. Furthermore, it will be a key factor for the effect of the climate change. An unbelievable situation in mapping snow cover is the existence of clouds. Clouds can easily be found in any image from satellite, because clouds are bright and white in the visible wavelengths. But it is not the case when there is snow or ice in the background. It is similar spectral appearance of snow and clouds. Many cloud decision methods are built on decision trees. The decision trees were designed based on empirical studies and simulations. In this paper a classification trees were used to build the decision tree. And then with a great deal repeating scenes coming from the same area the cloud pixel can be replaced by "its" real surface types, such as snow pixel or vegetation or water. The effect of the cloud can be distinguished in the short wave infrared. The results show that most cloud coverage being removed. A validation was carried out for all subsequent steps. It led to the removal of all remaining cloud cover. The results show that the decision tree method performed satisfied.

  12. Cloud Detection Based on Decision Tree Over Tibetan Plateau with Modis Data

    NASA Astrophysics Data System (ADS)

    Xu, L.; Fang, S.; Niu, R.; Li, J.

    2012-07-01

    Snow cover area is a very critical parameter for hydrologic cycle of the Earth. Furthermore, it will be a key factor for the effect of the climate change. An unbelievable situation in mapping snow cover is the existence of clouds. Clouds can easily be found in any image from satellite, because clouds are bright and white in the visible wavelengths. But it is not the case when there is snow or ice in the background. It is similar spectral appearance of snow and clouds. Many cloud decision methods are built on decision trees. The decision trees were designed based on empirical studies and simulations. In this paper a classification trees were used to build the decision tree. And then with a great deal repeating scenes coming from the same area the cloud pixel can be replaced by "its" real surface types, such as snow pixel or vegetation or water. The effect of the cloud can be distinguished in the short wave infrared. The results show that most cloud coverage being removed. A validation was carried out for all subsequent steps. It led to the removal of all remaining cloud cover. The results show that the decision tree method performed satisfied.

  13. Pruning a decision tree for selecting computer-related assistive devices for people with disabilities.

    PubMed

    Chi, Chia-Fen; Tseng, Li-Kai; Jang, Yuh

    2012-07-01

    Many disabled individuals lack extensive knowledge about assistive technology, which could help them use computers. In 1997, Denis Anson developed a decision tree of 49 evaluative questions designed to evaluate the functional capabilities of the disabled user and choose an appropriate combination of assistive devices, from a selection of 26, that enable the individual to use a computer. In general, occupational therapists guide the disabled users through this process. They often have to go over repetitive questions in order to find an appropriate device. A disabled user may require an alphanumeric entry device, a pointing device, an output device, a performance enhancement device, or some combination of these. Therefore, the current research eliminates redundant questions and divides Anson's decision tree into multiple independent subtrees to meet the actual demand of computer users with disabilities. The modified decision tree was tested by six disabled users to prove it can determine a complete set of assistive devices with a smaller number of evaluative questions. The means to insert new categories of computer-related assistive devices was included to ensure the decision tree can be expanded and updated. The current decision tree can help the disabled users and assistive technology practitioners to find appropriate computer-related assistive devices that meet with clients' individual needs in an efficient manner. PMID:22552588

  14. Prediction of Weather Impacted Airport Capacity using Ensemble Learning

    NASA Technical Reports Server (NTRS)

    Wang, Yao Xun

    2011-01-01

    Ensemble learning with the Bagging Decision Tree (BDT) model was used to assess the impact of weather on airport capacities at selected high-demand airports in the United States. The ensemble bagging decision tree models were developed and validated using the Federal Aviation Administration (FAA) Aviation System Performance Metrics (ASPM) data and weather forecast at these airports. The study examines the performance of BDT, along with traditional single Support Vector Machines (SVM), for airport runway configuration selection and airport arrival rates (AAR) prediction during weather impacts. Testing of these models was accomplished using observed weather, weather forecast, and airport operation information at the chosen airports. The experimental results show that ensemble methods are more accurate than a single SVM classifier. The airport capacity ensemble method presented here can be used as a decision support model that supports air traffic flow management to meet the weather impacted airport capacity in order to reduce costs and increase safety.

  15. Outsourcing the Portal: Another Branch in the Decision Tree.

    ERIC Educational Resources Information Center

    McMahon, Tim

    2000-01-01

    Discussion of the management of information resources in organizations focuses on the use of portal technologies to update intranet capabilities. Considers application outsourcing decisions, reviews benefits (including reducing costs) as well as concerns, and describes application service providers (ASPs). (LRW)

  16. Post-event human decision errors: operator action tree/time reliability correlation

    SciTech Connect

    Hall, R E; Fragola, J; Wreathall, J

    1982-11-01

    This report documents an interim framework for the quantification of the probability of errors of decision on the part of nuclear power plant operators after the initiation of an accident. The framework can easily be incorporated into an event tree/fault tree analysis. The method presented consists of a structure called the operator action tree and a time reliability correlation which assumes the time available for making a decision to be the dominating factor in situations requiring cognitive human response. This limited approach decreases the magnitude and complexity of the decision modeling task. Specifically, in the past, some human performance models have attempted prediction by trying to emulate sequences of human actions, or by identifying and modeling the information processing approach applicable to the task. The model developed here is directed at describing the statistical performance of a representative group of hypothetical individuals responding to generalized situations.

  17. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    PubMed

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  18. A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem

    PubMed Central

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  19. The decision - identification tree: A new EIS scoping tool

    SciTech Connect

    Eccleston, C.H.

    1997-04-02

    No single methodology has been developed or universally accepted for determining the scope of an Environmental Impact Statement (EIS). Most typically, the scope is determined by first identifying actions and facilities to be analyzed. Yet, agencies sometimes complete an EIS, only to discover that the scope does not adequately address decisions that need to be made. Such discrepancies can often be traced to disconnects between the scoping process and the actual decision making that follows. A new tool, for use in a value engineering setting, provides an effective methodology for improving the EIS scoping process. Application of this tool is not limited to National Environmental Policy Act (NEPA) scoping efforts. This tool, could in fact, be used to map potential decision points for a range of diverse planning applications and exercises.

  20. Decision tree for the binding of dipeptides to the thermally fluctuating surface of cathepsin K

    NASA Astrophysics Data System (ADS)

    Nishiyama, Katsuhiko

    2016-03-01

    The behavior of 15 dipeptides on thermally fluctuating cathepsin K was investigated by molecular dynamics and docking simulations. Four dipeptides were distributed on sites near the active center, and the variations were small. Eleven dipeptides were distributed on sites far from the active center, and the variations were large for nine dipeptides and very large for the other two. The decision tree was constructed using genetic programming, and it accurately classified the 15 dipeptides. The decision tree would accurately estimate the behavior of various peptides, and should significantly contribute to the design of useful peptides.

  1. Ramping ensemble activity in dorsal anterior cingulate neurons during persistent commitment to a decision.

    PubMed

    Blanchard, Tommy C; Strait, Caleb E; Hayden, Benjamin Y

    2015-10-01

    We frequently need to commit to a choice to achieve our goals; however, the neural processes that keep us motivated in pursuit of delayed goals remain obscure. We examined ensemble responses of neurons in macaque dorsal anterior cingulate cortex (dACC), an area previously implicated in self-control and persistence, in a task that requires commitment to a choice to obtain a reward. After reward receipt, dACC neurons signaled reward amount with characteristic ensemble firing rate patterns; during the delay in anticipation of the reward, ensemble activity smoothly and gradually came to resemble the postreward pattern. On the subset of risky trials, in which a reward was anticipated with 50% certainty, ramping ensemble activity evolved to the pattern associated with the anticipated reward (and not with the anticipated loss) and then, on loss trials, took on an inverted form anticorrelated with the form associated with a win. These findings enrich our knowledge of reward processing in dACC and may have broader implications for our understanding of persistence and self-control. PMID:26334016

  2. Decision Support on the Sediments Flushing of Aimorés Dam Using Medium-Range Ensemble Forecasts

    NASA Astrophysics Data System (ADS)

    Mainardi Fan, Fernando; Schwanenberg, Dirk; Collischonn, Walter; Assis dos Reis, Alberto; Alvarado Montero, Rodolfo; Alencar Siqueira, Vinicius

    2015-04-01

    In the present study we investigate the use of medium-range streamflow forecasts in the Doce River basin (Brazil), at the reservoir of Aimorés Hydro Power Plant (HPP). During daily operations this reservoir acts as a "trap" to the sediments that originate from the upstream basin of the Doce River. This motivates a cleaning process called "pass through" to periodically remove the sediments from the reservoir. The "pass through" or "sediments flushing" process consists of a decrease of the reservoir's water level to a certain flushing level when a determined reservoir inflow threshold is forecasted. Then, the water in the approaching inflow is used to flush the sediments from the reservoir through the spillway and to recover the original reservoir storage. To be triggered, the sediments flushing operation requires an inflow larger than 3000m³/s in a forecast horizon of 7 days. This lead-time of 7 days is far beyond the basin's concentration time (around 2 days), meaning that the forecasts for the pass through procedure highly depends on Numerical Weather Predictions (NWP) models that generate Quantitative Precipitation Forecasts (QPF). This dependency creates an environment with a high amount of uncertainty to the operator. To support the decision making at Aimorés HPP we developed a fully operational hydrological forecasting system to the basin. The system is capable of generating ensemble streamflow forecasts scenarios when driven by QPF data from meteorological Ensemble Prediction Systems (EPS). This approach allows accounting for uncertainties in the NWP at a decision making level. This system is starting to be used operationally by CEMIG and is the one shown in the present study, including a hindcasting analysis to assess the performance of the system for the specific flushing problem. The QPF data used in the hindcasting study was derived from the TIGGE (THORPEX Interactive Grand Global Ensemble) database. Among all EPS available on TIGGE, three were

  3. Fuzzy decision trees for planning and autonomous control of a coordinated team of UAVs

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Nguyen, ThanhVu H.

    2007-04-01

    A fuzzy logic resource manager that enables a collection of unmanned aerial vehicles (UAVs) to automatically cooperate to make meteorological measurements will be discussed. Once in flight no human intervention is required. Planning and real-time control algorithms determine the optimal trajectory and points each UAV will sample, while taking into account the UAVs' risk, risk tolerance, reliability, mission priority, fuel limitations, mission cost, and related uncertainties. The control algorithm permits newly obtained information about weather and other events to be introduced to allow the UAVs to be more effective. The approach is illustrated by a discussion of the fuzzy decision tree for UAV path assignment and related simulation. The different fuzzy membership functions on the tree are described in mathematical detail. The different methods by which this tree is obtained are summarized including a method based on using a genetic program as a data mining function. A second fuzzy decision tree that allows the UAVs to automatically collaborate without human intervention is discussed. This tree permits three different types of collaborative behavior between the UAVs. Simulations illustrating how the tree allows the different types of collaboration to be automated are provided. Simulations also show the ability of the control algorithm to allow UAVs to effectively cooperate to increase the UAV team's likelihood of success.

  4. What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    2004-01-01

    To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…

  5. Test Reviews: Euler, B. L. (2007). "Emotional Disturbance Decision Tree". Lutz, FL: Psychological Assessment Resources

    ERIC Educational Resources Information Center

    Tansy, Michael

    2009-01-01

    The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…

  6. Data-Mining-Based Coronary Heart Disease Risk Prediction Model Using Fuzzy Logic and Decision Tree

    PubMed Central

    Kim, Jaekwon; Lee, Jongsik

    2015-01-01

    Objectives The importance of the prediction of coronary heart disease (CHD) has been recognized in Korea; however, few studies have been conducted in this area. Therefore, it is necessary to develop a method for the prediction and classification of CHD in Koreans. Methods A model for CHD prediction must be designed according to rule-based guidelines. In this study, a fuzzy logic and decision tree (classification and regression tree [CART])-driven CHD prediction model was developed for Koreans. Datasets derived from the Korean National Health and Nutrition Examination Survey VI (KNHANES-VI) were utilized to generate the proposed model. Results The rules were generated using a decision tree technique, and fuzzy logic was applied to overcome problems associated with uncertainty in CHD prediction. Conclusions The accuracy and receiver operating characteristic (ROC) curve values of the propose systems were 69.51% and 0.594, proving that the proposed methods were more efficient than other models. PMID:26279953

  7. Classification and concentration estimation of explosive precursors using nanowires sensor array and decision tree learning

    NASA Astrophysics Data System (ADS)

    Cho, Junghwan; Li, Xiaopeng; Gu, Zhiyong; Kurup, Pradeep

    2011-09-01

    This paper aims to classify and estimate concentrations of explosive precursors using a nanowire sensor array and decision tree learning algorithm. The nanowire sensor array consists of tin oxide sensors with four different additives, platinum (Pt), copper (Cu), indium (In), and nickel (Ni). The nanowire sensor array was tested using the vapors from four explosives precursors, acetone, nitrobenzene, nitrotoluene, and octane with 10 different concentration levels each. A pattern recognition technique based on decision tree learning was applied to classify the explosive precursors and estimate their concentration. Classification and regression tree (CART) analysis was used for classification. The CART was also utilized for the purpose of structure identification in Sugeno fuzzy inference system (FIS) for estimating the concentration of the precursors. Two CARTs were trained and their testing results were investigated.

  8. An expert-guided decision tree construction strategy: an application in knowledge discovery with medical databases.

    PubMed Central

    Tsai, Y. S.; King, P. H.; Higgins, M. S.; Pierce, D.; Patel, N. P.

    1997-01-01

    With the steady growth in electronic patient records and clinical medical informatics systems, the data collected for routine clinical use have been accumulating at a dramatic rate. Inter-disciplinary research provides a new generation of computation tools in knowledge discovery and data management is in great demand. In this study, an expert-guided decision tree construction strategy is proposed to offer an user-oriented knowledge discovery environment. The strategy allows experts, based on their expertise and/or preference, to override inductive decision tree construction process. Moreover, by reviewing decision paths, experts could focus on subsets of data that may be clues to new findings, or simply contaminated cases. PMID:9357618

  9. Ensembl 2007.

    PubMed

    Hubbard, T J P; Aken, B L; Beal, K; Ballester, B; Caccamo, M; Chen, Y; Clarke, L; Coates, G; Cunningham, F; Cutts, T; Down, T; Dyer, S C; Fitzgerald, S; Fernandez-Banet, J; Graf, S; Haider, S; Hammond, M; Herrero, J; Holland, R; Howe, K; Howe, K; Johnson, N; Kahari, A; Keefe, D; Kokocinski, F; Kulesha, E; Lawson, D; Longden, I; Melsopp, C; Megy, K; Meidl, P; Ouverdin, B; Parker, A; Prlic, A; Rice, S; Rios, D; Schuster, M; Sealy, I; Severin, J; Slater, G; Smedley, D; Spudich, G; Trevanion, S; Vilella, A; Vogel, J; White, S; Wood, M; Cox, T; Curwen, V; Durbin, R; Fernandez-Suarez, X M; Flicek, P; Kasprzyk, A; Proctor, G; Searle, S; Smith, J; Ureta-Vidal, A; Birney, E

    2007-01-01

    The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of chordate genome sequences. Over the past year the number of genomes available from Ensembl has increased from 15 to 33, with the addition of sites for the mammalian genomes of elephant, rabbit, armadillo, tenrec, platypus, pig, cat, bush baby, common shrew, microbat and european hedgehog; the fish genomes of stickleback and medaka and the second example of the genomes of the sea squirt (Ciona savignyi) and the mosquito (Aedes aegypti). Some of the major features added during the year include the first complete gene sets for genomes with low-sequence coverage, the introduction of new strain variation data and the introduction of new orthology/paralog annotations based on gene trees. PMID:17148474

  10. Minimizing the cost of translocation failure with decision-tree models that predict species' behavioral response in translocation sites.

    PubMed

    Ebrahimi, Mehregan; Ebrahimie, Esmaeil; Bull, C Michael

    2015-08-01

    The high number of failures is one reason why translocation is often not recommended. Considering how behavior changes during translocations may improve translocation success. To derive decision-tree models for species' translocation, we used data on the short-term responses of an endangered Australian skink in 5 simulated translocations with different release conditions. We used 4 different decision-tree algorithms (decision tree, decision-tree parallel, decision stump, and random forest) with 4 different criteria (gain ratio, information gain, gini index, and accuracy) to investigate how environmental and behavioral parameters may affect the success of a translocation. We assumed behavioral changes that increased dispersal away from a release site would reduce translocation success. The trees became more complex when we included all behavioral parameters as attributes, but these trees yielded more detailed information about why and how dispersal occurred. According to these complex trees, there were positive associations between some behavioral parameters, such as fight and dispersal, that showed there was a higher chance, for example, of dispersal among lizards that fought than among those that did not fight. Decision trees based on parameters related to release conditions were easier to understand and could be used by managers to make translocation decisions under different circumstances. PMID:25737134

  11. Application of decision tree algorithm for identification of rock forming minerals using energy dispersive spectrometry

    NASA Astrophysics Data System (ADS)

    Akkaş, Efe; Çubukçu, H. Evren; Artuner, Harun

    2014-05-01

    Rapid and automated mineral identification is compulsory in certain applications concerning natural rocks. Among all microscopic and spectrometric methods, energy dispersive X-ray spectrometers (EDS) integrated with scanning electron microscopes produce rapid information with reliable chemical data. Although obtaining elemental data with EDS analyses is fast and easy by the help of improving technology, it is rather challenging to perform accurate and rapid identification considering the large quantity of minerals in a rock sample with varying dimensions ranging between nanometer to centimeter. Furthermore, the physical properties of the specimen (roughness, thickness, electrical conductivity, position in the instrument etc.) and the incident electron beam (accelerating voltage, beam current, spot size etc.) control the produced characteristic X-ray, which in turn affect the elemental analyses. In order to minimize the effects of these physical constraints and develop an automated mineral identification system, a rule induction paradigm has been applied to energy dispersive spectral data. Decision tree classifiers divide training data sets into subclasses using generated rules or decisions and thereby it produces classification or recognition associated with these data sets. A number of thinsections prepared from rock samples with suitable mineralogy have been investigated and a preliminary 12 distinct mineral groups (olivine, orthopyroxene, clinopyroxene, apatite, amphibole, plagioclase, K- feldspar, zircon, magnetite, titanomagnetite, biotite, quartz), comprised mostly of silicates and oxides, have been selected. Energy dispersive spectral data for each group, consisting of 240 reference and 200 test analyses, have been acquired under various, non-standard, physical and electrical conditions. The reference X-Ray data have been used to assign the spectral distribution of elements to the specified mineral groups. Consequently, the test data have been analyzed using

  12. Tools of the Future: How Decision Tree Analysis Will Impact Mission Planning

    NASA Technical Reports Server (NTRS)

    Otterstatter, Matthew R.

    2005-01-01

    The universe is infinitely complex; however, the human mind has a finite capacity. The multitude of possible variables, metrics, and procedures in mission planning are far too many to address exhaustively. This is unfortunate because, in general, considering more possibilities leads to more accurate and more powerful results. To compensate, we can get more insightful results by employing our greatest tool, the computer. The power of the computer will be utilized through a technology that considers every possibility, decision tree analysis. Although decision trees have been used in many other fields, this is innovative for space mission planning. Because this is a new strategy, no existing software is able to completely accommodate all of the requirements. This was determined through extensive research and testing of current technologies. It was necessary to create original software, for which a short-term model was finished this summer. The model was built into Microsoft Excel to take advantage of the familiar graphical interface for user input, computation, and viewing output. Macros were written to automate the process of tree construction, optimization, and presentation. The results are useful and promising. If this tool is successfully implemented in mission planning, our reliance on old-fashioned heuristics, an error-prone shortcut for handling complexity, will be reduced. The computer algorithms involved in decision trees will revolutionize mission planning. The planning will be faster and smarter, leading to optimized missions with the potential for more valuable data.

  13. Confident Surgical Decision Making in Temporal Lobe Epilepsy by Heterogeneous Classifier Ensembles

    PubMed Central

    Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Jafari-Khouzani, Kourosh; Elisevich, Kost; Fotouhi, Farshad

    2015-01-01

    In medical domains with low tolerance for invalid predictions, classification confidence is highly important and traditional performance measures such as overall accuracy cannot provide adequate insight into classifications reliability. In this paper, a confident-prediction rate (CPR) which measures the upper limit of confident predictions has been proposed based on receiver operating characteristic (ROC) curves. It has been shown that heterogeneous ensemble of classifiers improves this measure. This ensemble approach has been applied to lateralization of focal epileptogenicity in temporal lobe epilepsy (TLE) and prediction of surgical outcomes. A goal of this study is to reduce extraoperative electrocorticography (eECoG) requirement which is the practice of using electrodes placed directly on the exposed surface of the brain. We have shown that such goal is achievable with application of data mining techniques. Furthermore, all TLE surgical operations do not result in complete relief from seizures and it is not always possible for human experts to identify such unsuccessful cases prior to surgery. This study demonstrates the capability of data mining techniques in prediction of undesirable outcome for a portion of such cases. PMID:26609547

  14. Image Change Detection via Ensemble Learning

    SciTech Connect

    Martin, Benjamin W; Vatsavai, Raju

    2013-01-01

    The concept of geographic change detection is relevant in many areas. Changes in geography can reveal much information about a particular location. For example, analysis of changes in geography can identify regions of population growth, change in land use, and potential environmental disturbance. A common way to perform change detection is to use a simple method such as differencing to detect regions of change. Though these techniques are simple, often the application of these techniques is very limited. Recently, use of machine learning methods such as neural networks for change detection has been explored with great success. In this work, we explore the use of ensemble learning methodologies for detecting changes in bitemporal synthetic aperture radar (SAR) images. Ensemble learning uses a collection of weak machine learning classifiers to create a stronger classifier which has higher accuracy than the individual classifiers in the ensemble. The strength of the ensemble lies in the fact that the individual classifiers in the ensemble create a mixture of experts in which the final classification made by the ensemble classifier is calculated from the outputs of the individual classifiers. Our methodology leverages this aspect of ensemble learning by training collections of weak decision tree based classifiers to identify regions of change in SAR images collected of a region in the Staten Island, New York area during Hurricane Sandy. Preliminary studies show that the ensemble method has approximately 11.5% higher change detection accuracy than an individual classifier.

  15. Validating a decision tree for serious infection: diagnostic accuracy in acutely ill children in ambulatory care

    PubMed Central

    Verbakel, Jan Y; Lemiengre, Marieke B; De Burghgraeve, Tine; De Sutter, An; Aertgeerts, Bert; Bullens, Dominique M A; Shinkins, Bethany; Van den Bruel, Ann; Buntinx, Frank

    2015-01-01

    Objective Acute infection is the most common presentation of children in primary care with only few having a serious infection (eg, sepsis, meningitis, pneumonia). To avoid complications or death, early recognition and adequate referral are essential. Clinical prediction rules have the potential to improve diagnostic decision-making for rare but serious conditions. In this study, we aimed to validate a recently developed decision tree in a new but similar population. Design Diagnostic accuracy study validating a clinical prediction rule. Setting and participants Acutely ill children presenting to ambulatory care in Flanders, Belgium, consisting of general practice and paediatric assessment in outpatient clinics or the emergency department. Intervention Physicians were asked to score the decision tree in every child. Primary outcome measures The outcome of interest was hospital admission for at least 24 h with a serious infection within 5 days after initial presentation. We report the diagnostic accuracy of the decision tree in sensitivity, specificity, likelihood ratios and predictive values. Results In total, 8962 acute illness episodes were included, of which 283 lead to admission to hospital with a serious infection. Sensitivity of the decision tree was 100% (95% CI 71.5% to 100%) at a specificity of 83.6% (95% CI 82.3% to 84.9%) in the general practitioner setting with 17% of children testing positive. In the paediatric outpatient and emergency department setting, sensitivities were below 92%, with specificities below 44.8%. Conclusions In an independent validation cohort, this clinical prediction rule has shown to be extremely sensitive to identify children at risk of hospital admission for a serious infection in general practice, making it suitable for ruling out. Trial registration number NCT02024282. PMID:26254472

  16. Using Boosted Decision Trees to Separate Signal and Background in B to XsGamma Decays

    SciTech Connect

    Barber, James; /Massachusetts U., Amherst /SLAC

    2006-09-27

    The measurement of the branching fraction of the flavor changing neutral current B {yields} X{sub s}{gamma} transition can be used to expose physics outside the Standard Model. In order to make a precise measurement of this inclusive branching fraction, it is necessary to be able to effectively separate signal and background in the data. In order to achieve better separation, an algorithm based on Boosted Decision Trees (BDTs) is implemented. Using Monte Carlo simulated events, ''forests'' of trees were trained and tested with different sets of parameters. This parameter space was studied with the goal of maximizing the figure of merit, Q, the measure of separation quality used in this analysis. It is found that the use of 1000 trees, with 100 values tested for each variable at each node, and 50 events required for a node to continue separating give the highest figure of merit, Q = 18.37.

  17. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach.

    PubMed

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  18. Circum-Arctic petroleum systems identified using decision-tree chemometrics

    USGS Publications Warehouse

    Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.

    2007-01-01

    Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.

  19. Three-dimensional object recognition using similar triangles and decision trees

    NASA Technical Reports Server (NTRS)

    Spirkovska, Lilly

    1993-01-01

    A system, TRIDEC, that is capable of distinguishing between a set of objects despite changes in the objects' positions in the input field, their size, or their rotational orientation in 3D space is described. TRIDEC combines very simple yet effective features with the classification capabilities of inductive decision tree methods. The feature vector is a list of all similar triangles defined by connecting all combinations of three pixels in a coarse coded 127 x 127 pixel input field. The classification is accomplished by building a decision tree using the information provided from a limited number of translated, scaled, and rotated samples. Simulation results are presented which show that TRIDEC achieves 94 percent recognition accuracy in the 2D invariant object recognition domain and 98 percent recognition accuracy in the 3D invariant object recognition domain after training on only a small sample of transformed views of the objects.

  20. Decision tree approach for classification of remotely sensed satellite data using open source support

    NASA Astrophysics Data System (ADS)

    Sharma, Richa; Ghosh, Aniruddha; Joshi, P. K.

    2013-10-01

    In this study, an attempt has been made to develop a decision tree classification (DTC) algorithm for classification of remotely sensed satellite data (Landsat TM) using open source support. The decision tree is constructed by recursively partitioning the spectral distribution of the training dataset using WEKA, open source data mining software. The classified image is compared with the image classified using classical ISODATA clustering and Maximum Likelihood Classifier (MLC) algorithms. Classification result based on DTC method provided better visual depiction than results produced by ISODATA clustering or by MLC algorithms. The overall accuracy was found to be 90% (kappa = 0.88) using the DTC, 76.67% (kappa = 0.72) using the Maximum Likelihood and 57.5% (kappa = 0.49) using ISODATA clustering method. Based on the overall accuracy and kappa statistics, DTC was found to be more preferred classification approach than others.

  1. Using Decision Trees for Estimating Mode Choice of Trips in Buca-Izmir

    NASA Astrophysics Data System (ADS)

    Oral, L. O.; Tecim, V.

    2013-05-01

    Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.

  2. Office of Legacy Management Decision Tree for Solar Photovoltaic Projects - 13317

    SciTech Connect

    Elmer, John; Butherus, Michael; Barr, Deborah L.

    2013-07-01

    To support consideration of renewable energy power development as a land reuse option, the DOE Office of Legacy Management (LM) and the National Renewable Energy Laboratory (NREL) established a partnership to conduct an assessment of wind and solar renewable energy resources on LM lands. From a solar capacity perspective, the larger sites in the western United States present opportunities for constructing solar photovoltaic (PV) projects. A detailed analysis and preliminary plan was developed for three large sites in New Mexico, assessing the costs, the conceptual layout of a PV system, and the electric utility interconnection process. As a result of the study, a 1,214-hectare (3,000-acre) site near Grants, New Mexico, was chosen for further study. The state incentives, utility connection process, and transmission line capacity were key factors in assessing the feasibility of the project. LM's Durango, Colorado, Disposal Site was also chosen for consideration because the uranium mill tailings disposal cell is on a hillside facing south, transmission lines cross the property, and the community was very supportive of the project. LM worked with the regulators to demonstrate that the disposal cell's long-term performance would not be impacted by the installation of a PV solar system. A number of LM-unique issues were resolved in making the site available for a private party to lease a portion of the site for a solar PV project. A lease was awarded in September 2012. Using a solar decision tree that was developed and launched by the EPA and NREL, LM has modified and expanded the decision tree structure to address the unique aspects and challenges faced by LM on its multiple sites. The LM solar decision tree covers factors such as land ownership, usable acreage, financial viability of the project, stakeholder involvement, and transmission line capacity. As additional sites are transferred to LM in the future, the decision tree will assist in determining whether a solar

  3. Flood-type classification in mountainous catchments using crisp and fuzzy decision trees

    NASA Astrophysics Data System (ADS)

    Sikorska, Anna E.; Viviroli, Daniel; Seibert, Jan

    2015-10-01

    Floods are governed by largely varying processes and thus exhibit various behaviors. Classification of flood events into flood types and the determination of their respective frequency is therefore important for a better understanding and prediction of floods. This study presents a flood classification for identifying flood patterns at a catchment scale by means of a fuzzy decision tree. Hence, events are represented as a spectrum of six main possible flood types that are attributed with their degree of acceptance. Considered types are flash, short rainfall, long rainfall, snow-melt, rainfall on snow and, in high alpine catchments, glacier-melt floods. The fuzzy decision tree also makes it possible to acknowledge the uncertainty present in the identification of flood processes and thus allows for more reliable flood class estimates than using a crisp decision tree, which identifies one flood type per event. Based on the data set in nine Swiss mountainous catchments, it was demonstrated that this approach is less sensitive to uncertainties in the classification attributes than the classical crisp approach. These results show that the fuzzy approach bears additional potential for analyses of flood patterns at a catchment scale and thereby it provides more realistic representation of flood processes.

  4. Data mining for multiagent rules, strategies, and fuzzy decision tree structure

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Rhyne, Robert D., II; Fisher, Kristin

    2002-03-01

    A fuzzy logic based resource manager (RM) has been developed that automatically allocates electronic attack resources in real-time over many dissimilar platforms. Two different data mining algorithms have been developed to determine rules, strategies, and fuzzy decision tree structure. The first data mining algorithm uses a genetic algorithm as a data mining function and is called from an electronic game. The game allows a human expert to play against the resource manager in a simulated battlespace with each of the defending platforms being exclusively directed by the fuzzy resource manager and the attacking platforms being controlled by the human expert or operating autonomously under their own logic. This approach automates the data mining problem. The game automatically creates a database reflecting the domain expert's knowledge. It calls a data mining function, a genetic algorithm, for data mining of the database as required and allows easy evaluation of the information mined in the second step. The criterion for re- optimization is discussed as well as experimental results. Then a second data mining algorithm that uses a genetic program as a data mining function is introduced to automatically discover fuzzy decision tree structures. Finally, a fuzzy decision tree generated through this process is discussed.

  5. Building Decision Trees for Characteristic Ellipsoid Method to Monitor Power System Transient Behaviors

    SciTech Connect

    Ma, Jian; Diao, Ruisheng; Makarov, Yuri V.; Etingov, Pavel V.; Zhou, Ning; Dagle, Jeffery E.

    2010-12-01

    The characteristic ellipsoid is a new method to monitor the dynamics of power systems. Decision trees (DTs) play an important role in applying the characteristic ellipsoid method to system operation and analysis. This paper presents the idea and initial results of building DTs for detecting transient dynamic events using the characteristic ellipsoid method. The objective is to determine fault types, fault locations and clearance time in the system using decision trees based on ellipsoids of system transient responses. The New England 10-machine 39-bus system is used for running dynamic simulations to generate a sufficiently large number of transient events in different system configurations. Comprehensive transient simulations considering three fault types, two fault clearance times and different fault locations were conducted in the study. Bus voltage magnitudes and monitored reactive and active power flows are recorded as the phasor measurements to calculate characteristic ellipsoids whose volume, eccentricity, center and projection of the longest axis are used as indices to build decision trees. The DT performances are tested and compared by considering different sets of PMU locations. The proposed method demonstrates that the characteristic ellipsoid method is a very efficient and promising tool to monitor power system dynamic behaviors.

  6. Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm

    PubMed Central

    2014-01-01

    Background In the application of microarray data, how to select a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers is an important issue. Many researchers use various computational intelligence methods to analyzed gene expression data. Results To achieve efficient gene selection from thousands of candidate genes that can contribute in identifying cancers, this study aims at developing a novel method utilizing particle swarm optimization combined with a decision tree as the classifier. This study also compares the performance of our proposed method with other well-known benchmark classification methods (support vector machine, self-organizing map, back propagation neural network, C4.5 decision tree, Naive Bayes, CART decision tree, and artificial immune recognition system) and conducts experiments on 11 gene expression cancer datasets. Conclusion Based on statistical analysis, our proposed method outperforms other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets. Further, the housekeeping genes with various expression patterns and tissue-specific genes are identified. These genes provide a high discrimination power on cancer classification. PMID:24555567

  7. MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination

    NASA Technical Reports Server (NTRS)

    Riggs, George A.; Hall, Dorothy K.

    2010-01-01

    Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.

  8. A Study of Factors that Influence First-Year Nonmusic Majors' Decisions to Participate in Music Ensembles at Small Liberal Arts Colleges in Indiana

    ERIC Educational Resources Information Center

    Faber, Ardis R.

    2010-01-01

    The purpose of this study was to investigate factors that influence first-year nonmusic majors' decisions regarding participation in music ensembles at small liberal arts colleges in Indiana. A survey questionnaire was used to gather data. The data collected was analyzed to determine significant differences between the nonmusic majors who have…

  9. A decision tree for selecting the most cost-effective waste disposal strategy in foodservice operations.

    PubMed

    Wie, Seunghee; Shanklin, Carol W; Lee, Kyung-Eun

    2003-04-01

    The purposes of this study were to determine costs of disposal strategies for wastes generated in foodservice operations and to develop a decision tree to determine the most cost-effective disposal strategy for foodservice operations. Four cases, including the central food processing center (CFPC) in a school district, a continuing-care retirement center (CCRC), a university dining center (UDC), and a commercial chain restaurant (CCR), were studied to determine the most cost-effective disposal strategy. Annual costs for the current and projected strategies were determined for each case. Results of waste characterization studies and stopwatch studies, interviews with foodservice directors, and water flow and electrical requirements from manufacturers' specifications were used to determine cost incurred. The annual percentage increases for labor, fees, and services were used to reflect an inflated economic condition for the ensuing 10 years of the study period. The Net Present Worth method was used to compare costs of strategies, and the multiparameter sensitivity analysis was conducted to examine the tolerance of the chosen strategy. The most cost-effective strategy differed among foodservice operations because of the composition of food and packaging wastes, the quantity of recyclable materials, the waste-hauling charges, labor costs, start-up costs, and inflation rate. For example, the use of a garbage disposal for food waste and landfills and recycling for packaging waste were the most cost-effective strategies for the CCRC. A decision tree was developed to illustrate the decision-making process that occurs when conducting cost analysis and subsequent decisions. Dietetics practitioners can use the decision tree when evaluating the results of the cost analysis. PMID:12669011

  10. Bonsai Trees in Your Head: How the Pavlovian System Sculpts Goal-Directed Choices by Pruning Decision Trees

    PubMed Central

    O'Nions, Elizabeth; Sheridan, Luke; Dayan, Peter; Roiser, Jonathan P.

    2012-01-01

    When planning a series of actions, it is usually infeasible to consider all potential future sequences; instead, one must prune the decision tree. Provably optimal pruning is, however, still computationally ruinous and the specific approximations humans employ remain unknown. We designed a new sequential reinforcement-based task and showed that human subjects adopted a simple pruning strategy: during mental evaluation of a sequence of choices, they curtailed any further evaluation of a sequence as soon as they encountered a large loss. This pruning strategy was Pavlovian: it was reflexively evoked by large losses and persisted even when overwhelmingly counterproductive. It was also evident above and beyond loss aversion. We found that the tendency towards Pavlovian pruning was selectively predicted by the degree to which subjects exhibited sub-clinical mood disturbance, in accordance with theories that ascribe Pavlovian behavioural inhibition, via serotonin, a role in mood disorders. We conclude that Pavlovian behavioural inhibition shapes highly flexible, goal-directed choices in a manner that may be important for theories of decision-making in mood disorders. PMID:22412360

  11. Socioeconomic determinants of menarche in rural Polish girls using the decision trees method.

    PubMed

    Matusik, Stanisław; Laska-Mierzejewska, Teresa; Chrzanowska, Maria

    2011-05-01

    The aim of this study was to assess the usefulness of the decision trees method as a research method of multidimensional associations between menarche and socioeconomic variables. The article is based on data collected from the rural area of Choszczno in the West Pomerania district of Poland between 1987 and 2001. Girls were asked about the appearance of first menstruation (a yes/no method). The average menarchal age was estimated by the probit analysis method, using second grade polynomials. The socioeconomic status of the girls' families was determined using five qualitative variables: fathers' and mothers' educational level, source of income, household appliances and the number of children in a family. For classification based on five socioeconomic variables, one of the most effective algorithms CART (Classification and Regression Trees) was used. In 2001 the menarchal age in 66% of examined girls was properly classified, while a higher efficiency of 70% was obtained for girls examined in 1987. The decision trees method enabled the definition of the hierarchy of socioeconomic variables influencing girls' biological development level. The strongest discriminatory power was attributed to the number of children in a family, and the mother's and then father's educational level. Using this method it is possible to detect differences in strength of socioeconomic variables associated with girls' pubescence before 1987 and after 2001 during the transformation of the economic and political systems in Poland. However, the decision trees method is infrequently applied in social sciences and constitutes a novelty; this article proves its usefulness in examining relations between biological processes and a population's living conditions. PMID:21211091

  12. Using decision trees to manage hospital readmission risk for acute myocardial infarction, heart failure, and pneumonia.

    PubMed

    Hilbert, John P; Zasadil, Scott; Keyser, Donna J; Peele, Pamela B

    2014-12-01

    To improve healthcare quality and reduce costs, the Affordable Care Act places hospitals at financial risk for excessive readmissions associated with acute myocardial infarction (AMI), heart failure (HF), and pneumonia (PN). Although predictive analytics is increasingly looked to as a means for measuring, comparing, and managing this risk, many modeling tools require data inputs that are not readily available and/or additional resources to yield actionable information. This article demonstrates how hospitals and clinicians can use their own structured discharge data to create decision trees that produce highly transparent, clinically relevant decision rules for better managing readmission risk associated with AMI, HF, and PN. For illustrative purposes, basic decision trees are trained and tested using publically available data from the California State Inpatient Databases and an open-source statistical package. As expected, these simple models perform less well than other more sophisticated tools, with areas under the receiver operating characteristic (ROC) curve (or AUC) of 0.612, 0.583, and 0.650, respectively, but achieve a lift of at least 1.5 or greater for higher-risk patients with any of the three conditions. More importantly, they are shown to offer substantial advantages in terms of transparency and interpretability, comprehensiveness, and adaptability. By enabling hospitals and clinicians to identify important factors associated with readmissions, target subgroups of patients at both high and low risk, and design and implement interventions that are appropriate to the risk levels observed, decision trees serve as an ideal application for addressing the challenge of reducing hospital readmissions. PMID:25160603

  13. Cloud Detection from Satellite Imagery: A Comparison of Expert-Generated and Automatically-Generated Decision Trees

    NASA Technical Reports Server (NTRS)

    Shiffman, Smadar

    2004-01-01

    Automated cloud detection and tracking is an important step in assessing global climate change via remote sensing. Cloud masks, which indicate whether individual pixels depict clouds, are included in many of the data products that are based on data acquired on- board earth satellites. Many cloud-mask algorithms have the form of decision trees, which employ sequential tests that scientists designed based on empirical astrophysics studies and astrophysics simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In this study we explored the potential benefits of automatically-learned decision trees for detecting clouds from images acquired using the Advanced Very High Resolution Radiometer (AVHRR) instrument on board the NOAA-14 weather satellite of the National Oceanic and Atmospheric Administration. We constructed three decision trees for a sample of 8km-daily AVHRR data from 2000 using a decision-tree learning procedure provided within MATLAB(R), and compared the accuracy of the decision trees to the accuracy of the cloud mask. We used ground observations collected by the National Aeronautics and Space Administration Clouds and the Earth s Radiant Energy Systems S COOL project as the gold standard. For the sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks included in the AVHRR data product.

  14. Decision Tree based Prediction and Rule Induction for Groundwater Trichloroethene (TCE) Pollution Vulnerability

    NASA Astrophysics Data System (ADS)

    Park, J.; Yoo, K.

    2013-12-01

    For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.

  15. Merging Multi-model CMIP5/PMIP3 Past-1000 Ensemble Simulations with Tree Ring Proxy Data by Optimal Interpolation Approach

    NASA Astrophysics Data System (ADS)

    Chen, Xin; Luo, Yong; Xing, Pei; Nie, Suping; Tian, Qinhua

    2015-04-01

    Two sets of gridded annual mean surface air temperature in past millennia over the Northern Hemisphere was constructed employing optimal interpolation (OI) method so as to merge the tree ring proxy records with the simulations from CMIP5 (the fifth phase of the Climate Model Intercomparison Project). Both the uncertainties in proxy reconstruction and model simulations can be taken into account applying OI algorithm. For better preservation of physical coordinated features and spatial-temporal completeness of climate variability in 7 copies of model results, we perform the Empirical Orthogonal Functions (EOF) analysis to truncate the ensemble mean field as the first guess (background field) for OI. 681 temperature sensitive tree-ring chronologies are collected and screened from International Tree Ring Data Bank (ITRDB) and Past Global Changes (PAGES-2k) project. Firstly, two methods (variance matching and linear regression) are employed to calibrate the tree ring chronologies with instrumental data (CRUTEM4v) individually. In addition, we also remove the bias of both the background field and proxy records relative to instrumental dataset. Secondly, time-varying background error covariance matrix (B) and static "observation" error covariance matrix (R) are calculated for OI frame. In our scheme, matrix B was calculated locally, and "observation" error covariance are partially considered in R matrix (the covariance value between the pairs of tree ring sites that are very close to each other would be counted), which is different from the traditional assumption that R matrix should be diagonal. Comparing our results, it turns out that regional averaged series are not sensitive to the selection for calibration methods. The Quantile-Quantile plots indicate regional climatologies based on both methods are tend to be more agreeable with regional reconstruction of PAGES-2k in 20th century warming period than in little ice age (LIA). Lager volcanic cooling response over Asia

  16. Application of Decision Tree Algorithm for classification and identification of natural minerals using SEM-EDS

    NASA Astrophysics Data System (ADS)

    Akkaş, Efe; Akin, Lutfiye; Evren Çubukçu, H.; Artuner, Harun

    2015-07-01

    A mineral is a natural, homogeneous solid with a definite chemical composition and a highly ordered atomic arrangement. Recently, fast and accurate mineral identification/classification became a necessity. Energy Dispersive X-ray Spectrometers integrated with Scanning Electron Microscopes (SEM) are used to obtain rapid and reliable elemental analysis or chemical characterization of a solid. However, mineral identification is challenging since there is wide range of spectral dataset for natural minerals. The more mineralogical data acquired, time required for classification procedures increases. Moreover, applied instrumental conditions on a SEM-EDS differ for various applications, affecting the produced X-ray patterns even for the same mineral. This study aims to test whether C5.0 Decision Tree is a rapid and reliable method algorithm for classification and identification of various natural magmatic minerals. Ten distinct mineral groups (olivine, orthopyroxene, clinopyroxene, apatite, amphibole, plagioclase, K-feldspar, zircon, magnetite, biotite) from different igneous rocks have been analyzed on SEM-EDS. 4601 elemental X-ray intensity data have been collected under various instrumental conditions. 2400 elemental data have been used to train and the remaining 2201 data have been tested to identify the minerals. The vast majority of the test data have been classified accurately. Additionally, high accuracy has been reached on the minerals with similar chemical composition, such as olivine ((Mg,Fe)2[SiO4]) and orthopyroxene ((Mg,Fe)2[SiO6]). Furthermore, two members from amphibole group (magnesiohastingsite, tschermakite) and two from clinopyroxene group (diopside, hedenbergite) have been accurately identified by the Decision Tree Algorithm. These results demonstrate that C5.0 Decision Tree Algorithm is an efficient method for mineral group classification and the identification of mineral members.

  17. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization.

    PubMed

    Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin

    2015-08-01

    Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods. PMID:26737960

  18. Decision Tree Classifier for Classification of Plant and Animal Micro RNA's

    NASA Astrophysics Data System (ADS)

    Pant, Bhasker; Pant, Kumud; Pardasani, K. R.

    Gene expression is regulated by miRNAs or micro RNAs which can be 21-23 nucleotide in length. They are non coding RNAs which control gene expression either by translation repression or mRNA degradation. Plants and animals both contain miRNAs which have been classified by wet lab techniques. These techniques are highly expensive, labour intensive and time consuming. Hence faster and economical computational approaches are needed. In view of above a machine learning model has been developed for classification of plant and animal miRNAs using decision tree classifier. The model has been tested on available data and it gives results with 91% accuracy.

  19. Improvement and analysis of ID3 algorithm in decision-making tree

    NASA Astrophysics Data System (ADS)

    Xie, Xiao-Lan; Long, Zhen; Liao, Wen-Qi

    2015-12-01

    For the cooperative system under development, it needs to use the spatial analysis and relative technology concerning data mining in order to carry out the detection of the subject conflict and redundancy, while the ID3 algorithm is an important data mining. Due to the traditional ID3 algorithm in the decision-making tree towards the log part is rather complicated, this paper obtained a new computational formula of information gain through the optimization of algorithm of the log part. During the experiment contrast and theoretical analysis, it is found that IID3 (Improved ID3 Algorithm) algorithm owns higher calculation efficiency and accuracy and thus worth popularizing.

  20. A comparison of student academic achievement using decision trees techniques: Reflection from University Malaysia Perlis

    NASA Astrophysics Data System (ADS)

    Aziz, Fatihah; Jusoh, Abd Wahab; Abu, Mohd Syafarudy

    2015-05-01

    A decision tree is one of the techniques in data mining for prediction. Using this method, hidden information from abundant of data can be taken out and interpret the information into useful knowledge. In this paper the academic performance of the student will be examined from 2002 to 2012 from two faculties; Faculty of Manufacturing Engineering and Faculty of Microelectronic Engineering in University Malaysia Perlis (UniMAP). The objectives of this study are to determine and compare the factors that affect the students' academic achievement between the two faculties. The prediction results show there are five attributes that have been considered as factors that influence the students' academic performance.

  1. Are decision trees a feasible knowledge representation to guide extraction of critical information from randomized controlled trial reports?

    PubMed Central

    Chung, Grace Y; Coiera, Enrico

    2008-01-01

    Background This paper proposes the use of decision trees as the basis for automatically extracting information from published randomized controlled trial (RCT) reports. An exploratory analysis of RCT abstracts is undertaken to investigate the feasibility of using decision trees as a semantic structure. Quality-of-paper measures are also examined. Methods A subset of 455 abstracts (randomly selected from a set of 7620 retrieved from Medline from 1998 – 2006) are examined for the quality of RCT reporting, the identifiability of RCTs from abstracts, and the completeness and complexity of RCT abstracts with respect to key decision tree elements. Abstracts were manually assigned to 6 sub-groups distinguishing whether they were primary RCTs versus other design types. For primary RCT studies, we analyzed and annotated the reporting of intervention comparison, population assignment and outcome values. To measure completeness, the frequencies by which complete intervention, population and outcome information are reported in abstracts were measured. A qualitative examination of the reporting language was conducted. Results Decision tree elements are manually identifiable in the majority of primary RCT abstracts. 73.8% of a random subset was primary studies with a single population assigned to two or more interventions. 68% of these primary RCT abstracts were structured. 63% contained pharmaceutical interventions. 84% reported the total number of study subjects. In a subset of 21 abstracts examined, 71% reported numerical outcome values. Conclusion The manual identifiability of decision tree elements in the abstract suggests that decision trees could be a suitable construct to guide machine summarisation of RCTs. The presence of decision tree elements could also act as an indicator for RCT report quality in terms of completeness and uniformity. PMID:18957129

  2. Integrating Decision Tree and Hidden Markov Model (HMM) for Subtype Prediction of Human Influenza A Virus

    NASA Astrophysics Data System (ADS)

    Attaluri, Pavan K.; Chen, Zhengxin; Weerakoon, Aruna M.; Lu, Guoqing

    Multiple criteria decision making (MCDM) has significant impact in bioinformatics. In the research reported here, we explore the integration of decision tree (DT) and Hidden Markov Model (HMM) for subtype prediction of human influenza A virus. Infection with influenza viruses continues to be an important public health problem. Viral strains of subtype H3N2 and H1N1 circulates in humans at least twice annually. The subtype detection depends mainly on the antigenic assay, which is time-consuming and not fully accurate. We have developed a Web system for accurate subtype detection of human influenza virus sequences. The preliminary experiment showed that this system is easy-to-use and powerful in identifying human influenza subtypes. Our next step is to examine the informative positions at the protein level and extend its current functionality to detect more subtypes. The web functions can be accessed at http://glee.ist.unomaha.edu/.

  3. Trees

    ERIC Educational Resources Information Center

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  4. Genetic algorithm-based neural fuzzy decision tree for mixed scheduling in ATM networks.

    PubMed

    Lin, Chin-Teng; Chung, I-Fang; Pu, Her-Chang; Lee', Tsern-Huei; Chang, Jyh-Yeong

    2002-01-01

    Future broadband integrated services networks based on asynchronous transfer mode (ATM) technology are expected to support multiple types of multimedia information with diverse statistical characteristics and quality of service (QoS) requirements. To meet these requirements, efficient scheduling methods are important for traffic control in ATM networks. Among general scheduling schemes, the rate monotonic algorithm is simple enough to be used in high-speed networks, but does not attain the high system utilization of the deadline driven algorithm. However, the deadline driven scheme is computationally complex and hard to implement in hardware. The mixed scheduling algorithm is a combination of the rate monotonic algorithm and the deadline driven algorithm; thus it can provide most of the benefits of these two algorithms. In this paper, we use the mixed scheduling algorithm to achieve high system utilization under the hardware constraint. Because there is no analytic method for schedulability testing of mixed scheduling, we propose a genetic algorithm-based neural fuzzy decision tree (GANFDT) to realize it in a real-time environment. The GANFDT combines a GA and a neural fuzzy network into a binary classification tree. This approach also exploits the power of the classification tree. Simulation results show that the GANFDT provides an efficient way of carrying out mixed scheduling in ATM networks. PMID:18244889

  5. Decision-tree analysis of factors influencing rainfall-related building structure and content damage

    NASA Astrophysics Data System (ADS)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-09-01

    Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the

  6. Decision tree analysis of factors influencing rainfall-related building damage

    NASA Astrophysics Data System (ADS)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-04-01

    Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a

  7. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

    PubMed Central

    Chang, Wan-Yu; Chiu, Chung-Cheng; Yang, Jia-Horng

    2015-01-01

    In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods. PMID:26393597

  8. Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining

    PubMed Central

    Habibi, Shafi; Ahmadi, Maryam; Alizadeh, Somayeh

    2015-01-01

    Objectives: The aim of this study was to examine a predictive model using features related to the diabetes type 2 risk factors. Methods: The data were obtained from a database in a diabetes control system in Tabriz, Iran. The data included all people referred for diabetes screening between 2009 and 2011. The features considered as “Inputs” were: age, sex, systolic and diastolic blood pressure, family history of diabetes, and body mass index (BMI). Moreover, we used diagnosis as “Class”. We applied the “Decision Tree” technique and “J48” algorithm in the WEKA (3.6.10 version) software to develop the model. Results: After data preprocessing and preparation, we used 22,398 records for data mining. The model precision to identify patients was 0.717. The age factor was placed in the root node of the tree as a result of higher information gain. The ROC curve indicates the model function in identification of patients and those individuals who are healthy. The curve indicates high capability of the model, especially in identification of the healthy persons. Conclusions: We developed a model using the decision tree for screening T2DM which did not require laboratory tests for T2DM diagnosis. PMID:26156928

  9. Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree

    PubMed Central

    Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad

    2015-01-01

    MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen–host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules. PMID:26649272

  10. Using decision-tree classifier systems to extract knowledge from databases

    NASA Technical Reports Server (NTRS)

    St.clair, D. C.; Sabharwal, C. L.; Hacke, Keith; Bond, W. E.

    1990-01-01

    One difficulty in applying artificial intelligence techniques to the solution of real world problems is that the development and maintenance of many AI systems, such as those used in diagnostics, require large amounts of human resources. At the same time, databases frequently exist which contain information about the process(es) of interest. Recently, efforts to reduce development and maintenance costs of AI systems have focused on using machine learning techniques to extract knowledge from existing databases. Research is described in the area of knowledge extraction using a class of machine learning techniques called decision-tree classifier systems. Results of this research suggest ways of performing knowledge extraction which may be applied in numerous situations. In addition, a measurement called the concept strength metric (CSM) is described which can be used to determine how well the resulting decision tree can differentiate between the concepts it has learned. The CSM can be used to determine whether or not additional knowledge needs to be extracted from the database. An experiment involving real world data is presented to illustrate the concepts described.

  11. Snow event classification with a 2D video disdrometer - A decision tree approach

    NASA Astrophysics Data System (ADS)

    Bernauer, F.; Hürkamp, K.; Rühm, W.; Tschiersch, J.

    2016-05-01

    Snowfall classification according to crystal type or degree of riming of the snowflakes is import for many atmospheric processes, e.g. wet deposition of aerosol particles. 2D video disdrometers (2DVD) have recently proved their capability to measure microphysical parameters of snowfall. The present work has the aim of classifying snowfall according to microphysical properties of single hydrometeors (e.g. shape and fall velocity) measured by means of a 2DVD. The constraints for the shape and velocity parameters which are used in a decision tree for classification of the 2DVD measurements, are derived from detailed on-site observations, combining automatic 2DVD classification with visual inspection. The developed decision tree algorithm subdivides the detected events into three classes of dominating crystal type (single crystals, complex crystals and pellets) and three classes of dominating degree of riming (weak, moderate and strong). The classification results for the crystal type were validated with an independent data set proving the unambiguousness of the classification. In addition, for three long-term events, good agreement of the classification results with independently measured maximum dimension of snowflakes, snowflake bulk density and surrounding temperature was found. The developed classification algorithm is applicable for wind speeds below 5.0 m s -1 and has the advantage of being easily implemented by other users.

  12. Cardiovascular Dysautonomias Diagnosis Using Crisp and Fuzzy Decision Tree: A Comparative Study.

    PubMed

    Kadi, Ilham; Idri, Ali

    2016-01-01

    Decision trees (DTs) are one of the most popular techniques for learning classification systems, especially when it comes to learning from discrete examples. In real world, many data occurred in a fuzzy form. Hence a DT must be able to deal with such fuzzy data. In fact, integrating fuzzy logic when dealing with imprecise and uncertain data allows reducing uncertainty and providing the ability to model fine knowledge details. In this paper, a fuzzy decision tree (FDT) algorithm was applied on a dataset extracted from the ANS (Autonomic Nervous System) unit of the Moroccan university hospital Avicenne. This unit is specialized on performing several dynamic tests to diagnose patients with autonomic disorder and suggest them the appropriate treatment. A set of fuzzy classifiers were generated using FID 3.4. The error rates of the generated FDTs were calculated to measure their performances. Moreover, a comparison between the error rates obtained using crisp and FDTs was carried out and has proved that the results of FDTs were better than those obtained using crisp DTs. PMID:27139378

  13. Prediction of Antimicrobial Activity of Synthetic Peptides by a Decision Tree Model

    PubMed Central

    Lira, Felipe; Perez, Pedro S.; Baranauskas, José A.

    2013-01-01

    Antimicrobial resistance is a persistent problem in the public health sphere. However, recent attempts to find effective substitutes to combat infections have been directed at identifying natural antimicrobial peptides in order to circumvent resistance to commercial antibiotics. This study describes the development of synthetic peptides with antimicrobial activity, created in silico by site-directed mutation modeling using wild-type peptides as scaffolds for these mutations. Fragments of antimicrobial peptides were used for modeling with molecular modeling computational tools. To analyze these peptides, a decision tree model, which indicated the action range of peptides on the types of microorganisms on which they can exercise biological activity, was created. The decision tree model was processed using physicochemistry properties from known antimicrobial peptides available at the Antimicrobial Peptide Database (APD). The two most promising peptides were synthesized, and antimicrobial assays showed inhibitory activity against Gram-positive and Gram-negative bacteria. Colossomin C and colossomin D were the most inhibitory peptides at 5 μg/ml against Staphylococcus aureus and Escherichia coli. The methods described in this work and the results obtained are useful for the identification and development of new compounds with antimicrobial activity through the use of computational tools. PMID:23455341

  14. Analysis of acid rain patterns in northeastern China using a decision tree method

    NASA Astrophysics Data System (ADS)

    Zhang, Xiuying; Jiang, Hong; Jin, Jiaxin; Xu, Xiaohua; Zhang, Qingxin

    2012-01-01

    Acid rain is a major regional-scale environmental problem in China. To control acid rain pollution and to protect the ecological environment, it is urgent to document acid rain patterns in various regions of China. Taking Liaoning Province as the study area, the present work focused on the spatial and temporal variations of acid rains in northeastern China. It presents a means for predicting the occurrence of acid rain using geographic position, terrain characteristics, routinely monitored meteorological factors and column concentrations of atmospheric SO 2 and NO 2. The analysis applies a decision tree approach to the foregoing observation data. Results showed that: (1) acid rain occurred at 17 stations among the 81 monitoring stations in Liaoning Province, with the frequency of acid rain from 0 to 84.38%; (2) summer had the most acid rain occurrences followed by spring and autumn, and the winter had the least; (3) the total accuracy for the simulation of precipitation pH (pH ≤ 4.5, 4.5 < pH ≤ 5.6, and pH > 5.6) was 98.04% using the decision tree method known as C5. The simulation results also indicated that the distance to coastline, elevation, wind direction, wind speed, rainfall amount, atmospheric pressure, and the precursors of acid rain all have a strong influence on the occurrence of acid rains in northeastern China.

  15. Generation of 2D Land Cover Maps for Urban Areas Using Decision Tree Classification

    NASA Astrophysics Data System (ADS)

    Höhle, J.

    2014-09-01

    A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects like buildings, roads, grassland, trees, hedges, and walls from such an "intelligent" point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using image analysis techniques. The proposed methodology is described step by step. The classification, assessment, and refinement is carried out by the open source software "R"; the generation of the dense and accurate digital surface model by the "Match-T DSM" program of the Trimble Company. A practical example of a 2D land cover map generation is carried out. Images of a multispectral medium-format aerial camera covering an urban area in Switzerland are used. The assessment of the produced land cover map is based on class-wise stratified sampling where reference values of samples are determined by means of stereo-observations of false-colour stereopairs. The stratified statistical assessment of the produced land cover map with six classes and based on 91 points per class reveals a high thematic accuracy for classes "building" (99 %, 95 % CI: 95 %-100 %) and "road and parking lot" (90 %, 95 % CI: 83 %-95 %). Some other accuracy measures (overall accuracy, kappa value) and their 95 % confidence intervals are derived as well. The proposed methodology has a high potential for automation and fast processing and may be applied to other scenes and sensors.

  16. Determinants of farmers' tree-planting investment decisions as a degraded landscape management strategy in the central highlands of Ethiopia

    NASA Astrophysics Data System (ADS)

    Gessesse, Berhan; Bewket, Woldeamlak; Bräuning, Achim

    2016-04-01

    Land degradation due to lack of sustainable land management practices is one of the critical challenges in many developing countries including Ethiopia. This study explored the major determinants of farm-level tree-planting decisions as a land management strategy in a typical farming and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and a binary logistic regression model. The model significantly predicted farmers' tree-planting decisions (χ2 = 37.29, df = 15, P < 0.001). Besides, the computed significant value of the model revealed that all the considered predictor variables jointly influenced the farmers' decisions to plant trees as a land management strategy. The findings of the study demonstrated that the adoption of tree-growing decisions by local land users was a function of a wide range of biophysical, institutional, socioeconomic and household-level factors. In this regard, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system had a critical influence on tree-growing investment decisions in the study watershed. Eventually, the processes of land-use conversion and land degradation were serious, which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, the study recommended that devising and implementing sustainable land management policy options would enhance ecological restoration and livelihood sustainability in the study watershed.

  17. Determinants of farmers' tree planting investment decision as a degraded landscape management strategy in the central highlands of Ethiopia

    NASA Astrophysics Data System (ADS)

    Gessesse, B.; Bewket, W.; Bräuning, A.

    2015-11-01

    Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and binary logistic regression model. The model significantly predicted farmers' tree planting decision (Chi-square = 37.29, df = 15, P<0.001). Besides, the computed significant value of the model suggests that all the considered predictor variables jointly influenced the farmers' decision to plant trees as a land management strategy. In this regard, the finding of the study show that local land-users' willingness to adopt tree growing decision is a function of a wide range of biophysical, institutional, socioeconomic and household level factors, however, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system have positively and significantly influence on tree growing investment decisions in the study watershed. Eventually, the processes of land use conversion and land degradation are serious which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, devising sustainable and integrated land management policy options and implementing them would enhance ecological restoration and livelihood sustainability in the study watershed.

  18. Multi-output decision trees for lesion segmentation in multiple sclerosis

    NASA Astrophysics Data System (ADS)

    Jog, Amod; Carass, Aaron; Pham, Dzung L.; Prince, Jerry L.

    2015-03-01

    Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.

  19. Decision-Tree-based data mining and rule induction for predicting and mapping soil bacterial diversity.

    PubMed

    Kim, Kangsuk; Yoo, Keunje; Ki, Dongwon; Son, Il Suh; Oh, Kyong Joo; Park, Joonhong

    2011-07-01

    Soilmicrobial ecology plays a significant role in global ecosystems. Nevertheless, methods of model prediction and mapping have yet to be established for soil microbial ecology. The present study was undertaken to develop an artificial-intelligence- and geographical information system (GIS)-integrated framework for predicting and mapping soil bacterial diversity using pre-existing environmental geospatial database information, and to further evaluate the applicability of soil bacterial diversity mapping for planning construction of eco-friendly roads. Using a stratified random sampling, soil bacterial diversity was measured in 196 soil samples in a forest area where construction of an eco-friendly road was planned. Model accuracy, coherence analyses, and tree analysis were systematically performed, and four-class discretized decision tree (DT) with ordinary pair-wise partitioning (OPP) was selected as the optimal model among tested five DT model variants. GIS-based simulations of the optimal DT model with varying weights assigned to soil ecological quality showed that the inclusion of soil ecology in environmental components, which are considered in environmental impact assessment, significantly affects the spatial distributions of overall environmental quality values as well as the determination of an environmentally optimized road route. This work suggests a guideline to use systematic accuracy, coherence, and tree analyses in selecting an optimal DT model from multiple candidate model variants, and demonstrates the applicability of the OPP-improved DT integrated with GIS in rule induction for mapping bacterial diversity. These findings also provide implication on the significance of soil microbial ecology in environmental impact assessment and eco-friendly construction planning. PMID:21072585

  20. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree

    PubMed Central

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  1. Bayesian decision tree for the classification of the mode of motion in single-molecule trajectories.

    PubMed

    Türkcan, Silvan; Masson, Jean-Baptiste

    2013-01-01

    Membrane proteins move in heterogeneous environments with spatially (sometimes temporally) varying friction and with biochemical interactions with various partners. It is important to reliably distinguish different modes of motion to improve our knowledge of the membrane architecture and to understand the nature of interactions between membrane proteins and their environments. Here, we present an analysis technique for single molecule tracking (SMT) trajectories that can determine the preferred model of motion that best matches observed trajectories. The method is based on Bayesian inference to calculate the posteriori probability of an observed trajectory according to a certain model. Information theory criteria, such as the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and modified AIC (AICc), are used to select the preferred model. The considered group of models includes free Brownian motion, and confined motion in 2nd or 4th order potentials. We determine the best information criteria for classifying trajectories. We tested its limits through simulations matching large sets of experimental conditions and we built a decision tree. This decision tree first uses the BIC to distinguish between free Brownian motion and confined motion. In a second step, it classifies the confining potential further using the AIC. We apply the method to experimental Clostridium Perfingens [Formula: see text]-toxin (CP[Formula: see text]T) receptor trajectories to show that these receptors are confined by a spring-like potential. An adaptation of this technique was applied on a sliding window in the temporal dimension along the trajectory. We applied this adaptation to experimental CP[Formula: see text]T trajectories that lose confinement due to disaggregation of confining domains. This new technique adds another dimension to the discussion of SMT data. The mode of motion of a receptor might hold more biologically relevant information than the diffusion

  2. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree.

    PubMed

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  3. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    PubMed

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology. PMID:25835791

  4. IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework.

    PubMed

    Bashir, Saba; Qamar, Usman; Khan, Farhan Hassan

    2016-02-01

    Accuracy plays a vital role in the medical field as it concerns with the life of an individual. Extensive research has been conducted on disease classification and prediction using machine learning techniques. However, there is no agreement on which classifier produces the best results. A specific classifier may be better than others for a specific dataset, but another classifier could perform better for some other dataset. Ensemble of classifiers has been proved to be an effective way to improve classification accuracy. In this research we present an ensemble framework with multi-layer classification using enhanced bagging and optimized weighting. The proposed model called "HM-BagMoov" overcomes the limitations of conventional performance bottlenecks by utilizing an ensemble of seven heterogeneous classifiers. The framework is evaluated on five different heart disease datasets, four breast cancer datasets, two diabetes datasets, two liver disease datasets and one hepatitis dataset obtained from public repositories. The analysis of the results show that ensemble framework achieved the highest accuracy, sensitivity and F-Measure when compared with individual classifiers for all the diseases. In addition to this, the ensemble framework also achieved the highest accuracy when compared with the state of the art techniques. An application named "IntelliHealth" is also developed based on proposed model that may be used by hospitals/doctors for diagnostic advice. PMID:26703093

  5. A Decision-Tree-Oriented Guidance Mechanism for Conducting Nature Science Observation Activities in a Context-Aware Ubiquitous Learning

    ERIC Educational Resources Information Center

    Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung

    2010-01-01

    A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…

  6. What Satisfies Students? Mining Student-Opinion Data with Regression and Decision-Tree Analysis. AIR 2002 Forum Paper.

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…

  7. Model-independent evaluation of tumor markers and a logistic-tree approach to diagnostic decision support.

    PubMed

    Ni, Weizeng; Huang, Samuel H; Su, Qiang; Shi, Jinghua

    2014-01-01

    Sensitivity and specificity of using individual tumor markers hardly meet the clinical requirement. This challenge gave rise to many efforts, e.g., combing multiple tumor markers and employing machine learning algorithms. However, results from different studies are often inconsistent, which are partially attributed to the use of different evaluation criteria. Also, the wide use of model-dependent validation leads to high possibility of data overfitting when complex models are used for diagnosis. We propose two model-independent criteria, namely, area under the curve (AUC) and Relief to evaluate the diagnostic values of individual and multiple tumor markers, respectively. For diagnostic decision support, we propose the use of logistic-tree which combines decision tree and logistic regression. Application on a colorectal cancer dataset shows that the proposed evaluation criteria produce results that are consistent with current knowledge. Furthermore, the simple and highly interpretable logistic-tree has diagnostic performance that is competitive with other complex models. PMID:25516124

  8. Proposal of a Clinical Decision Tree Algorithm Using Factors Associated with Severe Dengue Infection

    PubMed Central

    Hussin, Narwani; Cheah, Wee Kooi; Ng, Kee Sing; Muninathan, Prema

    2016-01-01

    Background WHO’s new classification in 2009: dengue with or without warning signs and severe dengue, has necessitated large numbers of admissions to hospitals of dengue patients which in turn has been imposing a huge economical and physical burden on many hospitals around the globe, particularly South East Asia and Malaysia where the disease has seen a rapid surge in numbers in recent years. Lack of a simple tool to differentiate mild from life threatening infection has led to unnecessary hospitalization of dengue patients. Methods We conducted a single-centre, retrospective study involving serologically confirmed dengue fever patients, admitted in a single ward, in Hospital Kuala Lumpur, Malaysia. Data was collected for 4 months from February to May 2014. Socio demography, co-morbidity, days of illness before admission, symptoms, warning signs, vital signs and laboratory result were all recorded. Descriptive statistics was tabulated and simple and multiple logistic regression analysis was done to determine significant risk factors associated with severe dengue. Results 657 patients with confirmed dengue were analysed, of which 59 (9.0%) had severe dengue. Overall, the commonest warning sign were vomiting (36.1%) and abdominal pain (32.1%). Previous co-morbid, vomiting, diarrhoea, pleural effusion, low systolic blood pressure, high haematocrit, low albumin and high urea were found as significant risk factors for severe dengue using simple logistic regression. However the significant risk factors for severe dengue with multiple logistic regressions were only vomiting, pleural effusion, and low systolic blood pressure. Using those 3 risk factors, we plotted an algorithm for predicting severe dengue. When compared to the classification of severe dengue based on the WHO criteria, the decision tree algorithm had a sensitivity of 0.81, specificity of 0.54, positive predictive value of 0.16 and negative predictive of 0.96. Conclusion The decision tree algorithm proposed

  9. Ensembl 2012.

    PubMed

    Flicek, Paul; Amode, M Ridwan; Barrell, Daniel; Beal, Kathryn; Brent, Simon; Carvalho-Silva, Denise; Clapham, Peter; Coates, Guy; Fairley, Susan; Fitzgerald, Stephen; Gil, Laurent; Gordon, Leo; Hendrix, Maurice; Hourlier, Thibaut; Johnson, Nathan; Kähäri, Andreas K; Keefe, Damian; Keenan, Stephen; Kinsella, Rhoda; Komorowska, Monika; Koscielny, Gautier; Kulesha, Eugene; Larsson, Pontus; Longden, Ian; McLaren, William; Muffato, Matthieu; Overduin, Bert; Pignatelli, Miguel; Pritchard, Bethan; Riat, Harpreet Singh; Ritchie, Graham R S; Ruffier, Magali; Schuster, Michael; Sobral, Daniel; Tang, Y Amy; Taylor, Kieron; Trevanion, Stephen; Vandrovcova, Jana; White, Simon; Wilson, Mark; Wilder, Steven P; Aken, Bronwen L; Birney, Ewan; Cunningham, Fiona; Dunham, Ian; Durbin, Richard; Fernández-Suarez, Xosé M; Harrow, Jennifer; Herrero, Javier; Hubbard, Tim J P; Parker, Anne; Proctor, Glenn; Spudich, Giulietta; Vogel, Jan; Yates, Andy; Zadissa, Amonida; Searle, Stephen M J

    2012-01-01

    The Ensembl project (http://www.ensembl.org) provides genome resources for chordate genomes with a particular focus on human genome data as well as data for key model organisms such as mouse, rat and zebrafish. Five additional species were added in the last year including gibbon (Nomascus leucogenys) and Tasmanian devil (Sarcophilus harrisii) bringing the total number of supported species to 61 as of Ensembl release 64 (September 2011). Of these, 55 species appear on the main Ensembl website and six species are provided on the Ensembl preview site (Pre!Ensembl; http://pre.ensembl.org) with preliminary support. The past year has also seen improvements across the project. PMID:22086963

  10. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    NASA Astrophysics Data System (ADS)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  11. An Efficient Ensemble Learning Method for Gene Microarray Classification

    PubMed Central

    Shadgar, Bita

    2013-01-01

    The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost. PMID:24024194

  12. A Low Complexity System Based on Multiple Weighted Decision Trees for Indoor Localization.

    PubMed

    Sánchez-Rodríguez, David; Hernández-Morera, Pablo; Quinteiro, José Ma; Alonso-González, Itziar

    2015-01-01

    Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity. The localization system is built using a dataset from sensor fusion, which combines the strength of radio signals from different wireless local area network access points and device orientation information from a digital compass built-in mobile device, so that extra sensors are unnecessary. Experimental results indicate that the proposed system leads to substantial improvements on computational complexity over the widely-used traditional fingerprinting methods, and it has a better accuracy than they have. PMID:26110413

  13. Effect of training characteristics on object classification: An application using Boosted Decision Trees

    NASA Astrophysics Data System (ADS)

    Sevilla-Noarbe, I.; Etayo-Sotos, P.

    2015-06-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects that different input vectors and training sets have on the classification performance, the results being of wider use to other machine learning techniques.

  14. Object classification in images for Epo doping control based on fuzzy decision trees

    NASA Astrophysics Data System (ADS)

    Bajla, Ivan; Hollander, Igor; Heiss, Dorothea; Granec, Reinhard; Minichmayr, Markus

    2005-02-01

    Erythropoietin (Epo) is a hormone which can be misused as a doping substance. Its detection involves analysis of images containing specific objects (bands), whose position and intensity are critical for doping positivity. Within a research project of the World Anti-Doping Agency (WADA) we are implementing the GASepo software that should serve for Epo testing in doping control laboratories world-wide. For identification of the bands we have developed a segmentation procedure based on a sequence of filters and edge detectors. Whereas all true bands are properly segmented, the procedure generates a relatively high number of false positives (artefacts). To separate these artefacts we suggested a post-segmentation supervised classification using real-valued geometrical measures of objects. The method is based on the ID3 (Ross Quinlan's) rule generation method, where fuzzy representation is used for linking the linguistic terms to quantitative data. The fuzzy modification of the ID3 method provides a framework that generates fuzzy decision trees, as well as fuzzy sets for input data. Using the MLTTM software (Machine Learning Framework) we have generated a set of fuzzy rules explicitly describing bands and artefacts. The method eliminated most of the artefacts. The contribution includes a comparison of the obtained misclassification errors to the errors produced by some other statistical classification methods.

  15. A Low Complexity System Based on Multiple Weighted Decision Trees for Indoor Localization

    PubMed Central

    Sánchez-Rodríguez, David; Hernández-Morera, Pablo; Quinteiro, José Ma.; Alonso-González, Itziar

    2015-01-01

    Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity. The localization system is built using a dataset from sensor fusion, which combines the strength of radio signals from different wireless local area network access points and device orientation information from a digital compass built-in mobile device, so that extra sensors are unnecessary. Experimental results indicate that the proposed system leads to substantial improvements on computational complexity over the widely-used traditional fingerprinting methods, and it has a better accuracy than they have. PMID:26110413

  16. Interactive change detection based on dissimilarity image and decision tree classification

    NASA Astrophysics Data System (ADS)

    Wang, Yan; Crouzil, Alain; Puel, Jean-Baptiste

    2015-02-01

    Our study mainly focus on detecting changed regions in two images of the same scene taken by digital cameras at different times. The images taken by digital cameras generally provide less information than multi-channel remote sensing images. Moreover, the application-dependent insignificant changes, such as shadows or clouds, may cause the failure of the classical methods based on image differences. The machine learning approach seems to be promising, but the lack of a sufficient volume of training data for photographic landscape observatories discards a lot of methods. So we investigate in this work the interactive learning approach and provide a discriminative model that is a 16-dimensional feature space comprising the textural appearance and contextual information. Dissimilarity measures in different neighborhood sizes are used to detect the difference within the neighborhood of an image pair. To detect changes between two images, the user designates change and non-change samples (pixel sets) in the images using a selection tool. This data is used to train a classifier using decision tree training method which is then applied to all the other pixels of the image pair. The experiments have proved the potential of the proposed approach.

  17. Tailored approach in inguinal hernia repair - decision tree based on the guidelines.

    PubMed

    Köckerling, Ferdinand; Schug-Pass, Christine

    2014-01-01

    The endoscopic procedures TEP and TAPP and the open techniques Lichtenstein, Plug and Patch, and PHS currently represent the gold standard in inguinal hernia repair recommended in the guidelines of the European Hernia Society, the International Endohernia Society, and the European Association of Endoscopic Surgery. Eighty-two percent of experienced hernia surgeons use the "tailored approach," the differentiated use of the several inguinal hernia repair techniques depending on the findings of the patient, trying to minimize the risks. The following differential therapeutic situations must be distinguished in inguinal hernia repair: unilateral in men, unilateral in women, bilateral, scrotal, after previous pelvic and lower abdominal surgery, no general anesthesia possible, recurrence, and emergency surgery. Evidence-based guidelines and consensus conferences of experts give recommendations for the best approach in the individual situation of a patient. This review tries to summarize the recommendations of the various guidelines and to transfer them into a practical decision tree for the daily work of surgeons performing inguinal hernia repair. PMID:25593944

  18. Smart on-board diagnostic decision trees for quantitative aviation equipment and safety procedures validation

    NASA Astrophysics Data System (ADS)

    Ali, Ali H.; Markarian, Garik; Tarter, Alex; Kölle, Rainer

    2010-04-01

    The current trend in high-accuracy aircraft navigation systems is towards using data from one or more inertial navigation subsystem and one or more navigational reference subsystems. The enhancement in fault diagnosis and detection is achieved via computing the minimum mean square estimate of the aircraft states using, for instance, Kalman filter method. However, this enhancement might degrade if the cause of a subsystem fault has some effect on other subsystems that are calculating the same measurement. One instance of such case is the tragic incident of Air France Flight 447 in June, 2009 where message transmissions in the last moment before the crash indicated inconsistencies in measured airspeed as reported by Airbus. In this research, we propose the use of mathematical aircraft model to work out the current states of the airplane and in turn, using these states to validate the readings of the navigation equipment throughout smart diagnostic decision tree network. Various simulated equipment failures have been introduced in a controlled environment to proof the concept of operation. The results have showed successful detection of the failing equipment in all cases.

  19. Using Ensemble Decisions and Active Selection to Improve Low-Cost Labeling for Multi-View Data

    NASA Technical Reports Server (NTRS)

    Rebbapragada, Umaa; Wagstaff, Kiri L.

    2011-01-01

    This paper seeks to improve low-cost labeling in terms of training set reliability (the fraction of correctly labeled training items) and test set performance for multi-view learning methods. Co-training is a popular multiview learning method that combines high-confidence example selection with low-cost (self) labeling. However, co-training with certain base learning algorithms significantly reduces training set reliability, causing an associated drop in prediction accuracy. We propose the use of ensemble labeling to improve reliability in such cases. We also discuss and show promising results on combining low-cost ensemble labeling with active (low-confidence) example selection. We unify these example selection and labeling strategies under collaborative learning, a family of techniques for multi-view learning that we are developing for distributed, sensor-network environments.

  20. Ensemble-based analysis of Front Range severe convection on 6-7 June 2012: Forecast uncertainty and communication of weather information to Front Range decision-makers

    NASA Astrophysics Data System (ADS)

    Vincente, Vanessa

    -allowing ensemble also showed greater skill in forecasting heavy precipitation amounts in the vicinity of where they were observed during the most active convective period, particularly near urbanized areas. A total of 9 Front Range EMs were interviewed to research how they understood hazardous weather information, and how their perception of forecast uncertainty would influence their decision making following a heavy rain event. Many of the EMs use situational awareness and past experiences with major weather events to guide their emergency planning. They also highly valued their relationship with the National Weather Service to improve their understanding of weather forecasts and ask questions about the uncertainties. Most of the EMs perceived forecast uncertainty in terms of probability and with the understanding that forecasting the weather is an imprecise science. The greater the likelihood of occurrence (implied by a higher probability of precipitation) showed greater confidence in the forecast that an event was likely to happen. Five probabilistic forecast products were generated from the convection-allowing ensemble output to generate a hypothetical warm season heavy rain event scenario. Responses varied between the EMs in which products they found most practical or least useful. Most EMs believed that there was a high probability for flooding, as illustrated by the degree of forecasted precipitation intensity. Most confirmed perceiving uncertainty in the different forecast representations, sharing the idea that there is an inherent uncertainty that follows modeled forecasts. The long-term goal of this research is to develop and add reliable probabilistic forecast products to the "toolbox" of decision-makers to help them better assess hazardous weather information and improve warning notifications and response.

  1. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    PubMed

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  2. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    PubMed Central

    Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  3. Ant colony optimisation of decision tree and contingency table models for the discovery of gene-gene interactions.

    PubMed

    Sapin, Emmanuel; Keedwell, Ed; Frayling, Tim

    2015-12-01

    In this study, ant colony optimisation (ACO) algorithm is used to derive near-optimal interactions between a number of single nucleotide polymorphisms (SNPs). This approach is used to discover small numbers of SNPs that are combined into a decision tree or contingency table model. The ACO algorithm is shown to be very robust as it is proven to be able to find results that are discriminatory from a statistical perspective with logical interactions, decision tree and contingency table models for various numbers of SNPs considered in the interaction. A large number of the SNPs discovered here have been already identified in large genome-wide association studies to be related to type II diabetes in the literature, lending additional confidence to the results. PMID:26577156

  4. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

    NASA Astrophysics Data System (ADS)

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-01

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.

  5. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography.

    PubMed

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-01

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology. PMID:27273293

  6. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China

    PubMed Central

    Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin

    2016-01-01

    Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328

  7. Evaluation of the potential allergenicity of the enzyme microbial transglutaminase using the 2001 FAO/WHO Decision Tree.

    PubMed

    Pedersen, Mona H; Hansen, Tine K; Sten, Eva; Seguro, Katsuya; Ohtsuka, Tomoko; Morita, Akiko; Bindslev-Jensen, Carsten; Poulsen, Lars K

    2004-11-01

    All novel proteins must be assessed for their potential allergenicity before they are introduced into the food market. One method to achieve this is the 2001 FAO/WHO Decision Tree recommended for evaluation of proteins from genetically modified organisms (GMOs). It was the aim of this study to investigate the allergenicity of microbial transglutaminase (m-TG) from Streptoverticillium mobaraense. Amino acid sequence similarity to known allergens, pepsin resistance, and detection of protein binding to specific serum immunoglobulin E (IgE) (RAST) have been evaluated as recommended by the decision tree. Allergenicity in the source material was thought unlikely, since no IgE-mediated allergy to any bacteria has been reported. m-TG is fully degraded after 5 min of pepsin treatment. A database search showed that the enzyme has no homology with known allergens, down to a match of six contiguous amino acids, which meets the requirements of the decision tree. However, there is a match at the five contiguous amino acid level to the major codfish allergen Gad c1. The potential cross reactivity between m-TG and Gad c1 was investigated in RAST using sera from 25 documented cod-allergic patients and an extract of raw codfish. No binding between patient IgE and m-TG was observed. It can be concluded that no safety concerns with regard to the allergenic potential of m-TG were identified. PMID:15508178

  8. Decision tree supported substructure prediction of metabolites from GC-MS profiles.

    PubMed

    Hummel, Jan; Strehmel, Nadine; Selbig, Joachim; Walther, Dirk; Kopka, Joachim

    2010-06-01

    Gas chromatography coupled to mass spectrometry (GC-MS) is one of the most widespread routine technologies applied to the large scale screening and discovery of novel metabolic biomarkers. However, currently the majority of mass spectral tags (MSTs) remains unidentified due to the lack of authenticated pure reference substances required for compound identification by GC-MS. Here, we accessed the information on reference compounds stored in the Golm Metabolome Database (GMD) to apply supervised machine learning approaches to the classification and identification of unidentified MSTs without relying on library searches. Non-annotated MSTs with mass spectral and retention index (RI) information together with data of already identified metabolites and reference substances have been archived in the GMD. Structural feature extraction was applied to sub-divide the metabolite space contained in the GMD and to define the prediction target classes. Decision tree (DT)-based prediction of the most frequent substructures based on mass spectral features and RI information is demonstrated to result in highly sensitive and specific detections of sub-structures contained in the compounds. The underlying set of DTs can be inspected by the user and are made available for batch processing via SOAP (Simple Object Access Protocol)-based web services. The GMD mass spectral library with the integrated DTs is freely accessible for non-commercial use at http://gmd.mpimp-golm.mpg.de/. All matching and structure search functionalities are available as SOAP-based web services. A XML + HTTP interface, which follows Representational State Transfer (REST) principles, facilitates read-only access to data base entities. PMID:20526350

  9. Accurate and interpretable nanoSAR models from genetic programming-based decision tree construction approaches.

    PubMed

    Oksel, Ceyda; Winkler, David A; Ma, Cai Y; Wilkins, Terry; Wang, Xue Z

    2016-09-01

    The number of engineered nanomaterials (ENMs) being exploited commercially is growing rapidly, due to the novel properties they exhibit. Clearly, it is important to understand and minimize any risks to health or the environment posed by the presence of ENMs. Data-driven models that decode the relationships between the biological activities of ENMs and their physicochemical characteristics provide an attractive means of maximizing the value of scarce and expensive experimental data. Although such structure-activity relationship (SAR) methods have become very useful tools for modelling nanotoxicity endpoints (nanoSAR), they have limited robustness and predictivity and, most importantly, interpretation of the models they generate is often very difficult. New computational modelling tools or new ways of using existing tools are required to model the relatively sparse and sometimes lower quality data on the biological effects of ENMs. The most commonly used SAR modelling methods work best with large datasets, are not particularly good at feature selection, can be relatively opaque to interpretation, and may not account for nonlinearity in the structure-property relationships. To overcome these limitations, we describe the application of a novel algorithm, a genetic programming-based decision tree construction tool (GPTree) to nanoSAR modelling. We demonstrate the use of GPTree in the construction of accurate and interpretable nanoSAR models by applying it to four diverse literature datasets. We describe the algorithm and compare model results across the four studies. We show that GPTree generates models with accuracies equivalent to or superior to those of prior modelling studies on the same datasets. GPTree is a robust, automatic method for generation of accurate nanoSAR models with important advantages that it works with small datasets, automatically selects descriptors, and provides significantly improved interpretability of models. PMID:26956430

  10. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    PubMed

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988

  11. Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

    PubMed Central

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988

  12. Ensembl comparative genomics resources

    PubMed Central

    Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J.; Searle, Stephen M. J.; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

  13. Ensembl comparative genomics resources.

    PubMed

    Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org. PMID:26896847

  14. The creation of a digital soil map for Cyprus using decision-tree classification techniques

    NASA Astrophysics Data System (ADS)

    Camera, Corrado; Zomeni, Zomenia; Bruggeman, Adriana; Noller, Joy; Zissimos, Andreas

    2014-05-01

    Considering the increasing threats soil are experiencing especially in semi-arid, Mediterranean environments like Cyprus (erosion, contamination, sealing and salinisation), producing a high resolution, reliable soil map is essential for further soil conservation studies. This study aims to create a 1:50.000 soil map covering the area under the direct control of the Republic of Cyprus (5.760 km2). The study consists of two major steps. The first is the creation of a raster database of predictive variables selected according to the scorpan formula (McBratney et al., 2003). It is of particular interest the possibility of using, as soil properties, data coming from three older island-wide soil maps and the recently published geochemical atlas of Cyprus (Cohen et al., 2011). Ten highly characterizing elements were selected and used as predictors in the present study. For the other factors usual variables were used: temperature and aridity index for climate; total loss on ignition, vegetation and forestry types maps for organic matter; the DEM and related relief derivatives (slope, aspect, curvature, landscape units); bedrock, surficial geology and geomorphology (Noller, 2009) for parent material and age; and a sub-watershed map to better bound location related to parent material sources. In the second step, the digital soil map is created using the Random Forests package in R. Random Forests is a decision tree classification technique where many trees, instead of a single one, are developed and compared to increase the stability and the reliability of the prediction. The model is trained and verified on areas where a 1:25.000 published soil maps obtained from field work is available and then it is applied for predictive mapping to the other areas. Preliminary results obtained in a small area in the plain around the city of Lefkosia, where eight different soil classes are present, show very good capacities of the method. The Ramdom Forest approach leads to reproduce soil

  15. Genetic programming based ensemble system for microarray data classification.

    PubMed

    Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To

    2015-01-01

    Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748

  16. Analysis of the impact of recreational trail usage for prioritising management decisions: a regression tree approach

    NASA Astrophysics Data System (ADS)

    Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek

    2016-04-01

    The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of

  17. Genetic program based data mining of fuzzy decision trees and methods of improving convergence and reducing bloat

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Nguyen, ThanhVu H.

    2007-04-01

    A data mining procedure for automatic determination of fuzzy decision tree structure using a genetic program (GP) is discussed. A GP is an algorithm that evolves other algorithms or mathematical expressions. Innovative methods for accelerating convergence of the data mining procedure and reducing bloat are given. In genetic programming, bloat refers to excessive tree growth. It has been observed that the trees in the evolving GP population will grow by a factor of three every 50 generations. When evolving mathematical expressions much of the bloat is due to the expressions not being in algebraically simplest form. So a bloat reduction method based on automated computer algebra has been introduced. The effectiveness of this procedure is discussed. Also, rules based on fuzzy logic have been introduced into the GP to accelerate convergence, reduce bloat and produce a solution more readily understood by the human user. These rules are discussed as well as other techniques for convergence improvement and bloat control. Comparisons between trees created using a genetic program and those constructed solely by interviewing experts are made. A new co-evolutionary method that improves the control logic evolved by the GP by having a genetic algorithm evolve pathological scenarios is discussed. The effect on the control logic is considered. Finally, additional methods that have been used to validate the data mining algorithm are referenced.

  18. Effective Visualization of Temporal Ensembles.

    PubMed

    Hao, Lihua; Healey, Christopher G; Bass, Steffen A

    2016-01-01

    An ensemble is a collection of related datasets, called members, built from a series of runs of a simulation or an experiment. Ensembles are large, temporal, multidimensional, and multivariate, making them difficult to analyze. Another important challenge is visualizing ensembles that vary both in space and time. Initial visualization techniques displayed ensembles with a small number of members, or presented an overview of an entire ensemble, but without potentially important details. Recently, researchers have suggested combining these two directions, allowing users to choose subsets of members to visualization. This manual selection process places the burden on the user to identify which members to explore. We first introduce a static ensemble visualization system that automatically helps users locate interesting subsets of members to visualize. We next extend the system to support analysis and visualization of temporal ensembles. We employ 3D shape comparison, cluster tree visualization, and glyph based visualization to represent different levels of detail within an ensemble. This strategy is used to provide two approaches for temporal ensemble analysis: (1) segment based ensemble analysis, to capture important shape transition time-steps, clusters groups of similar members, and identify common shape changes over time across multiple members; and (2) time-step based ensemble analysis, which assumes ensemble members are aligned in time by combining similar shapes at common time-steps. Both approaches enable users to interactively visualize and analyze a temporal ensemble from different perspectives at different levels of detail. We demonstrate our techniques on an ensemble studying matter transition from hadronic gas to quark-gluon plasma during gold-on-gold particle collisions. PMID:26529728

  19. Predicting Lung Radiotherapy-Induced Pneumonitis Using a Model Combining Parametric Lyman Probit With Nonparametric Decision Trees

    SciTech Connect

    Das, Shiva K. . E-mail: shiva.das@duke.edu; Zhou Sumin; Zhang, Junan; Yin, F.-F.; Dewhirst, Mark W.; Marks, Lawrence B.

    2007-07-15

    Purpose: To develop and test a model to predict for lung radiation-induced Grade 2+ pneumonitis. Methods and Materials: The model was built from a database of 234 lung cancer patients treated with radiotherapy (RT), of whom 43 were diagnosed with pneumonitis. The model augmented the predictive capability of the parametric dose-based Lyman normal tissue complication probability (LNTCP) metric by combining it with weighted nonparametric decision trees that use dose and nondose inputs. The decision trees were sequentially added to the model using a 'boosting' process that enhances the accuracy of prediction. The model's predictive capability was estimated by 10-fold cross-validation. To facilitate dissemination, the cross-validation result was used to extract a simplified approximation to the complicated model architecture created by boosting. Application of the simplified model is demonstrated in two example cases. Results: The area under the model receiver operating characteristics curve for cross-validation was 0.72, a significant improvement over the LNTCP area of 0.63 (p = 0.005). The simplified model used the following variables to output a measure of injury: LNTCP, gender, histologic type, chemotherapy schedule, and treatment schedule. For a given patient RT plan, injury prediction was highest for the combination of pre-RT chemotherapy, once-daily treatment, female gender and lowest for the combination of no pre-RT chemotherapy and nonsquamous cell histologic type. Application of the simplified model to the example cases revealed that injury prediction for a given treatment plan can range from very low to very high, depending on the settings of the nondose variables. Conclusions: Radiation pneumonitis prediction was significantly enhanced by decision trees that added the influence of nondose factors to the LNTCP formulation.

  20. Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes

    PubMed Central

    2013-01-01

    Background Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets. Results A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups. Conclusions The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease. PMID:24188919

  1. Ensembl 2013

    PubMed Central

    Flicek, Paul; Ahmed, Ikhlak; Amode, M. Ridwan; Barrell, Daniel; Beal, Kathryn; Brent, Simon; Carvalho-Silva, Denise; Clapham, Peter; Coates, Guy; Fairley, Susan; Fitzgerald, Stephen; Gil, Laurent; García-Girón, Carlos; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah; Juettemann, Thomas; Kähäri, Andreas K.; Keenan, Stephen; Komorowska, Monika; Kulesha, Eugene; Longden, Ian; Maurel, Thomas; McLaren, William M.; Muffato, Matthieu; Nag, Rishi; Overduin, Bert; Pignatelli, Miguel; Pritchard, Bethan; Pritchard, Emily; Riat, Harpreet Singh; Ritchie, Graham R. S.; Ruffier, Magali; Schuster, Michael; Sheppard, Daniel; Sobral, Daniel; Taylor, Kieron; Thormann, Anja; Trevanion, Stephen; White, Simon; Wilder, Steven P.; Aken, Bronwen L.; Birney, Ewan; Cunningham, Fiona; Dunham, Ian; Harrow, Jennifer; Herrero, Javier; Hubbard, Tim J. P.; Johnson, Nathan; Kinsella, Rhoda; Parker, Anne; Spudich, Giulietta; Yates, Andy; Zadissa, Amonida; Searle, Stephen M. J.

    2013-01-01

    The Ensembl project (http://www.ensembl.org) provides genome information for sequenced chordate genomes with a particular focus on human, mouse, zebrafish and rat. Our resources include evidenced-based gene sets for all supported species; large-scale whole genome multiple species alignments across vertebrates and clade-specific alignments for eutherian mammals, primates, birds and fish; variation data resources for 17 species and regulation annotations based on ENCODE and other data sets. Ensembl data are accessible through the genome browser at http://www.ensembl.org and through other tools and programmatic interfaces. PMID:23203987

  2. Ensemble Models

    EPA Science Inventory

    Ensemble forecasting has been used for operational numerical weather prediction in the United States and Europe since the early 1990s. An ensemble of weather or climate forecasts is used to characterize the two main sources of uncertainty in computer models of physical systems: ...

  3. Ensembl 2005

    PubMed Central

    Hubbard, T.; Andrews, D.; Caccamo, M.; Cameron, G.; Chen, Y.; Clamp, M.; Clarke, L.; Coates, G.; Cox, T.; Cunningham, F.; Curwen, V.; Cutts, T.; Down, T.; Durbin, R.; Fernandez-Suarez, X. M.; Gilbert, J.; Hammond, M.; Herrero, J.; Hotz, H.; Howe, K.; Iyer, V.; Jekosch, K.; Kahari, A.; Kasprzyk, A.; Keefe, D.; Keenan, S.; Kokocinsci, F.; London, D.; Longden, I.; McVicker, G.; Melsopp, C.; Meidl, P.; Potter, S.; Proctor, G.; Rae, M.; Rios, D.; Schuster, M.; Searle, S.; Severin, J.; Slater, G.; Smedley, D.; Smith, J.; Spooner, W.; Stabenau, A.; Stalker, J.; Storey, R.; Trevanion, S.; Ureta-Vidal, A.; Vogel, J.; White, S.; Woodwark, C.; Birney, E.

    2005-01-01

    The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased by 7 to 16, with the addition of the six vertebrate genomes of chimpanzee, dog, cow, chicken, tetraodon and frog and the insect genome of honeybee. The majority have been annotated automatically using the Ensembl gene build system, showing its flexibility to reliably annotate a wide variety of genomes. With the increased number of vertebrate genomes, the comparative analysis provided to users has been greatly improved, with new website interfaces allowing annotation of different genomes to be directly compared. The Ensembl software system is being increasingly widely reused in different projects showing the benefits of a completely open approach to software development and distribution. PMID:15608235

  4. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    PubMed Central

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule’s overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  5. Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier

    PubMed Central

    Kambhampati, Satya Samyukta; Singh, Vishal; Ramkumar, Barathram

    2015-01-01

    In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%. PMID:26609414

  6. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections.

    PubMed

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule's overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  7. Unified framework for triaxial accelerometer-based fall event detection and classification using cumulants and hierarchical decision tree classifier.

    PubMed

    Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram

    2015-08-01

    In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%. PMID:26609414

  8. Improving Crop Classification Techniques Using Optical Remote Sensing Imagery, High-Resolution Agriculture Resource Inventory Shapefiles and Decision Trees

    NASA Astrophysics Data System (ADS)

    Melnychuk, A. L.; Berg, A. A.; Sweeney, S.

    2010-12-01

    Recognition of anthropogenic effects of land use management practices on bodies of water is important for remediating and preventing eutrophication. In the case of Lake Simcoe, Ontario the main surrounding landuse is agriculture. To better manage the nutrient flow into the lake, knowledge of the management of the agricultural land is important. For this basin, a comprehensive agricultural resource inventory is required for assessment of policy and for input into water quality management and assessment tools. Supervised decision tree classification schemes, used in many previous applications, have yielded reliable classifications in agricultural land-use systems. However, when using these classification techniques the user is confronted with numerous data sources. In this study we use a large inventory of optical satellite image products (Landsat, AWiFS, SPOT and MODIS) and ancillary data sources (temporal MODIS-NDVI product signatures, digital elevation models and soil maps) at various spatial and temporal resolutions in a decision tree classification scheme. The sensitivity of the classification accuracy to various products is assessed to identify optimal data sources for classifying crop systems.

  9. Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients

    PubMed Central

    Hashem, Somaya; Esmat, Gamal; Elakel, Wafaa; Habashy, Shahira; Abdel Raouf, Safaa; Darweesh, Samar; Soliman, Mohamad; Elhefnawi, Mohamed; El-Adawy, Mohamed; ElHefnawi, Mahmoud

    2016-01-01

    Background/Aim. Respectively with the prevalence of chronic hepatitis C in the world, using noninvasive methods as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy is significantly increasing. The aim of this study is to combine the serum biomarkers and clinical information to develop a classification model that can predict advanced liver fibrosis. Methods. 39,567 patients with chronic hepatitis C were included and randomly divided into two separate sets. Liver fibrosis was assessed via METAVIR score; patients were categorized as mild to moderate (F0–F2) or advanced (F3-F4) fibrosis stages. Two models were developed using alternating decision tree algorithm. Model 1 uses six parameters, while model 2 uses four, which are similar to FIB-4 features except alpha-fetoprotein instead of alanine aminotransferase. Sensitivity and receiver operating characteristic curve were performed to evaluate the performance of the proposed models. Results. The best model achieved 86.2% negative predictive value and 0.78 ROC with 84.8% accuracy which is better than FIB-4. Conclusions. The risk of advanced liver fibrosis, due to chronic hepatitis C, could be predicted with high accuracy using decision tree learning algorithm that could be used to reduce the need to assess the liver biopsy. PMID:26880886

  10. Accurate Prediction of Advanced Liver Fibrosis Using the Decision Tree Learning Algorithm in Chronic Hepatitis C Egyptian Patients.

    PubMed

    Hashem, Somaya; Esmat, Gamal; Elakel, Wafaa; Habashy, Shahira; Abdel Raouf, Safaa; Darweesh, Samar; Soliman, Mohamad; Elhefnawi, Mohamed; El-Adawy, Mohamed; ElHefnawi, Mahmoud

    2016-01-01

    Background/Aim. Respectively with the prevalence of chronic hepatitis C in the world, using noninvasive methods as an alternative method in staging chronic liver diseases for avoiding the drawbacks of biopsy is significantly increasing. The aim of this study is to combine the serum biomarkers and clinical information to develop a classification model that can predict advanced liver fibrosis. Methods. 39,567 patients with chronic hepatitis C were included and randomly divided into two separate sets. Liver fibrosis was assessed via METAVIR score; patients were categorized as mild to moderate (F0-F2) or advanced (F3-F4) fibrosis stages. Two models were developed using alternating decision tree algorithm. Model 1 uses six parameters, while model 2 uses four, which are similar to FIB-4 features except alpha-fetoprotein instead of alanine aminotransferase. Sensitivity and receiver operating characteristic curve were performed to evaluate the performance of the proposed models. Results. The best model achieved 86.2% negative predictive value and 0.78 ROC with 84.8% accuracy which is better than FIB-4. Conclusions. The risk of advanced liver fibrosis, due to chronic hepatitis C, could be predicted with high accuracy using decision tree learning algorithm that could be used to reduce the need to assess the liver biopsy. PMID:26880886

  11. Lessons Learned from Applications of a Climate Change Decision Tree toWater System Projects in Kenya and Nepal

    NASA Astrophysics Data System (ADS)

    Ray, P. A.; Bonzanigo, L.; Taner, M. U.; Wi, S.; Yang, Y. C. E.; Brown, C.

    2015-12-01

    The Decision Tree Framework developed for the World Bank's Water Partnership Program provides resource-limited project planners and program managers with a cost-effective and effort-efficient, scientifically defensible, repeatable, and clear method for demonstrating the robustness of a project to climate change. At the conclusion of this process, the project planner is empowered to confidently communicate the method by which the vulnerabilities of the project have been assessed, and how the adjustments that were made (if any were necessary) improved the project's feasibility and profitability. The framework adopts a "bottom-up" approach to risk assessment that aims at a thorough understanding of a project's vulnerabilities to climate change in the context of other nonclimate uncertainties (e.g., economic, environmental, demographic, political). It helps identify projects that perform well across a wide range of potential future climate conditions, as opposed to seeking solutions that are optimal in expected conditions but fragile to conditions deviating from the expected. Lessons learned through application of the Decision Tree to case studies in Kenya and Nepal will be presented, and aspects of the framework requiring further refinement will be described.

  12. Refined estimation of solar energy potential on roof areas using decision trees on CityGML-data

    NASA Astrophysics Data System (ADS)

    Baumanns, K.; Löwner, M.-O.

    2009-04-01

    We present a decision tree for a refined solar energy plant potential estimation on roof areas using the exchange format CityGML. Compared to raster datasets CityGML-data holds geometric and semantic information of buildings and roof areas in more detail. In addition to shadowing effects ownership structures and lifetime of roof areas can be incorporated into the valuation. Since the Renewable Energy Sources Act came into force in Germany in 2000, private house owners and municipals raise attention to the production of green electricity. At this the return on invest depends on the statutory price per Watt, the initial costs of the solar energy plant, its lifetime, and the real production of this installation. The latter depends on the radiation that is obtained from and the size of the solar energy plant. In this context the exposition and slope of the roof area is as important as building parts like chimneys or dormers that might shadow parts of the roof. Knowing the controlling factors a decision tree can be created to support a beneficial deployment of a solar energy plant. Also sufficient data has to be available. Airborne raster datasets can only support a coarse estimation of the solar energy potential of roof areas. While they carry no semantically information, even roof installations are hardly to identify. CityGML as an Open Geospatial Consortium standard is an interoperable exchange data format for virtual 3-dimensional Cities. Based on international standards it holds the aforementioned geometric properties as well as semantically information. In Germany many Cities are on the way to provide CityGML dataset, e. g. Berlin. Here we present a decision tree that incorporates geometrically as well as semantically demands for a refined estimation of the solar energy potential on roof areas. Based on CityGML's attribute lists we consider geometries of roofs and roof installations as well as global radiation which can be derived e. g. from the European Solar

  13. Decisions for Others Become Less Impulsive the Further Away They Are on the Family Tree

    PubMed Central

    Ziegler, Fenja V.; Tunney, Richard J.

    2012-01-01

    Background People tend to prefer a smaller immediate reward to a larger but delayed reward. Although this discounting of future rewards is often associated with impulsivity, it is not necessarily irrational. Instead it has been suggested that it reflects the decision maker’s greater interest in the ‘me now’ than the ‘me in 10 years’, such that the concern for our future self is about the same as for someone else who is close to us. Methodology/Principal Findings To investigate this we used a delay-discounting task to compare discount functions for choices that people would make for themselves against decisions that they think that other people should make, e.g. to accept $500 now or $1000 next week. The psychological distance of the hypothetical beneficiaries was manipulated in terms of the genetic coefficient of relatedness ranging from zero (e.g. a stranger, or unrelated close friend), .125 (e.g. a cousin), .25 (e.g. a nephew or niece), to .5 (parent or sibling). Conclusions/Significance The observed discount functions were steeper (i.e. more impulsive) for choices in which the decision-maker was the beneficiary than for all other beneficiaries. Impulsiveness of decisions declined systematically with the distance of the beneficiary from the decision-maker. The data are discussed with reference to the implusivity and interpersonal empathy gaps in decision-making. PMID:23209580

  14. Ensembl 2015

    PubMed Central

    Cunningham, Fiona; Amode, M. Ridwan; Barrell, Daniel; Beal, Kathryn; Billis, Konstantinos; Brent, Simon; Carvalho-Silva, Denise; Clapham, Peter; Coates, Guy; Fitzgerald, Stephen; Gil, Laurent; Girón, Carlos García; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah E.; Janacek, Sophie H.; Johnson, Nathan; Juettemann, Thomas; Kähäri, Andreas K.; Keenan, Stephen; Martin, Fergal J.; Maurel, Thomas; McLaren, William; Murphy, Daniel N.; Nag, Rishi; Overduin, Bert; Parker, Anne; Patricio, Mateus; Perry, Emily; Pignatelli, Miguel; Riat, Harpreet Singh; Sheppard, Daniel; Taylor, Kieron; Thormann, Anja; Vullo, Alessandro; Wilder, Steven P.; Zadissa, Amonida; Aken, Bronwen L.; Birney, Ewan; Harrow, Jennifer; Kinsella, Rhoda; Muffato, Matthieu; Ruffier, Magali; Searle, Stephen M.J.; Spudich, Giulietta; Trevanion, Stephen J.; Yates, Andy; Zerbino, Daniel R.; Flicek, Paul

    2015-01-01

    Ensembl (http://www.ensembl.org) is a genomic interpretation system providing the most up-to-date annotations, querying tools and access methods for chordates and key model organisms. This year we released updated annotation (gene models, comparative genomics, regulatory regions and variation) on the new human assembly, GRCh38, although we continue to support researchers using the GRCh37.p13 assembly through a dedicated site (http://grch37.ensembl.org). Our Regulatory Build has been revamped to identify regulatory regions of interest and to efficiently highlight their activity across disparate epigenetic data sets. A number of new interfaces allow users to perform large-scale comparisons of their data against our annotations. The REST server (http://rest.ensembl.org), which allows programs written in any language to query our databases, has moved to a full service alongside our upgraded website tools. Our online Variant Effect Predictor tool has been updated to process more variants and calculate summary statistics. Lastly, the WiggleTools package enables users to summarize large collections of data sets and view them as single tracks in Ensembl. The Ensembl code base itself is more accessible: it is now hosted on our GitHub organization page (https://github.com/Ensembl) under an Apache 2.0 open source license. PMID:25352552

  15. Construction the model on the breast cancer survival analysis use support vector machine, logistic regression and decision tree.

    PubMed

    Chao, Cheng-Min; Yu, Ya-Wen; Cheng, Bor-Wen; Kuo, Yao-Lung

    2014-10-01

    The aim of the paper is to use data mining technology to establish a classification of breast cancer survival patterns, and offers a treatment decision-making reference for the survival ability of women diagnosed with breast cancer in Taiwan. We studied patients with breast cancer in a specific hospital in Central Taiwan to obtain 1,340 data sets. We employed a support vector machine, logistic regression, and a C5.0 decision tree to construct a classification model of breast cancer patients' survival rates, and used a 10-fold cross-validation approach to identify the model. The results show that the establishment of classification tools for the classification of the models yielded an average accuracy rate of more than 90% for both; the SVM provided the best method for constructing the three categories of the classification system for the survival mode. The results of the experiment show that the three methods used to create the classification system, established a high accuracy rate, predicted a more accurate survival ability of women diagnosed with breast cancer, and could be used as a reference when creating a medical decision-making frame. PMID:25119239

  16. Generalization of the Viola-Jones method as a decision tree of strong classifiers for real-time object recognition in video stream

    NASA Astrophysics Data System (ADS)

    Minkina, A.; Nikolaev, D.; Usilin, S.; Kozyrev, V.

    2015-02-01

    In this paper, we present a new modification of Viola-Jones complex classifiers. We describe a complex classifier in the form of a decision tree and provide a method of training for such classifiers. Performance impact of the tree structure is analyzed. Comparison is carried out of precision and performance of the presented method with that of the classical cascade. Various tree architectures are experimentally studied. The task of vehicle wheels detection on images obtained from an automatic vehicle classification system is taken as an example.

  17. Evaluating Psychiatric Hospital Admission Decisions for Children in Foster Care: An Optimal Classification Tree Analysis

    ERIC Educational Resources Information Center

    Snowden, Jessica A.; Leon, Scott C.; Bryant, Fred B.; Lyons, John S.

    2007-01-01

    This study explored clinical and nonclinical predictors of inpatient hospital admission decisions across a sample of children in foster care over 4 years (N = 13,245). Forty-eight percent of participants were female and the mean age was 13.4 (SD = 3.5 years). Optimal data analysis (Yarnold & Soltysik, 2005) was used to construct a nonlinear…

  18. Discovering Decision Trees in the Curriculum Jungle: A Chronicle of Group Groping.

    ERIC Educational Resources Information Center

    Helburn, Nicholas

    Additional insight into the High School Geography Project (HSGP) is provided by this retrospective view of the critical decisions which influenced its nature and scope. A commitment was made to materials at the expense of teacher education and other changes in the educational system. Successive choices focused on a complete but frugal package of…

  19. An approach for automated fault diagnosis based on a fuzzy decision tree and boundary analysis of a reconstructed phase space.

    PubMed

    Aydin, Ilhan; Karakose, Mehmet; Akin, Erhan

    2014-03-01

    Although reconstructed phase space is one of the most powerful methods for analyzing a time series, it can fail in fault diagnosis of an induction motor when the appropriate pre-processing is not performed. Therefore, boundary analysis based a new feature extraction method in phase space is proposed for diagnosis of induction motor faults. The proposed approach requires the measurement of one phase current signal to construct the phase space representation. Each phase space is converted into an image, and the boundary of each image is extracted by a boundary detection algorithm. A fuzzy decision tree has been designed to detect broken rotor bars and broken connector faults. The results indicate that the proposed approach has a higher recognition rate than other methods on the same dataset. PMID:24296116

  20. Effective Prediction of Errors by Non-native Speakers Using Decision Tree for Speech Recognition-Based CALL System

    NASA Astrophysics Data System (ADS)

    Wang, Hongcui; Kawahara, Tatsuya

    CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.

  1. A decision tree-based on-line preventive control strategy for power system transient instability prevention

    NASA Astrophysics Data System (ADS)

    Xu, Yan; Dong, Zhao Yang; Zhang, Rui; Wong, Kit Po

    2014-02-01

    Maintaining transient stability is a basic requirement for secure power system operations. Preventive control deals with modifying the system operating point to withstand probable contingencies. In this article, a decision tree (DT)-based on-line preventive control strategy is proposed for transient instability prevention of power systems. Given a stability database, a distance-based feature estimation algorithm is first applied to identify the critical generators, which are then used as features to develop a DT. By interpreting the splitting rules of DT, preventive control is realised by formulating the rules in a standard optimal power flow model and solving it. The proposed method is transparent in control mechanism, on-line computation compatible and convenient to deal with multi-contingency. The effectiveness and efficiency of the method has been verified on New England 10-machine 39-bus test system.

  2. Novel benzofuroxan derivatives against multidrug-resistant Staphylococcus aureus strains: design using Topliss' decision tree, synthesis and biological assay.

    PubMed

    Jorge, Salomão Dória; Palace-Berl, Fanny; Masunari, Andrea; Cechinel, Cléber André; Ishii, Marina; Pasqualoto, Kerly Fernanda Mesquita; Tavares, Leoberto Costa

    2011-08-15

    The aim of this study was the design of a set of benzofuroxan derivatives as antimicrobial agents exploring the physicochemical properties of the related substituents. Topliss' decision tree approach was applied to select the substituent groups. Hierarchical cluster analysis was also performed to emphasize natural clusters and patterns. The compounds were obtained using two synthetic approaches for reducing the synthetic steps as well as improving the yield. The minimal inhibitory concentration method was employed to evaluate the activity against multidrug-resistant Staphylococcus aureus strains. The most active compound was 4-nitro-3-(trifluoromethyl)[N'-(benzofuroxan-5-yl)methylene]benzhydrazide (MIC range 12.7-11.4 μg/mL), pointing out that the antimicrobial activity was indeed influenced by the hydrophobic and electron-withdrawing property of the substituent groups 3-CF(3) and 4-NO(2), respectively. PMID:21757359

  3. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    NASA Astrophysics Data System (ADS)

    Kupriyanov, M. S.; Shukeilo, E. Y.; Shichkina, J. A.

    2015-11-01

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient's health condition using data from a wearable device considers in this article.

  4. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    SciTech Connect

    Kupriyanov, M. S. Shukeilo, E. Y. Shichkina, J. A.

    2015-11-17

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article.

  5. Ensemble Tractography

    PubMed Central

    Wandell, Brian A.

    2016-01-01

    Tractography uses diffusion MRI to estimate the trajectory and cortical projection zones of white matter fascicles in the living human brain. There are many different tractography algorithms and each requires the user to set several parameters, such as curvature threshold. Choosing a single algorithm with specific parameters poses two challenges. First, different algorithms and parameter values produce different results. Second, the optimal choice of algorithm and parameter value may differ between different white matter regions or different fascicles, subjects, and acquisition parameters. We propose using ensemble methods to reduce algorithm and parameter dependencies. To do so we separate the processes of fascicle generation and evaluation. Specifically, we analyze the value of creating optimized connectomes by systematically combining candidate streamlines from an ensemble of algorithms (deterministic and probabilistic) and systematically varying parameters (curvature and stopping criterion). The ensemble approach leads to optimized connectomes that provide better cross-validated prediction error of the diffusion MRI data than optimized connectomes generated using a single-algorithm or parameter set. Furthermore, the ensemble approach produces connectomes that contain both short- and long-range fascicles, whereas single-parameter connectomes are biased towards one or the other. In summary, a systematic ensemble tractography approach can produce connectomes that are superior to standard single parameter estimates both for predicting the diffusion measurements and estimating white matter fascicles. PMID:26845558

  6. Forest or the trees: At what scale do elephants make foraging decisions?

    NASA Astrophysics Data System (ADS)

    Shrader, Adrian M.; Bell, Caroline; Bertolli, Liandra; Ward, David

    2012-07-01

    For herbivores, food is distributed spatially in a hierarchical manner ranging from plant parts to regions. Ultimately, utilisation of food is dependent on the scale at which herbivores make foraging decisions. A key factor that influences these decisions is body size, because selection inversely relates to body size. As a result, large animals can be less selective than small herbivores. Savanna elephants (Loxodonta africana) are the largest terrestrial herbivore. Thus, they represent a potential extreme with respect to unselective feeding. However, several studies have indicated that elephants prefer specific habitats and certain woody plant species. Thus, it is unclear at which scale elephants focus their foraging decisions. To determine this, we recorded the seasonal selection of habitats and woody plant species by elephants in the Ithala Game Reserve, South Africa. We expected that during the wet season, when both food quality and availability were high, that elephants would select primarily for habitats. This, however, does not mean that they would utilise plant species within these habitats in proportion to availability, but rather would show a stronger selection for habitats compared to plants. In contrast, during the dry season when food quality and availability declined, we expected that elephants would shift and select for the remaining high quality woody species across all habitats. Consistent with our predictions, elephants selected for the larger spatial scale (i.e. habitats) during the wet season. However, elephants did not increase their selection of woody species during the dry season, but rather increased their selection of habitats relative to woody plant selection. Unlike a number of earlier studies, we found that that neither palatability (i.e. crude protein, digestibility, and energy) alone nor tannin concentrations had a significant effect for determining the elephants' selection of woody species. However, the palatability:tannin ratio was

  7. Treatment of envenomation by Echis coloratus (mid-east saw scaled viper): a decision tree.

    PubMed

    Gilon, D; Shalev, O; Benbassat, J

    1989-01-01

    Envenomation by Echis coloratus causes a transient hemostatic failure. Systemic symptoms, hypotension and evident bleeding are rare, with only one reported fatality. In this paper, we examine the decision to treat victims of Echis coloratus by a specific horse antiserum. The decision model considers the mortality of treated and untreated envenomation, and the side effects of antiserum treatment: fatal anaphylaxis, serum sickness and increased risk of death after a possible repeated exposure to horse antiserum in the future. The results of the analysis are not sensitive to variations in the probability of side effects of antiserum treatment. They are sensitive to variations in the risk of bleeding after envenomation, in the degree of reduction of this risk by antiserum treatment and in the risk of dying after an event of bleeding. Prompt administration of antiserum appears to be the treatment of choice if it reduces the risk of bleeding from 23.6% to 20.3% and if 1.6% or more of the bleeding events are fatal. We conclude that presently available data support antiserum treatment of victims of Echis coloratus who present with hemostatic failure, even though the advantage imparted by this treatment appears to be small. PMID:2683230

  8. Ensembl 2016

    PubMed Central

    Yates, Andrew; Akanni, Wasiu; Amode, M. Ridwan; Barrell, Daniel; Billis, Konstantinos; Carvalho-Silva, Denise; Cummins, Carla; Clapham, Peter; Fitzgerald, Stephen; Gil, Laurent; Girón, Carlos García; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah E.; Janacek, Sophie H.; Johnson, Nathan; Juettemann, Thomas; Keenan, Stephen; Lavidas, Ilias; Martin, Fergal J.; Maurel, Thomas; McLaren, William; Murphy, Daniel N.; Nag, Rishi; Nuhn, Michael; Parker, Anne; Patricio, Mateus; Pignatelli, Miguel; Rahtz, Matthew; Riat, Harpreet Singh; Sheppard, Daniel; Taylor, Kieron; Thormann, Anja; Vullo, Alessandro; Wilder, Steven P.; Zadissa, Amonida; Birney, Ewan; Harrow, Jennifer; Muffato, Matthieu; Perry, Emily; Ruffier, Magali; Spudich, Giulietta; Trevanion, Stephen J.; Cunningham, Fiona; Aken, Bronwen L.; Zerbino, Daniel R.; Flicek, Paul

    2016-01-01

    The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license. PMID:26687719

  9. Ensembl 2016.

    PubMed

    Yates, Andrew; Akanni, Wasiu; Amode, M Ridwan; Barrell, Daniel; Billis, Konstantinos; Carvalho-Silva, Denise; Cummins, Carla; Clapham, Peter; Fitzgerald, Stephen; Gil, Laurent; Girón, Carlos García; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah E; Janacek, Sophie H; Johnson, Nathan; Juettemann, Thomas; Keenan, Stephen; Lavidas, Ilias; Martin, Fergal J; Maurel, Thomas; McLaren, William; Murphy, Daniel N; Nag, Rishi; Nuhn, Michael; Parker, Anne; Patricio, Mateus; Pignatelli, Miguel; Rahtz, Matthew; Riat, Harpreet Singh; Sheppard, Daniel; Taylor, Kieron; Thormann, Anja; Vullo, Alessandro; Wilder, Steven P; Zadissa, Amonida; Birney, Ewan; Harrow, Jennifer; Muffato, Matthieu; Perry, Emily; Ruffier, Magali; Spudich, Giulietta; Trevanion, Stephen J; Cunningham, Fiona; Aken, Bronwen L; Zerbino, Daniel R; Flicek, Paul

    2016-01-01

    The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license. PMID:26687719

  10. Decision-tree analysis of clinical data to aid diagnostic reasoning for equine laminitis: a cross-sectional study.

    PubMed

    Wylie, C E; Shaw, D J; Verheyen, K L P; Newton, J R

    2016-04-23

    The objective of this cross-sectional study was to compare the prevalence of selected clinical signs in laminitis cases and non-laminitic but lame controls to evaluate their capability to discriminate laminitis from other causes of lameness. Participating veterinary practitioners completed a checklist of laminitis-associated clinical signs identified by literature review. Cases were defined as horses/ponies with veterinary-diagnosed, clinically apparent laminitis; controls were horses/ponies with any lameness other than laminitis. Associations were tested by logistic regression with adjusted odds ratios (ORs) and 95% confidence intervals, with veterinary practice as an a priori fixed effect. Multivariable analysis using graphical classification tree-based statistical models linked laminitis prevalence with specific combinations of clinical signs. Data were collected for 588 cases and 201 controls. Five clinical signs had a difference in prevalence of greater than +50 per cent: 'reluctance to walk' (OR 4.4), 'short, stilted gait at walk' (OR 9.4), 'difficulty turning' (OR 16.9), 'shifting weight' (OR 17.7) and 'increased digital pulse' (OR 13.2) (all P<0.001). 'Bilateral forelimb lameness' was the best discriminator; 92 per cent of animals with this clinical sign had laminitis (OR 40.5, P<0.001). If, in addition, horses/ponies had an 'increased digital pulse', 99 per cent were identified as laminitis. 'Presence of a flat/convex sole' also significantly enhanced clinical diagnosis discrimination (OR 15.5, P<0.001). This is the first epidemiological laminitis study to use decision-tree analysis, providing the first evidence base for evaluating clinical signs to differentially diagnose laminitis from other causes of lameness. Improved evaluation of the clinical signs displayed by laminitic animals examined by first-opinion practitioners will lead to equine welfare improvements. PMID:26969668

  11. Detecting subcanopy invasive plant species in tropical rainforest by integrating optical and microwave (InSAR/PolInSAR) remote sensing data, and a decision tree algorithm

    NASA Astrophysics Data System (ADS)

    Ghulam, Abduwasit; Porton, Ingrid; Freeman, Karen

    2014-02-01

    In this paper, we propose a decision tree algorithm to characterize spatial extent and spectral features of invasive plant species (i.e., guava, Madagascar cardamom, and Molucca raspberry) in tropical rainforests by integrating datasets from passive and active remote sensing sensors. The decision tree algorithm is based on a number of input variables including matching score and infeasibility images from Mixture Tuned Matched Filtering (MTMF), land-cover maps, tree height information derived from high resolution stereo imagery, polarimetric feature images, Radar Forest Degradation Index (RFDI), polarimetric and InSAR coherence and phase difference images. Spatial distributions of the study organisms are mapped using pixel-based Winner-Takes-All (WTA) algorithm, object oriented feature extraction, spectral unmixing, and compared with the newly developed decision tree approach. Our results show that the InSAR phase difference and PolInSAR HH-VV coherence images of L-band PALSAR data are the most important variables following the MTMF outputs in mapping subcanopy invasive plant species in tropical rainforest. We also show that the three types of invasive plants alone occupy about 17.6% of the Betampona Nature Reserve (BNR) while mixed forest, shrubland and grassland areas are summed to 11.9% of the reserve. This work presents the first systematic attempt to evaluate forest degradation, habitat quality and invasive plant statistics in the BNR, and provides significant insights as to management strategies for the control of invasive plants and conversation in the reserve.

  12. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    NASA Astrophysics Data System (ADS)

    Khader, A.; Rosenberg, D.; McKee, M.

    2012-12-01

    Nitrate pollution poses a health risk for infants whose freshwater drinking source is groundwater. This risk creates a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision maker and the expected outcomes from these alternatives. The alternatives include: (i) ignore the health risk of nitrate contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, pollution transport processes, and climate (Khader and McKee, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine where methemoglobinemia is the main health problem associated with the principal pollutant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not-use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs include healthcare for methemoglobinemia, purchase of bottled water, and installation and maintenance of the groundwater monitoring system. At current

  13. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    NASA Astrophysics Data System (ADS)

    Khader, A. I.; Rosenberg, D. E.; McKee, M.

    2013-05-01

    Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i) ignore the health risk of nitrate-contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome) is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs

  14. Diagnosis of pulmonary hypertension from magnetic resonance imaging–based computational models and decision tree analysis

    PubMed Central

    Swift, Andrew J.; Capener, David; Kiely, David; Hose, Rod; Wild, Jim M.

    2016-01-01

    Abstract Accurately identifying patients with pulmonary hypertension (PH) using noninvasive methods is challenging, and right heart catheterization (RHC) is the gold standard. Magnetic resonance imaging (MRI) has been proposed as an alternative to echocardiography and RHC in the assessment of cardiac function and pulmonary hemodynamics in patients with suspected PH. The aim of this study was to assess whether machine learning using computational modeling techniques and image-based metrics of PH can improve the diagnostic accuracy of MRI in PH. Seventy-two patients with suspected PH attending a referral center underwent RHC and MRI within 48 hours. Fifty-seven patients were diagnosed with PH, and 15 had no PH. A number of functional and structural cardiac and cardiovascular markers derived from 2 mathematical models and also solely from MRI of the main pulmonary artery and heart were integrated into a classification algorithm to investigate the diagnostic utility of the combination of the individual markers. A physiological marker based on the quantification of wave reflection in the pulmonary artery was shown to perform best individually, but optimal diagnostic performance was found by the combination of several image-based markers. Classifier results, validated using leave-one-out cross validation, demonstrated that combining computation-derived metrics reflecting hemodynamic changes in the pulmonary vasculature with measurement of right ventricular morphology and function, in a decision support algorithm, provides a method to noninvasively diagnose PH with high accuracy (92%). The high diagnostic accuracy of these MRI-based model parameters may reduce the need for RHC in patients with suspected PH. PMID:27252844

  15. Diagnosis of pulmonary hypertension from magnetic resonance imaging-based computational models and decision tree analysis.

    PubMed

    Lungu, Angela; Swift, Andrew J; Capener, David; Kiely, David; Hose, Rod; Wild, Jim M

    2016-06-01

    Accurately identifying patients with pulmonary hypertension (PH) using noninvasive methods is challenging, and right heart catheterization (RHC) is the gold standard. Magnetic resonance imaging (MRI) has been proposed as an alternative to echocardiography and RHC in the assessment of cardiac function and pulmonary hemodynamics in patients with suspected PH. The aim of this study was to assess whether machine learning using computational modeling techniques and image-based metrics of PH can improve the diagnostic accuracy of MRI in PH. Seventy-two patients with suspected PH attending a referral center underwent RHC and MRI within 48 hours. Fifty-seven patients were diagnosed with PH, and 15 had no PH. A number of functional and structural cardiac and cardiovascular markers derived from 2 mathematical models and also solely from MRI of the main pulmonary artery and heart were integrated into a classification algorithm to investigate the diagnostic utility of the combination of the individual markers. A physiological marker based on the quantification of wave reflection in the pulmonary artery was shown to perform best individually, but optimal diagnostic performance was found by the combination of several image-based markers. Classifier results, validated using leave-one-out cross validation, demonstrated that combining computation-derived metrics reflecting hemodynamic changes in the pulmonary vasculature with measurement of right ventricular morphology and function, in a decision support algorithm, provides a method to noninvasively diagnose PH with high accuracy (92%). The high diagnostic accuracy of these MRI-based model parameters may reduce the need for RHC in patients with suspected PH. PMID:27252844

  16. An adaptive incremental approach to constructing ensemble classifiers: Application in an information-theoretic computer-aided decision system for detection of masses in mammograms

    SciTech Connect

    Mazurowski, Maciej A.; Zurada, Jacek M.; Tourassi, Georgia D.

    2009-07-15

    Ensemble classifiers have been shown efficient in multiple applications. In this article, the authors explore the effectiveness of ensemble classifiers in a case-based computer-aided diagnosis system for detection of masses in mammograms. They evaluate two general ways of constructing subclassifiers by resampling of the available development dataset: Random division and random selection. Furthermore, they discuss the problem of selecting the ensemble size and propose two adaptive incremental techniques that automatically select the size for the problem at hand. All the techniques are evaluated with respect to a previously proposed information-theoretic CAD system (IT-CAD). The experimental results show that the examined ensemble techniques provide a statistically significant improvement (AUC=0.905{+-}0.024) in performance as compared to the original IT-CAD system (AUC=0.865{+-}0.029). Some of the techniques allow for a notable reduction in the total number of examples stored in the case base (to 1.3% of the original size), which, in turn, results in lower storage requirements and a shorter response time of the system. Among the methods examined in this article, the two proposed adaptive techniques are by far the most effective for this purpose. Furthermore, the authors provide some discussion and guidance for choosing the ensemble parameters.

  17. Ensembl 2014

    PubMed Central

    Flicek, Paul; Amode, M. Ridwan; Barrell, Daniel; Beal, Kathryn; Billis, Konstantinos; Brent, Simon; Carvalho-Silva, Denise; Clapham, Peter; Coates, Guy; Fitzgerald, Stephen; Gil, Laurent; Girón, Carlos García; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah; Johnson, Nathan; Juettemann, Thomas; Kähäri, Andreas K.; Keenan, Stephen; Kulesha, Eugene; Martin, Fergal J.; Maurel, Thomas; McLaren, William M.; Murphy, Daniel N.; Nag, Rishi; Overduin, Bert; Pignatelli, Miguel; Pritchard, Bethan; Pritchard, Emily; Riat, Harpreet S.; Ruffier, Magali; Sheppard, Daniel; Taylor, Kieron; Thormann, Anja; Trevanion, Stephen J.; Vullo, Alessandro; Wilder, Steven P.; Wilson, Mark; Zadissa, Amonida; Aken, Bronwen L.; Birney, Ewan; Cunningham, Fiona; Harrow, Jennifer; Herrero, Javier; Hubbard, Tim J.P.; Kinsella, Rhoda; Muffato, Matthieu; Parker, Anne; Spudich, Giulietta; Yates, Andy; Zerbino, Daniel R.; Searle, Stephen M.J.

    2014-01-01

    Ensembl (http://www.ensembl.org) creates tools and data resources to facilitate genomic analysis in chordate species with an emphasis on human, major vertebrate model organisms and farm animals. Over the past year we have increased the number of species that we support to 77 and expanded our genome browser with a new scrollable overview and improved variation and phenotype views. We also report updates to our core datasets and improvements to our gene homology relationships from the addition of new species. Our REST service has been extended with additional support for comparative genomics and ontology information. Finally, we provide updated information about our methods for data access and resources for user training. PMID:24316576

  18. Prediction of healthy blood with data mining classification by using Decision Tree, Naive Baysian and SVM approaches

    NASA Astrophysics Data System (ADS)

    Khalilinezhad, Mahdieh; Minaei, Behrooz; Vernazza, Gianni; Dellepiane, Silvana

    2015-03-01

    Data mining (DM) is the process of discovery knowledge from large databases. Applications of data mining in Blood Transfusion Organizations could be useful for improving the performance of blood donation service. The aim of this research is the prediction of healthiness of blood donors in Blood Transfusion Organization (BTO). For this goal, three famous algorithms such as Decision Tree C4.5, Naïve Bayesian classifier, and Support Vector Machine have been chosen and applied to a real database made of 11006 donors. Seven fields such as sex, age, job, education, marital status, type of donor, results of blood tests (doctors' comments and lab results about healthy or unhealthy blood donors) have been selected as input to these algorithms. The results of the three algorithms have been compared and an error cost analysis has been performed. According to this research and the obtained results, the best algorithm with low error cost and high accuracy is SVM. This research helps BTO to realize a model from blood donors in each area in order to predict the healthy blood or unhealthy blood of donors. This research could be useful if used in parallel with laboratory tests to better separate unhealthy blood.

  19. Clinical elements that predict outcome after traumatic brain injury: a prospective multicenter recursive partitioning (decision-tree) analysis.

    PubMed

    Brown, Allen W; Malec, James F; McClelland, Robyn L; Diehl, Nancy N; Englander, Jeffrey; Cifu, David X

    2005-10-01

    Traumatic brain injury (TBI) often presents clinicians with a complex combination of clinical elements that can confound treatment and make outcome prediction challenging. Predictive models have commonly used acute physiological variables and gross clinical measures to predict mortality and basic outcome endpoints. The primary goal of this study was to consider all clinical elements available concerning a survivor of TBI admitted for inpatient rehabilitation, and identify those factors that predict disability, need for supervision, and productive activity one year after injury. The Traumatic Brain Injury Model Systems (TBIMS) database was used for decision tree analysis using recursive partitioning (n = 3463). Outcome measures included the Functional Independence Measure(), the Disability Rating Scale, the Supervision Rating Scale, and a measure of productive activity. Predictor variables included all physical examination elements, measures of injury severity (initial Glasgow Coma Scale score, duration of post-traumatic amnesia [PTA], length of coma, CT scan pathology), gender, age, and years of education. The duration of PTA, age, and most elements of the physical examination were predictive of early disability. The duration of PTA alone was selected to predict late disability and independent living. The duration of PTA, age, sitting balance, and limb strength were selected to predict productive activity at 1 year. The duration of PTA was the best predictor of outcome selected in this model for all endpoints and elements of the physical examination provided additional predictive value. Valid and reliable measures of PTA and physical impairment after TBI are important for accurate outcome prediction. PMID:16238482

  20. Assessing and monitoring the risk of desertification in Dobrogea, Romania, using Landsat data and decision tree classifier.

    PubMed

    Vorovencii, Iosif

    2015-04-01

    The risk of the desertification of a part of Romania is increasingly evident, constituting a serious problem for the environment and the society. This article attempts to assess and monitor the risk of desertification in Dobrogea using Landsat Thematic Mapper (TM) satellite images acquired in 1987, 1994, 2000, 2007 and 2011. In order to assess the risk of desertification, we used as indicators the Modified Soil Adjustment Vegetation Index 1 (MSAVI1), the Moving Standard Deviation Index (MSDI) and the albedo, indices relating to the vegetation conditions, the landscape pattern and micrometeorology. The decision tree classifier (DTC) was also used on the basis of pre-established rules, and maps displaying six grades of desertification risk were obtained: non, very low, low, medium, high and severe. Land surface temperature (LST) was also used for the analysis. The results indicate that, according to pre-established rules for the period of 1987-2011, there are two grades of desertification risk that have an ascending trend in Dobrogea, namely very low and medium desertification. An investigation into the causes of the desertification risk revealed that high temperature is the main factor, accompanied by the destruction of forest shelterbelts and of the irrigation system and, to a smaller extent, by the fragmentation of agricultural land and the deforestation in the study area. PMID:25800368

  1. Landsat-derived cropland mask for Tanzania using 2010-2013 time series and decision tree classifier methods

    NASA Astrophysics Data System (ADS)

    Justice, C. J.

    2015-12-01

    80% of Tanzania's population is involved in the agriculture sector. Despite this national dependence, agricultural reporting is minimal and monitoring efforts are in their infancy. The cropland mask developed through this study provides the framework for agricultural monitoring through informing analysis of crop conditions, dispersion, and intensity at a national scale. Tanzania is dominated by smallholder agricultural systems with an average field size of less than one hectare (Sarris et al, 2006). At this field scale, previous classifications of agricultural land in Tanzania using MODIS course resolution data are insufficient to inform a working monitoring system. The nation-wide cropland mask in this study was developed using composited Landsat tiles from a 2010-2013 time series. Decision tree classifiers methods were used in the study with representative training areas collected for agriculture and no agriculture using appropriate indices to separate these classes (Hansen et al, 2013). Validation was done using random sample and high resolution satellite images to compare Agriculture and No agriculture samples from the study area. The techniques used in this study were successful and have the potential to be adapted for other countries, allowing targeted monitoring efforts to improve food security, market price, and inform agricultural policy.

  2. An expert system with radial basis function neural network based on decision trees for predicting sediment transport in sewers.

    PubMed

    Ebtehaj, Isa; Bonakdari, Hossein; Zaji, Amir Hossein

    2016-01-01

    In this study, an expert system with a radial basis function neural network (RBF-NN) based on decision trees (DT) is designed to predict sediment transport in sewer pipes at the limit of deposition. First, sensitivity analysis is carried out to investigate the effect of each parameter on predicting the densimetric Froude number (Fr). The results indicate that utilizing the ratio of the median particle diameter to pipe diameter (d/D), ratio of median particle diameter to hydraulic radius (d/R) and volumetric sediment concentration (C(V)) as the input combination leads to the best Fr prediction. Subsequently, the new hybrid DT-RBF method is presented. The results of DT-RBF are compared with RBF and RBF-particle swarm optimization (PSO), which uses PSO for RBF training. It appears that DT-RBF is more accurate (R(2) = 0.934, MARE = 0.103, RMSE = 0.527, SI = 0.13, BIAS = -0.071) than the two other RBF methods. Moreover, the proposed DT-RBF model offers explicit expressions for use by practicing engineers. PMID:27386995

  3. Rejecting Non-MIP-Like Tracks using Boosted Decision Trees with the T2K Pi-Zero Subdetector

    NASA Astrophysics Data System (ADS)

    Hogan, Matthew; Schwehr, Jacklyn; Cherdack, Daniel; Wilson, Robert; T2K Collaboration

    2016-03-01

    Tokai-to-Kamioka (T2K) is a long-baseline neutrino experiment with a narrow band energy spectrum peaked at 600 MeV. The Pi-Zero detector (PØD) is a plastic scintillator-based detector located in the off-axis near detector complex 280 meters from the beam origin. It is designed to constrain neutral-current induced π0 production background at the far detector using the water target which is interleaved between scintillator layers. A PØD-based measurement of charged-current (CC) single charged pion (1π+) production on water is being developed which will have expanded phase space coverage as compared to the previous analysis. The signal channel for this analysis, which for T2K is dominated by Δ production, is defined as events that produce a single muon, single charged pion, and any number of nucleons in the final state. The analysis will employ machine learning algorithms to enhance CC1π+ selection by studying topological observables that characterize signal well. Important observables for this analysis are those that discriminate a minimum ionizing particle (MIP) like a muon or pion from a proton at the T2K energies. This work describes the development of a discriminator using Boosted Decision Trees to reject non-MIP-like PØD tracks.

  4. Model-Based Design of a Decision Tree for Treating HER2+ Cancers Based on Genetic and Protein Biomarkers

    PubMed Central

    Kirouac, DC; Lahdenranta, J; Du, J; Yarar, D; Onsum, MD; Nielsen, UB; McDonagh, CF

    2015-01-01

    Human cancers are incredibly diverse with regard to molecular aberrations, dependence on oncogenic signaling pathways, and responses to pharmacological intervention. We wished to assess how cellular dependence on the canonical PI3K vs. MAPK pathways within HER2+ cancers affects responses to combinations of targeted therapies, and biomarkers predictive of their activity. Through an integrative analysis of mechanistic model simulations and in vitro cell line profiling, we designed a six-arm decision tree to stratify treatment of HER2+ cancers using combinations of targeted agents. Activating mutations in the PI3K and MAPK pathways (PIK3CA and KRAS), and expression of the HER3 ligand heregulin determined sensitivity to combinations of inhibitors against HER2 (lapatinib), HER3 (MM-111), AKT (MK-2206), and MEK (GSK-1120212; trametinib), in addition to the standard of care trastuzumab (Herceptin). The strategy used to identify effective combinations and predictive biomarkers in HER2-expressing tumors may be more broadly extendable to other human cancers. PMID:26225238

  5. Application of artificial neural network, fuzzy logic and decision tree algorithms for modelling of streamflow at Kasol in India.

    PubMed

    Senthil Kumar, A R; Goyal, Manish Kumar; Ojha, C S P; Singh, R D; Swamee, P K

    2013-01-01

    The prediction of streamflow is required in many activities associated with the planning and operation of the components of a water resources system. Soft computing techniques have proven to be an efficient alternative to traditional methods for modelling qualitative and quantitative water resource variables such as streamflow, etc. The focus of this paper is to present the development of models using multiple linear regression (MLR), artificial neural network (ANN), fuzzy logic and decision tree algorithms such as M5 and REPTree for predicting the streamflow at Kasol located at the upstream of Bhakra reservoir in Sutlej basin in northern India. The input vector to the various models using different algorithms was derived considering statistical properties such as auto-correlation function, partial auto-correlation and cross-correlation function of the time series. It was found that REPtree model performed well compared to other soft computing techniques such as MLR, ANN, fuzzy logic, and M5P investigated in this study and the results of the REPTree model indicate that the entire range of streamflow values were simulated fairly well. The performance of the naïve persistence model was compared with other models and the requirement of the development of the naïve persistence model was also analysed by persistence index. PMID:24355836

  6. Rapid decision support tool based on novel ecosystem service variables for retrofitting of permeable pavement systems in the presence of trees.

    PubMed

    Scholz, Miklas; Uzomah, Vincent C

    2013-08-01

    The retrofitting of sustainable drainage systems (SuDS) such as permeable pavements is currently undertaken ad hoc using expert experience supported by minimal guidance based predominantly on hard engineering variables. There is a lack of practical decision support tools useful for a rapid assessment of the potential of ecosystem services when retrofitting permeable pavements in urban areas that either feature existing trees or should be planted with trees in the near future. Thus the aim of this paper is to develop an innovative rapid decision support tool based on novel ecosystem service variables for retrofitting of permeable pavement systems close to trees. This unique tool proposes the retrofitting of permeable pavements that obtained the highest ecosystem service score for a specific urban site enhanced by the presence of trees. This approach is based on a novel ecosystem service philosophy adapted to permeable pavements rather than on traditional engineering judgement associated with variables based on quick community and environment assessments. For an example case study area such as Greater Manchester, which was dominated by Sycamore and Common Lime, a comparison with the traditional approach of determining community and environment variables indicates that permeable pavements are generally a preferred SuDS option. Permeable pavements combined with urban trees received relatively high scores, because of their great potential impact in terms of water and air quality improvement, and flood control, respectively. The outcomes of this paper are likely to lead to more combined permeable pavement and tree systems in the urban landscape, which are beneficial for humans and the environment. PMID:23697848

  7. Decision-tree and rule-induction approach to integration of remotely sensed and GIS data in mapping vegetation in disturbed or hilly environments

    NASA Astrophysics Data System (ADS)

    Lees, Brian G.; Ritman, Kim

    1991-11-01

    The integration of Landsat TM and environmental GIS data sets using artificial intelligence rule-induction and decision-tree analysis is shown to facilitate the production of vegetation maps with both floristic and structural information. This technique is particularly suited to vegetation mapping in disturbed or hilly environments that are unsuited to either conventional remote sensing methods or GIS modeling using environmental data bases.

  8. Utilizing home health care electronic health records for telehomecare patients with heart failure: a decision tree approach to detect associations with rehospitalizations

    PubMed Central

    Kang, Youjeong; McHugh, Matthew D; Chittams, Jesse; Bowles, Kathryn H.

    2016-01-01

    Heart failure is a complex condition with a significant impact on patients’ lives. A few studies have identified risk factors associated with rehospitalization among telehomecare patients with heart failure using logistic regression or survival analysis models. To date there are no published studies that have used data mining techniques to detect associations with rehospitalizations among telehomecare patients with heart failure. This study is a secondary analysis of the home health care electronic medical record called the Outcome Assessment and Information Set (OASIS)-C for 552 telemonitored heart failure patients. Bivariate analyses using SAS™ and a decision tree technique using Waikato Environment for Knowledge Analysis were used. From the decision tree technique, the presence of skin issue(s) was identified as the top predictor of rehospitalization that could be identified during the start of care assessment, followed by patient’s living situation, patient’s overall health status, severe pain experiences, frequency of activity-limiting pain, and total number of anticipated therapy visits coombined. Examining risk factors for rehospitalization from the OASIS-C database using a decision tree approach among a cohort of telehomecare patients provided a broad understanding of the characteristics of patients who are appropriate for the use of telehomcare or who need additional supports. PMID:26848645

  9. Ensemble Feature Learning of Genomic Data Using Support Vector Machine

    PubMed Central

    Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R.; Braytee, Ali; Kennedy, Paul J.

    2016-01-01

    The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data. PMID:27304923

  10. Assessing the safety of co-exposure to food packaging migrants in food and water using the maximum cumulative ratio and an established decision tree.

    PubMed

    Price, Paul; Zaleski, Rosemary; Hollnagel, Heli; Ketelslegers, Hans; Han, Xianglu

    2014-01-01

    Food contact materials can release low levels of multiple chemicals (migrants) into foods and beverages, to which individuals can be exposed through food consumption. This paper investigates the potential for non-carcinogenic effects from exposure to multiple migrants using the Cefic Mixtures Ad hoc Team (MIAT) decision tree. The purpose of the assessment is to demonstrate how the decision tree can be applied to concurrent exposures to multiple migrants using either hazard or structural data on the specific components, i.e. based on the acceptable daily intake (ADI) or the threshold of toxicological concern. The tree was used to assess risks from co-exposure to migrants reported in a study on non-intentionally added substances (NIAS) eluting from food contact-grade plastic and two studies of water bottles: one on organic compounds and the other on ionic forms of various elements. The MIAT decision tree assigns co-exposures to different risk management groups (I, II, IIIA and IIIB) based on the hazard index, and the maximum cumulative ratio (MCR). The predicted co-exposures for all examples fell into Group II (low toxicological concern) and had MCR values of 1.3 and 2.4 (indicating that one or two components drove the majority of the mixture's toxicity). MCR values from the study of inorganic ions (126 mixtures) ranged from 1.1 to 3.8 for glass and from 1.1 to 5.0 for plastic containers. The MCR values indicated that a single compound drove toxicity in 58% of the mixtures. MCR values also declined with increases in the hazard index for the screening assessments of exposure (suggesting fewer substances contributed as risk potential increased). Overall, it can be concluded that the data on co-exposure to migrants evaluated in these case studies are of low toxicological concern and the safety assessment approach described in this paper was shown to be a helpful screening tool. PMID:24320041

  11. Measurement of single top quark production in the tau+jets channnel using boosted decision trees at D0

    SciTech Connect

    Liu, Zhiyi

    2009-12-01

    The top quark is the heaviest known matter particle and plays an important role in the Standard Model of particle physics. At hadron colliders, it is possible to produce single top quarks via the weak interaction. This allows a direct measurement of the CKM matrix element Vtb and serves as a window to new physics. The first direct measurement of single top quark production with a tau lepton in the final state (the tau+jets channel) is presented in this thesis. The measurement uses 4.8 fb-1 of Tevatron Run II data in p$\\bar{p}$ collisions at √s = 1.96 TeV acquired by the D0 experiment. After selecting a data sample and building a background model, the data and background model are in good agreement. A multivariate technique, boosted decision trees, is employed in discriminating the small single top quark signal from a large background. The expected sensitivity of the tau+jets channel in the Standard Model is 1.8 standard deviations. Using a Bayesian statistical approach, an upper limit on the cross section of single top quark production in the tau+jets channel is measured as 7.3 pb at 95% confidence level, and the cross section is measured as 3.4-1.8+2.0 pb. The result of the single top quark production in the tau+jets channel is also combined with those in the electron+jets and muon+jets channels. The expected sensitivity of the electron, muon and tau combined analysis is 4.7 standard deviations, to be compared to 4.5 standard deviations in electron and muon alone. The measured cross section in the three combined final states is σ(p$\\bar{p}$ → tb + X,tqb + X) = 3.84-0.83+0.89 pb. A lower limit on |Vtb| is also measured in the three combined final states to be larger than 0.85 at 95% confidence level. These results are consistent with Standard Model expectations.

  12. MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging

    NASA Astrophysics Data System (ADS)

    Chen, Lei; Kamel, Mohamed S.

    2016-01-01

    In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.

  13. A decision tree approach for the application of drug metabolism and kinetic studies to in vivo and in vitro toxicological and pharmacological testing.

    PubMed

    Bach, P H; Bridges, J W

    1985-01-01

    The integration of toxicological and other biological findings with information on drug metabolism and pharmacokinetics is often very important for rational decision making in safety evaluation programmes. This goal is unlikely to be achieved by conducting a routine package of inflexibly defined drug metabolism and pharmacokinetic test protocols for each new chemical. Rather, an intelligent selection of experiments based on the known properties of a chemical is required. A series of decision trees are proposed which serve as an aide memoire in the choice of appropriate drug metabolism and pharmacokinetic experiments. These decision trees cover the physicochemical properties of a chemical, data on animal and human pharmacology and toxicology, and environmental information relevant to possible contamination. In many cases, drug metabolism and pharmacokinetic factors are an important prerequisite to the design of in vitro tests that are relevant to the in vivo situation. A scheme is provided to assist the identification of appropriate conditions for the in vitro testing of individual chemicals. PMID:3868347

  14. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions

    PubMed Central

    Sundaram, Mekala; Willoughby, Janna R.; Lichti, Nathanael I.; Steele, Michael A.; Swihart, Robert K.

    2015-01-01

    The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae) to eastern gray squirrels (Sciurus carolinensis), and measured the time and distance travelled by squirrels that consumed or cached each seed. We estimated 11 physical and chemical seed traits for each species, and the phylogenetic relationships between the 23 hardwood trees. Variance partitioning revealed that considerable variation in foraging investment was attributable to seed traits alone (27–73%), and combined effects of seed traits and phylogeny of hardwood trees (5–55%). A phylogenetic PCA (pPCA) on seed traits and tree phylogeny resulted in 2 “global” axes of traits that were phylogenetically autocorrelated at the family and genus level and a third “local” axis in which traits were not phylogenetically autocorrelated. Collectively, these axes explained 30–76% of the variation in squirrel foraging investments. The first global pPCA axis, which produced large scores for seed species with thin shells, low lipid and high carbohydrate content, was negatively related to time to consume and cache seeds and travel distance to cache. The second global pPCA axis, which produced large scores for seeds with high protein, low tannin and low dormancy levels, was an important predictor of consumption time only. The local pPCA axis primarily reflected kernel mass. Although it explained only 12% of the variation in trait space and was not autocorrelated among phylogenetic clades, the local axis was related to all four squirrel foraging investments. Squirrel foraging behaviors are influenced by a combination of phylogenetically conserved and more evolutionarily labile seed traits that is

  15. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions.

    PubMed

    Sundaram, Mekala; Willoughby, Janna R; Lichti, Nathanael I; Steele, Michael A; Swihart, Robert K

    2015-01-01

    The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae) to eastern gray squirrels (Sciurus carolinensis), and measured the time and distance travelled by squirrels that consumed or cached each seed. We estimated 11 physical and chemical seed traits for each species, and the phylogenetic relationships between the 23 hardwood trees. Variance partitioning revealed that considerable variation in foraging investment was attributable to seed traits alone (27-73%), and combined effects of seed traits and phylogeny of hardwood trees (5-55%). A phylogenetic PCA (pPCA) on seed traits and tree phylogeny resulted in 2 "global" axes of traits that were phylogenetically autocorrelated at the family and genus level and a third "local" axis in which traits were not phylogenetically autocorrelated. Collectively, these axes explained 30-76% of the variation in squirrel foraging investments. The first global pPCA axis, which produced large scores for seed species with thin shells, low lipid and high carbohydrate content, was negatively related to time to consume and cache seeds and travel distance to cache. The second global pPCA axis, which produced large scores for seeds with high protein, low tannin and low dormancy levels, was an important predictor of consumption time only. The local pPCA axis primarily reflected kernel mass. Although it explained only 12% of the variation in trait space and was not autocorrelated among phylogenetic clades, the local axis was related to all four squirrel foraging investments. Squirrel foraging behaviors are influenced by a combination of phylogenetically conserved and more evolutionarily labile seed traits that is consistent with a weak

  16. Decision-tree-model identification of nitrate pollution activities in groundwater: A combination of a dual isotope approach and chemical ions

    NASA Astrophysics Data System (ADS)

    Xue, Dongmei; Pang, Fengmei; Meng, Fanqiao; Wang, Zhongliang; Wu, Wenliang

    2015-09-01

    To develop management practices for agricultural crops to protect against NO3- contamination in groundwater, dominant pollution activities require reliable classification. In this study, we (1) classified potential NO3- pollution activities via an unsupervised learning algorithm based on δ15N- and δ18O-NO3- and physico-chemical properties of groundwater at 55 sampling locations; and (2) determined which water quality parameters could be used to identify the sources of NO3- contamination via a decision tree model. When a combination of δ15N-, δ18O-NO3- and physico-chemical properties of groundwater was used as an input for the k-means clustering algorithm, it allowed for a reliable clustering of the 55 sampling locations into 4 corresponding agricultural activities: well irrigated agriculture (28 sampling locations), sewage irrigated agriculture (16 sampling locations), a combination of sewage irrigated agriculture, farm and industry (5 sampling locations) and a combination of well irrigated agriculture and farm (6 sampling locations). A decision tree model with 97.5% classification success was developed based on SO42 - and Cl- variables. The NO3- and the δ15N- and δ18O-NO3- variables demonstrated limitation in developing a decision tree model as multiple N sources and fractionation processes both resulted in difficulties of discriminating NO3- concentrations and isotopic values. Although only the SO42 - and Cl- were selected as important discriminating variables, concentration data alone could not identify the specific NO3- sources responsible for groundwater contamination. This is a result of comprehensive analysis. To further reduce NO3- contamination, an integrated approach should be set-up by combining N and O isotopes of NO3- with land-uses and physico-chemical properties, especially in areas with complex agricultural activities.

  17. Decision-tree-model identification of nitrate pollution activities in groundwater: A combination of a dual isotope approach and chemical ions.

    PubMed

    Xue, Dongmei; Pang, Fengmei; Meng, Fanqiao; Wang, Zhongliang; Wu, Wenliang

    2015-09-01

    To develop management practices for agricultural crops to protect against NO3(-) contamination in groundwater, dominant pollution activities require reliable classification. In this study, we (1) classified potential NO3(-) pollution activities via an unsupervised learning algorithm based on δ(15)N- and δ(18)O-NO3(-) and physico-chemical properties of groundwater at 55 sampling locations; and (2) determined which water quality parameters could be used to identify the sources of NO3(-) contamination via a decision tree model. When a combination of δ(15)N-, δ(18)O-NO3(-) and physico-chemical properties of groundwater was used as an input for the k-means clustering algorithm, it allowed for a reliable clustering of the 55 sampling locations into 4 corresponding agricultural activities: well irrigated agriculture (28 sampling locations), sewage irrigated agriculture (16 sampling locations), a combination of sewage irrigated agriculture, farm and industry (5 sampling locations) and a combination of well irrigated agriculture and farm (6 sampling locations). A decision tree model with 97.5% classification success was developed based on SO4(2-) and Cl(-) variables. The NO3(-) and the δ(15)N- and δ(18)O-NO3(-) variables demonstrated limitation in developing a decision tree model as multiple N sources and fractionation processes both resulted in difficulties of discriminating NO3(-) concentrations and isotopic values. Although only the SO4(2-) and Cl(-) were selected as important discriminating variables, concentration data alone could not identify the specific NO3(-) sources responsible for groundwater contamination. This is a result of comprehensive analysis. To further reduce NO3(-) contamination, an integrated approach should be set-up by combining N and O isotopes of NO3(-) with land-uses and physico-chemical properties, especially in areas with complex agricultural activities. PMID:26231989

  18. Application of Decision Tree to Obtain Optimal Operation Rules for Reservoir Flood Control Considering Sediment Desilting-Case Study of Tseng Wen Reservoir

    NASA Astrophysics Data System (ADS)

    ShiouWei, L.

    2014-12-01

    Reservoirs are the most important water resources facilities in Taiwan.However,due to the steep slope and fragile geological conditions in the mountain area,storm events usually cause serious debris flow and flood,and the flood then will flush large amount of sediment into reservoirs.The sedimentation caused by flood has great impact on the reservoirs life.Hence,how to operate a reservoir during flood events to increase the efficiency of sediment desilting without risk the reservoir safety and impact the water supply afterward is a crucial issue in Taiwan.  Therefore,this study developed a novel optimization planning model for reservoir flood operation considering flood control and sediment desilting,and proposed easy to use operating rules represented by decision trees.The decision trees rules have considered flood mitigation,water supply and sediment desilting.The optimal planning model computes the optimal reservoir release for each flood event that minimum water supply impact and maximum sediment desilting without risk the reservoir safety.Beside the optimal flood operation planning model,this study also proposed decision tree based flood operating rules that were trained by the multiple optimal reservoir releases to synthesis flood scenarios.The synthesis flood scenarios consists of various synthesis storm events,reservoir's initial storage and target storages at the end of flood operating.  Comparing the results operated by the decision tree operation rules(DTOR) with that by historical operation for Krosa Typhoon in 2007,the DTOR removed sediment 15.4% more than that of historical operation with reservoir storage only8.38×106m3 less than that of historical operation.For Jangmi Typhoon in 2008,the DTOR removed sediment 24.4% more than that of historical operation with reservoir storage only 7.58×106m3 less than that of historical operation.The results show that the proposed DTOR model can increase the sediment desilting efficiency and extend the

  19. Ensemble Integration of Forest Disturbance Maps for the Landscape Change Monitoring System (LCMS)

    NASA Astrophysics Data System (ADS)

    Cohen, W. B.; Healey, S. P.; Yang, Z.; Zhu, Z.; Woodcock, C. E.; Kennedy, R. E.; Huang, C.; Steinwand, D.; Vogelmann, J. E.; Stehman, S. V.; Loveland, T. R.

    2014-12-01

    The recent convergence of free, high quality Landsat data and acceleration in the development of dense Landsat time series algorithms has spawned a nascent interagency effort known as the Landscape Change Monitoring System (LCMS). LCMS is being designed to map historic land cover changes associated with all major disturbance agents and land cover types in the US. Currently, five existing algorithms are being evaluated for inclusion in LCMS. The priorities of these five algorithms overlap to some degree, but each has its own strengths. This has led to the adoption of a novel approach, within LCMS, to integrate the map outputs (i.e., base learners) from these change detection algorithms using empirical ensemble models. Training data are derived from independent datasets representing disturbances such as: harvest, fire, insects, wind, and land use change. Ensemble modeling is expected to produce significant increases in predictive accuracy relative to the results of the individual base learners. The non-parametric models used in LCMS also provide a framework for matching output ensemble maps to independent sample-based statistical estimates of disturbance area. Multiple decision trees "vote" on class assignment, and it is possible to manipulate vote thresholds to ensure that ensemble maps reflect areas of disturbance derived from sources such as national-scale ground or image-based inventories. This talk will focus on results of the first ensemble integration of the base learners for six Landsat scenes distributed across the US. We will present an assessment of base learner performance across different types of disturbance against an independently derived, sample-based disturbance dataset (derived from the TimeSync Landsat time series visualization tool). The goal is to understand the contributions of each base learner to the quality of the ensemble map products. We will also demonstrate how the ensemble map products can be manipulated to match sample-based annual

  20. Ensemble Exigent Forecasting of Critical Weather Events

    NASA Astrophysics Data System (ADS)

    Hoffman, R. N.; Gombos, D.

    2011-12-01

    To improve the forecasting of and society's preparedness for "worst-case" weather damage scenarios, we have developed ensemble exigent analysis. Exigent analysis determines worst cast scenarios and associated probability quantiles from the joint spatial properties of multivariate damaging weather events. Using the ensemble-estimated forecast covariance, we (1) identify the forecast exigent analysis perturbation (ExAP) and (2) find the contemporaneous and antecedent meteorological conditions that are most likely to coexist with or to evolve into the ExAP at the forecast time. Here we focus on the first objective, the ExAP identification problem. The ExAP is the perturbation wrt to the ensemble mean at the forecast time that maximizes the damage in the subspace of the ensemble with respect to a user-defined damage metric (i.e. maximizes the sum of the damage perturbation over the domain of interest) and to a user-specified ensemble probability quantile (EPQ) defined in terms of the Mahalanobis distance of the perturbation to the ensemble mean. Making use of a universal relationship (for Gaussian ensembles) between the quantile of the damage functional and the EPQ, we explain the ExAP using topological arguments. Then, we formally define the ExAP by making use of the ensemble-estimated covariance of the damage ensemble in a Lagrangian minimization technique according to an exigent analysis theorem. Two case studies with varying complexities and expected accuracies are used to illustrate ensemble exigent analysis. The first case study employs the gridded forecast number of heating degree days (HDD) to analyze forecast heating demand over a large portion of the United Sates for a cold event on 9 January 2010. The second case uses ensemble forecasts of 2-meter temperature and estimates of the spatial distribution of citrus trees to define the damage functional as the percentage of Florida citrus trees damaged by the 11 January 2010 Florida freeze event. The ExAP of this

  1. Using Decision Trees to Examine Relationships between Inter-Annual Vegetation Variability, Topographic Attributes, and Climate Signals

    NASA Astrophysics Data System (ADS)

    White, A. B.; Kumar, P.

    2003-12-01

    The objective of this research is to develop KDD (knowledge discovery in databases) techniques for spatio-temporal geo-data, and use these techniques to examine inter-annual vegetation health signals. The underlying hypothesis of the research is that the signatures of inter-annual variability of climate on vegetation dynamics as represented by the statistical descriptors of vegetation index variations depend upon a variety of attributes related to the topography, hydrology, physiography, and climate. NDVI (normalized differential vegetation index) is enlisted to represent vegetation health and relationships between this index and topographic attributes such as elevation, slope, aspect, compound topographic index (CTI), and the proximity to a stream, are analyzed. Several scientific questions related to the identification and characterization of the inter-annual variability ensue as a consequence of our hypothesis. Investigations were performed using 13 years of 1-km resolution NDVI data from the AVHRR instrument on NOAA's POES (polar-orbiting operational environmental satellite) over the continental U.S. Various temporal change indices were used in order to identify anomalous inter-annual behavior in the NDVI index, including maximum absolute and relative deviations from the 13-year mean and positive and negative persistence indices (after Zhou et al., 2001). The KDD technique used in this research is the decision tree, which falls under the classification and prediction division of data mining techniques. The algorithm is similar to c4.5 and id3, but can handle continuous input and output values without binning and is optimized to determine the minimum error. Future work will incorporate clustering algorithms (both distance and density-based) and association rule algorithms (constraint-based) adapted for spatial-temporal data. Investigations will also be performed at smaller spatial scales, integrating higher resolution data. Throughout the growing season

  2. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  3. Predicting skin sensitisation using a decision tree integrated testing strategy with an in silico model and in chemico/in vitro assays.

    PubMed

    Macmillan, Donna S; Canipa, Steven J; Chilton, Martyn L; Williams, Richard V; Barber, Christopher G

    2016-04-01

    There is a pressing need for non-animal methods to predict skin sensitisation potential and a number of in chemico and in vitro assays have been designed with this in mind. However, some compounds can fall outside the applicability domain of these in chemico/in vitro assays and may not be predicted accurately. Rule-based in silico models such as Derek Nexus are expert-derived from animal and/or human data and the mechanism-based alert domain can take a number of factors into account (e.g. abiotic/biotic activation). Therefore, Derek Nexus may be able to predict for compounds outside the applicability domain of in chemico/in vitro assays. To this end, an integrated testing strategy (ITS) decision tree using Derek Nexus and a maximum of two assays (from DPRA, KeratinoSens, LuSens, h-CLAT and U-SENS) was developed. Generally, the decision tree improved upon other ITS evaluated in this study with positive and negative predictivity calculated as 86% and 81%, respectively. Our results demonstrate that an ITS using an in silico model such as Derek Nexus with a maximum of two in chemico/in vitro assays can predict the sensitising potential of a number of chemicals, including those outside the applicability domain of existing non-animal assays. PMID:26796566

  4. Improvement of the identification of four heavy metals in environmental samples by using predictive decision tree models coupled with a set of five bioluminescent bacteria.

    PubMed

    Jouanneau, Sulivan; Durand, Marie-José; Courcoux, Philippe; Blusseau, Thomas; Thouand, Gérald

    2011-04-01

    A primary statistical model based on the crossings between the different detection ranges of a set of five bioluminescent bacterial strains was developed to identify and quantify four metals which were at several concentrations in different mixtures: cadmium, arsenic III, mercury, and copper. Four specific decision trees based on the CHAID algorithm (CHi-squared Automatic Interaction Detector type) which compose this model were designed from a database of 576 experiments (192 different mixture conditions). A specific software, 'Metalsoft', helped us choose the best decision tree and a user-friendly way to identify the metal. To validate this innovative approach, 18 environmental samples containing a mixture of these metals were submitted to a bioassay and to standardized chemical methods. The results show on average a high correlation of 98.6% for the qualitative metal identification and 94.2% for the quantification. The results are particularly encouraging, and our model is able to provide semiquantitative information after only 60 min without pretreatments of samples. PMID:21355529

  5. An ensemble classification-based approach applied to retinal blood vessel segmentation.

    PubMed

    Fraz, Muhammad Moazam; Remagnino, Paolo; Hoppe, Andreas; Uyyanonvara, Bunyarit; Rudnicka, Alicja R; Owen, Christopher G; Barman, Sarah A

    2012-09-01

    This paper presents a new supervised method for segmentation of blood vessels in retinal photographs. This method uses an ensemble system of bagged and boosted decision trees and utilizes a feature vector based on the orientation analysis of gradient vector field, morphological transformation, line strength measures, and Gabor filter responses. The feature vector encodes information to handle the healthy as well as the pathological retinal image. The method is evaluated on the publicly available DRIVE and STARE databases, frequently used for this purpose and also on a new public retinal vessel reference dataset CHASE_DB1 which is a subset of retinal images of multiethnic children from the Child Heart and Health Study in England (CHASE) dataset. The performance of the ensemble system is evaluated in detail and the incurred accuracy, speed, robustness, and simplicity make the algorithm a suitable tool for automated retinal image analysis. PMID:22736688

  6. A Multi Criteria Group Decision-Making Model for Teacher Evaluation in Higher Education Based on Cloud Model and Decision Tree

    ERIC Educational Resources Information Center

    Chang, Ting-Cheng; Wang, Hui

    2016-01-01

    This paper proposes a cloud multi-criteria group decision-making model for teacher evaluation in higher education which is involving subjectivity, imprecision and fuzziness. First, selecting the appropriate evaluation index depending on the evaluation objectives, indicating a clear structural relationship between the evaluation index and…

  7. The Relation of Student Behavior, Peer Status, Race, and Gender to Decisions about School Discipline Using CHAID Decision Trees and Regression Modeling

    ERIC Educational Resources Information Center

    Horner, Stacy B.; Fireman, Gary D.; Wang, Eugene W.

    2010-01-01

    Peer nominations and demographic information were collected from a diverse sample of 1493 elementary school participants to examine behavior (overt and relational aggression, impulsivity, and prosociality), context (peer status), and demographic characteristics (race and gender) as predictors of teacher and administrator decisions about…

  8. Acceleration of ensemble machine learning methods using many-core devices

    NASA Astrophysics Data System (ADS)

    Tamerus, A.; Washbrook, A.; Wyeth, D.

    2015-12-01

    We present a case study into the acceleration of ensemble machine learning methods using many-core devices in collaboration with Toshiba Medical Visualisation Systems Europe (TMVSE). The adoption of GPUs to execute a key algorithm in the classification of medical image data was shown to significantly reduce overall processing time. Using a representative dataset and pre-trained decision trees as input we will demonstrate how the decision forest classification method can be mapped onto the GPU data processing model. It was found that a GPU-based version of the decision forest method resulted in over 138 times speed-up over a single-threaded CPU implementation with further improvements possible. The same GPU-based software was then directly applied to a suitably formed dataset to benefit supervised learning techniques applied in High Energy Physics (HEP) with similar improvements in performance.

  9. Under which conditions, additional monitoring data are worth gathering for improving decision making? Application of the VOI theory in the Bayesian Event Tree eruption forecasting framework

    NASA Astrophysics Data System (ADS)

    Loschetter, Annick; Rohmer, Jérémy

    2016-04-01

    Standard and new generation of monitoring observations provide in almost real-time important information about the evolution of the volcanic system. These observations are used to update the model and contribute to a better hazard assessment and to support decision making concerning potential evacuation. The framework BET_EF (based on Bayesian Event Tree) developed by INGV enables dealing with the integration of information from monitoring with the prospect of decision making. Using this framework, the objectives of the present work are i. to propose a method to assess the added value of information (within the Value Of Information (VOI) theory) from monitoring; ii. to perform sensitivity analysis on the different parameters that influence the VOI from monitoring. VOI consists in assessing the possible increase in expected value provided by gathering information, for instance through monitoring. Basically, the VOI is the difference between the value with information and the value without additional information in a Cost-Benefit approach. This theory is well suited to deal with situations that can be represented in the form of a decision tree such as the BET_EF tool. Reference values and ranges of variation (for sensitivity analysis) were defined for input parameters, based on data from the MESIMEX exercise (performed at Vesuvio volcano in 2006). Complementary methods for sensitivity analyses were implemented: local, global using Sobol' indices and regional using Contribution to Sample Mean and Variance plots. The results (specific to the case considered) obtained with the different techniques are in good agreement and enable answering the following questions: i. Which characteristics of monitoring are important for early warning (reliability)? ii. How do experts' opinions influence the hazard assessment and thus the decision? Concerning the characteristics of monitoring, the more influent parameters are the means rather than the variances for the case considered

  10. Exploring Ensemble Visualization

    PubMed Central

    Phadke, Madhura N.; Pinto, Lifford; Alabi, Femi; Harter, Jonathan; Taylor, Russell M.; Wu, Xunlei; Petersen, Hannah; Bass, Steffen A.; Healey, Christopher G.

    2012-01-01

    An ensemble is a collection of related datasets. Each dataset, or member, of an ensemble is normally large, multidimensional, and spatio-temporal. Ensembles are used extensively by scientists and mathematicians, for example, by executing a simulation repeatedly with slightly different input parameters and saving the results in an ensemble to see how parameter choices affect the simulation. To draw inferences from an ensemble, scientists need to compare data both within and between ensemble members. We propose two techniques to support ensemble exploration and comparison: a pairwise sequential animation method that visualizes locally neighboring members simultaneously, and a screen door tinting method that visualizes subsets of members using screen space subdivision. We demonstrate the capabilities of both techniques, first using synthetic data, then with simulation data of heavy ion collisions in high-energy physics. Results show that both techniques are capable of supporting meaningful comparisons of ensemble data. PMID:22347540

  11. An Ensemble Learning Based Framework for Traditional Chinese Medicine Data Analysis with ICD-10 Labels

    PubMed Central

    Zhang, Gang; Huang, Yonghui; Zhong, Ling; Ou, Shanxing; Zhang, Yi; Li, Ziping

    2015-01-01

    Objective. This study aims to establish a model to analyze clinical experience of TCM veteran doctors. We propose an ensemble learning based framework to analyze clinical records with ICD-10 labels information for effective diagnosis and acupoints recommendation. Methods. We propose an ensemble learning framework for the analysis task. A set of base learners composed of decision tree (DT) and support vector machine (SVM) are trained by bootstrapping the training dataset. The base learners are sorted by accuracy and diversity through nondominated sort (NDS) algorithm and combined through a deep ensemble learning strategy. Results. We evaluate the proposed method with comparison to two currently successful methods on a clinical diagnosis dataset with manually labeled ICD-10 information. ICD-10 label annotation and acupoints recommendation are evaluated for three methods. The proposed method achieves an accuracy rate of 88.2%  ±  2.8% measured by zero-one loss for the first evaluation session and 79.6%  ±  3.6% measured by Hamming loss, which are superior to the other two methods. Conclusion. The proposed ensemble model can effectively model the implied knowledge and experience in historic clinical data records. The computational cost of training a set of base learners is relatively low. PMID:26504897

  12. World Music Ensemble: Kulintang

    ERIC Educational Resources Information Center

    Beegle, Amy C.

    2012-01-01

    As instrumental world music ensembles such as steel pan, mariachi, gamelan and West African drums are becoming more the norm than the exception in North American school music programs, there are other world music ensembles just starting to gain popularity in particular parts of the United States. The kulintang ensemble, a drum and gong ensemble…

  13. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

    PubMed

    Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul

    2014-01-01

    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time. PMID:24794073

  14. Identification of Some Zeolite Group Minerals by Application of Artificial Neural Network and Decision Tree Algorithm Based on SEM-EDS Data

    NASA Astrophysics Data System (ADS)

    Akkaş, Efe; Evren Çubukçu, H.; Akin, Lutfiye; Erkut, Volkan; Yurdakul, Yasin; Karayigit, Ali Ihsan

    2016-04-01

    Identification of zeolite group minerals is complicated due to their similar chemical formulas and habits. Although the morphologies of various zeolite crystals can be recognized under Scanning Electron Microscope (SEM), it is relatively more challenging and problematic process to identify zeolites using their mineral chemical data. SEMs integrated with energy dispersive X-ray spectrometers (EDS) provide fast and reliable chemical data of minerals. However, considering elemental similarities of characteristic chemical formulae of zeolite species (e.g. Clinoptilolite ((Na,K,Ca)2 ‑3Al3(Al,Si)2Si13O3612H2O) and Erionite ((Na2,K2,Ca)2Al4Si14O36ṡ15H2O)) EDS data alone does not seem to be sufficient for correct identification. Furthermore, the physical properties of the specimen (e.g. roughness, electrical conductivity) and the applied analytical conditions (e.g. accelerating voltage, beam current, spot size) of the SEM-EDS should be uniform in order to obtain reliable elemental results of minerals having high alkali (Na, K) and H2O (approx. %14-18) contents. This study which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK Project No: 113Y439), aims to construct a database as large as possible for various zeolite minerals and to develop a general prediction model for the identification of zeolite minerals using SEM-EDS data. For this purpose, an artificial neural network and rule based decision tree algorithm were employed. Throughout the analyses, a total of 1850 chemical data were collected from four distinct zeolite species, (Clinoptilolite-Heulandite, Erionite, Analcime and Mordenite) observed in various rocks (e.g. coals, pyroclastics). In order to obtain a representative training data set for each minerals, a selection procedure for reference mineral analyses was applied. During the selection procedure, SEM based crystal morphology data, XRD spectra and re-calculated cationic distribution, obtained by EDS have been used for

  15. Robust Machine Learning Applied to Astronomical Data Sets. I. Star-Galaxy Classification of the Sloan Digital Sky Survey DR3 Using Decision Trees

    NASA Astrophysics Data System (ADS)

    Ball, Nicholas M.; Brunner, Robert J.; Myers, Adam D.; Tcheng, David

    2006-10-01

    We provide classifications for all 143 million nonrepeat photometric objects in the Third Data Release of the SDSS using decision trees trained on 477,068 objects with SDSS spectroscopic data. We demonstrate that these star/galaxy classifications are expected to be reliable for approximately 22 million objects with r<~20. The general machine learning environment Data-to-Knowledge and supercomputing resources enabled extensive investigation of the decision tree parameter space. This work presents the first public release of objects classified in this way for an entire SDSS data release. The objects are classified as either galaxy, star, or nsng (neither star nor galaxy), with an associated probability for each class. To demonstrate how to effectively make use of these classifications, we perform several important tests. First, we detail selection criteria within the probability space defined by the three classes to extract samples of stars and galaxies to a given completeness and efficiency. Second, we investigate the efficacy of the classifications and the effect of extrapolating from the spectroscopic regime by performing blind tests on objects in the SDSS, 2dFGRS, and 2QZ surveys. Given the photometric limits of our spectroscopic training data, we effectively begin to extrapolate past our star-galaxy training set at r~18. By comparing the number counts of our training sample with the classified sources, however, we find that our efficiencies appear to remain robust to r~20. As a result, we expect our classifications to be accurate for 900,000 galaxies and 6.7 million stars and remain robust via extrapolation for a total of 8.0 million galaxies and 13.9 million stars.

  16. An efficient algorithm for finding optimal gain-ratio multiple-split tests on hierarchical attributes in decision tree learning

    SciTech Connect

    Almuallim, H.; Akiba, Yasuhiro; Kaneda, Shigeo

    1996-12-31

    Given a set of training examples S and a tree-structured attribute x, the goal in this work is to find a multiple-split test defined on x that maximizes Quinlan`s gain-ratio measure. The number of possible such multiple-split tests grows exponentially in the size of the hierarchy associated with the attribute. It is, therefore, impractical to enumerate and evaluate all these tests in order to choose the best one. We introduce an efficient algorithm for solving this problem that guarantees maximizing the gain-ratio over all possible tests. For a training set of m examples and an attribute hierarchy of height d, our algorithm runs in time proportional to dm, which makes it efficient enough for practical use.

  17. Triticeae Resources in Ensembl Plants

    PubMed Central

    Bolser, Dan M.; Kerhornou, Arnaud; Walts, Brandon; Kersey, Paul

    2015-01-01

    Recent developments in DNA sequencing have enabled the large and complex genomes of many crop species to be determined for the first time, even those previously intractable due to their polyploid nature. Indeed, over the course of the last 2 years, the genome sequences of several commercially important cereals, notably barley and bread wheat, have become available, as well as those of related wild species. While still incomplete, comparison with other, more completely assembled species suggests that coverage of genic regions is likely to be high. Ensembl Plants (http://plants.ensembl.org) is an integrative resource organizing, analyzing and visualizing genome-scale information for important crop and model plants. Available data include reference genome sequence, variant loci, gene models and functional annotation. For variant loci, individual and population genotypes, linkage information and, where available, phenotypic information are shown. Comparative analyses are performed on DNA and protein sequence alignments. The resulting genome alignments and gene trees, representing the implied evolutionary history of the gene family, are made available for visualization and analysis. Driven by the case of bread wheat, specific extensions to the analysis pipelines and web interface have recently been developed to support polyploid genomes. Data in Ensembl Plants is accessible through a genome browser incorporating various specialist interfaces for different data types, and through a variety of additional methods for programmatic access and data mining. These interfaces are consistent with those offered through the Ensembl interface for the genomes of non-plant species, including those of plant pathogens, pests and pollinators, facilitating the study of the plant in its environment. PMID:25432969

  18. Triticeae resources in Ensembl Plants.

    PubMed

    Bolser, Dan M; Kerhornou, Arnaud; Walts, Brandon; Kersey, Paul

    2015-01-01

    Recent developments in DNA sequencing have enabled the large and complex genomes of many crop species to be determined for the first time, even those previously intractable due to their polyploid nature. Indeed, over the course of the last 2 years, the genome sequences of several commercially important cereals, notably barley and bread wheat, have become available, as well as those of related wild species. While still incomplete, comparison with other, more completely assembled species suggests that coverage of genic regions is likely to be high. Ensembl Plants (http://plants.ensembl.org) is an integrative resource organizing, analyzing and visualizing genome-scale information for important crop and model plants. Available data include reference genome sequence, variant loci, gene models and functional annotation. For variant loci, individual and population genotypes, linkage information and, where available, phenotypic information are shown. Comparative analyses are performed on DNA and protein sequence alignments. The resulting genome alignments and gene trees, representing the implied evolutionary history of the gene family, are made available for visualization and analysis. Driven by the case of bread wheat, specific extensions to the analysis pipelines and web interface have recently been developed to support polyploid genomes. Data in Ensembl Plants is accessible through a genome browser incorporating various specialist interfaces for different data types, and through a variety of additional methods for programmatic access and data mining. These interfaces are consistent with those offered through the Ensembl interface for the genomes of non-plant species, including those of plant pathogens, pests and pollinators, facilitating the study of the plant in its environment. PMID:25432969

  19. A decision-tree model to detect post-calving diseases based on rumination, activity, milk yield, BW and voluntary visits to the milking robot.

    PubMed

    Steensels, M; Antler, A; Bahr, C; Berckmans, D; Maltz, E; Halachmi, I

    2016-09-01

    Early detection of post-calving health problems is critical for dairy operations. Separating sick cows from the herd is important, especially in robotic-milking dairy farms, where searching for a sick cow can disturb the other cows' routine. The objectives of this study were to develop and apply a behaviour- and performance-based health-detection model to post-calving cows in a robotic-milking dairy farm, with the aim of detecting sick cows based on available commercial sensors. The study was conducted in an Israeli robotic-milking dairy farm with 250 Israeli-Holstein cows. All cows were equipped with rumination- and neck-activity sensors. Milk yield, visits to the milking robot and BW were recorded in the milking robot. A decision-tree model was developed on a calibration data set (historical data of the 10 months before the study) and was validated on the new data set. The decision model generated a probability of being sick for each cow. The model was applied once a week just before the veterinarian performed the weekly routine post-calving health check. The veterinarian's diagnosis served as a binary reference for the model (healthy-sick). The overall accuracy of the model was 78%, with a specificity of 87% and a sensitivity of 69%, suggesting its practical value. PMID:27221983

  20. The Hydrologic Ensemble Prediction Experiment (HEPEX)

    NASA Astrophysics Data System (ADS)

    Wood, A. W.; Thielen, J.; Pappenberger, F.; Schaake, J. C.; Hartman, R. K.

    2012-12-01

    The Hydrologic Ensemble Prediction Experiment was established in March, 2004, at a workshop hosted by the European Center for Medium Range Weather Forecasting (ECMWF). With support from the US National Weather Service (NWS) and the European Commission (EC), the HEPEX goal was to bring the international hydrological and meteorological communities together to advance the understanding and adoption of hydrological ensemble forecasts for decision support in emergency management and water resources sectors. The strategy to meet this goal includes meetings that connect the user, forecast producer and research communities to exchange ideas, data and methods; the coordination of experiments to address specific challenges; and the formation of testbeds to facilitate shared experimentation. HEPEX has organized about a dozen international workshops, as well as sessions at scientific meetings (including AMS, AGU and EGU) and special issues of scientific journals where workshop results have been published. Today, the HEPEX mission is to demonstrate the added value of hydrological ensemble prediction systems (HEPS) for emergency management and water resources sectors to make decisions that have important consequences for economy, public health, safety, and the environment. HEPEX is now organised around six major themes that represent core elements of a hydrologic ensemble prediction enterprise: input and pre-processing, ensemble techniques, data assimilation, post-processing, verification, and communication and use in decision making. This poster presents an overview of recent and planned HEPEX activities, highlighting case studies that exemplify the focus and objectives of HEPEX.

  1. Using Decision Tree Analysis to Understand Foundation Science Student Performance. Insight Gained at One South African University

    NASA Astrophysics Data System (ADS)

    Kirby, Nicola Frances; Dempster, Edith Roslyn

    2014-11-01

    The Foundation Programme of the Centre for Science Access at the University of KwaZulu-Natal, South Africa provides access to tertiary science studies to educationally disadvantaged students who do not meet formal faculty entrance requirements. The low number of students proceeding from the programme into mainstream is of concern, particularly given the national imperative to increase participation and levels of performance in tertiary-level science. An attempt was made to understand foundation student performance in a campus of this university, with the view to identifying challenges and opportunities for remediation in the curriculum and processes of selection into the programme. A classification and regression tree analysis was used to identify which variables best described student performance. The explanatory variables included biographical and school-history data, performance in selection tests, and socio-economic data pertaining to their year in the programme. The results illustrate the prognostic reliability of the model used to select students, raise concerns about the inefficiency of school performance indicators as a measure of students' academic potential in the Foundation Programme, and highlight the importance of accommodation arrangements and financial support for student success in their access year.

  2. The Ensemble Canon

    NASA Technical Reports Server (NTRS)

    MIittman, David S

    2011-01-01

    Ensemble is an open architecture for the development, integration, and deployment of mission operations software. Fundamentally, it is an adaptation of the Eclipse Rich Client Platform (RCP), a widespread, stable, and supported framework for component-based application development. By capitalizing on the maturity and availability of the Eclipse RCP, Ensemble offers a low-risk, politically neutral path towards a tighter integration of operations tools. The Ensemble project is a highly successful, ongoing collaboration among NASA Centers. Since 2004, the Ensemble project has supported the development of mission operations software for NASA's Exploration Systems, Science, and Space Operations Directorates.

  3. Trees Are Terrific!

    ERIC Educational Resources Information Center

    Braus, Judy, Ed.

    1992-01-01

    Ranger Rick's NatureScope is a creative education series dedicated to inspiring in children an understanding and appreciation of the natural world while developing the skills they will need to make responsible decisions about the environment. Contents are organized into the following sections: (1) "What Makes a Tree a Tree?," including information…

  4. Structural Equation Model Trees

    ERIC Educational Resources Information Center

    Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman

    2013-01-01

    In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree…

  5. Land cover and forest formation distributions for St. Kitts, Nevis, St. Eustatius, Grenada and Barbados from decision tree classification of cloud-cleared satellite imagery

    USGS Publications Warehouse

    Helmer, E.H.; Kennaway, T.A.; Pedreros, D.H.; Clark, M.L.; Marcano-Vega, H.; Tieszen, L.L.; Ruzycki, T.R.; Schill, S.R.; Carrington, C.M.S.

    2008-01-01

    Satellite image-based mapping of tropical forests is vital to conservation planning. Standard methods for automated image classification, however, limit classification detail in complex tropical landscapes. In this study, we test an approach to Landsat image interpretation on four islands of the Lesser Antilles, including Grenada and St. Kitts, Nevis and St. Eustatius, testing a more detailed classification than earlier work in the latter three islands. Secondly, we estimate the extents of land cover and protected forest by formation for five islands and ask how land cover has changed over the second half of the 20th century. The image interpretation approach combines image mosaics and ancillary geographic data, classifying the resulting set of raster data with decision tree software. Cloud-free image mosaics for one or two seasons were created by applying regression tree normalization to scene dates that could fill cloudy areas in a base scene. Such mosaics are also known as cloud-filled, cloud-minimized or cloud-cleared imagery, mosaics, or composites. The approach accurately distinguished several classes that more standard methods would confuse; the seamless mosaics aided reference data collection; and the multiseason imagery allowed us to separate drought deciduous forests and woodlands from semi-deciduous ones. Cultivated land areas declined 60 to 100 percent from about 1945 to 2000 on several islands. Meanwhile, forest cover has increased 50 to 950%. This trend will likely continue where sugar cane cultivation has dominated. Like the island of Puerto Rico, most higher-elevation forest formations are protected in formal or informal reserves. Also similarly, lowland forests, which are drier forest types on these islands, are not well represented in reserves. Former cultivated lands in lowland areas could provide lands for new reserves of drier forest types. The land-use history of these islands may provide insight for planners in countries currently considering

  6. Hydrological Ensemble Prediction System (HEPS)

    NASA Astrophysics Data System (ADS)

    Thielen-Del Pozo, J.; Schaake, J.; Martin, E.; Pailleux, J.; Pappenberger, F.

    2010-09-01

    Flood forecasting systems form a key part of ‘preparedness' strategies for disastrous floods and provide hydrological services, civil protection authorities and the public with information of upcoming events. Provided the warning leadtime is sufficiently long, adequate preparatory actions can be taken to efficiently reduce the impacts of the flooding. Following on the success of the use of ensembles for weather forecasting, the hydrological community now moves increasingly towards Hydrological Ensemble Prediction Systems (HEPS) for improved flood forecasting using operationally available NWP products as inputs. However, these products are often generated on relatively coarse scales compared to hydrologically relevant basin units and suffer systematic biases that may have considerable impact when passed through the non-linear hydrological filters. Therefore, a better understanding on how best to produce, communicate and use hydrologic ensemble forecasts in hydrological short-, medium- und long term prediction of hydrological processes is necessary. The "Hydrologic Ensemble Prediction Experiment" (HEPEX), is an international initiative consisting of hydrologists, meteorologist and end-users to advance probabilistic hydrologic forecast techniques for flood, drought and water management applications. Different aspects of the hydrological ensemble processor are being addressed including • Production of useful meteorological products relevant for hydrological applications, ranging from nowcasting products to seasonal forecasts. The importance of hindcasts that are consistent with the operational weather forecasts will be discussed to support bias correction and downscaling, statistically meaningful verification of HEPS, and the development and testing of operating rules; • Need for downscaling and post-processing of weather ensembles to reduce bias before entering hydrological applications; • Hydrological model and parameter uncertainty and how to correct and

  7. Development of a Decision Support Tree Approach for Mapping Urban Vegetation Cover From Hyperspectral Imagery and GIS: the case of Athens, Greece

    NASA Astrophysics Data System (ADS)

    Georgopoulou, Iro; Petropoulos, George P.; Kalivas, Dionissios P.

    2013-04-01

    Urban vegetation represents one of the main factors directly influencing human life. Consequently, extracting information on its spatial distribution is of crucial importance to ensure, between other, sustainable urban planning and successful environmental management. To this end, remote sensing & Geographical Information Systems (GIS) technology has demonstrated a very promising, viable solution. In comparison to multispectral systems, use of hyperspectral imagery in particular, enhances dramatically our ability to accurately identify different targets on the Earth's surface. In our study, a decision tree-based classification method is presented for mapping urban vegetation cover from hyperspectral imagery. The ability of the proposed method is demonstrated using as a case study the city of Athens, Greece, for which satellite hyperspectral imagery from Hyperion sensor has been acquired. Hyperion collects spectral data in 242 spectral bands from visible to middle-infrared regions of electromagnetic spectrum and at a spatial resolution of 30 meters. Validation of our proposed method is carried out on a GIS environment based on the error matrix statistics, using as reference very high resolution imagery acquired nearly concurrently to Hyperion at our study region, supported by field visits conducted in the studied area. Additionally, the urban vegetation cover maps derived from our proposed here technique are compared versus analogous results obtained against other classification methods traditionally used in mapping urban vegetation cover. Our results confirmed the ability of our approach combined with Hyperion imagery to extract urban vegetation cover for the case of a densely-populated city with complex urban features, such as Athens. Our findings can potentially offer significant information at local scale as regards the presence of open green spaces in urban environment, since such information is vital for the successful infrastructure development, urban

  8. Streamflow Ensemble Generation using Climate Forecasts

    NASA Astrophysics Data System (ADS)

    Watkins, D. W.; O'Connell, S.; Wei, W.; Nykanen, D.; Mahmoud, M.

    2002-12-01

    Although significant progress has been made in understanding the correlation between large-scale atmospheric circulation patterns and regional streamflow anomalies, there is a general perception that seasonal climate forecasts are not being used to the fullest extent possible for optimal water resources management. Possible contributing factors are limited knowledge and understanding of climate processes and prediction capabilities, noise in climate signals and inaccuracies in forecasts, and hesitancy on the part of water managers to apply new information or methods that could expose them to greater liability. This work involves a decision support model based on streamflow ensembles developed for the Lower Colorado River Authority in Central Texas. Predicative skill is added to ensemble forecasts that are based on climatology by conditioning the ensembles on observable climate indicators, including streamflow (persistence), soil moisture, land surface temperatures, and large-scale recurrent patterns such as the El Ni¤o-Southern Oscillation, Pacific Decadal Oscillation, and the North Atlantic Oscillation. A Bayesian procedure for updating ensemble probabilities is outlined, and various skill scores are reviewed for evaluating forecast performance. Verification of the ensemble forecasts using a resampling procedure indicates a small but potentially significant improvement in forecast skill that could be exploited in seasonal water management decisions. The ultimate goal of this work will be explicit incorporation of climate forecasts in reservoir operating rules and estimation of the value of the forecasts.

  9. Ensembl regulation resources.

    PubMed

    Zerbino, Daniel R; Johnson, Nathan; Juetteman, Thomas; Sheppard, Dan; Wilder, Steven P; Lavidas, Ilias; Nuhn, Michael; Perry, Emily; Raffaillac-Desfosses, Quentin; Sobral, Daniel; Keefe, Damian; Gräf, Stefan; Ahmed, Ikhlak; Kinsella, Rhoda; Pritchard, Bethan; Brent, Simon; Amode, Ridwan; Parker, Anne; Trevanion, Steven; Birney, Ewan; Dunham, Ian; Flicek, Paul

    2016-01-01

    New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl's regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org. Database URL: http://www.ensembl.org. PMID:26888907

  10. Ensemble habitat mapping of invasive plant species

    USGS Publications Warehouse

    Stohlgren, T.J.; Ma, P.; Kumar, S.; Rocca, M.; Morisette, J.T.; Jarnevich, C.S.; Benson, N.

    2010-01-01

    Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.

  11. Ensemble habitat mapping of invasive plant species.

    PubMed

    Stohlgren, Thomas J; Ma, Peter; Kumar, Sunil; Rocca, Monique; Morisette, Jeffrey T; Jarnevich, Catherine S; Benson, Nate

    2010-02-01

    Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. PMID:20136746

  12. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment

    PubMed Central

    Chiang, Kai-Wei; Liao, Jhen-Kai; Tsai, Guang-Je; Chang, Hsiu-Wen

    2015-01-01

    Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS) in some indoor environments. Pedestrian Dead Reckoning (PDR) is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT) aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS). Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions. PMID:26729114

  13. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment.

    PubMed

    Chiang, Kai-Wei; Liao, Jhen-Kai; Tsai, Guang-Je; Chang, Hsiu-Wen

    2015-01-01

    Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS) in some indoor environments. Pedestrian Dead Reckoning (PDR) is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT) aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS). Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions. PMID:26729114

  14. The use of a decision tree based on the rabies diagnosis scenario, to assist the implementation of alternatives to laboratory animals.

    PubMed

    Bones, Vanessa C; Molento, Carla Forte Maiolino

    2016-05-01

    Brazilian federal legislation makes the use of alternatives mandatory, when there are validated methods to replace the use of laboratory animals. The objective of this paper is to introduce a novel decision tree (DT)-based approach, which can be used to assist the replacement of laboratory animal procedures in Brazil. This project is based on a previous analysis of the rabies diagnosis scenario, in which we identified certain barriers that hinder replacement, such as: a) the perceived higher costs of alternative methods; b) the availability of staff qualified in these methods; c) resistance to change by laboratory staff; d) regulatory obstacles, including incompatibilities between the Federal Environmental Crimes Act and specific norms and working practices relating to the use of laboratory animals; and e) the lack of government incentives. The DT represents a highly promising means to overcome these reported barriers to the replacement of laboratory animal use in Brazil. It provides guidance to address the main obstacles, and, followed step-by-step, would lead to the implementation of validated alternative methods (VAMs), or their development when such alternatives do not exist. The DT appears suitable for application to laboratory animal use scenarios where alternative methods already exist, such as in the case of rabies diagnosis, and could contribute to increase compliance with the Three Rs principles in science and with the current legal requirements in Brazil. PMID:27256454

  15. Assessing the safety of cosmetic chemicals: Consideration of a flux decision tree to predict dermally delivered systemic dose for comparison with oral TTC (Threshold of Toxicological Concern).

    PubMed

    Williams, Faith M; Rothe, Helga; Barrett, Gordon; Chiodini, Alessandro; Whyte, Jacqueline; Cronin, Mark T D; Monteiro-Riviere, Nancy A; Plautz, James; Roper, Clive; Westerhout, Joost; Yang, Chihae; Guy, Richard H

    2016-04-01

    Threshold of Toxicological Concern (TTC) aids assessment of human health risks from exposure to low levels of chemicals when toxicity data are limited. The objective here was to explore the potential refinement of exposure for applying the oral TTC to chemicals found in cosmetic products, for which there are limited dermal absorption data. A decision tree was constructed to estimate the dermally absorbed amount of chemical, based on typical skin exposure scenarios. Dermal absorption was calculated using an established predictive algorithm to derive the maximum skin flux adjusted to the actual 'dose' applied. The predicted systemic availability (assuming no local metabolism), can then be ranked against the oral TTC for the relevant structural class. The predictive approach has been evaluated by deriving the experimental/prediction ratio for systemic availability for 22 cosmetic chemical exposure scenarios. These emphasise that estimation of skin penetration may be challenging for penetration enhancing formulations, short application times with incomplete rinse-off, or significant metabolism. While there were a few exceptions, the experiment-to-prediction ratios mostly fell within a factor of 10 of the ideal value of 1. It can be concluded therefore, that the approach is fit-for-purpose when used as a screening and prioritisation tool. PMID:26825378

  16. Ensemble learning prediction of protein-protein interactions using proteins functional annotations.

    PubMed

    Saha, Indrajit; Zubek, Julian; Klingström, Tomas; Forsberg, Simon; Wikander, Johan; Kierczak, Marcin; Maulik, Ujjwal; Plewczynski, Dariusz

    2014-04-01

    Protein-protein interactions are important for the majority of biological processes. A significant number of computational methods have been developed to predict protein-protein interactions using protein sequence, structural and genomic data. Vast experimental data is publicly available on the Internet, but it is scattered across numerous databases. This fact motivated us to create and evaluate new high-throughput datasets of interacting proteins. We extracted interaction data from DIP, MINT, BioGRID and IntAct databases. Then we constructed descriptive features for machine learning purposes based on data from Gene Ontology and DOMINE. Thereafter, four well-established machine learning methods: Support Vector Machine, Random Forest, Decision Tree and Naïve Bayes, were used on these datasets to build an Ensemble Learning method based on majority voting. In cross-validation experiment, sensitivity exceeded 80% and classification/prediction accuracy reached 90% for the Ensemble Learning method. We extended the experiment to a bigger and more realistic dataset maintaining sensitivity over 70%. These results confirmed that our datasets are suitable for performing PPI prediction and Ensemble Learning method is well suited for this task. Both the processed PPI datasets and the software are available at . PMID:24469380

  17. Ensembl regulation resources

    PubMed Central

    Zerbino, Daniel R.; Johnson, Nathan; Juetteman, Thomas; Sheppard, Dan; Wilder, Steven P.; Lavidas, Ilias; Nuhn, Michael; Perry, Emily; Raffaillac-Desfosses, Quentin; Sobral, Daniel; Keefe, Damian; Gräf, Stefan; Ahmed, Ikhlak; Kinsella, Rhoda; Pritchard, Bethan; Brent, Simon; Amode, Ridwan; Parker, Anne; Trevanion, Steven; Birney, Ewan; Dunham, Ian; Flicek, Paul

    2016-01-01

    New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl’s regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org. Database URL: http://www.ensembl.org PMID:26888907

  18. The ensembl regulatory build.

    PubMed

    Zerbino, Daniel R; Wilder, Steven P; Johnson, Nathan; Juettemann, Thomas; Flicek, Paul R

    2015-01-01

    Most genomic variants associated with phenotypic traits or disease do not fall within gene coding regions, but in regulatory regions, rendering their interpretation difficult. We collected public data on epigenetic marks and transcription factor binding in human cell types and used it to construct an intuitive summary of regulatory regions in the human genome. We verified it against independent assays for sensitivity. The Ensembl Regulatory Build will be progressively enriched when more data is made available. It is freely available on the Ensembl browser, from the Ensembl Regulation MySQL database server and in a dedicated track hub. PMID:25887522

  19. Ensemble Data Mining Methods

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.

    2004-01-01

    Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, Le., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.

  20. Hydrologic Ensemble Prediction: Challenges and Opportunities

    NASA Astrophysics Data System (ADS)

    Schaake, J.; Bradley, A.

    2005-12-01

    Ensemble forecast techniques are beginning to be used for hydrological prediction by operational hydrological services throughout the world. These techniques are attractive because they allow effects of a wide range of sources of uncertainty on hydrological forecasts to be accounted for. Not only does ensemble prediction in hydrology offer a general approach to probabilistic prediction, it offers a significant new approach to improve hydrological forecast accuracy as well. But, there are many scientific challenges that must be overcome to provide users with high quality hydrologic ensemble forecasts. A new international project the Hydrologic Ensemble Prediction Experiment (HEPEX) was started last year to organize the scientific community to meet these challenges. Its main objective is to bring the international hydrological community together with the meteorological community to demonstrate how to produce reliable hydrological ensemble for decisions for the benefit of public health and safety, the economy and the environment. Topics that will be addressed by the HEPEX scientific community include techniques for using weather and climate information in hydrologic prediction systems, new methods in hydrologic prediction, data assimilation issues in hydrology and hydrometeorology, verification and correction of ensemble weather and hydrologic forecasts, and better quantification of uncertainty in hydrological prediction. As pathway for addressing these topics, HEPEX will set up demonstration test bed projects and compile data sets for the intercomparison of coupled systems for atmospheric and hydrologic forecasting, and their assessment for meeting end users' needs for decision-making. Test bed projects have been proposed in North and South America, Europe, and Asia, and have a focus ranging from short-range flood forecasting to seasonal predictions for water supply. For example, within the United States, ongoing activities in seasonal prediction as part of the GEWEX

  1. Decision tree analysis as a supplementary tool to enhance histomorphological differentiation when distinguishing human from non-human cranial bone in both burnt and unburnt states: A feasibility study.

    PubMed

    Simmons, T; Goodburn, B; Singhrao, S K

    2016-01-01

    This feasibility study was undertaken to describe and record the histological characteristics of burnt and unburnt cranial bone fragments from human and non-human bones. Reference series of fully mineralized, transverse sections of cranial bone, from all variables and specimen states, were prepared by manual cutting and semi-automated grinding and polishing methods. A photomicrograph catalogue reflecting differences in burnt and unburnt bone from human and non-humans was recorded and qualitative analysis was performed using an established classification system based on primary bone characteristics. The histomorphology associated with human and non-human samples was, for the main part, preserved following burning at high temperature. Clearly, fibro-lamellar complex tissue subtypes, such as plexiform or laminar primary bone, were only present in non-human bones. A decision tree analysis based on histological features provided a definitive identification key for distinguishing human from non-human bone, with an accuracy of 100%. The decision tree for samples where burning was unknown was 96% accurate, and multi-step classification to taxon was possible with 100% accuracy. The results of this feasibility study strongly suggest that histology remains a viable alternative technique if fragments of cranial bone require forensic examination in both burnt and unburnt states. The decision tree analysis may provide an additional but vital tool to enhance data interpretation. Further studies are needed to assess variation in histomorphology taking into account other cranial bones, ontogeny, species and burning conditions. PMID:26130749

  2. Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees

    PubMed Central

    2013-01-01

    Background Microarray technology can acquire information about thousands of genes simultaneously. We analyzed published breast cancer microarray databases to predict five-year recurrence and compared the performance of three data mining algorithms of artificial neural networks (ANN), decision trees (DT) and logistic regression (LR) and two composite models of DT-ANN and DT-LR. The collection of microarray datasets from the Gene Expression Omnibus, four breast cancer datasets were pooled for predicting five-year breast cancer relapse. After data compilation, 757 subjects, 5 clinical variables and 13,452 genetic variables were aggregated. The bootstrap method, Mann–Whitney U test and 20-fold cross-validation were performed to investigate candidate genes with 100 most-significant p-values. The predictive powers of DT, LR and ANN models were assessed using accuracy and the area under ROC curve. The associated genes were evaluated using Cox regression. Results The DT models exhibited the lowest predictive power and the poorest extrapolation when applied to the test samples. The ANN models displayed the best predictive power and showed the best extrapolation. The 21 most-associated genes, as determined by integration of each model, were analyzed using Cox regression with a 3.53-fold (95% CI: 2.24-5.58) increased risk of breast cancer five-year recurrence… Conclusions The 21 selected genes can predict breast cancer recurrence. Among these genes, CCNB1, PLK1 and TOP2A are in the cell cycle G2/M DNA damage checkpoint pathway. Oncologists can offer the genetic information for patients when understanding the gene expression profiles on breast cancer recurrence. PMID:23506640

  3. The use of the decision tree technique and image cytometry to characterize aggressiveness in World Health Organization (WHO) grade II superficial transitional cell carcinomas of the bladder.

    PubMed

    Decaestecker, C; van Velthoven, R; Petein, M; Janssen, T; Salmon, I; Pasteels, J L; van Ham, P; Schulman, C; Kiss, R

    1996-03-01

    The aggressiveness of human bladder tumours can be assessed by means of various classification systems, including the one proposed by the World Health Organization (WHO). According to the WHO classification, three levels of malignancy are identified as grades I (low), II (intermediate), and III (high). This classification system operates satisfactorily for two of the three grades in forecasting clinical progression, most grade I tumours being associated with good prognoses and most grade III with bad. In contrast, the grade II group is very heterogeneous in terms of their clinical behaviour. The present study used two computer-assisted methods to investigate whether it is possible to sub-classify grade II tumours: computer-assisted microscope analysis (image cytometry) of Feulgen-stained nuclei and the Decision Tree Technique. This latter technique belongs to the Supervised Learning Algorithm and enables an objective assessment to be made of the diagnostic value associated with a given parameter. The combined use of these two methods in a series of 292 superficial transitional cell carcinomas shows that it is possible to identify one subgroup of grade II tumours which behave clinically like grade I tumours and a second subgroup which behaves clinically like grade III tumours. Of the nine ploidy-related parameters computed by means of image cytometry [the DNA index (DI), DNA histogram type (DHT), and the percentages of diploid, hyperdiploid, triploid, hypertriploid, tetraploid, hypertetraploid, and polyploid cell nuclei], it was the percentage of hyperdiploid and hypertetraploid cell nuclei which enabled identification, rather than conventional parameters such as the DI or the DHT. PMID:8778332

  4. Protein Target Quantification Decision Tree

    PubMed Central

    Kim, Jong Won; You, Jinsam

    2013-01-01

    The utility of mass spectrometry-(MS-) based proteomic platforms and their clinical applications have become an emerging field in proteomics in recent years. Owing to its selectivity and sensitivity, MS has become a key technological platform in proteomic research. Using this platform, a large number of potential biomarker candidates for specific diseases have been reported. However, due to lack of validation, none has been approved for use in clinical settings by the Food and Drug Administration (FDA). Successful candidate verification and validation will facilitate the development of potential biomarkers, leading to better strategies for disease diagnostics, prognostics, and treatment. With the recent new developments in mass spectrometers, high sensitivity, high resolution, and high mass accuracy can be achieved. This greatly enhances the capabilities of protein biomarker validation. In this paper, we describe and discuss recent developments and applications of targeted proteomics methods for biomarker validation. PMID:23401774

  5. The Hydrologic Ensemble Prediction Experiment (HEPEX)

    NASA Astrophysics Data System (ADS)

    Wood, Andy; Wetterhall, Fredrik; Ramos, Maria-Helena

    2015-04-01

    The Hydrologic Ensemble Prediction Experiment was established in March, 2004, at a workshop hosted by the European Center for Medium Range Weather Forecasting (ECMWF), and co-sponsored by the US National Weather Service (NWS) and the European Commission (EC). The HEPEX goal was to bring the international hydrological and meteorological communities together to advance the understanding and adoption of hydrological ensemble forecasts for decision support. HEPEX pursues this goal through research efforts and practical implementations involving six core elements of a hydrologic ensemble prediction enterprise: input and pre-processing, ensemble techniques, data assimilation, post-processing, verification, and communication and use in decision making. HEPEX has grown through meetings that connect the user, forecast producer and research communities to exchange ideas, data and methods; the coordination of experiments to address specific challenges; and the formation of testbeds to facilitate shared experimentation. In the last decade, HEPEX has organized over a dozen international workshops, as well as sessions at scientific meetings (including AMS, AGU and EGU) and special issues of scientific journals where workshop results have been published. Through these interactions and an active online blog (www.hepex.org), HEPEX has built a strong and active community of nearly 400 researchers & practitioners around the world. This poster presents an overview of recent and planned HEPEX activities, highlighting case studies that exemplify the focus and objectives of HEPEX.

  6. Ensemble Statistical Post-Processing of the National Air Quality Forecast Capability: Enhancing Ozone Forecasts in Baltimore, Maryland

    NASA Technical Reports Server (NTRS)

    Garner, Gregory G.; Thompson, Anne M.

    2013-01-01

    An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone. The user is provided the freedom to tailor the forecast to the decision at hand by using decision-specific probability thresholds that define a forecast for an ozone exceedance. Taking advantage of the ESP, the user not only receives an increase in value over the NAQFC, but also receives value for An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone

  7. Online breakage detection of multitooth tools using classifier ensembles for imbalanced data

    NASA Astrophysics Data System (ADS)

    Bustillo, Andrés; Rodríguez, Juan J.

    2014-12-01

    Cutting tool breakage detection is an important task, due to its economic impact on mass production lines in the automobile industry. This task presents a central limitation: real data-sets are extremely imbalanced because breakage occurs in very few cases compared with normal operation of the cutting process. In this paper, we present an analysis of different data-mining techniques applied to the detection of insert breakage in multitooth tools. The analysis applies only one experimental variable: the electrical power consumption of the tool drive. This restriction profiles real industrial conditions more accurately than other physical variables, such as acoustic or vibration signals, which are not so easily measured. Many efforts have been made to design a method that is able to identify breakages with a high degree of reliability within a short period of time. The solution is based on classifier ensembles for imbalanced data-sets. Classifier ensembles are combinations of classifiers, which in many situations are more accurate than individual classifiers. Six different base classifiers are tested: Decision Trees, Rules, Naïve Bayes, Nearest Neighbour, Multilayer Perceptrons and Logistic Regression. Three different balancing strategies are tested with each of the classifier ensembles and compared to their performance with the original data-set: Synthetic Minority Over-Sampling Technique (SMOTE), undersampling and a combination of SMOTE and undersampling. To identify the most suitable data-mining solution, Receiver Operating Characteristics (ROC) graph and Recall-precision graph are generated and discussed. The performance of logistic regression ensembles on the balanced data-set using the combination of SMOTE and undersampling turned out to be the most suitable technique. Finally a comparison using industrial performance measures is presented, which concludes that this technique is also more suited to this industrial problem than the other techniques presented in

  8. Ensemble statistical post-processing of the National Air Quality Forecast Capability: Enhancing ozone forecasts in Baltimore, Maryland

    NASA Astrophysics Data System (ADS)

    Garner, Gregory G.; Thompson, Anne M.

    2013-12-01

    An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone. The user is provided the freedom to tailor the forecast to the decision at hand by using decision-specific probability thresholds that define a forecast for an ozone exceedance. Taking advantage of the ESP, the user not only receives an increase in value over the NAQFC, but also receives value for costly decisions that the NAQFC couldn't provide alone.

  9. Input Decimated Ensembles

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Oza, Nikunj C.; Clancy, Daniel (Technical Monitor)

    2001-01-01

    Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers' performance levels high is an important area of research. In this article, we explore input decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses them to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles (IDEs) outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.

  10. Matlab Cluster Ensemble Toolbox

    SciTech Connect

    Sapio, Vincent De; Kegelmeyer, Philip

    2009-04-27

    This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. With regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.

  11. Learning classification trees

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1991-01-01

    Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.

  12. Top Quark Produced Through the Electroweak Force: Discovery Using the Matrix Element Analysis and Search for Heavy Gauge Bosons Using Boosted Decision Trees

    SciTech Connect

    Pangilinan, Monica

    2010-05-01

    The top quark produced through the electroweak channel provides a direct measurement of the Vtb element in the CKM matrix which can be viewed as a transition rate of a top quark to a bottom quark. This production channel of top quark is also sensitive to different theories beyond the Standard Model such as heavy charged gauged bosons termed W'. This thesis measures the cross section of the electroweak produced top quark using a technique based on using the matrix elements of the processes under consideration. The technique is applied to 2.3 fb-1 of data from the D0 detector. From a comparison of the matrix element discriminants between data and the signal and background model using Bayesian statistics, we measure the cross section of the top quark produced through the electroweak mechanism σ(p$\\bar{p}$ → tb + X, tqb + X) = 4.30-1.20+0.98 pb. The measured result corresponds to a 4.9σ Gaussian-equivalent significance. By combining this analysis with other analyses based on the Bayesian Neural Network (BNN) and Boosted Decision Tree (BDT) method, the measured cross section is 3.94 ± 0.88 pb with a significance of 5.0σ, resulting in the discovery of electroweak produced top quarks. Using this measured cross section and constraining |Vtb| < 1, the 95% confidence level (C.L.) lower limit is |Vtb| > 0.78. Additionally, a search is made for the production of W' using the same samples from the electroweak produced top quark. An analysis based on the BDT method is used to separate the signal from expected backgrounds. No significant excess is found and 95% C.L. upper limits on the production cross section are set for W' with masses within 600-950 GeV. For four general models of W{prime} boson production using decay channel W' → t$\\bar{p}$, the lower mass limits are the following: M(W'L with SM couplings) > 840 GeV; M(W'R) > 880 GeV or 890 GeV if the right-handed neutrino is

  13. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS

    NASA Astrophysics Data System (ADS)

    Pradhan, Biswajeet

    2013-02-01

    The purpose of the present study is to compare the prediction performances of three different approaches such as decision tree (DT), support vector machine (SVM) and adaptive neuro-fuzzy inference system (ANFIS) for landslide susceptibility mapping at Penang Hill area, Malaysia. The necessary input parameters for the landslide susceptibility assessments were obtained from various sources. At first, landslide locations were identified by aerial photographs and field surveys and a total of 113 landslide locations were constructed. The study area contains 340,608 pixels while total 8403 pixels include landslides. The landslide inventory was randomly partitioned into two subsets: (1) part 1 that contains 50% (4000 landslide grid cells) was used in the training phase of the models; (2) part 2 is a validation dataset 50% (4000 landslide grid cells) for validation of three models and to confirm its accuracy. The digitally processed images of input parameters were combined in GIS. Finally, landslide susceptibility maps were produced, and the performances were assessed and discussed. Total fifteen landslide susceptibility maps were produced using DT, SVM and ANFIS based models, and the resultant maps were validated using the landslide locations. Prediction performances of these maps were checked by receiver operating characteristics (ROC) by using both success rate curve and prediction rate curve. The validation results showed that, area under the ROC curve for the fifteen models produced using DT, SVM and ANFIS varied from 0.8204 to 0.9421 for success rate curve and 0.7580 to 0.8307 for prediction rate curves, respectively. Moreover, the prediction curves revealed that model 5 of DT has slightly higher prediction performance (83.07), whereas the success rate showed that model 5 of ANFIS has better prediction (94.21) capability among all models. The results of this study showed that landslide susceptibility mapping in the Penang Hill area using the three approaches (e

  14. The Potential Impact of Improving Appropriate Treatment for Fever on Malaria and Non-Malarial Febrile Illness Management in Under-5s: A Decision-Tree Modelling Approach

    PubMed Central

    Rao, V. Bhargavi; Schellenberg, David; Ghani, Azra C.

    2013-01-01

    Background As international funding for malaria programmes plateaus, limited resources must be rationally managed for malaria and non-malarial febrile illnesses (NMFI). Given widespread unnecessary treatment of NMFI with first-line antimalarial Artemisinin Combination Therapies (ACTs), our aim was to estimate the effect of health-systems factors on rates of appropriate treatment for fever and on use of ACTs. Methods A decision-tree tool was developed to investigate the impact of improving aspects of the fever care-pathway and also evaluate the impact in Tanzania of the revised WHO malaria guidelines advocating diagnostic-led management Results Model outputs using baseline parameters suggest 49% malaria cases attending a clinic would receive ACTs (95% Uncertainty Interval:40.6–59.2%) but that 44% (95% UI:35–54.8%) NMFI cases would also receive ACTs. Provision of 100% ACT stock predicted a 28.9% increase in malaria cases treated with ACT, but also an increase in overtreatment of NMFI, with 70% NMFI cases (95% UI:56.4–79.2%) projected to receive ACTs, and thus an overall 13% reduction (95% UI:5–21.6%) in correct management of febrile cases. Modelling increased availability or use of diagnostics had little effect on malaria management outputs, but may significantly reduce NMFI overtreatment. The model predicts the early rollout of revised WHO guidelines in Tanzania may have led to a 35% decrease (95% UI:31.2–39.8%) in NMFI overtreatment, but also a 19.5% reduction (95% UI:11–27.2%), in malaria cases receiving ACTs, due to a potential fourfold decrease in cases that were untested or tested false-negative (42.5% vs.8.9%) and so untreated. Discussion Modelling multi-pronged intervention strategies proved most effective to improve malaria treatment without increasing NMFI overtreatment. As malaria transmission declines, health system interventions must be guided by whether the management priority is an increase in malaria cases receiving ACTs (reducing the

  15. Imprinting and recalling cortical ensembles.

    PubMed

    Carrillo-Reid, Luis; Yang, Weijian; Bando, Yuki; Peterka, Darcy S; Yuste, Rafael

    2016-08-12

    Neuronal ensembles are coactive groups of neurons that may represent building blocks of cortical circuits. These ensembles could be formed by Hebbian plasticity, whereby synapses between coactive neurons are strengthened. Here we report that repetitive activation with two-photon optogenetics of neuronal populations from ensembles in the visual cortex of awake mice builds neuronal ensembles that recur spontaneously after being imprinted and do not disrupt preexisting ones. Moreover, imprinted ensembles can be recalled by single- cell stimulation and remain coactive on consecutive days. Our results demonstrate the persistent reconfiguration of cortical circuits by two-photon optogenetics into neuronal ensembles that can perform pattern completion. PMID:27516599

  16. Classroom Management for Ensembles.

    ERIC Educational Resources Information Center

    Bauer, William I.

    2001-01-01

    Discusses topics essential to good classroom management for ensemble music teachers. Explores the importance of planning and preparation, good teaching practice within the classroom, and using an effective discipline plan to deal with any behavior problems in the classroom. Includes a bibliography of further resources. (CMK)

  17. Protective Garment Ensemble

    NASA Technical Reports Server (NTRS)

    Wakefield, M. E.

    1982-01-01

    Protective garment ensemble with internally-mounted environmental- control unit contains its own air supply. Alternatively, a remote-environmental control unit or an air line is attached at the umbilical quick disconnect. Unit uses liquid air that is vaporized to provide both breathing air and cooling. Totally enclosed garment protects against toxic substances.

  18. Ensemble Bayesian forecasting system Part I: Theory and algorithms

    NASA Astrophysics Data System (ADS)

    Herr, Henry D.; Krzysztofowicz, Roman

    2015-05-01

    The ensemble Bayesian forecasting system (EBFS), whose theory was published in 2001, is developed for the purpose of quantifying the total uncertainty about a discrete-time, continuous-state, non-stationary stochastic process such as a time series of stages, discharges, or volumes at a river gauge. The EBFS is built of three components: an input ensemble forecaster (IEF), which simulates the uncertainty associated with random inputs; a deterministic hydrologic model (of any complexity), which simulates physical processes within a river basin; and a hydrologic uncertainty processor (HUP), which simulates the hydrologic uncertainty (an aggregate of all uncertainties except input). It works as a Monte Carlo simulator: an ensemble of time series of inputs (e.g., precipitation amounts) generated by the IEF is transformed deterministically through a hydrologic model into an ensemble of time series of outputs, which is next transformed stochastically by the HUP into an ensemble of time series of predictands (e.g., river stages). Previous research indicated that in order to attain an acceptable sampling error, the ensemble size must be on the order of hundreds (for probabilistic river stage forecasts and probabilistic flood forecasts) or even thousands (for probabilistic stage transition forecasts). The computing time needed to run the hydrologic model this many times renders the straightforward simulations operationally infeasible. This motivates the development of the ensemble Bayesian forecasting system with randomization (EBFSR), which takes full advantage of the analytic meta-Gaussian HUP and generates multiple ensemble members after each run of the hydrologic model; this auxiliary randomization reduces the required size of the meteorological input ensemble and makes it operationally feasible to generate a Bayesian ensemble forecast of large size. Such a forecast quantifies the total uncertainty, is well calibrated against the prior (climatic) distribution of

  19. Lung Cancer Survival Prediction using Ensemble Data Mining on Seer Data

    DOE PAGESBeta

    Agrawal, Ankit; Misra, Sanchit; Narayanan, Ramanathan; Polepeddi, Lalith; Choudhary, Alok

    2012-01-01

    We analyze the lung cancer data available from the SEER program with the aim of developing accurate survival prediction models for lung cancer. Carefully designed preprocessing steps resulted in removal/modification/splitting of several attributes, and 2 of the 11 derived attributes were found to have significant predictive power. Several supervised classification methods were used on the preprocessed data along with various data mining optimizations and validations. In our experiments, ensemble voting of five decision tree based classifiers and meta-classifiers was found to result in the best prediction performance in terms of accuracy and area under the ROC curve. We have developedmore » an on-line lung cancer outcome calculator for estimating the risk of mortality after 6 months, 9 months, 1 year, 2 year and 5 years of diagnosis, for which a smaller non-redundant subset of 13 attributes was carefully selected using attribute selection techniques, while trying to retain the predictive power of the original set of attributes. Further, ensemble voting models were also created for predicting conditional survival outcome for lung cancer (estimating risk of mortality after 5 years of diagnosis, given that the patient has already survived for a period of time), and included in the calculator. The on-line lung cancer outcome calculator developed as a result of this study is available at http://info.eecs.northwestern.edu:8080/LungCancerOutcomeCalculator/.« less

  20. The Protein Ensemble Database.

    PubMed

    Varadi, Mihaly; Tompa, Peter

    2015-01-01

    The scientific community's major conceptual notion of structural biology has recently shifted in emphasis from the classical structure-function paradigm due to the emergence of intrinsically disordered proteins (IDPs). As opposed to their folded cousins, these proteins are defined by the lack of a stable 3D fold and a high degree of inherent structural heterogeneity that is closely tied to their function. Due to their flexible nature, solution techniques such as small-angle X-ray scattering (SAXS), nuclear magnetic resonance (NMR) spectroscopy and fluorescence resonance energy transfer (FRET) are particularly well-suited for characterizing their biophysical properties. Computationally derived structural ensembles based on such experimental measurements provide models of the conformational sampling displayed by these proteins, and they may offer valuable insights into the functional consequences of inherent flexibility. The Protein Ensemble Database (http://pedb.vib.be) is the first openly accessible, manually curated online resource storing the ensemble models, protocols used during the calculation procedure, and underlying primary experimental data derived from SAXS and/or NMR measurements. By making this previously inaccessible data freely available to researchers, this novel resource is expected to promote the development of more advanced modelling methodologies, facilitate the design of standardized calculation protocols, and consequently lead to a better understanding of how function arises from the disordered state. PMID:26387108

  1. Matlab Cluster Ensemble Toolbox

    Energy Science and Technology Software Center (ESTSC)

    2009-04-27

    This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. Withmore » regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.« less

  2. Structural Equation Model Trees

    PubMed Central

    Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman

    2015-01-01

    In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree structures that separate a data set recursively into subsets with significantly different parameter estimates in a SEM. SEM Trees provide means for finding covariates and covariate interactions that predict differences in structural parameters in observed as well as in latent space and facilitate theory-guided exploration of empirical data. We describe the methodology, discuss theoretical and practical implications, and demonstrate applications to a factor model and a linear growth curve model. PMID:22984789

  3. Tree Scanning

    PubMed Central

    Templeton, Alan R.; Maxwell, Taylor; Posada, David; Stengård, Jari H.; Boerwinkle, Eric; Sing, Charles F.

    2005-01-01

    We use evolutionary trees of haplotypes to study phenotypic associations by exhaustively examining all possible biallelic partitions of the tree, a technique we call tree scanning. If the first scan detects significant associations, additional rounds of tree scanning are used to partition the tree into three or more allelic classes. Two worked examples are presented. The first is a reanalysis of associations between haplotypes at the Alcohol Dehydrogenase locus in Drosophila melanogaster that was previously analyzed using a nested clade analysis, a more complicated technique for using haplotype trees to detect phenotypic associations. Tree scanning and the nested clade analysis yield the same inferences when permutation testing is used with both approaches. The second example is an analysis of associations between variation in various lipid traits and genetic variation at the Apolipoprotein E (APOE) gene in three human populations. Tree scanning successfully identified phenotypic associations expected from previous analyses. Tree scanning for the most part detected more associations and provided a better biological interpretative framework than single SNP analyses. We also show how prior information can be incorporated into the tree scan by starting with the traditional three electrophoretic alleles at APOE. Tree scanning detected genetically determined phenotypic heterogeneity within all three electrophoretic allelic classes. Overall, tree scanning is a simple, powerful, and flexible method for using haplotype trees to detect phenotype/genotype associations at candidate loci. PMID:15371364

  4. Ensemble forecasting of major solar flares: First results

    NASA Astrophysics Data System (ADS)

    Guerra, J. A.; Pulkkinen, A.; Uritsky, V. M.

    2015-10-01

    We present the results from the first ensemble prediction model for major solar flares (M and X classes). The primary aim of this investigation is to explore the construction of an ensemble for an initial prototyping of this new concept. Using the probabilistic forecasts from three models hosted at the Community Coordinated Modeling Center (NASA-GSFC) and the NOAA forecasts, we developed an ensemble forecast by linearly combining the flaring probabilities from all four methods. Performance-based combination weights were calculated using a Monte Carlo-type algorithm that applies a decision threshold Pth to the combined probabilities and maximizing the Heidke Skill Score (HSS). Using the data for 13 recent solar active regions between years 2012 and 2014, we found that linear combination methods can improve the overall probabilistic prediction and improve the categorical prediction for certain values of decision thresholds. Combination weights vary with the applied threshold and none of the tested individual forecasting models seem to provide more accurate predictions than the others for all values of Pth. According to the maximum values of HSS, a performance-based weights calculated by averaging over the sample, performed similarly to a equally weighted model. The values Pth for which the ensemble forecast performs the best are 25% for M-class flares and 15% for X-class flares. When the human-adjusted probabilities from NOAA are excluded from the ensemble, the ensemble performance in terms of the Heidke score is reduced.

  5. Application of a Multi-Scheme Ensemble Prediction System and an Ensemble Classification Method to Streamflow Forecasting

    NASA Astrophysics Data System (ADS)

    Pahlow, M.; Moehrlen, C.; Joergensen, J.; Hundecha, Y.

    2007-12-01

    Europe has experienced a number of unusually long-lasting and intense rainfall events in the last decade, resulting in severe floods in most European countries. Ensemble forecasts emerged as a valuable resource to provide decision makers in case of emergency with adequate information to protect downstream areas. However, forecasts should not only provide a best guess of the state of the stream network, but also an estimate of the range of possible outcomes. Ensemble forecast techniques are a suitable tool to obtain the required information. Furthermore a wide range of uncertainty that may impact hydrological forecasts can be accounted for using an ensemble of forecasts. The forecasting system used in this study is based on a multi-scheme ensemble prediction method and forecasts the meteorological uncertainty on synoptic scales as well as the resulting forecast error in weather derived products. Statistical methods are used to directly transform raw weather output to derived products and thereby utilize the statistical capabilities of each ensemble forecast. The forecasting system MS-EPS (Multi-Scheme Ensemble Prediction System) used in this study is a limited area ensemble prediction system using 75 different numerical weather prediction (NWP) model parameterisations. These individual 'schemes' each differ in their formulation of the fast meteorological processes. The MS-EPS forecasts are used as input for a hydrological model (HBV) to generate an ensemble of streamflow forecasts. Determining the most probable forecast from an ensemble of forecasts requires suitable statistical tools. They must enable a forecaster to interpret the model output, to condense the information and to provide the desired product. For this purpose, a probabilistic multi-trend filter (pmt-filter) for statistical post-processing of the hydrological ensemble forecasts is used in this study. An application of the forecasting system is shown for a watershed located in the eastern part of

  6. Integrated approach using data mining-based decision tree and object-based image analysis for high-resolution urban mapping of WorldView-2 satellite sensor data

    NASA Astrophysics Data System (ADS)

    Hamedianfar, Alireza; Shafri, Helmi Zulhaidi Mohd

    2016-04-01

    This paper integrates decision tree-based data mining (DM) and object-based image analysis (OBIA) to provide a transferable model for the detailed characterization of urban land-cover classes using WorldView-2 (WV-2) satellite images. Many articles have been published on OBIA in recent years based on DM for different applications. However, less attention has been paid to the generation of a transferable model for characterizing detailed urban land cover features. Three subsets of WV-2 images were used in this paper to generate transferable OBIA rule-sets. Many features were explored by using a DM algorithm, which created the classification rules as a decision tree (DT) structure from the first study area. The developed DT algorithm was applied to object-based classifications in the first study area. After this process, we validated the capability and transferability of the classification rules into second and third subsets. Detailed ground truth samples were collected to assess the classification results. The first, second, and third study areas achieved 88%, 85%, and 85% overall accuracies, respectively. Results from the investigation indicate that DM was an efficient method to provide the optimal and transferable classification rules for OBIA, which accelerates the rule-sets creation stage in the OBIA classification domain.

  7. Tree Lifecycle.

    ERIC Educational Resources Information Center

    Nature Study, 1998

    1998-01-01

    Presents a Project Learning Tree (PLT) activity that has students investigate and compare the lifecycle of a tree to other living things and the tree's role in the ecosystem. Includes background material as well as step-by-step instructions, variation and enrichment ideas, assessment opportunities, and student worksheets. (SJR)

  8. Solving tridiagonal systems on ensemble architectures

    SciTech Connect

    Johnsson, S.L.

    1987-05-01

    The concurrent solution of tridiagonal systems on linear and 2-dimensional arrays, complete binary trees, shuffle-exchange and perfect shuttle networks, and boolean cubes by elimination methods are devised and analyzed. The methods can be obtained by symmetric permutations of some rows and columns, and amounts to cyclic reduction or a combination of Gaussian elimination and cyclic reduction (GECR). The ensembles have only local storage and no global control. Synchronization is accomplished via message passing to neighboring processors. The parallel arithmetic complexity of GECR for N equations on a K processor ensemble is O(N/K+log/sub 2/K), and the communication complexity is O(K) for the linear array, O(..sqrt..K) for the 2-dimensional mesh, and O(log/sub 2/K) for the networks of diameter O(log/sub 2/K). The maximum speed-up for the linear array is attained at K approx. = (N/..cap alpha..)/sup 1/2/ and for the 2-d mesh at K approx. = (N/2..cap alpha..)/sup 2/3/, where ..cap alpha..=(the time to communicate one floating-point number)/(the time for a floating-point arithmetic operation). For the binary tree the maximum speed-up is attained at K = N, and for the perfect shuffle and boolean k-cube networks, K = N/(1 + ..cap alpha..) yields the maximum speed-up. The minimum time complexity is of order O(N/sup 1/2/) for the linear array, of order O(N/sup 1/3/) for the mesh, and of order O(log/sub 2/N) for the binary tree, the shuffle exchange, the perfect shuffle and the boolean k-cube.

  9. Seeing the forest through the trees: improving decision making on the Iowa gambling task by shifting focus from short- to long-term outcomes

    PubMed Central

    Buelow, Melissa T.; Okdie, Bradley M.; Blaine, Amber L.

    2013-01-01

    Introduction: The present study sought to examine two methods by which to improve decision making on the Iowa Gambling Task (IGT): inducing a negative mood and providing additional learning trials. Method: In the first study, 194 undergraduate students [74 male; Mage = 19.44 (SD = 3.69)] were randomly assigned to view a series of pictures to induce a positive, negative, or neutral mood immediately prior to the IGT. In the second study, 276 undergraduate students [111 male; Mage = 19.18 (SD = 2.58)] completed a delay discounting task and back-to-back administrations of the IGT. Results: Participants in an induced negative mood selected more from Deck C during the final trials than those in an induced positive mood. Providing additional learning trials resulted in better decision making: participants shifted their focus from the frequency of immediate gains/losses (i.e., a preference for Decks B and D) to long-term outcomes (i.e., a preference for Deck D). In addition, disadvantageous decision making on the additional learning trials was associated with larger delay discounting (i.e., a preference for more immediate but smaller rewards). Conclusions: The present results indicate that decision making is affected by negative mood state, and that decision making can be improved by increasing the number of learning trials. In addition, the current results provide evidence of a relationship between performance on the IGT and on a separate measure of decision making, the delay discounting task. Moreover, the present results indicate that improved decision making on the IGT can be attributed to shifting focus toward long-term outcomes, as evidenced by increased selections from advantageous decks as well as correlations between the IGT and delay discounting task. Implications for the assessment of decision making using the IGT are discussed. PMID:24151485

  10. Multilevel ensemble Kalman filtering

    DOE PAGESBeta

    Hoel, Hakon; Law, Kody J. H.; Tempone, Raul

    2016-06-14

    This study embeds a multilevel Monte Carlo sampling strategy into the Monte Carlo step of the ensemble Kalman filter (EnKF) in the setting of finite dimensional signal evolution and noisy discrete-time observations. The signal dynamics is assumed to be governed by a stochastic differential equation (SDE), and a hierarchy of time grids is introduced for multilevel numerical integration of that SDE. Finally, the resulting multilevel EnKF is proved to asymptotically outperform EnKF in terms of computational cost versus approximation accuracy. The theoretical results are illustrated numerically.

  11. Multinomial logistic regression ensembles.

    PubMed

    Lee, Kyewon; Ahn, Hongshik; Moon, Hojin; Kodell, Ralph L; Chen, James J

    2013-05-01

    This article proposes a method for multiclass classification problems using ensembles of multinomial logistic regression models. A multinomial logit model is used as a base classifier in ensembles from random partitions of predictors. The multinomial logit model can be applied to each mutually exclusive subset of the feature space without variable selection. By combining multiple models the proposed method can handle a huge database without a constraint needed for analyzing high-dimensional data, and the random partition can improve the prediction accuracy by reducing the correlation among base classifiers. The proposed method is implemented using R, and the performance including overall prediction accuracy, sensitivity, and specificity for each category is evaluated on two real data sets and simulation data sets. To investigate the quality of prediction in terms of sensitivity and specificity, the area under the receiver operating characteristic (ROC) curve (AUC) is also examined. The performance of the proposed model is compared to a single multinomial logit model and it shows a substantial improvement in overall prediction accuracy. The proposed method is also compared with other classification methods such as the random forest, support vector machines, and random multinomial logit model. PMID:23611203

  12. Decimated Input Ensembles for Improved Generalization

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Oza, Nikunj C.; Norvig, Peter (Technical Monitor)

    1999-01-01

    Recently, many researchers have demonstrated that using classifier ensembles (e.g., averaging the outputs of multiple classifiers before reaching a classification decision) leads to improved performance for many difficult generalization problems. However, in many domains there are serious impediments to such "turnkey" classification accuracy improvements. Most notable among these is the deleterious effect of highly correlated classifiers on the ensemble performance. One particular solution to this problem is generating "new" training sets by sampling the original one. However, with finite number of patterns, this causes a reduction in the training patterns each classifier sees, often resulting in considerably worsened generalization performance (particularly for high dimensional data domains) for each individual classifier. Generally, this drop in the accuracy of the individual classifier performance more than offsets any potential gains due to combining, unless diversity among classifiers is actively promoted. In this work, we introduce a method that: (1) reduces the correlation among the classifiers; (2) reduces the dimensionality of the data, thus lessening the impact of the 'curse of dimensionality'; and (3) improves the classification performance of the ensemble.

  13. Hierarchical Ensemble Methods for Protein Function Prediction

    PubMed Central

    2014-01-01

    Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. PMID:25937954

  14. Density of states for Gaussian unitary ensemble, Gaussian orthogonal ensemble, and interpolating ensembles through supersymmetric approach

    SciTech Connect

    Shamis, Mira

    2013-11-15

    We use the supersymmetric formalism to derive an integral formula for the density of states of the Gaussian Orthogonal Ensemble, and then apply saddle-point analysis to give a new derivation of the 1/N-correction to Wigner's law. This extends the work of Disertori on the Gaussian Unitary Ensemble. We also apply our method to the interpolating ensembles of Mehta–Pandey.

  15. Improving Climate Projections Using "Intelligent" Ensembles

    NASA Technical Reports Server (NTRS)

    Baker, Noel C.; Taylor, Patrick C.

    2015-01-01

    Recent changes in the climate system have led to growing concern, especially in communities which are highly vulnerable to resource shortages and weather extremes. There is an urgent need for better climate information to develop solutions and strategies for adapting to a changing climate. Climate models provide excellent tools for studying the current state of climate and making future projections. However, these models are subject to biases created by structural uncertainties. Performance metrics-or the systematic determination of model biases-succinctly quantify aspects of climate model behavior. Efforts to standardize climate model experiments and collect simulation data-such as the Coupled Model Intercomparison Project (CMIP)-provide the means to directly compare and assess model performance. Performance metrics have been used to show that some models reproduce present-day climate better than others. Simulation data from multiple models are often used to add value to projections by creating a consensus projection from the model ensemble, in which each model is given an equal weight. It has been shown that the ensemble mean generally outperforms any single model. It is possible to use unequal weights to produce ensemble means, in which models are weighted based on performance (called "intelligent" ensembles). Can performance metrics be used to improve climate projections? Previous work introduced a framework for comparing the utility of model performance metrics, showing that the best metrics are related to the variance of top-of-atmosphere outgoing longwave radiation. These metrics improve present-day climate simulations of Earth's energy budget using the "intelligent" ensemble method. The current project identifies several approaches for testing whether performance metrics can be applied to future simulations to create "intelligent" ensemble-mean climate projections. It is shown that certain performance metrics test key climate processes in the models, and

  16. Class Evolution Tree: A Graphical Tool to Support Decisions on the Number of Classes in Exploratory Categorical Latent Variable Modeling for Rehabilitation Research

    ERIC Educational Resources Information Center

    Kriston, Levente; Melchior, Hanne; Hergert, Anika; Bergelt, Corinna; Watzke, Birgit; Schulz, Holger; von Wolff, Alessa

    2011-01-01

    The aim of our study was to develop a graphical tool that can be used in addition to standard statistical criteria to support decisions on the number of classes in explorative categorical latent variable modeling for rehabilitation research. Data from two rehabilitation research projects were used. In the first study, a latent profile analysis was…

  17. Representative Ensembles in Statistical Mechanics

    NASA Astrophysics Data System (ADS)

    Yukalov, V. I.

    The notion of representative statistical ensembles, correctly representing statistical systems, is strictly formulated. This notion allows for a proper description of statistical systems, avoiding inconsistencies in theory. As an illustration, a Bose-condensed system is considered. It is shown that a self-consistent treatment of the latter, using a representative ensemble, always yields a conserving and gapless theory.

  18. The Importance of Bass Ensemble.

    ERIC Educational Resources Information Center

    Bitz, Michael

    1997-01-01

    States that bass players should be allowed to play chamber music because it is an essential component to all string students' musical development. Expounds that bassists can successfully enjoy chamber music through participation in a bass ensemble. Gives suggestions on how to form a bass ensemble and on the repertoire of music. (CMK)

  19. The Behavioral Relevance of Cortical Neural Ensemble Responses Emerges Suddenly.

    PubMed

    Sadacca, Brian F; Mukherjee, Narendra; Vladusich, Tony; Li, Jennifer X; Katz, Donald B; Miller, Paul

    2016-01-20

    Whereas many laboratory-studied decisions involve a highly trained animal identifying an ambiguous stimulus, many naturalistic decisions do not. Consumption decisions, for instance, involve determining whether to eject or consume an already identified stimulus in the mouth and are decisions that can be made without training. By standard analyses, rodent cortical single-neuron taste responses come to predict such consumption decisions across the 500 ms preceding the consumption or rejection itself; decision-related firing emerges well after stimulus identification. Analyzing single-trial ensemble activity using hidden Markov models, we show these decision-related cortical responses to be part of a reliable sequence of states (each defined by the firing rates within the ensemble) separated by brief state-to-state transitions, the latencies of which vary widely between trials. When we aligned data to the onset of the (late-appearing) state that dominates during the time period in which single-neuron firing is correlated to taste palatability, the apparent ramp in stimulus-aligned choice-related firing was shown to be a much more precipitous coherent jump. This jump in choice-related firing resembled a step function more than it did the output of a standard (ramping) decision-making model, and provided a robust prediction of decision latency in single trials. Together, these results demonstrate that activity related to naturalistic consumption decisions emerges nearly instantaneously in cortical ensembles. Significance statement: This paper provides a description of how the brain makes evaluative decisions. The majority of work on the neurobiology of decision making deals with "what is it?" decisions; out of this work has emerged a model whereby neurons accumulate information about the stimulus in the form of slowly increasing firing rates and reach a decision when those firing rates reach a threshold. Here, we study a different kind of more naturalistic decision

  20. A multi-model ensemble approach to seabed mapping

    NASA Astrophysics Data System (ADS)

    Diesing, Markus; Stephens, David

    2015-06-01

    Seabed habitat mapping based on swath acoustic data and ground-truth samples is an emergent and active marine science discipline. Significant progress could be achieved by transferring techniques and approaches that have been successfully developed and employed in such fields as terrestrial land cover mapping. One such promising approach is the multiple classifier system, which aims at improving classification performance by combining the outputs of several classifiers. Here we present results of a multi-model ensemble applied to multibeam acoustic data covering more than 5000 km2 of seabed in the North Sea with the aim to derive accurate spatial predictions of seabed substrate. A suite of six machine learning classifiers (k-Nearest Neighbour, Support Vector Machine, Classification Tree, Random Forest, Neural Network and Naïve Bayes) was trained with ground-truth sample data classified into seabed substrate classes and their prediction accuracy was assessed with an independent set of samples. The three and five best performing models were combined to classifier ensembles. Both ensembles led to increased prediction accuracy as compared to the best performing single classifier. The improvements were however not statistically significant at the 5% level. Although the three-model ensemble did not perform significantly better than its individual component models, we noticed that the five-model ensemble did perform significantly better than three of the five component models. A classifier ensemble might therefore be an effective strategy to improve classification performance. Another advantage is the fact that the agreement in predicted substrate class between the individual models of the ensemble could be used as a measure of confidence. We propose a simple and spatially explicit measure of confidence that is based on model agreement and prediction accuracy.

  1. Using high-resolution topography and hyperspectral data to classify tree species at the San Joaquin Experimental Range

    NASA Astrophysics Data System (ADS)

    Dibb, S. D.; Ustin, S.; Grigsby, S.

    2015-12-01

    Air- and space-borne remote sensing instruments allow for rapid and precise study of the diversity of the Earth's ecosystems. After atmospheric correction and ground validation are performed, the gathered hyperspectral and topographic data can be assembled into a stack of layers for land cover classification. Data for this project were collected in multiple field campaigns, including the 2013 NSF NEON California campaign and 2015 NASA SARP campaign. Using hyperspectral and high resolution topography data, 25 discriminatory attributes were processed in Exelis' ENVI software and collected for use in a decision forest to classify the four major tree species (Blue Oak, Live Oak, California Buckeye, and Foothill Pine) at the San Joaquin Experimental Range near Fresno, CA. These attributes include 21 classic vegetation indices and a number of other spectral characteristics, such as color and albedo, and four topographic layers, including slope, aspect, elevation, and tree height. Additionally, a number of nearby terrain classes, including bare earth, asphalt, water, rock, shadow, structures, and grass were created. Fifty training pixels were used for each class. The training pixels for each tree species came from collected GPS points in the field. Ensemble bootstrap aggregation of decision trees was performed in MATLAB, and an arbitrary number of 500 trees were selected to be grown. The tree that produced the minimum out-of-bag classification error (4.65%) was selected to classify the entire scene. Classification results accurately distinguished between oak species, but was suboptimal in dense areas. The entire San Joaquin Experimental Range was mapped with an overall accuracy of 94.7% and a Kappa coefficient 0.94. Finally, the Commission and Omission percentage averages were 5.3% each. A highly accurate map of tree species at this scale supports studies on drought effects, disease, and species-specific growth traits.

  2. A GLM Post-processor to Adjust Ensemble Forecast Traces

    NASA Astrophysics Data System (ADS)

    Thiemann, M.; Day, G. N.; Schaake, J. C.; Draijer, S.; Wang, L.

    2011-12-01

    The skill of hydrologic ensemble forecasts has improved in the last years through a better understanding of climate variability, better climate forecasts and new data assimilation techniques. Having been extensively utilized for probabilistic water supply forecasting, interest is developing to utilize these forecasts in operational decision making. Hydrologic ensemble forecast members typically have inherent biases in flow timing and volume caused by (1) structural errors in the models used, (2) systematic errors in the data used to calibrate those models, (3) uncertain initial hydrologic conditions, and (4) uncertainties in the forcing datasets. Furthermore, hydrologic models have often not been developed for operational decision points and ensemble forecasts are thus not always available where needed. A statistical post-processor can be used to address these issues. The post-processor should (1) correct for systematic biases in flow timing and volume, (2) preserve the skill of the available raw forecasts, (3) preserve spatial and temporal correlation as well as the uncertainty in the forecasted flow data, (4) produce adjusted forecast ensembles that represent the variability of the observed hydrograph to be predicted, and (5) preserve individual forecast traces as equally likely. The post-processor should also allow for the translation of available ensemble forecasts to hydrologically similar locations where forecasts are not available. This paper introduces an ensemble post-processor (EPP) developed in support of New York City water supply operations. The EPP employs a general linear model (GLM) to (1) adjust available ensemble forecast traces and (2) create new ensembles for (nearby) locations where only historic flow observations are available. The EPP is calibrated by developing daily and aggregated statistical relationships form historical flow observations and model simulations. These are then used in operation to obtain the conditional probability density

  3. The Behavioral Relevance of Cortical Neural Ensemble Responses Emerges Suddenly

    PubMed Central

    Sadacca, Brian F.; Mukherjee, Narendra; Vladusich, Tony; Li, Jennifer X.

    2016-01-01

    Whereas many laboratory-studied decisions involve a highly trained animal identifying an ambiguous stimulus, many naturalistic decisions do not. Consumption decisions, for instance, involve determining whether to eject or consume an already identified stimulus in the mouth and are decisions that can be made without training. By standard analyses, rodent cortical single-neuron taste responses come to predict such consumption decisions across the 500 ms preceding the consumption or rejection itself; decision-related firing emerges well after stimulus identification. Analyzing single-trial ensemble activity using hidden Markov models, we show these decision-related cortical responses to be part of a reliable sequence of states (each defined by the firing rates within the ensemble) separated by brief state-to-state transitions, the latencies of which vary widely between trials. When we aligned data to the onset of the (late-appearing) state that dominates during the time period in which single-neuron firing is correlated to taste palatability, the apparent ramp in stimulus-aligned choice-related firing was shown to be a much more precipitous coherent jump. This jump in choice-related firing resembled a step function more than it did the output of a standard (ramping) decision-making model, and provided a robust prediction of decision latency in single trials. Together, these results demonstrate that activity related to naturalistic consumption decisions emerges nearly instantaneously in cortical ensembles. SIGNIFICANCE STATEMENT This paper provides a description of how the brain makes evaluative decisions. The majority of work on the neurobiology of decision making deals with “what is it?” decisions; out of this work has emerged a model whereby neurons accumulate information about the stimulus in the form of slowly increasing firing rates and reach a decision when those firing rates reach a threshold. Here, we study a different kind of more naturalistic decision

  4. Hydrologic ensemble hindcasting and verification in the U.S. National Weather Service

    NASA Astrophysics Data System (ADS)

    Demargne, Julie; Liu, Yuqiong; Brown, James; Seo, Dong-Jun; Wu, Limin; Weerts, Albrecht; Werner, Micha

    2010-05-01

    Quantifying the predictive uncertainty in hydrologic forecasts is one of the most pressing needs in operational hydrologic forecasting, to support risk-based decision making for a wide range of applications (e.g. flood risk management, water supply management, streamflow regulation, and recreation planning). Towards this goal, the Office of Hydrologic Development of the National Oceanic and Atmospheric Administration (NOAA) National Weather Service (NWS), in collaboration with the NWS River Forecast Centers, Deltares and other partners, has been developing the Experimental Ensemble Forecast System (XEFS). The XEFS includes the Ensemble Pre-Processor, the Ensemble Streamflow Prediction subsystem, the Ensemble Post-Processor, the Hydrologic Model Output Statistics streamflow ensemble processor, as well as the Ensemble Verification System for assessing the quality of the probabilistic forecasts generated therein. It is currently being integrated into the NWS's Community Hydrologic Prediction System (CHPS), which builds on the service-oriented architecture of the Delft FEWS Flood Early Warning System. The CHPS-XEFS also provides ensemble hindcasting capabilities to retroactively apply the newly developed ensemble forecasting approaches, and produce large samples of ensemble hindcasts that are necessary for verification. The verification results based on these hindcasts may be used to evaluate the benefits of new or improved ensemble forecasting approaches. Additionally these can be used to analyze the various sources of uncertainty and error in the forecasting system, as well as guide targeted improvements. Hindcasts may also be required by sophisticated forecast users to calibrate their decision support system, and could help operational forecasters identify historical analogue forecasts to make informed decisions in real-time. In this paper, we describe our hindcasting procedures using CHPS-XEFS, present verification results of ensemble hindcasts generated therein

  5. Talking Trees

    ERIC Educational Resources Information Center

    Tolman, Marvin

    2005-01-01

    Students love outdoor activities and will love them even more when they build confidence in their tree identification and measurement skills. Through these activities, students will learn to identify the major characteristics of trees and discover how the pace--a nonstandard measuring unit--can be used to estimate not only distances but also the…

  6. Tree Amigos.

    ERIC Educational Resources Information Center

    Center for Environmental Study, Grand Rapids, MI.

    Tree Amigos is a special cross-cultural program that uses trees as a common bond to bring the people of the Americas together in unique partnerships to preserve and protect the shared global environment. It is a tangible program that embodies the philosophy that individuals, acting together, can make a difference. This resource book contains…

  7. Efficient Gene Tree Correction Guided by Genome Evolution

    PubMed Central

    Lafond, Manuel; Seguin, Jonathan; Boussau, Bastien; Guéguen, Laurent; El-Mabrouk, Nadia; Tannier, Eric

    2016-01-01

    Motivations Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. Results We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. Availability A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available. PMID:27513924

  8. The Polyanalytic Ginibre Ensembles

    NASA Astrophysics Data System (ADS)

    Haimi, Antti; Hedenmalm, Haakan

    2013-10-01

    For integers n, q=1,2,3,… , let Pol n, q denote the -linear space of polynomials in z and , of degree ≤ n-1 in z and of degree ≤ q-1 in . We supply Pol n, q with the inner product structure of the resulting Hilbert space is denoted by Pol m, n, q . Here, it is assumed that m is a positive real. We let K m, n, q denote the reproducing kernel of Pol m, n, q , and study the associated determinantal process, in the limit as m, n→+∞ while n= m+O(1); the number q, the degree of polyanalyticity, is kept fixed. We call these processes polyanalytic Ginibre ensembles, because they generalize the Ginibre ensemble—the eigenvalue process of random (normal) matrices with Gaussian weight. There is a physical interpretation in terms of a system of free fermions in a uniform magnetic field so that a fixed number of the first Landau levels have been filled. We consider local blow-ups of the polyanalytic Ginibre ensembles around points in the spectral droplet, which is here the closed unit disk . We obtain asymptotics for the blow-up process, using a blow-up to characteristic distance m -1/2; the typical distance is the same both for interior and for boundary points of . This amounts to obtaining the asymptotical behavior of the generating kernel K m, n, q . Following (Ameur et al. in Commun. Pure Appl. Math. 63(12):1533-1584, 2010), the asymptotics of the K m, n, q are rather conveniently expressed in terms of the Berezin measure (and density) [Equation not available: see fulltext.] For interior points | z|<1, we obtain that in the weak-star sense, where δ z denotes the unit point mass at z. Moreover, if we blow up to the scale of m -1/2 around z, we get convergence to a measure which is Gaussian for q=1, but exhibits more complicated Fresnel zone behavior for q>1. In contrast, for exterior points | z|>1, we have instead that , where is the harmonic measure at z with respect to the exterior disk . For boundary points, | z|=1, the Berezin measure converges to the unit

  9. Multilevel Ensemble Transform Particle Filtering

    NASA Astrophysics Data System (ADS)

    Gregory, Alastair; Cotter, Colin; Reich, Sebastian

    2016-04-01

    This presentation extends the Multilevel Monte Carlo variance reduction technique to nonlinear filtering. In particular, Multilevel Monte Carlo is applied to a certain variant of the particle filter, the Ensemble Transform Particle Filter (ETPF). A key aspect is the use of optimal transport methods to re-establish correlation between coarse and fine ensembles after resampling; this controls the variance of the estimator. Numerical examples present a proof of concept of the effectiveness of the proposed method, demonstrating significant computational cost reductions (relative to the single-level ETPF counterpart) in the propagation of ensembles.

  10. The Ensembl gene annotation system.

    PubMed

    Aken, Bronwen L; Ayling, Sarah; Barrell, Daniel; Clarke, Laura; Curwen, Valery; Fairley, Susan; Fernandez Banet, Julio; Billis, Konstantinos; García Girón, Carlos; Hourlier, Thibaut; Howe, Kevin; Kähäri, Andreas; Kokocinski, Felix; Martin, Fergal J; Murphy, Daniel N; Nag, Rishi; Ruffier, Magali; Schuster, Michael; Tang, Y Amy; Vogel, Jan-Hinnerk; White, Simon; Zadissa, Amonida; Flicek, Paul; Searle, Stephen M J

    2016-01-01

    The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail.Database URL: http://www.ensembl.org/index.html. PMID:27337980

  11. The Ensembl gene annotation system

    PubMed Central

    Aken, Bronwen L.; Ayling, Sarah; Barrell, Daniel; Clarke, Laura; Curwen, Valery; Fairley, Susan; Fernandez Banet, Julio; Billis, Konstantinos; García Girón, Carlos; Hourlier, Thibaut; Howe, Kevin; Kähäri, Andreas; Kokocinski, Felix; Martin, Fergal J.; Murphy, Daniel N.; Nag, Rishi; Ruffier, Magali; Schuster, Michael; Tang, Y. Amy; Vogel, Jan-Hinnerk; White, Simon; Zadissa, Amonida; Flicek, Paul

    2016-01-01

    The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail. Database URL: http://www.ensembl.org/index.html PMID:27337980

  12. Comparing climate change impacts on crops in Belgium based on CMIP3 and EU-ENSEMBLES multi-model ensembles

    NASA Astrophysics Data System (ADS)

    Vanuytrecht, E.; Raes, D.; Willems, P.; Semenov, M.

    2012-04-01

    Global Circulation Models (GCMs) are sophisticated tools to study the future evolution of the climate. Yet, the coarse scale of GCMs of hundreds of kilometers raises questions about the suitability for agricultural impact assessments. These assessments are often made at field level and require consideration of interactions at sub-GCM grid scale (e.g., elevation-dependent climatic changes). Regional climate models (RCMs) were developed to provide climate projections at a spatial scale of 25-50 km for limited regions, e.g. Europe (Giorgi and Mearns, 1991). Climate projections from GCMs or RCMs are available as multi-model ensembles. These ensembles are based on large data sets of simulations produced by modelling groups worldwide, who performed a set of coordinated climate experiments in which climate models were run for a common set of experiments and various emissions scenarios (Knutti et al., 2010). The use of multi-model ensembles in climate change studies is an important step in quantifying uncertainty in impact predictions, which will underpin more informed decisions for adaptation and mitigation to changing climate (Semenov and Stratonovitch, 2010). The objective of our study was to evaluate the effect of the spatial scale of climate projections on climate change impacts for cereals in Belgium. Climate scenarios were based on two multi-model ensembles, one comprising 15 GCMs of the Coupled Model Intercomparison Project phase 3 (CMIP3; Meehl et al., 2007) with spatial resolution of 200-300 km, the other comprising 9 RCMs of the EU-ENSEMBLES project (van der Linden and Mitchell, 2009) with spatial resolution of 25 km. To be useful for agricultural impact assessments, the projections of GCMs and RCMs were downscaled to the field level. Long series (240 cropping seasons) of local-scale climate scenarios were generated by the LARS-WG weather generator (Semenov et al., 2010) via statistical inference. Crop growth and development were simulated with the Aqua

  13. Trial encoding algorithms ensemble.

    PubMed

    Cheng, Lipin Bill; Yeh, Ren Jye

    2013-01-01

    This paper proposes trial algorithms for some basic components in cryptography and lossless bit compression. The symmetric encryption is accomplished by mixing up randomizations and scrambling with hashing of the key playing an essential role. The digital signature is adapted from the Hill cipher with the verification key matrices incorporating un-invertible parts to hide the signature matrix. The hash is a straight running summation (addition chain) of data bytes plus some randomization. One simplified version can be burst error correcting code. The lossless bit compressor is the Shannon-Fano coding that is less optimal than the later Huffman and Arithmetic coding, but can be conveniently implemented without the use of a tree structure and improvable with bytes concatenation. PMID:27057475

  14. Independent component ensemble of EEG for brain-computer interface.

    PubMed

    Chuang, Chun-Hsiang; Ko, Li-Wei; Lin, Yuan-Pin; Jung, Tzyy-Ping; Lin, Chin-Teng

    2014-03-01

    Recently, successful applications of independent component analysis (ICA) to electroencephalographic (EEG) signals have yielded tremendous insights into brain processes that underlie human cognition. Many studies have further established the feasibility of using independent processes to elucidate human cognitive states. However, various technical problems arise in the building of an online brain-computer interface (BCI). These include the lack of an automatic procedure for selecting independent components of interest (ICi) and the potential risk of not obtaining a desired ICi. Therefore, this study proposes an ICi-ensemble method that uses multiple classifiers with ICA processing to improve upon existing algorithms. The mechanisms that are used in this ensemble system include: 1) automatic ICi selection; 2) extraction of features of the resultant ICi; 3) the construction of parallel pipelines for effectively training multiple classifiers; and a 4) simple process that combines the multiple decisions. The proposed ICi-ensemble is demonstrated in a typical BCI application, which is the monitoring of participants' cognitive states in a realistic sustained-attention driving task. The results reveal that the proposed ICi-ensemble outperformed the previous method using a single ICi with  ∼ 7% (91.6% versus 84.3%) in the cognitive state classification. Additionally, the proposed ICi-ensemble method that characterizes the EEG dynamics of multiple brain areas favors the application of BCI in natural environments. PMID:24608683

  15. Gradient Flow and Scale Setting on MILC HISQ Ensembles

    DOE PAGESBeta

    Bazavov, A.; Bernard, C.; Brown, N.; Komijani, J.; DeTar, C.; Foley, J.; Levkova, L.; Gottlieb, Steven; Heller, U. M.; Laiho, J.; et al

    2016-05-25

    We report on a scale determination with gradient-flow techniques on the Nf = 2 + 1 + 1 HISQ ensembles generated by the MILC collaboration. The ensembles include four lattice spacings, ranging from approximately 0.15 to 0.06 fm, and both physical and unphysical values of the quark masses. The scales p √t0/a and w0/a and their tree-level improvements,√t0;imp and w0;imp, are computed on each ensemble using Symanzik ow and the cloverleaf definition of the energy density E. Using a combination of continuum chiral perturbation theory and a Taylor-series ansatz for the lattice-spacing and strong-coupling dependence, the results are simultaneously extrapolatedmore » to the continuum and interpolated to physical quark masses. We also determine the scales p t0 = 0:1416(+8-5) fm and w0 = 0:1717(+12-11) fm, where the errors are sums, in quadrature, of statistical and all systematic errors. The precision of w0 and √t0 is comparable to or more precise than the best previous estimates, respectively. We also find the continuum mass-dependence of w0 that will be useful for estimating the scales of other ensembles. Furthermore, we estimate the integrated autocorrelation length of . For long flow times, the autocorrelation length of appears to be comparable to or smaller than that of the topological charge.« less

  16. Reduction of predictive uncertainty in estimating irrigation water requirement through multi-model ensembles and ensemble averaging

    NASA Astrophysics Data System (ADS)

    Multsch, S.; Exbrayat, J.-F.; Kirby, M.; Viney, N. R.; Frede, H.-G.; Breuer, L.

    2015-04-01

    Irrigation agriculture plays an increasingly important role in food supply. Many evapotranspiration models are used today to estimate the water demand for irrigation. They consider different stages of crop growth by empirical crop coefficients to adapt evapotranspiration throughout the vegetation period. We investigate the importance of the model structural versus model parametric uncertainty for irrigation simulations by considering six evapotranspiration models and five crop coefficient sets to estimate irrigation water requirements for growing wheat in the Murray-Darling Basin, Australia. The study is carried out using the spatial decision support system SPARE:WATER. We find that structural model uncertainty among reference ET is far more important than model parametric uncertainty introduced by crop coefficients. These crop coefficients are used to estimate irrigation water requirement following the single crop coefficient approach. Using the reliability ensemble averaging (REA) technique, we are able to reduce the overall predictive model uncertainty by more than 10%. The exceedance probability curve of irrigation water requirements shows that a certain threshold, e.g. an irrigation water limit due to water right of 400 mm, would be less frequently exceeded in case of the REA ensemble average (45%) in comparison to the equally weighted ensemble average (66%). We conclude that multi-model ensemble predictions and sophisticated model averaging techniques are helpful in predicting irrigation demand and provide relevant information for decision making.

  17. Reduction of predictive uncertainty in estimating irrigation water requirement through multi-model ensembles and ensemble averaging

    NASA Astrophysics Data System (ADS)

    Multsch, S.; Exbrayat, J.-F.; Kirby, M.; Viney, N. R.; Frede, H.-G.; Breuer, L.

    2014-11-01

    Irrigation agriculture plays an increasingly important role in food supply. Many evapotranspiration models are used today to estimate the water demand for irrigation. They consider different stages of crop growth by empirical crop coefficients to adapt evapotranspiration throughout the vegetation period. We investigate the importance of the model structural vs. model parametric uncertainty for irrigation simulations by considering six evapotranspiration models and five crop coefficient sets to estimate irrigation water requirements for growing wheat in the Murray-Darling Basin, Australia. The study is carried out using the spatial decision support system SPARE:WATER. We find that structural model uncertainty is far more important than model parametric uncertainty to estimate irrigation water requirement. Using the Reliability Ensemble Averaging (REA) technique, we are able to reduce the overall predictive model uncertainty by more than 10%. The exceedance probability curve of irrigation water requirements shows that a certain threshold, e.g. an irrigation water limit due to water right of 400 mm, would be less frequently exceeded in case of the REA ensemble average (45%) in comparison to the equally weighted ensemble average (66%). We conclude that multi-model ensemble predictions and sophisticated model averaging techniques are helpful in predicting irrigation demand and provide relevant information for decision making.

  18. CME Ensemble Forecasting - A Primer

    NASA Astrophysics Data System (ADS)

    Pizzo, V. J.; de Koning, C. A.; Cash, M. D.; Millward, G. H.; Biesecker, D. A.; Codrescu, M.; Puga, L.; Odstrcil, D.

    2014-12-01

    SWPC has been evaluating various approaches for ensemble forecasting of Earth-directed CMEs. We have developed the software infrastructure needed to support broad-ranging CME ensemble modeling, including composing, interpreting, and making intelligent use of ensemble simulations. The first step is to determine whether the physics of the interplanetary propagation of CMEs is better described as chaotic (like terrestrial weather) or deterministic (as in tsunami propagation). This is important, since different ensemble strategies are to be pursued under the two scenarios. We present the findings of a comprehensive study of CME ensembles in uniform and structured backgrounds that reveals systematic relationships between input cone parameters and ambient flow states and resulting transit times and velocity/density amplitudes at Earth. These results clearly indicate that the propagation of single CMEs to 1 AU is a deterministic process. Thus, the accuracy with which one can forecast the gross properties (such as arrival time) of CMEs at 1 AU is determined primarily by the accuracy of the inputs. This is no tautology - it means specifically that efforts to improve forecast accuracy should focus upon obtaining better inputs, as opposed to developing better propagation models. In a companion paper (deKoning et al., this conference), we compare in situ solar wind data with forecast events in the SWPC operational archive to show how the qualitative and quantitative findings presented here are entirely consistent with the observations and may lead to improved forecasts of arrival time at Earth.

  19. Ensemble algorithms in reinforcement learning.

    PubMed

    Wiering, Marco A; van Hasselt, Hado

    2008-08-01

    This paper describes several ensemble methods that combine multiple different reinforcement learning (RL) algorithms in a single agent. The aim is to enhance learning speed and final performance by combining the chosen actions or action probabilities of different RL algorithms. We designed and implemented four different ensemble methods combining the following five different RL algorithms: Q-learning, Sarsa, actor-critic (AC), QV-learning, and AC learning automaton. The intuitively designed ensemble methods, namely, majority voting (MV), rank voting, Boltzmann multiplication (BM), and Boltzmann addition, combine the policies derived from the value functions of the different RL algorithms, in contrast to previous work where ensemble methods have been used in RL for representing and learning a single value function. We show experiments on five maze problems of varying complexity; the first problem is simple, but the other four maze tasks are of a dynamic or partially observable nature. The results indicate that the BM and MV ensembles significantly outperform the single RL algorithms. PMID:18632380

  20. Macroscale intraspecific variation and environmental heterogeneity: analysis of cold and warm zone abundance, mortality, and regeneration distributions of four eastern US tree species.

    PubMed

    Prasad, Anantha M

    2015-11-01

    I test for macroscale intraspecific variation of abundance, mortality, and regeneration of four eastern US tree species (Tsuga canadensis,Betula lenta,Liriodendron tulipifera, and Quercus prinus) by splitting them into three climatic zones based on plant hardiness zones (PHZs). The primary goals of the analysis are to assess the differences in environmental heterogeneity and demographic responses among climatic zones, map regional species groups based on decision tree rules, and evaluate univariate and multivariate patterns of species demography with respect to environmental variables. I use the Forest Inventory Analysis (FIA) data to derive abundance, mortality, and regeneration indices and split the range into three climatic zones based on USDA PHZs: (1) cold adapted, leading region; (2) middle, well-adapted region; and (3) warm adapted, trailing region. I employ decision tree ensemble methods to assess the importance of environmental predictors on the abundance of the species between the cold and warm zones and map zonal variations in species groups. Multivariate regression trees are used to simultaneously explore abundance, mortality, and regeneration in tandem to assess species vulnerability. Analyses point to the relative importance of climate in the warm adapted, trailing zone (especially moisture) compared to the cold adapted, leading zone. Higher mortality and lower regeneration patterns in the warm trailing zone point to its vulnerability to growing season temperature and precipitation changes that could figure more prominently in the future. This study highlights the need to account for intraspecific variation of demography in order to understand environmental heterogeneity and differential adaptation. It provides a methodology for assessing the vulnerability of tree species by delineating climatic zones based on easily available PHZ data, and FIA derived abundance, mortality, and regeneration indices as a proxy for overall growth and fitness. Based on

  1. Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms

    NASA Astrophysics Data System (ADS)

    Erdal, Halil Ibrahim; Karakurt, Onur

    2013-01-01

    SummaryStreamflow forecasting is one of the most important steps in the water resources planning and management. Ensemble techniques such as bagging, boosting and stacking have gained popularity in hydrological forecasting in the recent years. The study investigates the potential usage of two ensemble learning paradigms (i.e., bagging; stochastic gradient boosting) in building classification and regression trees (CARTs) ensembles to advance the streamflow prediction accuracy. The study, initially, investigates the use of classification and regression trees for monthly streamflow forecasting and employs a support vector regression (SVR) model as the benchmark model. The analytic results indicate that CART outperforms SVR in both training and testing phases. Although the obtained results of CART model in training phase are considerable, it is not in testing phase. Thus, to optimize the prediction accuracy of CART for monthly streamflow forecasting, we incorporate bagging and stochastic gradient boosting which are rooted in same philosophy, advancing the prediction accuracy of weak learners. Comparing with the results of bagged regression trees (BRTs) and stochastic gradient boosted regression trees (GBRTs) models possess satisfactory monthly streamflow forecasting performance than CART and SVR models. Overall, it is found that ensemble learning paradigms can remarkably advance the prediction accuracy of CART models in monthly streamflow forecasting.

  2. Doubly robust survival trees.

    PubMed

    Steingrimsson, Jon Arni; Diao, Liqun; Molinaro, Annette M; Strawderman, Robert L

    2016-09-10

    Estimating a patient's mortality risk is important in making treatment decisions. Survival trees are a useful tool and employ recursive partitioning to separate patients into different risk groups. Existing 'loss based' recursive partitioning procedures that would be used in the absence of censoring have previously been extended to the setting of right censored outcomes using inverse probability censoring weighted estimators of loss functions. In this paper, we propose new 'doubly robust' extensions of these loss estimators motivated by semiparametric efficiency theory for missing data that better utilize available data. Simulations and a data analysis demonstrate strong performance of the doubly robust survival trees compared with previously used methods. Copyright © 2016 John Wiley & Sons, Ltd. PMID:27037609

  3. Application Bayesian Model Averaging method for ensemble system for Poland

    NASA Astrophysics Data System (ADS)

    Guzikowski, Jakub; Czerwinska, Agnieszka

    2014-05-01

    The aim of the project is to evaluate methods for generating numerical ensemble weather prediction using a meteorological data from The Weather Research & Forecasting Model and calibrating this data by means of Bayesian Model Averaging (WRF BMA) approach. We are constructing height resolution short range ensemble forecasts using meteorological data (temperature) generated by nine WRF's models. WRF models have 35 vertical levels and 2.5 km x 2.5 km horizontal resolution. The main emphasis is that the used ensemble members has a different parameterization of the physical phenomena occurring in the boundary layer. To calibrate an ensemble forecast we use Bayesian Model Averaging (BMA) approach. The BMA predictive Probability Density Function (PDF) is a weighted average of predictive PDFs associated with each individual ensemble member, with weights that reflect the member's relative skill. For test we chose a case with heat wave and convective weather conditions in Poland area from 23th July to 1st August 2013. From 23th July to 29th July 2013 temperature oscillated below or above 30 Celsius degree in many meteorology stations and new temperature records were added. During this time the growth of the hospitalized patients with cardiovascular system problems was registered. On 29th July 2013 an advection of moist tropical air masses was recorded in the area of Poland causes strong convection event with mesoscale convection system (MCS). MCS caused local flooding, damage to the transport infrastructure, destroyed buildings, trees and injuries and direct threat of life. Comparison of the meteorological data from ensemble system with the data recorded on 74 weather stations localized in Poland is made. We prepare a set of the model - observations pairs. Then, the obtained data from single ensemble members and median from WRF BMA system are evaluated on the basis of the deterministic statistical error Root Mean Square Error (RMSE), Mean Absolute Error (MAE). To evaluation

  4. Greenhouse trees

    SciTech Connect

    Hanover, J.W.; Hart, J.W.

    1980-05-09

    Michigan State University has been conducting research on growth control of woody plants with emphasis on commercial plantations. The objective was to develop the optimum levels for the major factors that affect tree seedling growth and development so that high quality plants can be produced for a specific use. This article describes the accelerated-optimal-growth (AOG) concept, describes precautions to take in its application, and shows ways to maximize the potential of AOG for producing ornamental trees. Factors considered were container growing system; protective culture including light, temperature, mineral nutrients, water, carbon dioxide, growth regulators, mycorrhizae, growing media, competition, and pests; size of seedlings; and acclamation. 1 table. (DP)

  5. Modeling of stage-discharge relationship for Gharraf River, southern Iraq using backpropagation artificial neural networks, M5 decision trees, and Takagi-Sugeno inference system technique: a comparative study

    NASA Astrophysics Data System (ADS)

    Al-Abadi, Alaa M.

    2014-12-01

    The potential of using three different data-driven techniques namely, multilayer perceptron with backpropagation artificial neural network (MLP), M5 decision tree model, and Takagi-Sugeno (TS) inference system for mimic stage-discharge relationship at Gharraf River system, southern Iraq has been investigated and discussed in this study. The study used the available stage and discharge data for predicting discharge using different combinations of stage, antecedent stages, and antecedent discharge values. The models' results were compared using root mean squared error (RMSE) and coefficient of determination (R 2) error statistics. The results of the comparison in testing stage reveal that M5 and Takagi-Sugeno techniques have certain advantages for setting up stage-discharge than multilayer perceptron artificial neural network. Although the performance of TS inference system was very close to that for M5 model in terms of R 2, the M5 method has the lowest RMSE (8.10 m3/s). The study implies that both M5 and TS inference systems are promising tool for identifying stage-discharge relationship in the study area.

  6. Audubon Tree Study Program.

    ERIC Educational Resources Information Center

    National Audubon Society, New York, NY.

    Included are an illustrated student reader, "The Story of Trees," a leaders' guide, and a large tree chart with 37 colored pictures. The student reader reviews several aspects of trees: a definition of a tree; where and how trees grow; flowers, pollination and seed production; how trees make their food; how to recognize trees; seasonal changes;…

  7. Visualizing phylogenetic trees using TreeView.

    PubMed

    Page, Roderic D M

    2002-08-01

    TreeView provides a simple way to view the phylogenetic trees produced by a range of programs, such as PAUP*, PHYLIP, TREE-PUZZLE, and ClustalX. While some phylogenetic programs (such as the Macintosh version of PAUP*) have excellent tree printing facilities, many programs do not have the ability to generate publication quality trees. TreeView addresses this need. The program can read and write a range of tree file formats, display trees in a variety of styles, print trees, and save the tree as a graphic file. Protocols in this unit cover both displaying and printing a tree. Support protocols describe how to download and install TreeView, and how to display bootstrap values in trees generated by ClustalX and PAUP*. PMID:18792942

  8. Using ensembles in water management: forecasting dry and wet episodes

    NASA Astrophysics Data System (ADS)

    van het Schip-Haverkamp, Tessa; van den Berg, Wim; van de Beek, Remco

    2015-04-01

    Extreme weather situations as droughts and extensive precipitation are becoming more frequent, which makes it more important to obtain accurate weather forecasts for the short and long term. Ensembles can provide a solution in terms of scenario forecasts. MeteoGroup uses ensembles in a new forecasting technique which presents a number of weather scenarios for a dynamical water management project, called Water-Rijk, in which water storage and water retention plays a large role. The Water-Rijk is part of Park Lingezegen, which is located between Arnhem and Nijmegen in the Netherlands. In collaboration with the University of Wageningen, Alterra and Eijkelkamp a forecasting system is developed for this area which can provide water boards with a number of weather and hydrology scenarios in order to assist in the decision whether or not water retention or water storage is necessary in the near future. In order to make a forecast for drought and extensive precipitation, the difference 'precipitation- evaporation' is used as a measurement of drought in the weather forecasts. In case of an upcoming drought this difference will take larger negative values. In case of a wet episode, this difference will be positive. The Makkink potential evaporation is used which gives the most accurate potential evaporation values during the summer, when evaporation plays an important role in the availability of surface water. Scenarios are determined by reducing the large number of forecasts in the ensemble to a number of averaged members with each its own likelihood of occurrence. For the Water-Rijk project 5 scenario forecasts are calculated: extreme dry, dry, normal, wet and extreme wet. These scenarios are constructed for two forecasting periods, each using its own ensemble technique: up to 48 hours ahead and up to 15 days ahead. The 48-hour forecast uses an ensemble constructed from forecasts of multiple high-resolution regional models: UKMO's Euro4 model,the ECMWF model, WRF and

  9. The ensemble nature of allostery

    PubMed Central

    Motlagh, Hesam N.; Wrabl, James O.; Li, Jing; Hilser, Vincent J.

    2014-01-01

    Allostery is the process by which biological macromolecules (mostly proteins) transmit the effect of binding at one site to another, often distal, functional site, allowing for regulation of activity. Recent experimental observations demonstrating that allostery can be facilitated by dynamic and intrinsically disordered proteins have resulted in a new paradigm for understanding allosteric mechanisms, which focuses on the conformational ensemble and the statistical nature of the interactions responsible for the transmission of information. Analysis of allosteric ensembles reveals a rich spectrum of regulatory strategies, as well as a framework to unify the description of allosteric mechanisms from different systems. PMID:24740064

  10. The Ensembl Variant Effect Predictor.

    PubMed

    McLaren, William; Gil, Laurent; Hunt, Sarah E; Riat, Harpreet Singh; Ritchie, Graham R S; Thormann, Anja; Flicek, Paul; Cunningham, Fiona

    2016-01-01

    The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs. PMID:27268795

  11. BgN-Score and BsN-Score: Bagging and boosting based ensemble neural networks scoring functions for accurate binding affinity prediction of protein-ligand complexes

    PubMed Central

    2015-01-01

    Background Accurately predicting the binding affinities of large sets of protein-ligand complexes is a key challenge in computational biomolecular science, with applications in drug discovery, chemical biology, and structural biology. Since a scoring function (SF) is used to score, rank, and identify drug leads, the fidelity with which it predicts the affinity of a ligand candidate for a protein's binding site has a significant bearing on the accuracy of virtual screening. Despite intense efforts in developing conventional SFs, which are either force-field based, knowledge-based, or empirical, their limited predictive power has been a major roadblock toward cost-effective drug discovery. Therefore, in this work, we present novel SFs employing a large ensemble of neural networks (NN) in conjunction with a diverse set of physicochemical and geometrical features characterizing protein-ligand complexes to predict binding affinity. Results We assess the scoring accuracies of two new ensemble NN SFs based on bagging (BgN-Score) and boosting (BsN-Score), as well as those of conventional SFs in the context of the 2007 PDBbind benchmark that encompasses a diverse set of high-quality protein families. We find that BgN-Score and BsN-Score have more than 25% better Pearson's correlation coefficient (0.804 and 0.816 vs. 0.644) between predicted and measured binding affinities compared to that achieved by a state-of-the-art conventional SF. In addition, these ensemble NN SFs are also at least 19% more accurate (0.804 and 0.816 vs. 0.675) than SFs based on a single neural network that has been traditionally used in drug discovery applications. We further find that ensemble models based on NNs surpass SFs based on the decision-tree ensemble technique Random Forests. Conclusions Ensemble neural networks SFs, BgN-Score and BsN-Score, are the most accurate in predicting binding affinity of protein-ligand complexes among the considered SFs. Moreover, their accuracies are even higher

  12. Numerical weather prediction model tuning via ensemble prediction system

    NASA Astrophysics Data System (ADS)

    Jarvinen, H.; Laine, M.; Ollinaho, P.; Solonen, A.; Haario, H.

    2011-12-01

    This paper discusses a novel approach to tune predictive skill of numerical weather prediction (NWP) models. NWP models contain tunable parameters which appear in parameterizations schemes of sub-grid scale physical processes. Currently, numerical values of these parameters are specified manually. In a recent dual manuscript (QJRMS, revised) we developed a new concept and method for on-line estimation of the NWP model parameters. The EPPES ("Ensemble prediction and parameter estimation system") method requires only minimal changes to the existing operational ensemble prediction infra-structure and it seems very cost-effective because practically no new computations are introduced. The approach provides an algorithmic decision making tool for model parameter optimization in operational NWP. In EPPES, statistical inference about the NWP model tunable parameters is made by (i) generating each member of the ensemble of predictions using different model parameter values, drawn from a proposal distribution, and (ii) feeding-back the relative merits of the parameter values to the proposal distribution, based on evaluation of a suitable likelihood function against verifying observations. In the presentation, the method is first illustrated in low-order numerical tests using a stochastic version of the Lorenz-95 model which effectively emulates the principal features of ensemble prediction systems. The EPPES method correctly detects the unknown and wrongly specified parameters values, and leads to an improved forecast skill. Second, results with an atmospheric general circulation model based ensemble prediction system show that the NWP model tuning capacity of EPPES scales up to realistic models and ensemble prediction systems. Finally, a global top-end NWP model tuning exercise with preliminary results is published.

  13. The assisted prediction modelling frame with hybridisation and ensemble for business risk forecasting and an implementation

    NASA Astrophysics Data System (ADS)

    Li, Hui; Hong, Lu-Yao; Zhou, Qing; Yu, Hai-Jie

    2015-08-01

    The business failure of numerous companies results in financial crises. The high social costs associated with such crises have made people to search for effective tools for business risk prediction, among which, support vector machine is very effective. Several modelling means, including single-technique modelling, hybrid modelling, and ensemble modelling, have been suggested in forecasting business risk with support vector machine. However, existing literature seldom focuses on the general modelling frame for business risk prediction, and seldom investigates performance differences among different modelling means. We reviewed researches on forecasting business risk with support vector machine, proposed the general assisted prediction modelling frame with hybridisation and ensemble (APMF-WHAE), and finally, investigated the use of principal components analysis, support vector machine, random sampling, and group decision, under the general frame in forecasting business risk. Under the APMF-WHAE frame with support vector machine as the base predictive model, four specific predictive models were produced, namely, pure support vector machine, a hybrid support vector machine involved with principal components analysis, a support vector machine ensemble involved with random sampling and group decision, and an ensemble of hybrid support vector machine using group decision to integrate various hybrid support vector machines on variables produced from principle components analysis and samples from random sampling. The experimental results indicate that hybrid support vector machine and ensemble of hybrid support vector machines were able to produce dominating performance than pure support vector machine and support vector machine ensemble.

  14. An Effective and Novel Neural Network Ensemble for Shift Pattern Detection in Control Charts

    PubMed Central

    Barghash, Mahmoud

    2015-01-01

    Pattern recognition in control charts is critical to make a balance between discovering faults as early as possible and reducing the number of false alarms. This work is devoted to designing a multistage neural network ensemble that achieves this balance which reduces rework and scrape without reducing productivity. The ensemble under focus is composed of a series of neural network stages and a series of decision points. Initially, this work compared using multidecision points and single-decision point on the performance of the ANN which showed that multidecision points are highly preferable to single-decision points. This work also tested the effect of population percentages on the ANN and used this to optimize the ANN's performance. Also this work used optimized and nonoptimized ANNs in an ensemble and proved that using nonoptimized ANN may reduce the performance of the ensemble. The ensemble that used only optimized ANNs has improved performance over individual ANNs and three-sigma level rule. In that respect using the designed ensemble can help in reducing the number of false stops and increasing productivity. It also can be used to discover even small shifts in the mean as early as possible. PMID:26339235

  15. Contour Boxplots: A Method for Characterizing Uncertainty in Feature Sets from Simulation Ensembles

    NASA Astrophysics Data System (ADS)

    Whitaker, Ross; Mirzargar, Mahsa; Kirby, Robert

    2014-05-01

    Researchers, analysts and decision makers are not only interested in understanding their data but also interested in understanding the uncertainty present in the data. With an increase in the complexity and dimensionality of data, visualization has become an integral and essential part of data analysis, while uncertainty visualization techniques are specifically designed to facilitate the communication of uncertain information. Among various uncertainty visualization techniques, ensemble visualization is of great interest in application as often times, modeling is not able to capture the through behavior of the phenomenon under study. Hence, ensembles are used to convey the uncertainty of the model output. Deriving robust statistical information and visualizing the variability present in an ensemble is a challenging task, especially if the quantities of interest are features of the data such as isocontours. Contour boxplot, as a generalization of conventional univariate boxplot, was proposed as an ensemble visualization scheme to study the variability between ensemble members of isocontours while preserving the main features shared among the members. Contour boxplot provides descriptive information about the ensemble based on order statistics of the members such as, most representative ensemble member (median) and potential outliers. The non-parametric nature and robustness of the order statistics makes contour boxplot an advantageous approach to present and study uncertainty among ensemble of contours in various applications ranging from weather forecasting to geoscience.

  16. An evaluation of the Canadian global meteorological ensemble prediction system for short-term hydrological forecasting

    NASA Astrophysics Data System (ADS)

    Velázquez, J. A.; Petit, T.; Lavoie, A.; Boucher, M.-A.; Turcotte, R.; Fortin, V.; Anctil, F.

    2009-11-01

    Hydrological forecasting consists in the assessment of future streamflow. Current deterministic forecasts do not give any information concerning the uncertainty, which might be limiting in a decision-making process. Ensemble forecasts are expected to fill this gap. In July 2007, the Meteorological Service of Canada has improved its ensemble prediction system, which has been operational since 1998. It uses the GEM model to generate a 20-member ensemble on a 100 km grid, at mid-latitudes. This improved system is used for the first time for hydrological ensemble predictions. Five watersheds in Quebec (Canada) are studied: Chaudière, Châteauguay, Du Nord, Kénogami and Du Lièvre. An interesting 17-day rainfall event has been selected in October 2007. Forecasts are produced in a 3 h time step for a 3-day forecast horizon. The deterministic forecast is also available and it is compared with the ensemble ones. In order to correct the bias of the ensemble, an updating procedure has been applied to the output data. Results showed that ensemble forecasts are more skilful than the deterministic ones, as measured by the Continuous Ranked Probability Score (CRPS), especially for 72 h forecasts. However, the hydrological ensemble forecasts are under dispersed: a situation that improves with the increasing length of the prediction horizons. We conjecture that this is due in part to the fact that uncertainty in the initial conditions of the hydrological model is not taken into account.

  17. An evaluation of the canadian global meteorological ensemble prediction system for short-term hydrological forecasting

    NASA Astrophysics Data System (ADS)

    Velázquez, J. A.; Petit, T.; Lavoie, A.; Boucher, M.-A.; Turcotte, R.; Fortin, V.; Anctil, F.

    2009-07-01

    Hydrological forecasting consists in the assessment of future streamflow. Current deterministic forecasts do not give any information concerning the uncertainty, which might be limiting in a decision-making process. Ensemble forecasts are expected to fill this gap. In July 2007, the Meteorological Service of Canada has improved its ensemble prediction system, which has been operational since 1998. It uses the GEM model to generate a 20-member ensemble on a 100 km grid, at mid-latitudes. This improved system is used for the first time for hydrological ensemble predictions. Five watersheds in Quebec (Canada) are studied: Chaudière, Châteauguay, Du Nord, Kénogami and Du Lièvre. An interesting 17-day rainfall event has been selected in October 2007. Forecasts are produced in a 3 h time step for a 3-day forecast horizon. The deterministic forecast is also available and it is compared with the ensemble ones. In order to correct the bias of the ensemble, an updating procedure has been applied to the output data. Results showed that ensemble forecasts are more skilful than the deterministic ones, as measured by the Continuous Ranked Probability Score (CRPS), especially for 72 h forecasts. However, the hydrological ensemble forecasts are under dispersed: a situation that improves with the increasing length of the prediction horizons. We conjecture that this is due in part to the fact that uncertainty in the initial conditions of the hydrological model is not taken into account.

  18. African Drum and Steel Pan Ensembles.

    ERIC Educational Resources Information Center

    Sunkett, Mark E.

    2000-01-01

    Discusses how to develop both African drum and steel pan ensembles providing information on teacher preparation, instrument choice, beginning the ensemble, and lesson planning. Includes additional information for the drum ensembles. Lists references and instructional materials, sources of drums and pans, and common note layout/range for steel pan…

  19. Ensembl genomes 2016: more genomes, more complexity

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent...

  20. Technical Tree Climbing.

    ERIC Educational Resources Information Center

    Jenkins, Peter

    Tree climbing offers a safe, inexpensive adventure sport that can be performed almost anywhere. Using standard procedures practiced in tree surgery or rock climbing, almost any tree can be climbed. Tree climbing provides challenge and adventure as well as a vigorous upper-body workout. Tree Climbers International classifies trees using a system…

  1. Short-term optimal operation of water systems using ensemble forecasts

    NASA Astrophysics Data System (ADS)

    Raso, L.; Schwanenberg, D.; van de Giesen, N. C.; van Overloop, P. J.

    2014-09-01

    Short-term water system operation can be realized using Model Predictive Control (MPC). MPC is a method for operational management of complex dynamic systems. Applied to open water systems, MPC provides integrated, optimal, and proactive management, when forecasts are available. Notwithstanding these properties, if forecast uncertainty is not properly taken into account, the system performance can critically deteriorate. Ensemble forecast is a way to represent short-term forecast uncertainty. An ensemble forecast is a set of possible future trajectories of a meteorological or hydrological system. The growing ensemble forecasts’ availability and accuracy raises the question on how to use them for operational management. The theoretical innovation presented here is the use of ensemble forecasts for optimal operation. Specifically, we introduce a tree based approach. We called the new method Tree-Based Model Predictive Control (TB-MPC). In TB-MPC, a tree is used to set up a Multistage Stochastic Programming, which finds a different optimal strategy for each branch and enhances the adaptivity to forecast uncertainty. Adaptivity reduces the sensitivity to wrong forecasts and improves the operational performance. TB-MPC is applied to the operational management of Salto Grande reservoir, located at the border between Argentina and Uruguay, and compared to other methods.

  2. Translating Ensemble Weather Forecasts into Probabilistic User-Relevant Information

    NASA Astrophysics Data System (ADS)

    Steiner, Matthias; Sharman, Robert; Hopson, Thomas; Liu, Yubao; Chapman, Michael

    2010-05-01

    Weather-related decisions increasingly rely on probabilistic information as a means of assessing the risk of one potential outcome over another. Ensemble forecasting presents one of the key approaches trying to grasp the uncertainty of weather forecasting. Moreover, in the future decision makers will rely on tools that fully integrate weather information into the decision making process. Through these decision support tools, weather information will be translated into impact information. This presentation will highlight the translation of gridded ensemble weather forecasts into probabilistic user-relevant information. Examples will be discussed that relate to the management of air traffic, noise and pollution dispersion, missile trajectory prediction, water resources and flooding, wind energy production, and road maintenance. The primary take-home message from these examples will be that weather forecasts have to be tailored with a specific user perspective in mind rather than a "one fits all" approach, where a standard forecast product gets thrown over the fence and the user has to figure out what to do with it.

  3. Coupled ensemble flow line advection and analysis.

    PubMed

    Guo, Hanqi; Yuan, Xiaoru; Huang, Jian; Zhu, Xiaomin

    2013-12-01

    Ensemble run simulations are becoming increasingly widespread. In this work, we couple particle advection with pathline analysis to visualize and reveal the differences among the flow fields of ensemble runs. Our method first constructs a variation field using a Lagrangian-based distance metric. The variation field characterizes the variation between vector fields of the ensemble runs, by extracting and visualizing the variation of pathlines within ensemble. Parallelism in a MapReduce style is leveraged to handle data processing and computing at scale. Using our prototype system, we demonstrate how scientists can effectively explore and investigate differences within ensemble simulations. PMID:24051840

  4. State Ensembles and Quantum Entropy

    NASA Astrophysics Data System (ADS)

    Kak, Subhash

    2016-06-01

    This paper considers quantum communication involving an ensemble of states. Apart from the von Neumann entropy, it considers other measures one of which may be useful in obtaining information about an unknown pure state and another that may be useful in quantum games. It is shown that under certain conditions in a two-party quantum game, the receiver of the states can increase the entropy by adding another pure state.

  5. Statistical Ensemble of Large Eddy Simulations

    NASA Technical Reports Server (NTRS)

    Carati, Daniele; Rogers, Michael M.; Wray, Alan A.; Mansour, Nagi N. (Technical Monitor)

    2001-01-01

    A statistical ensemble of large eddy simulations (LES) is run simultaneously for the same flow. The information provided by the different large scale velocity fields is used to propose an ensemble averaged version of the dynamic model. This produces local model parameters that only depend on the statistical properties of the flow. An important property of the ensemble averaged dynamic procedure is that it does not require any spatial averaging and can thus be used in fully inhomogeneous flows. Also, the ensemble of LES's provides statistics of the large scale velocity that can be used for building new models for the subgrid-scale stress tensor. The ensemble averaged dynamic procedure has been implemented with various models for three flows: decaying isotropic turbulence, forced isotropic turbulence, and the time developing plane wake. It is found that the results are almost independent of the number of LES's in the statistical ensemble provided that the ensemble contains at least 16 realizations.

  6. Heat fluctuations and initial ensembles.

    PubMed

    Kim, Kwangmoo; Kwon, Chulan; Park, Hyunggyu

    2014-09-01

    Time-integrated quantities such as work and heat increase incessantly in time during nonequilibrium processes near steady states. In the long-time limit, the average values of work and heat become asymptotically equivalent to each other, since they only differ by a finite energy change in average. However, the fluctuation theorem (FT) for the heat is found not to hold with the equilibrium initial ensemble, while the FT for the work holds. This reveals an intriguing effect of everlasting initial memory stored in rare events. We revisit the problem of a Brownian particle in a harmonic potential dragged with a constant velocity, which is in contact with a thermal reservoir. The heat and work fluctuations are investigated with initial Boltzmann ensembles at temperatures generally different from the reservoir temperature. We find that, in the infinite-time limit, the FT for the work is fully recovered for arbitrary initial temperatures, while the heat fluctuations significantly deviate from the FT characteristics except for the infinite initial-temperature limit (a uniform initial ensemble). Furthermore, we succeed in calculating finite-time corrections to the heat and work distributions analytically, using the modified saddle point integral method recently developed by us. Interestingly, we find noncommutativity between the infinite-time limit and the infinite-initial-temperature limit for the probability distribution function (PDF) of the heat. PMID:25314405

  7. Ensemble learning incorporating uncertain registration.

    PubMed

    Simpson, Ivor J A; Woolrich, Mark W; Andersson, Jesper L R; Groves, Adrian R; Schnabel, Julia A

    2013-04-01

    This paper proposes a novel approach for improving the accuracy of statistical prediction methods in spatially normalized analysis. This is achieved by incorporating registration uncertainty into an ensemble learning scheme. A probabilistic registration method is used to estimate a distribution of probable mappings between subject and atlas space. This allows the estimation of the distribution of spatially normalized feature data, e.g., grey matter probability maps. From this distribution, samples are drawn for use as training examples. This allows the creation of multiple predictors, which are subsequently combined using an ensemble learning approach. Furthermore, extra testing samples can be generated to measure the uncertainty of prediction. This is applied to separating subjects with Alzheimer's disease from normal controls using a linear support vector machine on a region of interest in magnetic resonance images of the brain. We show that our proposed method leads to an improvement in discrimination using voxel-based morphometry and deformation tensor-based morphometry over bootstrap aggregating, a common ensemble learning framework. The proposed approach also generates more reasonable soft-classification predictions than bootstrap aggregating. We expect that this approach could be applied to other statistical prediction tasks where registration is important. PMID:23288332

  8. Dimensionality Reduction Through Classifier Ensembles

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter (Technical Monitor)

    1999-01-01

    In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.

  9. Using Bayesian Belief Networks and event trees for volcanic hazard assessment and decision support : reconstruction of past eruptions of La Soufrière volcano, Guadeloupe and retrospective analysis of 1975-77 unrest.

    NASA Astrophysics Data System (ADS)

    Komorowski, Jean-Christophe; Hincks, Thea; Sparks, Steve; Aspinall, Willy; Legendre, Yoann; Boudon, Georges

    2013-04-01

    the contemporary volcanological narrative, and demonstrates that a formal evidential case could have been made to support the authorities' concerns and decision to evacuate. Revisiting the circumstances of the 1976 crisis highlights many contemporary challenges of decision-making under conditions of volcanological uncertainty. We suggest the BBN concept is a suitable framework for marshalling multiple observations, model results and interpretations - and all associated uncertainties - in a methodical manner. Base-rate eruption probabilities for Guadeloupe can be updated now with a new chronology of activity suggesting that 10 major explosive phases and 9 dome-forming phases occurred in the last 9150 years, associated with ≥ 8 flank-collapses and ≥ 6-7 high-energy pyroclastic density currents (blasts). Eruptive recurrence, magnitude and intensity place quantitative constraints on La Soufrière's event tree to elaborate credible scenarios. The current unrest offers an opportunity to update the BBN model and explore the uncertainty on inferences about the system's internal state. This probabilistic formalism would provoke key questions relating to unrest evolution: 1) is the unrest hydrothermal or magmatic? 2) what controls dyke/intrusion arrest and hence failed-magmatic eruptions like 1976? 3) what conditions could lead to significant pressurization with potential for explosive activity and edifice instability, and what monitoring signs might be manifest?

  10. Hydrologic ensemble prediction experiment focuses on reliable forecasts

    NASA Astrophysics Data System (ADS)

    Franz, Kristie; Ajami, Newsha; Schaake, John; Buizza, Roberto

    The Hydrologic Ensemble Prediction Experiment (HEPEX), an effort involving meteorological and hydrological scientists from research, operational, and user communities around the globe, is building a research project focused on advancing probabilistic hydrologic forecasting.HEPEX was launched in March 2004 at a meeting hosted by the European Centre for Medium-Range Weather Forecasts (ECMWF), in Reading, United Kingdom http://www.ecmwf.int/newsevents/meetings/workshops/2004/HEPEX/). The goal of HEPEX is “to bring the international hydrological and meteorological communities together to demonstrate how to produce reliable hydrological ensemble forecasts that can be used with confidence by the emergency management and water resources sectors to make decisions that have important consequences for the economy, public health, and safety.”

  11. Ensemble Learning Approaches to Predicting Complications of Blood Transfusion

    PubMed Central

    Murphree, Dennis; Ngufor, Che; Upadhyaya, Sudhindra; Madde, Nagesh; Clifford, Leanne; Kor, Daryl J.; Pathak, Jyotishman

    2016-01-01

    Of the 21 million blood components transfused in the United States during 2011, approximately 1 in 414 resulted in complication [1]. Two complications in particular, transfusion-related acute lung injury (TRALI) and transfusion-associated circulatory overload (TACO), are especially concerning. These two alone accounted for 62% of reported transfusion-related fatalities in 2013 [2]. We have previously developed a set of machine learning base models for predicting the likelihood of these adverse reactions, with a goal towards better informing the clinician prior to a transfusion decision. Here we describe recent work incorporating ensemble learning approaches to predicting TACO/TRALI. In particular we describe combining base models via majority voting, stacking of model sets with varying diversity, as well as a resampling/boosting combination algorithm called RUSBoost. We find that while the performance of many models is very good, the ensemble models do not yield significantly better performance in terms of AUC. PMID:26737958

  12. Ensemble learning approaches to predicting complications of blood transfusion.

    PubMed

    Murphree, Dennis; Ngufor, Che; Upadhyaya, Sudhindra; Madde, Nagesh; Clifford, Leanne; Kor, Daryl J; Pathak, Jyotishman

    2015-08-01

    Of the 21 million blood components transfused in the United States during 2011, approximately 1 in 414 resulted in complication [1]. Two complications in particular, transfusion-related acute lung injury (TRALI) and transfusion-associated circulatory overload (TACO), are especially concerning. These two alone accounted for 62% of reported transfusion-related fatalities in 2013 [2]. We have previously developed a set of machine learning base models for predicting the likelihood of these adverse reactions, with a goal towards better informing the clinician prior to a transfusion decision. Here we describe recent work incorporating ensemble learning approaches to predicting TACO/TRALI. In particular we describe combining base models via majority voting, stacking of model sets with varying diversity, as well as a resampling/boosting combination algorithm called RUSBoost. We find that while the performance of many models is very good, the ensemble models do not yield significantly better performance in terms of AUC. PMID:26737958

  13. SRNL PARTICIPATION IN THE MULTI-SCALE ENSEMBLE EXERCISES

    SciTech Connect

    Buckley, R

    2007-10-29

    Consequence assessment during emergency response often requires atmospheric transport and dispersion modeling to guide decision making. A statistical analysis of the ensemble of results from several models is a useful way of estimating the uncertainty for a given forecast. ENSEMBLE is a European Union program that utilizes an internet-based system to ingest transport results from numerous modeling agencies. A recent set of exercises required output on three distinct spatial and temporal scales. The Savannah River National Laboratory (SRNL) uses a regional prognostic model nested within a larger-scale synoptic model to generate the meteorological conditions which are in turn used in a Lagrangian particle dispersion model. A discussion of SRNL participation in these exercises is given, with particular emphasis on requirements for provision of results in a timely manner with regard to the various spatial scales.

  14. Gradient flow and scale setting on MILC HISQ ensembles

    NASA Astrophysics Data System (ADS)

    Bazavov, A.; Bernard, C.; Brown, N.; Komijani, J.; DeTar, C.; Foley, J.; Levkova, L.; Gottlieb, Steven; Heller, U. M.; Laiho, J.; Sugar, R. L.; Toussaint, D.; Van de Water, R. S.; MILC Collaboration

    2016-05-01

    We report on a scale determination with gradient-flow techniques on the Nf=2 +1 +1 highly improved staggered quark ensembles generated by the MILC Collaboration. The ensembles include four lattice spacings, ranging from approximately 0.15 to 0.06 fm, and both physical and unphysical values of the quark masses. The scales √{t0}/a and w0/a and their tree-level improvements, √{t0 ,imp} and w0 ,imp, are computed on each ensemble using Symanzik flow and the cloverleaf definition of the energy density E . Using a combination of continuum chiral-perturbation theory and a Taylor-series ansatz for the lattice-spacing and strong-coupling dependence, the results are simultaneously extrapolated to the continuum and interpolated to physical quark masses. We determine the scales √{t0 }=0.1416 (+8/-5) fm and w0=0.1714 (+15/-12) fm , where the errors are sums, in quadrature, of statistical and all systematic errors. The precision of w0 and √{t0} is comparable to or more precise than the best previous estimates, respectively. We then find the continuum mass dependence of √{t0} and w0, which will be useful for estimating the scales of new ensembles. We also estimate the integrated autocorrelation length of ⟨E (t )⟩. For long flow times, the autocorrelation length of ⟨E ⟩ appears to be comparable to that of the topological charge.

  15. Gradient Flow and Scale Setting on MILC HISQ Ensembles

    SciTech Connect

    Bazavov, A.; Bernard, C.; Brown, N.; Komijani, J.; DeTar, C.; Foley, J.; Levkova, L.; Gottlieb, Steven; Heller, U. M.; Laiho, J.; Sugar, R. L.; Toussaint, D.; Van de Water, R. S.

    2015-03-25

    We report on a scale determination with gradient-ow techniques on the Nf = 2 + 1 + 1 HISQ ensembles generated by the MILC collaboration. The ensembles include four lattice spacings, ranging from approximately 0.15 to 0.06 fm, and both physical and unphysical values of the quark masses. The scales p √t0/a and w0/a and their tree-level improvements,√t0;imp and w0;imp, are computed on each ensemble using Symanzik ow and the cloverleaf de_nition of the energy density E. Using a combination of continuum chiral perturbation theory and a Taylor-series ansatz for the lattice-spacing and strong-coupling dependence, the results are simultaneously extrapolated to the continuum and interpolated to physical quark masses. We also determine the scales p t0 = 0:1416(+8-5) fm and w0 = 0:1717(+12-11) fm, where the errors are sums, in quadrature, of statistical and all systematic errors. The precision of w0 and √t0 is comparable to or more precise than the best previous estimates, respectively. We also find the continuum mass-dependence of w0 that will be useful for estimating the scales of other ensembles. Furthermore, we estimate the integrated autocorrelation length of . For long ow times, the autocorrelation length of appears to be comparable to or smaller than that of the topological charge.

  16. Navigating a Path Toward Operational, Short-term, Ensemble Based, Probablistic Streamflow Forecasts

    NASA Astrophysics Data System (ADS)

    Hartman, R. K.; Schaake, J.

    2004-12-01

    The National Weather Service (NWS) has federal responsibility for issuing public flood warnings in the United States. Additionally, the NWS has been engaged in longer range water resources forecasts for many years, particularly in the Western U.S. In the past twenty years, longer range forecasts have increasingly incorporated ensemble techniques. Ensemble techniques are attractive because they allow a great deal of flexibility, both temporally and in content. This technique also provides for the influence of additional forcings (i.e. ENSO), through either pre or post processing techniques. More recently, attention has turned to the use of ensemble techniques in the short-term streamflow forecasting process. While considerably more difficult, the development of reliable short-term probabilistic streamflow forecasts has clear application and value for many NWS customers and partners. During flood episodes, expensive mitigation actions are initialed or withheld and critical reservoir management decisions are made in the absence of uncertainty and risk information. Limited emergency services resources and the optimal use of water resources facilities necessitates the development of a risk-based decision making process. The development of reliable short-term probabilistic streamflow forecasts are an essential ingredient in the decision making process. This paper addresses the utility of short-term ensemble streamflow forecasts and the considerations that must be addressed as techniques and operational capabilities are developed. Verification and validation information are discussed from both a scientific and customer perspective. Education and training related to the interpretation and use of ensemble products are also addressed.

  17. A Modified Ensemble Framework for Drought Estimation

    NASA Astrophysics Data System (ADS)

    Alobaidi, M. H.; Marpu, P. R.; Ouarda, T.

    2014-12-01

    Drought estimation at ungauged sites is a difficult task due to various challenges such as scale and limited availability and information about hydrologic neighborhoods. Ensemble regression has been recently utilized in modeling various hydrologic systems and showed advantage over classical regression approaches to such studies. A challenging task in ensemble modeling is the proper training of the ensemble's individual learners and the ensemble combiners. In this work, an ensemble framework is proposed to enhance the generalization ability of the sub-ensemble models and its combiner. Information mixtures between the subsamples are introduced. Such measure is dedicated to the ensemble members and ensemble combiners. Controlled homogeneity magnitudes are then stimulated and induced in the proposed model via a two-stage resampling algorithm. Artificial neural networks (ANNs) were used as ensemble members in addition to different ensemble integration plans. The model provided superior results when compared to previous models applied to the case study in this work. The root mean squared error (RMSE) in the testing phase for the drought quantiles improved by 67% - 76%. The bias error (BIAS) also showed 61% - 95% improvement.

  18. The Tree Worker's Manual.

    ERIC Educational Resources Information Center

    Smithyman, S. J.

    This manual is designed to prepare students for entry-level positions as tree care professionals. Addressed in the individual chapters of the guide are the following topics: the tree service industry; clothing, eqiupment, and tools; tree workers; basic tree anatomy; techniques of pruning; procedures for climbing and working in the tree; aerial…

  19. Developing Climate-Informed Ensemble Streamflow Forecasts over the Colorado River Basin

    NASA Astrophysics Data System (ADS)

    Miller, W. P.; Lhotak, J.; Werner, K.; Stokes, M.

    2014-12-01

    As climate change is realized, the assumption of hydrometeorologic stationarity embedded within many hydrologic models is no longer valid over the Colorado River Basin. As such, resource managers have begun to request more information to support decisions, specifically with regards to the incorporation of climate change information and operational risk. To this end, ensemble methodologies have become increasingly popular among the scientific and forecasting communities, and resource managers have begun to incorporate this information into decision support tools and operational models. Over the Colorado River Basin, reservoir operations are determined, in large part, by forecasts issued by the Colorado Basin River Forecast Center (CBRFC). The CBRFC produces both single value and ensemble forecasts for use by resource managers in their operational decision-making process. These ensemble forecasts are currently driven by a combination of daily updating model states used as initial conditions and weather forecasts plus historical meteorological information used to generate forecasts with the assumption that past hydroclimatological conditions are representative of future hydroclimatology. Recent efforts have produced updated bias-corrected and spatially downscaled projections of future climate over the Colorado River Basin. In this study, the historical climatology used as input to the CBRFC forecast model is adjusted to represent future projections of climate based on data developed by the updated projections of future climate data. Ensemble streamflow forecasts reflecting the impacts of climate change are then developed. These forecasts are subsequently compared to non-informed ensemble streamflow forecasts to evaluate the changing range of streamflow forecasts and risk over the Colorado River Basin. Ensemble forecasts may be compared through the use of a reservoir operations planning model, providing resource managers with ensemble information regarding changing

  20. NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms

    PubMed Central

    Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

    2014-01-01

    One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available

  1. Ensemble modeling of CME propagation

    NASA Astrophysics Data System (ADS)

    Lee, C. O.; Arge, C. N.; Henney, C. J.; Odstrcil, D.; Millward, G. H.; Pizzo, V. J.

    2014-12-01

    The Wang-Sheeley-Arge(WSA)-Enlil-cone modeling system is used for making routine arrival time forecasts of the Earth-directed "halo" coronal mass ejections (CMEs), since they typically produce the most geoeffective events. A major objective of this work is to better understand the sensitivity of the WSA-Enlil modeling results to input model parameters and how these parameters contribute to the overall model uncertainty and performance. We present ensemble modeling results for a simple halo CME event that occurred on 15 February 2011 and a succession of three halo CME events that occurred on 2-4 August 2011. During this period the Solar TErrestrial RElations Observatory (STEREO) A and B spacecraft viewed the CMEs over the solar limb, thereby providing more reliable constraints on the initial CME geometries during the manual cone fitting process. To investigate the sensitivity of the modeled CME arrival times to small variations in the input cone properties, for each CME event we create an ensemble of numerical simulations based on multiple sets of cone parameters. We find that the accuracy of the modeled arrival times not only depends on the initial input CME geometry, but also on the reliable specification of the background solar wind, which is driven by the input maps of the photospheric magnetic field. As part of the modeling ensemble, we simulate the CME events using the traditional daily updated maps as well as those that are produced by the Air Force data Assimilative Photospheric flux Transport (ADAPT) model, which provide a more instantaneous snapshot of the photospheric field distribution. For the August 2011 events, in particular, we find that the accuracy in the arrival time predictions also depends on whether the cone parameters for all three CMEs are specified in a single WSA-Enlil simulation. The inclusion/exclusion of one or two of the preceding CMEs affects the solar wind conditions through which the succeeding CME propagates.

  2. Soil texture reclassification by an ensemble model

    NASA Astrophysics Data System (ADS)

    Cisty, Milan; Hlavcova, Kamila

    2015-04-01

    a prerequisite for solving some subsequent task, this bias is propagated to the subsequent modelling or other work. Therefore, for the sake of achieving more general and precise outputs while solving such tasks, the authors of the present paper are proposing a hybrid approach, which has the potential for obtaining improved results. Although the authors continue recommending the use of the mentioned parametric PSD models in the proposed methodology, the final prediction is made by an ensemble machine learning algorithm based on regression trees, the so-called Random Forest algorithm, which is built on top of the outputs of such models, which serves as an ensemble members. An improvement in precision was proved, and it is documented in the paper that the ensemble model worked better than any of its constituents. References Nemes, A., Wosten, J.H.M., Lilly, A., Voshaar, J.H.O.: Evaluation of different procedures to interpolate particle-size distributions to achieve compatibility within soil databases. Geoderma 90, 187- 202 (1999) Hwang, S.: Effect of texture on the performance of soil particle-size distribution models. Geoderma 123, 363-371 (2004) Botula, Y.D., Cornelis, W.M., Baert, G., Mafuka, P., Van Ranst, E.: Particle size distribution models for soils of the humid tropics. J Soils Sediments. 13, 686-698 (2013)

  3. Measuring social interaction in music ensembles.

    PubMed

    Volpe, Gualtiero; D'Ausilio, Alessandro; Badino, Leonardo; Camurri, Antonio; Fadiga, Luciano

    2016-05-01

    Music ensembles are an ideal test-bed for quantitative analysis of social interaction. Music is an inherently social activity, and music ensembles offer a broad variety of scenarios which are particularly suitable for investigation. Small ensembles, such as string quartets, are deemed a significant example of self-managed teams, where all musicians contribute equally to a task. In bigger ensembles, such as orchestras, the relationship between a leader (the conductor) and a group of followers (the musicians) clearly emerges. This paper presents an overview of recent research on social interaction in music ensembles with a particular focus on (i) studies from cognitive neuroscience; and (ii) studies adopting a computational approach for carrying out automatic quantitative analysis of ensemble music performances. PMID:27069054

  4. A Localized Ensemble Kalman Smoother

    NASA Technical Reports Server (NTRS)

    Butala, Mark D.

    2012-01-01

    Numerous geophysical inverse problems prove difficult because the available measurements are indirectly related to the underlying unknown dynamic state and the physics governing the system may involve imperfect models or unobserved parameters. Data assimilation addresses these difficulties by combining the measurements and physical knowledge. The main challenge in such problems usually involves their high dimensionality and the standard statistical methods prove computationally intractable. This paper develops and addresses the theoretical convergence of a new high-dimensional Monte-Carlo approach called the localized ensemble Kalman smoother.

  5. Ensemble averaging of acoustic data

    NASA Technical Reports Server (NTRS)

    Stefanski, P. K.

    1982-01-01

    A computer program called Ensemble Averaging of Acoustic Data is documented. The program samples analog data, analyzes the data, and displays them in the time and frequency domains. Hard copies of the displays are the program's output. The documentation includes a description of the program and detailed user instructions for the program. This software was developed for use on the Ames 40- by 80-Foot Wind Tunnel's Dynamic Analysis System consisting of a PDP-11/45 computer, two RK05 disk drives, a tektronix 611 keyboard/display terminal, and FPE-4 Fourier Processing Element, and an analog-to-digital converter.

  6. Potential and limitations of ensemble docking.

    PubMed

    Korb, Oliver; Olsson, Tjelvar S G; Bowden, Simon J; Hall, Richard J; Verdonk, Marcel L; Liebeschuetz, John W; Cole, Jason C

    2012-05-25

    A major problem in structure-based virtual screening applications is the appropriate selection of a single or even multiple protein structures to be used in the virtual screening process. A priori it is unknown which protein structure(s) will perform best in a virtual screening experiment. We investigated the performance of ensemble docking, as a function of ensemble size, for eight targets of pharmaceutical interest. Starting from single protein structure docking results, for each ensemble size up to 500,000 combinations of protein structures were generated, and, for each ensemble, pose prediction and virtual screening results were derived. Comparison of single to multiple protein structure results suggests improvements when looking at the performance of the worst and the average over all single protein structures to the performance of the worst and average over all protein ensembles of size two or greater, respectively. We identified several key factors affecting ensemble docking performance, including the sampling accuracy of the docking algorithm, the choice of the scoring function, and the similarity of database ligands to the cocrystallized ligands of ligand-bound protein structures in an ensemble. Due to these factors, the prospective selection of optimum ensembles is a challenging task, shown by a reassessment of published ensemble selection protocols. PMID:22482774

  7. Multi-Model Ensemble Wake Vortex Prediction

    NASA Technical Reports Server (NTRS)

    Koerner, Stephan; Holzaepfel, Frank; Ahmad, Nash'at N.

    2015-01-01

    Several multi-model ensemble methods are investigated for predicting wake vortex transport and decay. This study is a joint effort between National Aeronautics and Space Administration and Deutsches Zentrum fuer Luft- und Raumfahrt to develop a multi-model ensemble capability using their wake models. An overview of different multi-model ensemble methods and their feasibility for wake applications is presented. The methods include Reliability Ensemble Averaging, Bayesian Model Averaging, and Monte Carlo Simulations. The methodologies are evaluated using data from wake vortex field experiments.

  8. The ENSEMBLES Statistical Downscaling Portal

    NASA Astrophysics Data System (ADS)

    Cofino, Antonio S.; San-Martín, Daniel; Gutiérrez, Jose M.

    2010-05-01

    The demand for high-resolution seasonal and ACC predictions is continuously increasing due to the multiple end-user applications in a variety of sectors (hydrology, agronomy, energy, etc.) which require regional meteorological inputs. To fill the gap between the coarse-resolution grids used by global weather models and the regional needs of applications, a number of statistical downscaling techniques have been proposed. Statistical downscaling is a complex multi-disciplinary problem which requires a cascade of different scientific tools to access and process different sources of data, from GCM outputs to local observations and to run complex statistical algorithms. Thus, an end-to-end approach is needed in order to link the outputs of the ensemble prediction systems to a range of impact applications. To accomplish this task in an interactive and user-friendly form, a Web portal has been developed within the European ENSEMBLES project, integrating the necessary tools and providing the appropriate technology for distributed data access and computing. In this form, users can obtain their downscaled data testing and validating different statistical methods (from the categories weather typing, regression or weather generators) in a transparent form, not worrying about the details of the downscaling techniques and the data formats and access.

  9. Probabilistic Description of Stellar Ensembles

    NASA Astrophysics Data System (ADS)

    Cerviño, Miguel

    I describe the modeling of stellar ensembles in terms of probability distributions. This modeling is primary characterized by the number of stars included in the considered resolution element, whatever its physical (stellar cluster) or artificial (pixel/IFU) nature. It provides a solution of the direct problem of characterizing probabilistically the observables of stellar ensembles as a function of their physical properties. In addition, this characterization implies that intensive properties (like color indices) are intrinsically biased observables, although the bias decreases when the number of stars in the resolution element increases. In the case of a low number of stars in the resolution element (N<105), the distributions of intensive and extensive observables follow nontrivial probability distributions. Such a situation ​​​ can be computed by means of Monte Carlo simulations where data mining techniques would be applied. Regarding the inverse problem of obtaining physical parameters from observational data, I show how some of the scatter in the data provides valuable physical information since it is related to the system size (and the number of stars in the resolution element). However, making use of such ​​​ information requires following iterative procedures in the data analysis.

  10. Visualizing ensembles in structural biology.

    PubMed

    Melvin, Ryan L; Salsbury, Freddie R

    2016-06-01

    Displaying a single representative conformation of a biopolymer rather than an ensemble of states mistakenly conveys a static nature rather than the actual dynamic personality of biopolymers. However, there are few apparent options due to the fixed nature of print media. Here we suggest a standardized methodology for visually indicating the distribution width, standard deviation and uncertainty of ensembles of states with little loss of the visual simplicity of displaying a single representative conformation. Of particular note is that the visualization method employed clearly distinguishes between isotropic and anisotropic motion of polymer subunits. We also apply this method to ligand binding, suggesting a way to indicate the expected error in many high throughput docking programs when visualizing the structural spread of the output. We provide several examples in the context of nucleic acids and proteins with particular insights gained via this method. Such examples include investigating a therapeutic polymer of FdUMP (5-fluoro-2-deoxyuridine-5-O-monophosphate) - a topoisomerase-1 (Top1), apoptosis-inducing poison - and nucleotide-binding proteins responsible for ATP hydrolysis from Bacillus subtilis. We also discuss how these methods can be extended to any macromolecular data set with an underlying distribution, including experimental data such as NMR structures. PMID:27179343

  11. Forecast of iceberg ensemble drift

    SciTech Connect

    El-Tahan, M.S.; El-Tahan, H.W.; Venkatesh, S.

    1983-05-01

    The objectives of the study are to gain a better understanding of the characteristics of iceberg motion and the factors controlling iceberg drift, and to develop an iceberg ensemble drift forecast system to be operated by the Canadian Atmospheric Environment Service. An extensive review of field and theoretical studies on iceberg behaviour, and the factors controlling iceberg motion has been carried out. Long term and short term behaviour of icebergs are critically examined. A quantitative assessment of the effects of the factors controlling iceberg motion is presented. The study indicated that wind and currents are the primary driving forces. Coriolis Force and ocean surface slope also have significant effects. As for waves, only the higher waves have a significant effect. Iceberg drift is also affected by iceberg size characteristics. Based on the findings of the study a comprehensive computerized forecast system to predict the drift of iceberg ensembles off Canada's east coast has been designed. The expected accuracy of the forecast system is discussed and recommendations are made for future improvements to the system.

  12. Residue-level global and local ensemble-ensemble comparisons of protein domains.

    PubMed

    Clark, Sarah A; Tronrud, Dale E; Karplus, P Andrew

    2015-09-01

    Many methods of protein structure generation such as NMR-based solution structure determination and template-based modeling do not produce a single model, but an ensemble of models consistent with the available information. Current strategies for comparing ensembles lose information because they use only a single representative structure. Here, we describe the ENSEMBLATOR and its novel strategy to directly compare two ensembles containing the same atoms to identify significant global and local backbone differences between them on per-atom and per-residue levels, respectively. The ENSEMBLATOR has four components: eePREP (ee for ensemble-ensemble), which selects atoms common to all models; eeCORE, which identifies atoms belonging to a cutoff-distance dependent common core; eeGLOBAL, which globally superimposes all models using the defined core atoms and calculates for each atom the two intraensemble variations, the interensemble variation, and the closest approach of members of the two ensembles; and eeLOCAL, which performs a local overlay of each dipeptide and, using a novel measure of local backbone similarity, reports the same four variations as eeGLOBAL. The combination of eeGLOBAL and eeLOCAL analyses identifies the most significant differences between ensembles. We illustrate the ENSEMBLATOR's capabilities by showing how using it to analyze NMR ensembles and to compare NMR ensembles with crystal structures provides novel insights compared to published studies. One of these studies leads us to suggest that a "consistency check" of NMR-derived ensembles may be a useful analysis step for NMR-based structure determinations in general. The ENSEMBLATOR 1.0 is available as a first generation tool to carry out ensemble-ensemble comparisons. PMID:26032515

  13. A pruned ensemble classifier for effective breast thermogram analysis.

    PubMed

    Krawczyk, Bartosz; Schaefer, Gerald

    2013-01-01

    Thermal infrared imaging has been shown to be useful for diagnosing breast cancer, since it is able to detect small tumors and hence can lead to earlier diagnosis. In this paper, we present a computer-aided diagnosis approach for analysing breast thermograms. We extract image features that describe bilateral differences of the breast regions in the thermogram, and then feed these features to an ensemble classifier. For the classification, we present an extension to the Under-Sampling Balanced Ensemble (USBE) algorithm. USBE addresses the problem of imbalanced class distribution that is common in medical decision making by training different classifiers on different subspaces, where each subspace is created so as to resemble a balanced classification problem. To combine the individual classifiers, we use a neural fuser based on discriminants and apply a classifier selection procedure based on a pairwise double-fault diversity measure to discard irrelevant and similar classifiers. We demonstrate that our approach works well, and that it statistically outperforms various other ensemble approaches including the original USBE algorithm. PMID:24111386

  14. Hydrologic Ensemble Forecasts for Flash Flood Warnings at Ungauged Locations

    NASA Astrophysics Data System (ADS)

    Demargne, Julie; Javelle, Pierre; Organde, Didier; Ramos, Maria-Helena

    2013-04-01

    Development of operational flash flood warning systems is one of the challenges in operational hydrology: flash floods are devastating but difficult to monitor and predict due to their nature. To provide flash flood warnings for ungauged basins, Météo-France and Irstea (formally Cemagref) have developed a discharge-threshold flood warning system called AIGA, which combines radar-gauge rainfall grids with a simplified distributed rainfall-runoff model run every 15 minutes at a 1-km² resolution. Operational since 2005 in the Southern part of France, the AIGA system produces, every 15 minutes, a map of the river network with a color chart indicating the range of the estimated return period of the ongoing flood event. To increase forecast lead time and quantify the forcing input uncertainty, the rainfall-runoff distributed model ingests the 11 precipitation ensemble members from the PEARP ensemble prediction system of Météo-France. Performance of the experimental probabilistic precipitation and flow forecasts is evaluated from a variety of ensemble verification metrics (e.g., Continuous Ranked Probability Skill Score, Relative Operating Characteristic score) for different French basins. We also discuss planned enhancements and challenges to assess other sources of hydrologic uncertainty and effectively communicate the uncertainty information to forecasters for better risk-based decision making.

  15. Joys of Community Ensemble Playing: The Case of the Happy Roll Elastic Ensemble in Taiwan

    ERIC Educational Resources Information Center

    Hsieh, Yuan-Mei; Kao, Kai-Chi

    2012-01-01

    The Happy Roll Elastic Ensemble (HREE) is a community music ensemble supported by Tainan Culture Centre in Taiwan. With enjoyment and friendship as its primary goals, it aims to facilitate the joys of ensemble playing and the spirit of social networking. This article highlights the key aspects of HREE's development in its first two years…

  16. Ensemble flood forecasting on the Tocantins River - Brazil

    NASA Astrophysics Data System (ADS)

    Fan, Fernando; Collischonn, Walter; Jiménez, Karena; Sorribas, Mino; Buarque, Diogo; Siqueira, Vinicius

    2014-05-01

    The Tocantins River basin is located in the northern region of Brazil and has about 300.000 km2 of drainage area upstream of its confluence with river Araguaia, its major tributary. The Tocantins River is intensely used for hydropower production, with seven major dams, including Tucuruí, world's fourth largest in terms of installed capacity. In this context, the use of hydrological streamflow forecasts at this basin is very useful to support the decision making process for reservoir operation, and can produce benefits by reducing damages from floods, increasing dam safety and upgrading efficiency in power generation. The occurrence of floods along the Tocantins River is a relatively frequent event, where one recent example is the year of 2012, when a large flood occurred in the Tocantins River with discharge peaks exceeding 16.000m³/s, and causing damages to cities located along the river. After this flooding event, a hydrological forecasting system was developed and is operationally in use since mid-2012 in order to assist the decision making of dam operation along the river basin. The forecasting system is based on the MGB-IPH model, a large scale distributed hydrological model, and initially used only telemetric data as observed information and deterministic rainfall forecasts from the Brazilian Meteorological Forecasting Centre (CPTEC) with 7-days lead time as input. Since August-2013 the system has been updated and now works with two new features: (i) a technique for merging satellite TRMM real-time precipitation estimative with gauged information is applied to reduce the uncertainty due to the lack of observed information over a portion of the basin, since the total number of rain gages available is scarce compared to the total basin area; (ii) rainfall ensemble forecasts with 16-days lead time provided by the Global Ensemble Forecasting System (GEFs), from the 2nd Generation of NOAA Global Ensemble Reforecast Data Set, maintained by the National Center for

  17. Ensemble stream flow predictions, a way towards better hydrological forecasting

    NASA Astrophysics Data System (ADS)

    Edlund, C.

    2009-04-01

    The hydrological forecasting division at SMHI has been using hydrological EPS and hydrological probabilities forecasts operationally since some years ago. The inputs to the hydrological model HBV are the EPS forecasts from ECMWF. From the ensemble, non-exceedance probabilities are estimated and final correction of the ensemble spread, based on evaluation is done. Ensemble stream flow predictions are done for about 80 indicator basins in Sweden, where there is a real-time discharge gauge. The EPS runs are updated daily against the latest observed discharge. Flood probability maps for exceeding a certain threshold, i.e. a certain warning level, are produced automatically once a day. The flood probabilistic forecasts are based on a HBV- model application, (called HBV-Sv, HBV Sweden) that covers the whole country and consist of 1001 subbasins with an average size between 200 and 700 km2. Probabilities computations for exceeding a certain warning level are made for each one of these 1001 subbasins. Statistical flood levels have been calculated for each river sub-basin. Hydrological probability forecasts should be seen as an early warning product that can give better support in decision making to end-users communities, for instance Civil Protections Offices and County Administrative Boards, within flood risk management. The main limitations with probability forecasts are: on one hand, difficulties to catch small-scale rain (mainly due to resolution of meteorological models); on the other hand, the hydrological model can't be updated against observations in all subbasins. The benefits of working with probabilities consist, first of all, of a new approach when working with flood risk management and scenarios. A probability forecast can give an early indication for Civil Protection that "something is going to happen" and to gain time in preparing aid operations. The ensemble stream flow prediction at SMHI is integrated with the national forecasting system and the products

  18. Tree Tectonics

    NASA Astrophysics Data System (ADS)

    Vogt, Peter R.

    2004-09-01

    Nature often replicates her processes at different scales of space and time in differing media. Here a tree-trunk cross section I am preparing for a dendrochronological display at the Battle Creek Cypress Swamp Nature Sanctuary (Calvert County, Maryland) dried and cracked in a way that replicates practically all the planform features found along the Mid-Oceanic Ridge (see Figure 1). The left-lateral offset of saw marks, contrasting with the right-lateral ``rift'' offset, even illustrates the distinction between transcurrent (strike-slip) and transform faults, the latter only recognized as a geologic feature, by J. Tuzo Wilson, in 1965. However, wood cracking is but one of many examples of natural processes that replicate one or several elements of lithospheric plate tectonics. Many of these examples occur in everyday venues and thus make great teaching aids, ``teachable'' from primary school to university levels. Plate tectonics, the dominant process of Earth geology, also occurs in miniature on the surface of some lava lakes, and as ``ice plate tectonics'' on our frozen seas and lakes. Ice tectonics also happens at larger spatial and temporal scales on the Jovian moons Europa and perhaps Ganymede. Tabletop plate tectonics, in which a molten-paraffin ``asthenosphere'' is surfaced by a skin of congealing wax ``plates,'' first replicated Mid-Oceanic Ridge type seafloor spreading more than three decades ago. A seismologist (J. Brune, personal communication, 2004) discovered wax plate tectonics by casually and serendipitously pulling a stick across a container of molten wax his wife and daughters had used in making candles. Brune and his student D. Oldenburg followed up and mirabile dictu published the results in Science (178, 301-304).

  19. GACEM: Genetic Algorithm Based Classifier Ensemble in a Multi-sensor System

    PubMed Central

    Xu, Rongwu; He, Lin

    2008-01-01

    Multi-sensor systems (MSS) have been increasingly applied in pattern classification while searching for the optimal classification framework is still an open problem. The development of the classifier ensemble seems to provide a promising solution. The classifier ensemble is a learning paradigm where many classifiers are jointly used to solve a problem, which has been proven an effective method for enhancing the classification ability. In this paper, by introducing the concept of Meta-feature (MF) and Trans-function (TF) for describing the relationship between the nature and the measurement of the observed phenomenon, classification in a multi-sensor system can be unified in the classifier ensemble framework. Then an approach called Genetic Algorithm based Classifier Ensemble in Multi-sensor system (GACEM) is presented, where a genetic algorithm is utilized for optimization of both the selection of features subset and the decision combination simultaneously. GACEM trains a number of classifiers based on different combinations of feature vectors at first and then selects the classifiers whose weight is higher than the pre-set threshold to make up the ensemble. An empirical study shows that, compared with the conventional feature-level voting and decision-level voting, not only can GACEM achieve better and more robust performance, but also simplify the system markedly.

  20. Using an online game to evaluate effective methods of communicating ensemble model output to different audiences

    NASA Astrophysics Data System (ADS)

    Stephens, E. M.; Mylne, K.; Spiegelhalter, D.

    2011-12-01

    Effective communication of probabilistic forecasts for weather and climate applications is vital for improved understanding and decision making by the public and other end-users. Probabilistic predictions are frequently produced for uses such as hurricane warnings or climate change impact assessments, usually using ensemble prediction systems, but limited research has been undertaken to explore the best methods of communicating this information. The communication of forecasts produced by ensemble prediction systems is one of the major challenges facing meteorologists today. Most, if not all, of ensemble output is not currently communicated to the general public, leading to a widening gap between what information is computed and what is provided. A lack of public understanding and the difficulty in presenting such complex probabilistic information are two reasons often cited for not communicating ensemble weather forecasts to the public. Using an online game we explore these issues by evaluating the ability of participants to make decisions using a number of different methods of presenting probabilistic temperature and rainfall predictions. Participants are segmented demographically to better understand how outcomes vary between audiences of different backgrounds and levels of expertise. The insights gained from this work on day to day weather forecasts have implications for effective communication across the wider ensemble modeling community.

  1. Layered Ensemble Architecture for Time Series Forecasting.

    PubMed

    Rahman, Md Mustafizur; Islam, Md Monirul; Murase, Kazuyuki; Yao, Xin

    2016-01-01

    Time series forecasting (TSF) has been widely used in many application areas such as science, engineering, and finance. The phenomena generating time series are usually unknown and information available for forecasting is only limited to the past values of the series. It is, therefore, necessary to use an appropriate number of past values, termed lag, for forecasting. This paper proposes a layered ensemble architecture (LEA) for TSF problems. Our LEA consists of two layers, each of which uses an ensemble of multilayer perceptron (MLP) networks. While the first ensemble layer tries to find an appropriate lag, the second ensemble layer employs the obtained lag for forecasting. Unlike most previous work on TSF, the proposed architecture considers both accuracy and diversity of the individual networks in constructing an ensemble. LEA trains different networks in the ensemble by using different training sets with an aim of maintaining diversity among the networks. However, it uses the appropriate lag and combines the best trained networks to construct the ensemble. This indicates LEAs emphasis on accuracy of the networks. The proposed architecture has been tested extensively on time series data of neural network (NN)3 and NN5 competitions. It has also been tested on several standard benchmark time series data. In terms of forecasting accuracy, our experimental results have revealed clearly that LEA is better than other ensemble and nonensemble methods. PMID:25751882

  2. Fine-Tuning Your Ensemble's Jazz Style.

    ERIC Educational Resources Information Center

    Garcia, Antonio J.

    1991-01-01

    Proposes instructional strategies for directors of jazz groups, including guidelines for developing of skills necessary for good performance. Includes effective methods for positive changes in ensemble style. Addresses jazz group problems such as beat, tempo, staying in tune, wind power, and solo/ensemble lines. Discusses percussionists, bassists,…

  3. Visual stimuli recruit intrinsically generated cortical ensembles

    PubMed Central

    Miller, Jae-eun Kang; Ayzenshtat, Inbal; Carrillo-Reid, Luis; Yuste, Rafael

    2014-01-01

    The cortical microcircuit is built with recurrent excitatory connections, and it has long been suggested that the purpose of this design is to enable intrinsically driven reverberating activity. To understand the dynamics of neocortical intrinsic activity better, we performed two-photon calcium imaging of populations of neurons from the primary visual cortex of awake mice during visual stimulation and spontaneous activity. In both conditions, cortical activity is dominated by coactive groups of neurons, forming ensembles whose activation cannot be explained by the independent firing properties of their contributing neurons, considered in isolation. Moreover, individual neurons flexibly join multiple ensembles, vastly expanding the encoding potential of the circuit. Intriguingly, the same coactive ensembles can repeat spontaneously and in response to visual stimuli, indicating that stimulus-evoked responses arise from activating these intrinsic building blocks. Although the spatial properties of stimulus-driven and spontaneous ensembles are similar, spontaneous ensembles are active at random intervals, whereas visually evoked ensembles are time-locked to stimuli. We conclude that neuronal ensembles, built by the coactivation of flexible groups of neurons, are emergent functional units of cortical activity and propose that visual stimuli recruit intrinsically generated ensembles to represent visual attributes. PMID:25201983

  4. Applications of Bayesian Procrustes shape analysis to ensemble radar reflectivity nowcast verification

    NASA Astrophysics Data System (ADS)

    Fox, Neil I.; Micheas, Athanasios C.; Peng, Yuqiang

    2016-07-01

    This paper introduces the use of Bayesian full Procrustes shape analysis in object-oriented meteorological applications. In particular, the Procrustes methodology is used to generate mean forecast precipitation fields from a set of ensemble forecasts. This approach has advantages over other ensemble averaging techniques in that it can produce a forecast that retains the morphological features of the precipitation structures and present the range of forecast outcomes represented by the ensemble. The production of the ensemble mean avoids the problems of smoothing that result from simple pixel or cell averaging, while producing credible sets that retain information on ensemble spread. Also in this paper, the full Bayesian Procrustes scheme is used as an object verification tool for precipitation forecasts. This is an extension of a previously presented Procrustes shape analysis based verification approach into a full Bayesian format designed to handle the verification of precipitation forecasts that match objects from an ensemble of forecast fields to a single truth image. The methodology is tested on radar reflectivity nowcasts produced in the Warning Decision Support System - Integrated Information (WDSS-II) by varying parameters in the K-means cluster tracking scheme.

  5. SIMULATION OF THE ICELAND VOLCANIC ERUPTION OF APRIL 2010 USING THE ENSEMBLE SYSTEM

    SciTech Connect

    Buckley, R.

    2011-05-10

    The Eyjafjallajokull volcanic eruption in Iceland in April 2010 disrupted transportation in Europe which ultimately affected travel plans for many on a global basis. The Volcanic Ash Advisory Centre (VAAC) is responsible for providing guidance to the aviation industry of the transport of volcanic ash clouds. There are nine such centers located globally, and the London branch (headed by the United Kingdom Meteorological Office, or UKMet) was responsible for modeling the Iceland volcano. The guidance provided by the VAAC created some controversy due to the burdensome travel restrictions and uncertainty involved in the prediction of ash transport. The Iceland volcanic eruption provides a useful exercise of the European ENSEMBLE program, coordinated by the Joint Research Centre (JRC) in Ispra, Italy. ENSEMBLE, a decision support system for emergency response, uses transport model results from a variety of countries in an effort to better understand the uncertainty involved with a given accident scenario. Model results in the form of airborne concentration and surface deposition are required from each member of the ensemble in a prescribed format that may then be uploaded to a website for manipulation. The Savannah River National Laboratory (SRNL) is the lone regular United States participant throughout the 10-year existence of ENSEMBLE. For the Iceland volcano, four separate source term estimates have been provided to ENSEMBLE participants. This paper focuses only on one of those source terms. The SRNL results in relation to other modeling agency results along with useful information obtained using an ensemble of transport results will be discussed.

  6. Characterizing Ensembles of Superconducting Qubits

    NASA Astrophysics Data System (ADS)

    Sears, Adam; Birenbaum, Jeff; Hover, David; Rosenberg, Danna; Weber, Steven; Yoder, Jonilyn L.; Kerman, Jamie; Gustavsson, Simon; Kamal, Archana; Yan, Fei; Oliver, William

    We investigate ensembles of up to 48 superconducting qubits embedded within a superconducting cavity. Such arrays of qubits have been proposed for the experimental study of Ising Hamiltonians, and efficient methods to characterize and calibrate these types of systems are still under development. Here we leverage high qubit coherence (> 70 μs) to characterize individual devices as well as qubit-qubit interactions, utilizing the common resonator mode for a joint readout. This research was funded by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA) under Air Force Contract No. FA8721-05-C-0002. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, or the US Government.

  7. The Needs of Trees

    ERIC Educational Resources Information Center

    Boyd, Amy E.; Cooper, Jim

    2004-01-01

    Tree rings can be used not only to look at plant growth, but also to make connections between plant growth and resource availability. In this lesson, students in 2nd-4th grades use role-play to become familiar with basic requirements of trees and how availability of those resources is related to tree ring sizes and tree growth. These concepts can…

  8. Rydberg ensemble based CNOTN gates using STIRAP

    NASA Astrophysics Data System (ADS)

    Gujarati, Tanvi; Duan, Luming

    2016-05-01

    Schemes for implementation of CNOT gates in atomic ensembles are important for realization of quantum computing. We present here a theoretical scheme of a CNOTN gate with an ensemble of three-level atoms in the lambda configuration and a single two-level control atom. We work in the regime of Rydberg blockade for the ensemble atoms due to excitation of the Rydberg control atom. It is shown that using STIRAP, atoms from one ground state of the ensemble can be adiabatically transferred to the other ground state, depending on the state of the control atom. A thorough analysis of adiabatic conditions for this scheme and the influence of the radiative decay is provided. We show that the CNOTN process is immune to the decay rate of the excited level in ensemble atoms. This work is supported by the ARL, the IARPA LogiQ program, and the AFOSR MURI program.

  9. ENCORE: Software for Quantitative Ensemble Comparison

    PubMed Central

    Tiberti, Matteo; Papaleo, Elena; Bengtsen, Tone; Boomsma, Wouter; Lindorff-Larsen, Kresten

    2015-01-01

    There is increasing evidence that protein dynamics and conformational changes can play an important role in modulating biological function. As a result, experimental and computational methods are being developed, often synergistically, to study the dynamical heterogeneity of a protein or other macromolecules in solution. Thus, methods such as molecular dynamics simulations or ensemble refinement approaches have provided conformational ensembles that can be used to understand protein function and biophysics. These developments have in turn created a need for algorithms and software that can be used to compare structural ensembles in the same way as the root-mean-square-deviation is often used to compare static structures. Although a few such approaches have been proposed, these can be difficult to implement efficiently, hindering a broader applications and further developments. Here, we present an easily accessible software toolkit, called ENCORE, which can be used to compare conformational ensembles generated either from simulations alone or synergistically with experiments. ENCORE implements three previously described methods for ensemble comparison, that each can be used to quantify the similarity between conformational ensembles by estimating the overlap between the probability distributions that underlie them. We demonstrate the kinds of insights that can be obtained by providing examples of three typical use-cases: comparing ensembles generated with different molecular force fields, assessing convergence in molecular simulations, and calculating differences and similarities in structural ensembles refined with various sources of experimental data. We also demonstrate efficient computational scaling for typical analyses, and robustness against both the size and sampling of the ensembles. ENCORE is freely available and extendable, integrates with the established MDAnalysis software package, reads ensemble data in many common formats, and can work with large

  10. ENCORE: Software for Quantitative Ensemble Comparison.

    PubMed

    Tiberti, Matteo; Papaleo, Elena; Bengtsen, Tone; Boomsma, Wouter; Lindorff-Larsen, Kresten

    2015-10-01

    There is increasing evidence that protein dynamics and conformational changes can play an important role in modulating biological function. As a result, experimental and computational methods are being developed, often synergistically, to study the dynamical heterogeneity of a protein or other macromolecules in solution. Thus, methods such as molecular dynamics simulations or ensemble refinement approaches have provided conformational ensembles that can be used to understand protein function and biophysics. These developments have in turn created a need for algorithms and software that can be used to compare structural ensembles in the same way as the root-mean-square-deviation is often used to compare static structures. Although a few such approaches have been proposed, these can be difficult to implement efficiently, hindering a broader applications and further developments. Here, we present an easily accessible software toolkit, called ENCORE, which can be used to compare conformational ensembles generated either from simulations alone or synergistically with experiments. ENCORE implements three previously described methods for ensemble comparison, that each can be used to quantify the similarity between conformational ensembles by estimating the overlap between the probability distributions that underlie them. We demonstrate the kinds of insights that can be obtained by providing examples of three typical use-cases: comparing ensembles generated with different molecular force fields, assessing convergence in molecular simulations, and calculating differences and similarities in structural ensembles refined with various sources of experimental data. We also demonstrate efficient computational scaling for typical analyses, and robustness against both the size and sampling of the ensembles. ENCORE is freely available and extendable, integrates with the established MDAnalysis software package, reads ensemble data in many common formats, and can work with large

  11. Medium Range Ensembles Flood Forecasts for Community Level Applications

    NASA Astrophysics Data System (ADS)

    Fakhruddin, S.; Kawasaki, A.; Babel, M. S.; AIT

    2013-05-01

    Early warning is a key element for disaster risk reduction. In recent decades, there has been a major advancement in medium range and seasonal forecasting. These could provide a great opportunity to improve early warning systems and advisories for early action for strategic and long term planning. This could result in increasing emphasis on proactive rather than reactive management of adverse consequences of flood events. This can be also very helpful for the agricultural sector by providing a diversity of options to farmers (e.g. changing cropping pattern, planting timing, etc.). An experimental medium range (1-10 days) flood forecasting model has been developed for Bangladesh which provides 51 set of discharge ensembles forecasts of one to ten days with significant persistence and high certainty. This could help communities (i.e. farmer) for gain/lost estimation as well as crop savings. This paper describe the application of ensembles probabilistic flood forecast at the community level for differential decision making focused on agriculture. The framework allows users to interactively specify the objectives and criteria that are germane to a particular situation, and obtain the management options that are possible, and the exogenous influences that should be taken into account before planning and decision making. risk and vulnerability assessment was conducted through community consultation. The forecast lead time requirement, users' needs, impact and management options for crops, livestock and fisheries sectors were identified through focus group discussions, informal interviews and questionnaire survey.

  12. Modeling Dynamic Systems with Efficient Ensembles of Process-Based Models

    PubMed Central

    Simidjievski, Nikola; Todorovski, Ljupčo; Džeroski, Sašo

    2016-01-01

    Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient. PMID:27078633

  13. Modeling Dynamic Systems with Efficient Ensembles of Process-Based Models.

    PubMed

    Simidjievski, Nikola; Todorovski, Ljupčo; Džeroski, Sašo

    2016-01-01

    Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient. PMID:27078633

  14. A dynamic fault tree model of a propulsion system

    NASA Technical Reports Server (NTRS)

    Xu, Hong; Dugan, Joanne Bechta; Meshkat, Leila

    2006-01-01

    We present a dynamic fault tree model of the benchmark propulsion system, and solve it using Galileo. Dynamic fault trees (DFT) extend traditional static fault trees with special gates to model spares and other sequence dependencies. Galileo solves DFT models using a judicious combination of automatically generated Markov and Binary Decision Diagram models. Galileo easily handles the complexities exhibited by the benchmark problem. In particular, Galileo is designed to model phased mission systems.

  15. Hybrid Data Assimilation without Ensemble Filtering

    NASA Technical Reports Server (NTRS)

    Todling, Ricardo; Akkraoui, Amal El

    2014-01-01

    The Global Modeling and Assimilation Office is preparing to upgrade its three-dimensional variational system to a hybrid approach in which the ensemble is generated using a square-root ensemble Kalman filter (EnKF) and the variational problem is solved using the Grid-point Statistical Interpolation system. As in most EnKF applications, we found it necessary to employ a combination of multiplicative and additive inflations, to compensate for sampling and modeling errors, respectively and, to maintain the small-member ensemble solution close to the variational solution; we also found it necessary to re-center the members of the ensemble about the variational analysis. During tuning of the filter we have found re-centering and additive inflation to play a considerably larger role than expected, particularly in a dual-resolution context when the variational analysis is ran at larger resolution than the ensemble. This led us to consider a hybrid strategy in which the members of the ensemble are generated by simply converting the variational analysis to the resolution of the ensemble and applying additive inflation, thus bypassing the EnKF. Comparisons of this, so-called, filter-free hybrid procedure with an EnKF-based hybrid procedure and a control non-hybrid, traditional, scheme show both hybrid strategies to provide equally significant improvement over the control; more interestingly, the filter-free procedure was found to give qualitatively similar results to the EnKF-based procedure.

  16. The impact of initial spread calibration on the RELO ensemble and its application to Lagrangian dynamics

    NASA Astrophysics Data System (ADS)

    Wei, M.; Jacobs, G.; Rowley, C.; Barron, C. N.; Hogan, P.; Spence, P.; Smedstad, O. M.; Martin, P.; Muscarella, P.; Coelho, E.

    2013-09-01

    A number of real-time ocean model forecasts were carried out successfully at Naval Research Laboratory (NRL) to provide modeling support and numerical guidance to the CARTHE GLAD at-sea experiment during summer 2012. Two RELO ensembles and three single models using NCOM and HYCOM with different resolutions were carried out. A calibrated ensemble system with enhanced spread and reliability was developed to better support this experiment. The calibrated ensemble is found to outperform the un-calibrated ensemble in forecasting accuracy, skill, and reliability for all the variables and observation spaces evaluated. The metrics used in this paper include RMS error, anomaly correlation, PECA, Brier score, spread reliability, and Talagrand rank histogram. It is also found that even the un-calibrated ensemble outperforms the single forecast from the model with the same resolution. The advantages of the ensembles are further extended to the Lagrangian framework. In contrast to a single model forecast, the RELO ensemble provides not only the most likely Lagrangian trajectory for a particle in the ocean, but also an uncertainty estimate that directly reflects the complicated ocean dynamics, which is valuable for decision makers. The examples show that the calibrated ensemble with more reliability can capture trajectories in different, even opposite, directions, which would be missed by the un-calibrated ensemble. The ensembles are applied to compute the repelling and attracting Lagrangian coherent structures (LCSs), and the uncertainties of the LCSs, which are hard to obtain from a single model forecast, are estimated. It is found that the spatial scales of the LCSs depend on the model resolution. The model with the highest resolution produces the finest, small-scale, LCS structures, while the model with lowest resolution generates only large-scale LCSs. The repelling and attracting LCSs are found to intersect at many locations and create complex mesoscale eddies. The fluid

  17. Evaluating reliability and resolution of ensemble forecasts using information theory

    NASA Astrophysics Data System (ADS)

    Weijs, Steven; van de Giesen, Nick

    2010-05-01

    Ensemble forecasts are increasingly popular for the communication of uncertainty towards the public and decision makers. Ideally, an ensemble forecast reflects both the uncertainty and the information in a forecast, which means that the spread in the ensemble should accurately represent the true uncertainty. For ensembles to be useful, they should be probabilistic, as probability is the language to precisely describe an incomplete state of knowledge, that is typical for forecasts. Information theory provides the ideal tools to deal with uncertainty and information in forecasts. Essential to the use and development of models and forecasts are ways to evaluate their quality. Without a proper definition of what is good, it is impossible to improve forecasts. In contrast to forecast value, which is user dependent, forecast quality, which is defined as the correspondence between forecasts and observations, can be objectively defined, given the question that is asked. The evaluation of forecast quality is known as forecast verification. Numerous techniques for forecast verification have been developed over the past decades. The Brier score (BS) and the derived Ranked Probability Score (RPS) are among the most widely used scores for measuring forecast quality. Both of these scores can be split into three additive components: uncertainty, reliability and resolution. While the first component, uncertainty, just depends on the inherent variability in the forecasted event, the latter two measure different aspects of the quality of forecasts themselves. Resolution measures the difference between the conditional probabilities and the marginal probabilities of occurrence. The third component, reliability, measures the conditional bias in the probability estimates, hence unreliability would be a better name. In this work, we argue that information theory should be adopted as the correct framework for measuring quality of probabilistic ensemble forecasts. We use the information

  18. Fault tree handbook

    SciTech Connect

    Haasl, D.F.; Roberts, N.H.; Vesely, W.E.; Goldberg, F.F.

    1981-01-01

    This handbook describes a methodology for reliability analysis of complex systems such as those which comprise the engineered safety features of nuclear power generating stations. After an initial overview of the available system analysis approaches, the handbook focuses on a description of the deductive method known as fault tree analysis. The following aspects of fault tree analysis are covered: basic concepts for fault tree analysis; basic elements of a fault tree; fault tree construction; probability, statistics, and Boolean algebra for the fault tree analyst; qualitative and quantitative fault tree evaluation techniques; and computer codes for fault tree evaluation. Also discussed are several example problems illustrating the basic concepts of fault tree construction and evaluation.

  19. An Ensemble Weighting Approach for Dendroclimatology: Drought Reconstructions for the Northeastern Tibetan Plateau

    PubMed Central

    Fang, Keyan; Wilmking, Martin; Davi, Nicole; Zhou, Feifei; Liu, Changzhi

    2014-01-01

    Traditional detrending methods assign equal mean value to all tree-ring series for chronology developments, despite that the mean annual growth changes in different time periods. We find that the strength of a tree-ring model can be improved by giving more weights to tree-ring series that have a stronger climate signal and less weight to series that have a weaker signal. We thus present an ensemble weighting method to mitigate these potential biases and to more accurately extract the climate signals in dendroclimatology studies. This new method has been used to develop the first annual precipitation reconstruction (previous August to current July) at the Songmingyan Mountain and to recalculate the tree-ring chronology from Shenge site in Dulan area in northeastern Tibetan Plateau (TP), a marginal area of Asian summer monsoon. The ensemble weighting method explains 31.7% of instrumental variance for the reconstructions at Songmingyan Mountain and 57.3% of the instrumental variance in the Dulan area, which are higher than those developed using traditional methods. We focus on the newly introduced reconstruction at Songmingyan Mountain, which showsextremely dry (wet) epochs from 1862–1874, 1914–1933 and 1991–1999 (1882–1905). These dry/wet epochs were also found in the marginal areas of summer monsoon and the Indian subcontinent, indicating the linkages between regional hydroclimate changes and the Indian summer monsoon. PMID:24497967

  20. An ensemble weighting approach for dendroclimatology: drought reconstructions for the northeastern Tibetan Plateau.

    PubMed

    Fang, Keyan; Wilmking, Martin; Davi, Nicole; Zhou, Feifei; Liu, Changzhi

    2014-01-01

    Traditional detrending methods assign equal mean value to all tree-ring series for chronology developments, despite that the mean annual growth changes in different time periods. We find that the strength of a tree-ring model can be improved by giving more weights to tree-ring series that have a stronger climate signal and less weight to series that have a weaker signal. We thus present an ensemble weighting method to mitigate these potential biases and to more accurately extract the climate signals in dendroclimatology studies. This new method has been used to develop the first annual precipitation reconstruction (previous August to current July) at the Songmingyan Mountain and to recalculate the tree-ring chronology from Shenge site in Dulan area in northeastern Tibetan Plateau (TP), a marginal area of Asian summer monsoon. The ensemble weighting method explains 31.7% of instrumental variance for the reconstructions at Songmingyan Mountain and 57.3% of the instrumental variance in the Dulan area, which are higher than those developed using traditional methods. We focus on the newly introduced reconstruction at Songmingyan Mountain, which showsextremely dry (wet) epochs from 1862-1874, 1914-1933 and 1991-1999 (1882-1905). These dry/wet epochs were also found in the marginal areas of summer monsoon and the Indian subcontinent, indicating the linkages between regional hydroclimate changes and the Indian summer monsoon. PMID:24497967

  1. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches

    SciTech Connect

    Singh, Kunwar P. Gupta, Shikha

    2014-03-15

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the

  2. Towards reliable seasonal ensemble streamflow forecasts for ephemeral rivers

    NASA Astrophysics Data System (ADS)

    Bennett, James; Wang, Qj; Li, Ming; Robertson, David

    2016-04-01

    Despite their inherently variable nature, ephemeral rivers are an important water resource in many dry regions. Water managers are likely benefit considerably from even mildly skilful ensemble forecasts of streamflow in ephemeral rivers. As with any ensemble forecast, forecast uncertainty - i.e., the spread of the ensemble - must be reliably quantified to allow users of the forecasts to make well-founded decisions. Correctly quantifying uncertainty in ephemeral rivers is particularly challenging because of the high incidence of zero flows, which are difficult to handle with conventional statistical techniques. Here we apply a seasonal streamflow forecasting system, the model for generating Forecast Guided Stochastic Scenarios (FoGSS), to 26 Australian ephemeral rivers. FoGSS uses post-processed ensemble rainfall forecasts from a coupled ocean-atmosphere prediction system to force an initialised monthly rainfall runoff model, and then applies a staged hydrological error model to describe and propagate hydrological uncertainty in the forecast. FoGSS produces 12-month streamflow forecasts; as forecast skill declines with lead time, the forecasts are designed to transit seamlessly to stochastic scenarios. The ensemble rainfall forecasts used in FoGSS are known to be unbiased and reliable, and we concentrate here on the hydrological error model. The FoGSS error model has several features that make it well suited to forecasting ephemeral rivers. First, FoGSS models the error after data is transformed with a log-sinh transformation. The log-sinh transformation is able to normalise even highly skewed data and homogenise its variance, allowing us to assume that errors are Gaussian. Second, FoGSS handles zero values using data censoring. Data censoring allows streamflow in ephemeral rivers to be treated as a continuous variable, rather than having to model the occurrence of non-zero values and the distribution of non-zero values separately. This greatly simplifies parameter

  3. Developing planning hydrologic ensembles that reflect combined paleoclimate and projected climate information sets

    NASA Astrophysics Data System (ADS)

    Prairie, J. R.; Brekke, L.; Pruitt, T.; Rajagopalan, B.; Woodhouse, C.

    2008-12-01

    Historically, Reclamation has performed probabilistic analysis to assess risk and reliability only considering the instrumental record. Understanding that the assumption of a future similar to the relatively short instrumental past is losing credibility for long-term planning; Reclamation has conducted recent studies in the Colorado River Basin involving methods that relate water supply assumptions to a blend of instrumental record with tree-ring based reconstructed flow information. In addition, Reclamation has conducted studies in California that relate projected climate information to natural runoff change and adjusted water supply assumptions for long-term simulation of Central Valley Project and State Water Project operations. Both methods provide means to estimate probabilities and risks in water management that do not only consider the relatively short instrumental record. Motivated by both of these efforts Reclamation is exploring a method that relates blended tree-ring based reconstructed flow information and projected hydroclimate information to ensemble water supply assumptions suitable for long-term planning. Presentation will focus on method application and results in the Missouri River Basin above Toston, Montana. The method builds from a recently published nonparametric method that resamples the flow magnitudes of the instrument-record conditioned on the hydrologic "state" sequences from tree-ring based reconstructions. In this application, magnitudes from the instrument record are replaced by magnitudes from runoff simulations consistent with climate projections. The resultant hydrologic ensemble is then compared to the ensemble consistent with only projected climate information and runoff projections to explore the advantages and disadvantages of conditioning the climate projections relative to paleo-climate information. This will be accomplished by comparison of ensemble descriptive statistics and probabilities of drought and surplus events.

  4. Categorizing Ideas about Trees: A Tree of Trees

    PubMed Central

    Fisler, Marie; Lecointre, Guillaume

    2013-01-01

    The aim of this study is to explore whether matrices and MP trees used to produce systematic categories of organisms could be useful to produce categories of ideas in history of science. We study the history of the use of trees in systematics to represent the diversity of life from 1766 to 1991. We apply to those ideas a method inspired from coding homologous parts of organisms. We discretize conceptual parts of ideas, writings and drawings about trees contained in 41 main writings; we detect shared parts among authors and code them into a 91-characters matrix and use a tree representation to show who shares what with whom. In other words, we propose a hierarchical representation of the shared ideas about trees among authors: this produces a “tree of trees.” Then, we categorize schools of tree-representations. Classical schools like “cladists” and “pheneticists” are recovered but others are not: “gradists” are separated into two blocks, one of them being called here “grade theoreticians.” We propose new interesting categories like the “buffonian school,” the “metaphoricians,” and those using “strictly genealogical classifications.” We consider that networks are not useful to represent shared ideas at the present step of the study. A cladogram is made for showing who is sharing what with whom, but also heterobathmy and homoplasy of characters. The present cladogram is not modelling processes of transmission of ideas about trees, and here it is mostly used to test for proximity of ideas of the same age and for categorization. PMID:23950877

  5. Teaching the Tools of Pharmaceutical Care Decision-Analysis.

    ERIC Educational Resources Information Center

    Rittenhouse, Brian E.

    1994-01-01

    A method of decision-analysis in pharmaceutical care that integrates epidemiology and economics is presented, including an example illustrating both the deceptive nature of medical decision making and the power of decision analysis. Principles in determining both general and specific probabilities of interest and use of decision trees for…

  6. Foraging Behaviour in Magellanic Woodpeckers Is Consistent with a Multi-Scale Assessment of Tree Quality

    PubMed Central

    Vergara, Pablo M.; Soto, Gerardo E.; Rodewald, Amanda D.; Meneses, Luis O.; Pérez-Hernández, Christian G.

    2016-01-01

    Theoretical models predict that animals should make foraging decisions after assessing the quality of available habitat, but most models fail to consider the spatio-temporal scales at which animals perceive habitat availability. We tested three foraging strategies that explain how Magellanic woodpeckers (Campephilus magellanicus) assess the relative quality of trees: 1) Woodpeckers with local knowledge select trees based on the available trees in the immediate vicinity. 2) Woodpeckers lacking local knowledge select trees based on their availability at previously visited locations. 3) Woodpeckers using information from long-term memory select trees based on knowledge about trees available within the entire landscape. We observed foraging woodpeckers and used a Brownian Bridge Movement Model to identify trees available to woodpeckers along foraging routes. Woodpeckers selected trees with a later decay stage than available trees. Selection models indicated that preferences of Magellanic woodpeckers were based on clusters of trees near the most recently visited trees, thus suggesting that woodpeckers use visual cues from neighboring trees. In a second analysis, Cox’s proportional hazards models showed that woodpeckers used information consolidated across broader spatial scales to adjust tree residence times. Specifically, woodpeckers spent more time at trees with larger diameters and in a more advanced stage of decay than trees available along their routes. These results suggest that Magellanic woodpeckers make foraging decisions based on the relative quality of trees that they perceive and memorize information at different spatio-temporal scales. PMID:27416115

  7. Foraging Behaviour in Magellanic Woodpeckers Is Consistent with a Multi-Scale Assessment of Tree Quality.

    PubMed

    Vergara, Pablo M; Soto, Gerardo E; Moreira-Arce, Darío; Rodewald, Amanda D; Meneses, Luis O; Pérez-Hernández, Christian G

    2016-01-01

    Theoretical models predict that animals should make foraging decisions after assessing the quality of available habitat, but most models fail to consider the spatio-temporal scales at which animals perceive habitat availability. We tested three foraging strategies that explain how Magellanic woodpeckers (Campephilus magellanicus) assess the relative quality of trees: 1) Woodpeckers with local knowledge select trees based on the available trees in the immediate vicinity. 2) Woodpeckers lacking local knowledge select trees based on their availability at previously visited locations. 3) Woodpeckers using information from long-term memory select trees based on knowledge about trees available within the entire landscape. We observed foraging woodpeckers and used a Brownian Bridge Movement Model to identify trees available to woodpeckers along foraging routes. Woodpeckers selected trees with a later decay stage than available trees. Selection models indicated that preferences of Magellanic woodpeckers were based on clusters of trees near the most recently visited trees, thus suggesting that woodpeckers use visual cues from neighboring trees. In a second analysis, Cox's proportional hazards models showed that woodpeckers used information consolidated across broader spatial scales to adjust tree residence times. Specifically, woodpeckers spent more time at trees with larger diameters and in a more advanced stage of decay than trees available along their routes. These results suggest that Magellanic woodpeckers make foraging decisions based on the relative quality of trees that they perceive and memorize information at different spatio-temporal scales. PMID:27416115

  8. Cooperative effects of neuronal ensembles.

    PubMed

    Rose, G; Siebler, M

    1995-01-01

    Electrophysiological properties of neurons as the basic cellular elements of the central nervous system and their synaptic connections are well characterized down to a molecular level. However, the behavior of complex noisy networks formed by these constituents usually cannot simply be derived from the knowledge of its microscopic parameters. As a consequence, cooperative phenomena based on the interaction of neurons were postulated. This is a report on a study of global network spike activity as a function of synaptic interaction. We performed experiments in dissociated cultured hippocampal neurons and, for comparison, simulations of a mathematical model closely related to electrophysiology. Numeric analyses revealed that at a critical level of synaptic connectivity the firing behavior undergoes a phase transition. This cooperative effect depends crucially on the interaction of numerous cells and cannot be attributed to the spike threshold of individual neurons. In the experiment a drastic increase in the firing level was observed upon increase of synaptic efficacy by lowering of the extracellular magnesium concentration, which is compatible with our theoretical predictions. This "on-off" phenomenon demonstrates that even in small neuronal ensembles collective behavior can emerge which is not explained by the characteristics of single neurons. PMID:8542966

  9. Decision Making in Assessment and Early Intervention Planning.

    ERIC Educational Resources Information Center

    Crais, Elizabeth R.; Roberts, Joanne E.

    1991-01-01

    This article presents a series of decision trees to help in planning assessment and intervention with handicapped children between three months and five years of age. A series of assessment questions lead to suggestions for intervention. Steps in using the decision trees are given and a case example presented. (Author/DB)

  10. The best ensembles of RCMs for climate change projections in Ukraine

    NASA Astrophysics Data System (ADS)

    Krakovska, Svitlana; Gnatiuk, Natalia; Palamarchuk, Liudmyla; Shedemenko, Iryna

    2013-04-01

    ensemble of 10RCMs all errors were minimal specifically an average areal absolute error was only -0.07oC. The same methodology was applied for multi-year monthly precipitation amount. But in this case correlation coefficients should be one of the most decisive parameters. Another two are RMSD and standard deviation, and Taylor diagrams are most useful for precipitation verification in this case. Unfortunately just 8 RCMs in the period 1961-1990 and 5 RCMs in period 1991-2010 ha