Science.gov

Sample records for decision tree ensembles

  1. Creating ensembles of decision trees through sampling

    DOEpatents

    Kamath, Chandrika; Cantu-Paz, Erick

    2005-08-30

    A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.

  2. Creating Ensembles of Decision Trees Through Sampling

    SciTech Connect

    Kamath,C; Cantu-Paz, E

    2001-07-26

    Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. This paper focuses on ensembles of decision trees that are created with a randomized procedure based on sampling. Randomization can be introduced by using random samples of the training data (as in bagging or boosting) and running a conventional tree-building algorithm, or by randomizing the induction algorithm itself. The objective of this paper is to describe the first experiences with a novel randomized tree induction method that uses a sub-sample of instances at a node to determine the split. The empirical results show that ensembles generated using this approach yield results that are competitive in accuracy and superior in computational cost to boosting and bagging.

  3. Using histograms to introduce randomization in the generation of ensembles of decision trees

    DOEpatents

    Kamath, Chandrika; Cantu-Paz, Erick; Littau, David

    2005-02-22

    A system for decision tree ensembles that includes a module to read the data, a module to create a histogram, a module to evaluate a potential split according to some criterion using the histogram, a module to select a split point randomly in an interval around the best split, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method includes the steps of reading the data; creating a histogram; evaluating a potential split according to some criterion using the histogram, selecting a split point randomly in an interval around the best split, splitting the data, and combining multiple decision trees in ensembles.

  4. Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

    DOEpatents

    Cantu-Paz, Erick; Kamath, Chandrika

    2006-06-13

    A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.

  5. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS

    NASA Astrophysics Data System (ADS)

    Tehrany, Mahyat Shafapour; Pradhan, Biswajeet; Jebur, Mustafa Neamah

    2013-11-01

    Decision tree (DT) machine learning algorithm was used to map the flood susceptible areas in Kelantan, Malaysia.We used an ensemble frequency ratio (FR) and logistic regression (LR) model in order to overcome weak points of the LR.Combined method of FR and LR was used to map the susceptible areas in Kelantan, Malaysia.Results of both methods were compared and their efficiency was assessed.Most influencing conditioning factors on flooding were recognized.

  6. Tree Ensembles on the Induced Discrete Space.

    PubMed

    Yildiz, Olcay Taner

    2016-05-01

    Decision trees are widely used predictive models in machine learning. Recently, K -tree is proposed, where the original discrete feature space is expanded by generating all orderings of values of k discrete attributes and these orderings are used as the new attributes in decision tree induction. Although K -tree performs significantly better than the proper one, their exponential time complexity can prohibit their use. In this brief, we propose K -forest, an extension of random forest, where a subset of features is selected randomly from the induced discrete space. Simulation results on 17 data sets show that the novel ensemble classifier has significantly lower error rate compared with the random forest based on the original feature space.

  7. Approximate Splitting for Ensembles of Trees using Histograms

    SciTech Connect

    Kamath, C; Cantu-Paz, E; Littau, D

    2001-09-28

    Recent work in classification indicates that significant improvements in accuracy can be obtained by growing an ensemble of classifiers and having them vote for the most popular class. Implicit in many of these techniques is the concept of randomization that generates different classifiers. In this paper, they focus on ensembles of decision trees that are created using a randomized procedure based on histograms. Techniques, such as histograms, that discretize continuous variables, have long been used in classification to convert the data into a form suitable for processing and to reduce the compute time. The approach combines the ideas behind discretization through histograms and randomization in ensembles to create decision trees by randomly selecting a split point in an interval around the best bin boundary in the histogram. The experimental results with public domain data show that ensembles generated using this approach are competitive in accuracy and superior in computational cost to other ensembles techniques such as boosting and bagging.

  8. Quantum decision tree classifier

    NASA Astrophysics Data System (ADS)

    Lu, Songfeng; Braunstein, Samuel L.

    2013-11-01

    We study the quantum version of a decision tree classifier to fill the gap between quantum computation and machine learning. The quantum entropy impurity criterion which is used to determine which node should be split is presented in the paper. By using the quantum fidelity measure between two quantum states, we cluster the training data into subclasses so that the quantum decision tree can manipulate quantum states. We also propose algorithms constructing the quantum decision tree and searching for a target class over the tree for a new quantum object.

  9. Lazy decision trees

    SciTech Connect

    Friedman, J.H.; Yun, Yeogirl; Kohavi, R.

    1996-12-31

    Lazy learning algorithms, exemplified by nearest-neighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single {open_quotes}best{close_quotes} decision tree during the training phase, and this tree is then used to classify test instances. The tests at the nodes of the constructed tree are good on average, but there may be better tests for classifying a specific instance. We propose a lazy decision tree algorithm-LazyDT-that conceptually constructs the {open_quotes}best{close_quote} decision tree for each test instance. In practice, only a path needs to be constructed, and a caching scheme makes the algorithm fast. The algorithm is robust with respect to missing values without resorting to the complicated methods usually seen in induction of decision trees. Experiments on real and artificial problems are presented.

  10. Weighted Hybrid Decision Tree Model for Random Forest Classifier

    NASA Astrophysics Data System (ADS)

    Kulkarni, Vrushali Y.; Sinha, Pradeep K.; Petare, Manisha C.

    2016-06-01

    Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

  11. Reweighting with Boosted Decision Trees

    NASA Astrophysics Data System (ADS)

    Rogozhnikov, Alex

    2016-10-01

    Machine learning tools are commonly used in modern high energy physics (HEP) experiments. Different models, such as boosted decision trees (BDT) and artificial neural networks (ANN), are widely used in analyses and even in the software triggers [1]. In most cases, these are classification models used to select the “signal” events from data. Monte Carlo simulated events typically take part in training of these models. While the results of the simulation are expected to be close to real data, in practical cases there is notable disagreement between simulated and observed data. In order to use available simulation in training, corrections must be introduced to generated data. One common approach is reweighting — assigning weights to the simulated events. We present a novel method of event reweighting based on boosted decision trees. The problem of checking the quality of reweighting step in analyses is also discussed.

  12. Extensions and applications of ensemble-of-trees methods in machine learning

    NASA Astrophysics Data System (ADS)

    Bleich, Justin

    Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of

  13. EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates

    PubMed Central

    Vilella, Albert J.; Severin, Jessica; Ureta-Vidal, Abel; Heng, Li; Durbin, Richard; Birney, Ewan

    2009-01-01

    We have developed a comprehensive gene orientated phylogenetic resource, EnsemblCompara GeneTrees, based on a computational pipeline to handle clustering, multiple alignment, and tree generation, including the handling of large gene families. We developed two novel non-sequence-based metrics of gene tree correctness and benchmarked a number of tree methods. The TreeBeST method from TreeFam shows the best performance in our hands. We also compared this phylogenetic approach to clustering approaches for ortholog prediction, showing a large increase in coverage using the phylogenetic approach. All data are made available in a number of formats and will be kept up to date with the Ensembl project. PMID:19029536

  14. Bayesian Evidence Framework for Decision Tree Learning

    NASA Astrophysics Data System (ADS)

    Chatpatanasiri, Ratthachat; Kijsirikul, Boonserm

    2005-11-01

    This work is primary interested in the problem of, given the observed data, selecting a single decision (or classification) tree. Although a single decision tree has a high risk to be overfitted, the induced tree is easily interpreted. Researchers have invented various methods such as tree pruning or tree averaging for preventing the induced tree from overfitting (and from underfitting) the data. In this paper, instead of using those conventional approaches, we apply the Bayesian evidence framework of Gull, Skilling and Mackay to a process of selecting a decision tree. We derive a formal function to measure `the fitness' for each decision tree given a set of observed data. Our method, in fact, is analogous to a well-known Bayesian model selection method for interpolating noisy continuous-value data. As in regression problems, given reasonable assumptions, this derived score function automatically quantifies the principle of Ockham's razor, and hence reasonably deals with the issue of underfitting-overfitting tradeoff.

  15. Ensemble survival trees for identifying subpopulations in personalized medicine.

    PubMed

    Chen, Yu-Chuan; Chen, James J

    2016-09-01

    Recently, personalized medicine has received great attention to improve safety and effectiveness in drug development. Personalized medicine aims to provide medical treatment that is tailored to the patient's characteristics such as genomic biomarkers, disease history, etc., so that the benefit of treatment can be optimized. Subpopulations identification is to divide patients into several different subgroups where each subgroup corresponds to an optimal treatment. For two subgroups, traditionally the multivariate Cox proportional hazards model is fitted and used to calculate the risk score when outcome is survival time endpoint. Median is commonly chosen as the cutoff value to separate patients. However, using median as the cutoff value is quite subjective and sometimes may be inappropriate in situations where data are imbalanced. Here, we propose a novel tree-based method that adopts the algorithm of relative risk trees to identify subgroup patients. After growing a relative risk tree, we apply k-means clustering to group the terminal nodes based on the averaged covariates. We adopt an ensemble Bagging method to improve the performance of a single tree since it is well known that the performance of a single tree is quite unstable. A simulation study is conducted to compare the performance between our proposed method and the multivariate Cox model. The applications of our proposed method to two public cancer data sets are also conducted for illustration.

  16. The clinical decision analysis using decision tree.

    PubMed

    Bae, Jong-Myon

    2014-01-01

    The clinical decision analysis (CDA) has used to overcome complexity and uncertainty in medical problems. The CDA is a tool allowing decision-makers to apply evidence-based medicine to make objective clinical decisions when faced with complex situations. The usefulness and limitation including six steps in conducting CDA were reviewed. The application of CDA results should be done under shared decision with patients' value.

  17. The clinical decision analysis using decision tree

    PubMed Central

    Bae, Jong-Myon

    2014-01-01

    The clinical decision analysis (CDA) has used to overcome complexity and uncertainty in medical problems. The CDA is a tool allowing decision-makers to apply evidence-based medicine to make objective clinical decisions when faced with complex situations. The usefulness and limitation including six steps in conducting CDA were reviewed. The application of CDA results should be done under shared decision with patients’ value. PMID:25358466

  18. The decision tree approach to classification

    NASA Technical Reports Server (NTRS)

    Wu, C.; Landgrebe, D. A.; Swain, P. H.

    1975-01-01

    A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.

  19. Comprehensive Decision Tree Models in Bioinformatics

    PubMed Central

    Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter

    2012-01-01

    Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class

  20. A survey of decision tree classifier methodology

    NASA Technical Reports Server (NTRS)

    Safavian, S. Rasoul; Landgrebe, David

    1990-01-01

    Decision Tree Classifiers (DTC's) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps, the most important feature of DTC's is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issue. After considering potential advantages of DTC's over single stage classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  1. A survey of decision tree classifier methodology

    NASA Technical Reports Server (NTRS)

    Safavian, S. R.; Landgrebe, David

    1991-01-01

    Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.

  2. PRIA 3 Fee Determination Decision Tree

    EPA Pesticide Factsheets

    The PRIA 3 decision tree will help applicants requesting a pesticide registration or certain tolerance action to accurately identify the category of their application and the amount of the required fee before they submit the application.

  3. RE-Powering’s Electronic Decision Tree

    EPA Pesticide Factsheets

    Developed by US EPA's RE-Powering America's Land Initiative, the RE-Powering Decision Trees tool guides interested parties through a process to screen sites for their suitability for solar photovoltaics or wind installations

  4. Solar and Wind Site Screening Decision Trees

    EPA Pesticide Factsheets

    EPA and NREL created a decision tree to guide state and local governments and other stakeholders through a process for screening sites for their suitability for future redevelopment with solar photovoltaic (PV) energy and wind energy.

  5. Parallel object-oriented decision tree system

    SciTech Connect

    Kamath; Chandrika , Cantu-Paz; Erick

    2006-02-28

    A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.

  6. Support Vector Machine with Ensemble Tree Kernel for Relation Extraction

    PubMed Central

    Fu, Hui; Du, Zhiguo

    2016-01-01

    Relation extraction is one of the important research topics in the field of information extraction research. To solve the problem of semantic variation in traditional semisupervised relation extraction algorithm, this paper proposes a novel semisupervised relation extraction algorithm based on ensemble learning (LXRE). The new algorithm mainly uses two kinds of support vector machine classifiers based on tree kernel for integration and integrates the strategy of constrained extension seed set. The new algorithm can weaken the inaccuracy of relation extraction, which is caused by the phenomenon of semantic variation. The numerical experimental research based on two benchmark data sets (PropBank and AIMed) shows that the LXRE algorithm proposed in the paper is superior to other two common relation extraction methods in four evaluation indexes (Precision, Recall, F-measure, and Accuracy). It indicates that the new algorithm has good relation extraction ability compared with others. PMID:27118966

  7. Automated critiquing of medical decision trees.

    PubMed

    Wellman, M P; Eckman, M H; Fleming, C; Marshall, S L; Sonnenberg, F A; Pauker, S G

    1989-01-01

    The authors developed a decision tree-critiquing program (called BUNYAN) that identifies potential modeling errors in medical decision trees. The program's critiques are based on the structure of a decision problem, obtained from an abstract description specifying only the basic semantic categories of the model's components. A taxonomy of node and branch types supplies the primitive building blocks for representing decision trees. Bunyan detects potential problems in a model by matching general pattern expressions that refer to these primitives. A small set of general principles justifies critiquing rules that detect four categories of potential structural problems: impossible strategies, dominated strategies, unaccountable violations of symmetry, and omission of apparently reasonable strategies. Although critiquing based on structure alone has clear limitations, principled structural analysis constitutes the core of a methodology for reasoning about decision models.

  8. Decision Tree Approach for Soil Liquefaction Assessment

    PubMed Central

    Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.

    2013-01-01

    In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498

  9. Decision tree approach for soil liquefaction assessment.

    PubMed

    Gandomi, Amir H; Fridline, Mark M; Roke, David A

    2013-01-01

    In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view.

  10. Fast Image Texture Classification Using Decision Trees

    NASA Technical Reports Server (NTRS)

    Thompson, David R.

    2011-01-01

    Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.

  11. Bayesian Ensemble Trees (BET) for Clustering and Prediction in Heterogeneous Data

    PubMed Central

    Duan, Leo L.; Clancy, John P.; Szczesniak, Rhonda D.

    2016-01-01

    We propose a novel “tree-averaging” model that utilizes the ensemble of classification and regression trees (CART). Each constituent tree is estimated with a subset of similar data. We treat this grouping of subsets as Bayesian Ensemble Trees (BET) and model them as a Dirichlet process. We show that BET determines the optimal number of trees by adapting to the data heterogeneity. Compared with the other ensemble methods, BET requires much fewer trees and shows equivalent prediction accuracy using weighted averaging. Moreover, each tree in BET provides variable selection criterion and interpretation for each subset. We developed an efficient estimating procedure with improved estimation strategies in both CART and mixture models. We demonstrate these advantages of BET with simulations and illustrate the approach with a real-world data example involving regression of lung function measurements obtained from patients with cystic fibrosis. Supplemental materials are available online. PMID:27524872

  12. CUDT: a CUDA based decision tree algorithm.

    PubMed

    Lo, Win-Tsung; Chang, Yue-Shan; Sheu, Ruey-Kai; Chiu, Chun-Chieh; Yuan, Shyan-Ming

    2014-01-01

    Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture), which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5 ∼ 55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

  13. Algorithms for optimal dyadic decision trees

    SciTech Connect

    Hush, Don; Porter, Reid

    2009-01-01

    A new algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, revising the core tree-building algorithm so that its run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.

  14. Prediction of regional streamflow frequency using model tree ensembles

    NASA Astrophysics Data System (ADS)

    Schnier, Spencer; Cai, Ximing

    2014-09-01

    This study introduces a novel data-driven method called model tree ensembles (MTEs) to predict streamflow frequency statistics based on known drainage area characteristics, which yields insights into the dominant controls of regional streamflow. The database used to induce the models contains both natural and anthropogenic drainage area characteristics for 294 USGS stream gages (164 in Texas and 130 in Illinois). MTEs were used to predict complete flow duration curves (FDCs) of ungaged streams by developing 17 models corresponding to 17 points along the FDC. Model accuracy was evaluated using ten-fold cross-validation and the coefficient of determination (R2). During the validation, the gages withheld from the analysis represent ungaged watersheds. MTEs are shown to outperform global multiple-linear regression models for predictions in ungaged watersheds. The accuracy of models for low flow is enhanced by explicit consideration of variables that capture human interference in watershed hydrology (e.g., population). Human factors (e.g., population and groundwater use) appear in the regionalizations for low flows, while annual and seasonal precipitation and drainage area are important for regionalizations of all flows. The results of this study have important implications for predictions in ungaged watersheds as well as gaged watersheds subject to anthropogenically-driven hydrologic changes.

  15. IND - THE IND DECISION TREE PACKAGE

    NASA Technical Reports Server (NTRS)

    Buntine, W.

    1994-01-01

    A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The

  16. Two Trees: Migrating Fault Trees to Decision Trees for Real Time Fault Detection on International Space Station

    NASA Technical Reports Server (NTRS)

    Lee, Charles; Alena, Richard L.; Robinson, Peter

    2004-01-01

    We started from ISS fault trees example to migrate to decision trees, presented a method to convert fault trees to decision trees. The method shows that the visualizations of root cause of fault are easier and the tree manipulating becomes more programmatic via available decision tree programs. The visualization of decision trees for the diagnostic shows a format of straight forward and easy understands. For ISS real time fault diagnostic, the status of the systems could be shown by mining the signals through the trees and see where it stops at. The other advantage to use decision trees is that the trees can learn the fault patterns and predict the future fault from the historic data. The learning is not only on the static data sets but also can be online, through accumulating the real time data sets, the decision trees can gain and store faults patterns in the trees and recognize them when they come.

  17. Lower Bounds for Algebraic Decision Trees.

    DTIC Science & Technology

    1980-07-01

    Computational Model the General Method. Let W C R " be any set. A (d-th order) decision tree T for testing if Z E W is a ternary tree with each internal node...satisfies the inequality 2 hp(hTdn)> N. Proof. For each leaf I of T let V be the set of inputs N E R " leading to I and let It be the set of...Seidenberg 110). To use Bezout’s Theorem we suppose that p is a real polynomial in n variables with degree m, and we note that R can be chosen so that A = {p

  18. CUDT: A CUDA Based Decision Tree Algorithm

    PubMed Central

    Sheu, Ruey-Kai; Chiu, Chun-Chieh

    2014-01-01

    Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture), which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set. PMID:25140346

  19. Using Decision Trees for Comparing Pattern Recognition Feature Sets

    SciTech Connect

    Proctor, D D

    2005-08-18

    Determination of the best set of features has been acknowledged as one of the most difficult tasks in the pattern recognition process. In this report significance tests on the sort-ordered, sample-size normalized vote distribution of an ensemble of decision trees is introduced as a method of evaluating relative quality of feature sets. Alternative functional forms for feature sets are also examined. Associated standard deviations provide the means to evaluate the effect of the number of folds, the number of classifiers per fold, and the sample size on the resulting classifications. The method is applied to a problem for which a significant portion of the training set cannot be classified unambiguously.

  20. Tree Structure Generation from Ensemble Forecasts for Short-Term Reservoir Optimization

    NASA Astrophysics Data System (ADS)

    Raso, L.; Schwanenberg, D.; Van De Giesen, N.

    2012-12-01

    In short-term reservoir management, weather forecasts enable water managers to look further ahead in time and anticipate on future system states. In this context, ensemble forecasts provide information about the uncertainty of the weather information. Tree-Based Model Predictive Control (TB-MPC) is an optimization scheme that embeds ensemble forecasts in a Multistage Stochastic Programming. TB-MPC requires a predefined tree structure that specifies when the ensemble trajectories diverge from each other. A correct tree structure is of critical importance because it strongly affects the performance of the optimization, and existing methods do not offer satisfactory results. We present a new methodology to generate a tree structure from the trajectories of an ensemble. The method models the information flow, considering which observations will become available along the forecast horizon, at which moment, and their level of uncertainty. It places a branching point when there is enough certainty on which trajectory is actually occurring. The method is well suited for trajectories that are close to each other at the beginning of the forecasting horizon, and spread out when progressing in time, as ensemble forecasts typically do. The method is compared to other tree structures (two-stage stochastic programming and others) in terms of performance by an application to the short-term management of the Salto Grande hydropower reservoir in River Uruguay along the Argentinean Uruguayan border.

  1. Coherent neuronal ensembles are rapidly recruited when making a look-reach decision.

    PubMed

    Wong, Yan T; Fabiszak, Margaret M; Novikov, Yevgeny; Daw, Nathaniel D; Pesaran, Bijan

    2016-02-01

    Selecting and planning actions recruits neurons across many areas of the brain, but how ensembles of neurons work together to make decisions is unknown. Temporally coherent neural activity may provide a mechanism by which neurons coordinate their activity to make decisions. If so, neurons that are part of coherent ensembles may predict movement choices before other ensembles of neurons. We recorded neuronal activity in the lateral and medial banks of the intraparietal sulcus (IPS) of the posterior parietal cortex while monkeys made choices about where to look and reach. We decoded the activity to predict the choices. Ensembles of neurons that displayed coherent patterns of spiking activity extending across the IPS--'dual-coherent' ensembles--predicted movement choices substantially earlier than other neuronal ensembles. We propose that dual-coherent spike timing reflects interactions between groups of neurons that are important to decisions.

  2. Coherent neuronal ensembles are rapidly recruited when making a look-reach decision

    PubMed Central

    Wong, Yan T.; Fabiszak, Margaret M.; Novikov, Yevgeny; Daw, Nathaniel D.; Pesaran, Bijan

    2015-01-01

    Summary Selecting and planning actions recruits neurons across many areas of the brain but how ensembles of neurons work together to make decisions is unknown. Temporally-coherent neural activity may provide a mechanism by which neurons coordinate their activity in order to make decisions. If so, neurons that are part of coherent ensembles may predict movement choices before other ensembles of neurons. We recorded neuronal activity in the lateral and medial banks of the intraparietal sulcus (IPS) of the posterior parietal cortex, while monkeys made choices about where to look and reach and decoded the activity to predict the choices. Ensembles of neurons that displayed coherent patterns of spiking activity extending across the IPS, “dual coherent” ensembles, predicted movement choices substantially earlier than other neuronal ensembles. We propose that dual-coherent spike timing reflects interactions between groups of neurons that play an important role in how we make decisions. PMID:26752158

  3. Identification of metabolic syndrome using decision tree analysis.

    PubMed

    Worachartcheewan, Apilak; Nantasenamat, Chanin; Isarankura-Na-Ayudhya, Chartchalerm; Pidetcha, Phannee; Prachayasittikul, Virapong

    2010-10-01

    This study employs decision tree as a decision support system for rapid and automated identification of individuals with metabolic syndrome (MS) among a Thai population. Results demonstrated strong predictivity of the decision tree in classification of individuals with and without MS, displaying an overall accuracy in excess of 99%.

  4. Extracting decision rules from police accident reports through decision trees.

    PubMed

    de Oña, Juan; López, Griselda; Abellán, Joaquín

    2013-01-01

    Given the current number of road accidents, the aim of many road safety analysts is to identify the main factors that contribute to crash severity. To pinpoint those factors, this paper shows an application that applies some of the methods most commonly used to build decision trees (DTs), which have not been applied to the road safety field before. An analysis of accidents on rural highways in the province of Granada (Spain) between 2003 and 2009 (both inclusive) showed that the methods used to build DTs serve our purpose and may even be complementary. Applying these methods has enabled potentially useful decision rules to be extracted that could be used by road safety analysts. For instance, some of the rules may indicate that women, contrary to men, increase their risk of severity under bad lighting conditions. The rules could be used in road safety campaigns to mitigate specific problems. This would enable managers to implement priority actions based on a classification of accidents by types (depending on their severity). However, the primary importance of this proposal is that other databases not used here (i.e. other infrastructure, roads and countries) could be used to identify unconventional problems in a manner easy for road safety managers to understand, as decision rules.

  5. Ventriculogram segmentation using boosted decision trees

    NASA Astrophysics Data System (ADS)

    McDonald, John A.; Sheehan, Florence H.

    2004-05-01

    Left ventricular status, reflected in ejection fraction or end systolic volume, is a powerful prognostic indicator in heart disease. Quantitative analysis of these and other parameters from ventriculograms (cine xrays of the left ventricle) is infrequently performed due to the labor required for manual segmentation. None of the many methods developed for automated segmentation has achieved clinical acceptance. We present a method for semi-automatic segmentation of ventriculograms based on a very accurate two-stage boosted decision-tree pixel classifier. The classifier determines which pixels are inside the ventricle at key ED (end-diastole) and ES (end-systole) frames. The test misclassification rate is about 1%. The classifier is semi-automatic, requiring a user to select 3 points in each frame: the endpoints of the aortic valve and the apex. The first classifier stage is 2 boosted decision-trees, trained using features such as gray-level statistics (e.g. median brightness) and image geometry (e.g. coordinates relative to user supplied 3 points). Second stage classifiers are trained using the same features as the first, plus the output of the first stage. Border pixels are determined from the segmented images using dilation and erosion. A curve is then fit to the border pixels, minimizing a penalty function that trades off fidelity to the border pixels with smoothness. ED and ES volumes, and ejection fraction are estimated from border curves using standard area-length formulas. On independent test data, the differences between automatic and manual volumes (and ejection fractions) are similar in size to the differences between two human observers.

  6. 15 CFR Supplement No 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... 15 Commerce and Foreign Trade 2 2014-01-01 2014-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000...

  7. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  8. 15 CFR Supplement No 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... 15 Commerce and Foreign Trade 2 2013-01-01 2013-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000...

  9. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... 15 Commerce and Foreign Trade 2 2012-01-01 2012-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  10. 15 CFR Supplement 1 to Part 732 - Decision Tree

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... 15 Commerce and Foreign Trade 2 2011-01-01 2011-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000...

  11. Decision-Tree Formulation With Order-1 Lateral Execution

    NASA Technical Reports Server (NTRS)

    James, Mark

    2007-01-01

    A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive

  12. Computational study of developing high-quality decision trees

    NASA Astrophysics Data System (ADS)

    Fu, Zhiwei

    2002-03-01

    Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio- Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

  13. Decision tree methods: applications for classification and prediction

    PubMed Central

    SONG, Yan-yan; LU, Ying

    2015-01-01

    Summary Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure. PMID:26120265

  14. An Ensemble of Neural Networks for Stock Trading Decision Making

    NASA Astrophysics Data System (ADS)

    Chang, Pei-Chann; Liu, Chen-Hao; Fan, Chin-Yuan; Lin, Jun-Lin; Lai, Chih-Ming

    Stock turning signals detection are very interesting subject arising in numerous financial and economic planning problems. In this paper, Ensemble Neural Network system with Intelligent Piecewise Linear Representation for stock turning points detection is presented. The Intelligent piecewise linear representation method is able to generate numerous stocks turning signals from the historic data base, then Ensemble Neural Network system will be applied to train the pattern and retrieve similar stock price patterns from historic data for training. These turning signals represent short-term and long-term trading signals for selling or buying stocks from the market which are applied to forecast the future turning points from the set of test data. Experimental results demonstrate that the hybrid system can make a significant and constant amount of profit when compared with other approaches using stock data available in the market.

  15. Automatic design of decision-tree algorithms with evolutionary algorithms.

    PubMed

    Barros, Rodrigo C; Basgalupp, Márcio P; de Carvalho, André C P L F; Freitas, Alex A

    2013-01-01

    This study reports the empirical analysis of a hyper-heuristic evolutionary algorithm that is capable of automatically designing top-down decision-tree induction algorithms. Top-down decision-tree algorithms are of great importance, considering their ability to provide an intuitive and accurate knowledge representation for classification problems. The automatic design of these algorithms seems timely, given the large literature accumulated over more than 40 years of research in the manual design of decision-tree induction algorithms. The proposed hyper-heuristic evolutionary algorithm, HEAD-DT, is extensively tested using 20 public UCI datasets and 10 microarray gene expression datasets. The algorithms automatically designed by HEAD-DT are compared with traditional decision-tree induction algorithms, such as C4.5 and CART. Experimental results show that HEAD-DT is capable of generating algorithms which are significantly more accurate than C4.5 and CART.

  16. Ensemble modelling and structured decision-making to support Emergency Disease Management.

    PubMed

    Webb, Colleen T; Ferrari, Matthew; Lindström, Tom; Carpenter, Tim; Dürr, Salome; Garner, Graeme; Jewell, Chris; Stevenson, Mark; Ward, Michael P; Werkman, Marleen; Backer, Jantien; Tildesley, Michael

    2017-03-01

    Epidemiological models in animal health are commonly used as decision-support tools to understand the impact of various control actions on infection spread in susceptible populations. Different models contain different assumptions and parameterizations, and policy decisions might be improved by considering outputs from multiple models. However, a transparent decision-support framework to integrate outputs from multiple models is nascent in epidemiology. Ensemble modelling and structured decision-making integrate the outputs of multiple models, compare policy actions and support policy decision-making. We briefly review the epidemiological application of ensemble modelling and structured decision-making and illustrate the potential of these methods using foot and mouth disease (FMD) models. In case study one, we apply structured decision-making to compare five possible control actions across three FMD models and show which control actions and outbreak costs are robustly supported and which are impacted by model uncertainty. In case study two, we develop a methodology for weighting the outputs of different models and show how different weighting schemes may impact the choice of control action. Using these case studies, we broadly illustrate the potential of ensemble modelling and structured decision-making in epidemiology to provide better information for decision-making and outline necessary development of these methods for their further application.

  17. Operational optimization of irrigation scheduling for citrus trees using an ensemble based data assimilation approach

    NASA Astrophysics Data System (ADS)

    Hendricks Franssen, H.; Han, X.; Martinez, F.; Jimenez, M.; Manzano, J.; Chanzy, A.; Vereecken, H.

    2013-12-01

    Data assimilation (DA) techniques, like the local ensemble transform Kalman filter (LETKF) not only offer the opportunity to update model predictions by assimilating new measurement data in real time, but also provide an improved basis for real-time (DA-based) control. This study focuses on the optimization of real-time irrigation scheduling for fields of citrus trees near Picassent (Spain). For three selected fields the irrigation was optimized with DA-based control, and for other fields irrigation was optimized on the basis of a more traditional approach where reference evapotranspiration for citrus trees was estimated using the FAO-method. The performance of the two methods is compared for the year 2013. The DA-based real-time control approach is based on ensemble predictions of soil moisture profiles, using the Community Land Model (CLM). The uncertainty in the model predictions is introduced by feeding the model with weather predictions from an ensemble prediction system (EPS) and uncertain soil hydraulic parameters. The model predictions are updated daily by assimilating soil moisture data measured by capacitance probes. The measurement data are assimilated with help of LETKF. The irrigation need was calculated for each of the ensemble members, averaged, and logistic constraints (hydraulics, energy costs) were taken into account for the final assigning of irrigation in space and time. For the operational scheduling based on this approach only model states and no model parameters were updated by the model. Other, non-operational simulation experiments for the same period were carried out where (1) neither ensemble weather forecast nor DA were used (open loop), (2) Only ensemble weather forecast was used, (3) Only DA was used, (4) also soil hydraulic parameters were updated in data assimilation and (5) both soil hydraulic and plant specific parameters were updated. The FAO-based and DA-based real-time irrigation control are compared in terms of soil moisture

  18. Decision Trees for Prediction and Data Mining

    DTIC Science & Technology

    2005-02-10

    ironic, as research in tree-structured methods was originally motivated by the desire for an interpretable alternative to standard methods such as...multiple linear regression and neural networks. Another problem with most tree construction algorithms is that their variable selection methods are biased...software, including well-known ones such as CART (Breiman, Friedman, Olshen and Stone 1984) and M5 (Quinlan 1992). With the excep- tion of the lesser

  19. Generating the Simple Decision Tree with Symbiotic Evolution

    NASA Astrophysics Data System (ADS)

    Otani, Noriko; Shimura, Masamichi

    In representing classification rules by decision trees, simplicity of tree structure is as important as predictive accuracy especially in consideration of the comprehensibility to a human, the memory capacity and the time required to classify. Trees tend to be complex when they get high accuracy. This paper proposes a novel method for generating accurate and simple decision trees based on symbiotic evolution. It is distinctive of symbiotic evolution that two different populations are evolved in parallel through genetic algorithms. In our method one's individuals are partial trees of height 1, and the other's individuals are whole trees represented by the combinations of the former individuals. Generally, overfitting to training examples prevents getting high predictive accuracy. In order to circumvent this difficulty, individuals are evaluated with not only the accuracy in training examples but also the correct answer biased rate indicating the dispersion of the correct answers in the terminal nodes. Based on our method we developed a system called SESAT for generating decision trees. Our experimental results show that SESAT compares favorably with other systems on several datasets in the UCI repository. SESAT has the ability to generate more simple trees than C5.0 without sacrificing predictive accuracy.

  20. Learning accurate very fast decision trees from uncertain data streams

    NASA Astrophysics Data System (ADS)

    Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo

    2015-12-01

    Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.

  1. Multi-Model Long-Range Ensemble Forecast for Decision Support in Hydroelectric Operations

    NASA Astrophysics Data System (ADS)

    Kunkel, M. L.; Parkinson, S.; Blestrud, D.; Holbrook, V. P.

    2014-12-01

    Idaho Power Company (IPC) is a hydroelectric based utility serving over a million customers in southern Idaho and eastern Oregon. Hydropower makes up ~50% of our power generation and accurate predictions of streamflow and precipitation drive our long-term planning and decision support for operations. We investigate the use of a multi-model ensemble approach for mid and long-range streamflow and precipitation forecasts throughout the Snake River Basin. Forecast are prepared using an Idaho Power developed ensemble forecasting technique for 89 locations throughout the Snake River Basin for periods of 3 to 18 months in advance. A series of multivariable linear regression, multivariable non-linear regression and multivariable Kalman filter techniques are combined in an ensemble forecast based upon two data types, historical data (streamflow, precipitation, climate indices [i.e. PDO, ENSO, AO, etc…]) and single value decomposition derived values based upon atmospheric heights and sea surface temperatures.

  2. Comparing the decision-relevance and utility of alternative ensembles of climate projections in water management and other applications

    NASA Astrophysics Data System (ADS)

    Lempert, R. J.; Tingstad, A.

    2015-12-01

    Decisions to manage the risks of climate change hinge, among many other things, on deeply uncertain and imperfect climate projections. Improving the decision relevance and utility of climate projections requires navigating a trade-off between increasing the physical realism of the model (often by improving the spatial resolution) and increasing the representation of decision-relevant uncertainties. This talk will examine the decision-relevance and utility of alternative ensembles of climate information by comparing two decision support applications, in water management and biodiversity perseveration, both in California. The climate ensembles will consist of different combinations of high and medium resolution projections from NARCCAP (North American Regional Climate Assessment Program) as well as low resolution, but more numerous, projections from the CMIP3 and CMIP5 ensembles. The decision support applications will use the same ensembles of climate projections in different contexts. Workshops with decision makers examine the extent to which the different ensembles lead to different decisions, the extent to which considering a wider range of uncertainty affects decisions, the extent to which decision makers' confidence in the projections and the decisions based on them will be sensitive to the resolution at which they are communicated and the resolution dependent skill, and how the answers to these questions varies with the water management and biodiversity contexts. This study aims to provide empirical evidence to support judgments on how best to use uncertainty climate information in water management and other decision support applications.

  3. Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets

    PubMed Central

    Doubravsky, Karel; Dohnal, Mirko

    2015-01-01

    Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details. PMID:26158662

  4. Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets.

    PubMed

    Doubravsky, Karel; Dohnal, Mirko

    2015-01-01

    Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details.

  5. Evaluation of Decision Trees for Cloud Detection from AVHRR Data

    NASA Technical Reports Server (NTRS)

    Shiffman, Smadar; Nemani, Ramakrishna

    2005-01-01

    Automated cloud detection and tracking is an important step in assessing changes in radiation budgets associated with global climate change via remote sensing. Data products based on satellite imagery are available to the scientific community for studying trends in the Earth's atmosphere. The data products include pixel-based cloud masks that assign cloud-cover classifications to pixels. Many cloud-mask algorithms have the form of decision trees. The decision trees employ sequential tests that scientists designed based on empirical astrophysics studies and simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In a previous study we compared automatically learned decision trees to cloud masks included in Advanced Very High Resolution Radiometer (AVHRR) data products from the year 2000. In this paper we report the replication of the study for five-year data, and for a gold standard based on surface observations performed by scientists at weather stations in the British Islands. For our sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks p < 0.001.

  6. Classification of posture and activities by using decision trees.

    PubMed

    Zhang, Ting; Tang, Wenlong; Sazonov, Edward S

    2012-01-01

    Obesity prevention and treatment as well as healthy life style recommendation requires the estimation of everyday physical activity. Monitoring posture allocations and activities with sensor systems is an effective method to achieve the goal. However, at present, most devices available rely on multiple sensors distributed on the body, which might be too obtrusive for everyday use. In this study, data was collected from a wearable shoe sensor system (SmartShoe) and a decision tree algorithm was applied for classification with high computational accuracy. The dataset was collected from 9 individual subjects performing 6 different activities--sitting, standing, walking, cycling, and stairs ascent/descent. Statistical features were calculated and the classification with decision tree classifier was performed, after which, advanced boosting algorithm was applied. The computational accuracy is as high as 98.85% without boosting, and 98.90% after boosting. Additionally, the simple tree structure provides a direct approach to simplify the feature set.

  7. Supervised learning with decision tree-based methods in computational and systems biology.

    PubMed

    Geurts, Pierre; Irrthum, Alexandre; Wehenkel, Louis

    2009-12-01

    At the intersection between artificial intelligence and statistics, supervised learning allows algorithms to automatically build predictive models from just observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the review is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the review provides a survey of their applications in the context of computational and systems biology.

  8. Boosting alternating decision trees modeling of disease trait information.

    PubMed

    Liu, Kuang-Yu; Lin, Jennifer; Zhou, Xiaobo; Wong, Stephen T C

    2005-12-30

    We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disorder disease status as the outcome of ADTrees-based classifiers, we obtained a new quantitative trait based on average prediction scores, which was then used for genome-wide quantitative trait linkage (QTL) analysis. ADTrees are machine learning methods that combine boosting and decision trees algorithms to generate smaller and easier-to-interpret classification rules. In this application, we compared four modeling strategies from the combinations of two boosting iterations (log or exponential loss functions) coupled with two choices of tree generation types (a full alternating decision tree or a classic boosting decision tree). These four different strategies were applied to the founders in each population to construct four classifiers, which were then applied to each study participant. To compute average prediction score for each subject with a specific trait profile, such a process was repeated with 10 runs of 10-fold cross validation, and standardized prediction scores obtained from the 10 runs were averaged and used in subsequent expectation-maximization Haseman-Elston QTL analyses (implemented in GENEHUNTER) with the approximate 900 SNPs in Hardy-Weinberg equilibrium provided for each population. Our QTL analyses on the basis of four models (a full alternating decision tree and a classic boosting decision tree paired with either log or exponential loss function) detected evidence for linkage (Z >or= 1.96, p < 0.01) on chromosomes 1, 3, 5, and 9. Moreover, using average iteration and abundance scores for the 12 phenotypes and sex as their relevancy measurements, we found all relevant phenotypes for all four populations except phenotype b for the Karangar population

  9. The limitations of decision trees and automatic learning in real world medical decision making.

    PubMed

    Kokol, P; Zorman, M; Stiglic, M M; Malèiae, I

    1998-01-01

    The decision tree approach is one of the most common approaches in automatic learning and decision making. It is popular for its simplicity in constructing, efficient use in decision making and for simple representation, which is easily understood by humans. The automatic learning of decision trees and their use usually show very good results in various "theoretical" environments. The training sets are usually large enough for learning algorithm to construct a hypothesis consistent with the underlying concept. But in real life it is often impossible to find the desired number of training objects for various reasons. The lack of possibilities to measure attribute values, high cost and complexity of such measurements, unavailability of all attributes at the same time are the typical representatives. There are different ways to deal with some of these problems, but in a delicate field of medical decision making, we cannot allow ourselves to make any inaccurate decisions. We have measured the values of 24 attributes before and after the 82 operations of children in age between 2 and 10 years. The aim was to find the dependencies between attribute values and a child's predisposition to acidemia--the decrease of blood's pH. Our main interest was in discovering predisposition to two forms of acidosis, the metabolic acidosis and the respiratory acidosis, which can both have serious effects on child's health. We decided to construct different decision trees from a set of training objects, which was complete (there were no missing attribute values), but on the other hand not large enough to avoid the effect of overfitting. A common approach to evaluation of a decision tree is the use of a test set. In our case we decided that instead of using a test set, we ask medical experts to take a closer look at the generated trees. They examined and evaluated the decision trees branch by branch. Their comments on the generated trees can be found in this paper. The comments show, that

  10. Towards the assimilation of tree-ring-width records using ensemble Kalman filtering techniques

    NASA Astrophysics Data System (ADS)

    Acevedo, Walter; Reich, Sebastian; Cubasch, Ulrich

    2016-03-01

    This paper investigates the applicability of the Vaganov-Shashkin-Lite (VSL) forward model for tree-ring-width chronologies as observation operator within a proxy data assimilation (DA) setting. Based on the principle of limiting factors, VSL combines temperature and moisture time series in a nonlinear fashion to obtain simulated TRW chronologies. When used as observation operator, this modelling approach implies three compounding, challenging features: (1) time averaging, (2) "switching recording" of 2 variables and (3) bounded response windows leading to "thresholded response". We generate pseudo-TRW observations from a chaotic 2-scale dynamical system, used as a cartoon of the atmosphere-land system, and attempt to assimilate them via ensemble Kalman filtering techniques. Results within our simplified setting reveal that VSL's nonlinearities may lead to considerable loss of assimilation skill, as compared to the utilization of a time-averaged (TA) linear observation operator. In order to understand this undesired effect, we embed VSL's formulation into the framework of fuzzy logic (FL) theory, which thereby exposes multiple representations of the principle of limiting factors. DA experiments employing three alternative growth rate functions disclose a strong link between the lack of smoothness of the growth rate function and the loss of optimality in the estimate of the TA state. Accordingly, VSL's performance as observation operator can be enhanced by resorting to smoother FL representations of the principle of limiting factors. This finding fosters new interpretations of tree-ring-growth limitation processes.

  11. Sinkhole hazard assessment in Minnesota using a decision tree model

    NASA Astrophysics Data System (ADS)

    Gao, Yongli; Alexander, E. Calvin

    2008-05-01

    An understanding of what influences sinkhole formation and the ability to accurately predict sinkhole hazards is critical to environmental management efforts in the karst lands of southeastern Minnesota. Based on the distribution of distances to the nearest sinkhole, sinkhole density, bedrock geology and depth to bedrock in southeastern Minnesota and northwestern Iowa, a decision tree model has been developed to construct maps of sinkhole probability in Minnesota. The decision tree model was converted as cartographic models and implemented in ArcGIS to create a preliminary sinkhole probability map in Goodhue, Wabasha, Olmsted, Fillmore, and Mower Counties. This model quantifies bedrock geology, depth to bedrock, sinkhole density, and neighborhood effects in southeastern Minnesota but excludes potential controlling factors such as structural control, topographic settings, human activities and land-use. The sinkhole probability map needs to be verified and updated as more sinkholes are mapped and more information about sinkhole formation is obtained.

  12. A modified classification tree method for personalized medicine decisions

    PubMed Central

    Tsai, Wan-Min; Zhang, Heping; Buta, Eugenia; O’Malley, Stephanie

    2015-01-01

    The tree-based methodology has been widely applied to identify predictors of health outcomes in medical studies. However, the classical tree-based approaches do not pay particular attention to treatment assignment and thus do not consider prediction in the context of treatment received. In recent years, attention has been shifting from average treatment effects to identifying moderators of treatment response, and tree-based approaches to identify subgroups of subjects with enhanced treatment responses are emerging. In this study, we extend and present modifications to one of these approaches (Zhang et al., 2010 [29]) to efficiently identify subgroups of subjects who respond more favorably to one treatment than another based on their baseline characteristics. We extend the algorithm by incorporating an automatic pruning step and propose a measure for assessment of the predictive performance of the constructed tree. We evaluate the proposed method through a simulation study and illustrate the approach using a data set from a clinical trial of treatments for alcohol dependence. This simple and efficient statistical tool can be used for developing algorithms for clinical decision making and personalized treatment for patients based on their characteristics. PMID:26770292

  13. The xeroderma pigmentosum pathway: decision tree analysis of DNA quality.

    PubMed

    Naegeli, Hanspeter; Sugasawa, Kaoru

    2011-07-15

    The nucleotide excision repair (NER) system is a fundamental cellular stress response that uses only a handful of DNA binding factors, mutated in the cancer-prone syndrome xeroderma pigmentosum (XP), to detect an astounding diversity of bulky base lesions, including those induced by ultraviolet light, electrophilic chemicals, oxygen radicals and further genetic insults. Several of these XP proteins are characterized by a mediocre preference for damaged substrates over the native double helix but, intriguingly, none of them recognizes injured bases with sufficient selectivity to account for the very high precision of bulky lesion excision. Instead, substrate versatility as well as damage specificity and strand selectivity are achieved by a multistage quality control strategy whereby different subunits of the XP pathway, in succession, interrogate the DNA double helix for a distinct abnormality in its structural or dynamic parameters. Through this step-by-step filtering procedure, the XP proteins operate like a systematic decision making tool, generally known as decision tree analysis, to sort out rare damaged bases embedded in a vast excess of native DNA. The present review is focused on the mechanisms by which multiple XP subunits of the NER pathway contribute to the proposed decision tree analysis of DNA quality in eukaryotic cells.

  14. An efficient tree classifier ensemble-based approach for pedestrian detection.

    PubMed

    Xu, Yanwu; Cao, Xianbin; Qiao, Hong

    2011-02-01

    Classification-based pedestrian detection systems (PDSs) are currently a hot research topic in the field of intelligent transportation. A PDS detects pedestrians in real time on moving vehicles. A practical PDS demands not only high detection accuracy but also high detection speed. However, most of the existing classification-based approaches mainly seek for high detection accuracy, while the detection speed is not purposely optimized for practical application. At the same time, the performance, particularly the speed, is primarily tuned based on experiments without theoretical foundations, leading to a long training procedure. This paper starts with measuring and optimizing detection speed, and then a practical classification-based pedestrian detection solution with high detection speed and training speed is described. First, an extended classification/detection speed metric, named feature-per-object (fpo), is proposed to measure the detection speed independently from execution. Then, an fpo minimization model with accuracy constraints is formulated based on a tree classifier ensemble, where the minimum fpo can guarantee the highest detection speed. Finally, the minimization problem is solved efficiently by using nonlinear fitting based on radial basis function neural networks. In addition, the optimal solution is directly used to instruct classifier training; thus, the training speed could be accelerated greatly. Therefore, a rapid and accurate classification-based detection technique is proposed for the PDS. Experimental results on urban traffic videos show that the proposed method has a high detection speed with an acceptable detection rate and a false-alarm rate for onboard detection; moreover, the training procedure is also very fast.

  15. A Novel Approach on Designing Augmented Fuzzy Cognitive Maps Using Fuzzified Decision Trees

    NASA Astrophysics Data System (ADS)

    Papageorgiou, Elpiniki I.

    This paper proposes a new methodology for designing Fuzzy Cognitive Maps using crisp decision trees that have been fuzzified. Fuzzy cognitive map is a knowledge-based technique that works as an artificial cognitive network inheriting the main aspects of cognitive maps and artificial neural networks. Decision trees, in the other hand, are well known intelligent techniques that extract rules from both symbolic and numeric data. Fuzzy theoretical techniques are used to fuzzify crisp decision trees in order to soften decision boundaries at decision nodes inherent in this type of trees. Comparisons between crisp decision trees and the fuzzified decision trees suggest that the later fuzzy tree is significantly more robust and produces a more balanced decision making. The approach proposed in this paper could incorporate any type of fuzzy decision trees. Through this methodology, new linguistic weights were determined in FCM model, thus producing augmented FCM tool. The framework is consisted of a new fuzzy algorithm to generate linguistic weights that describe the cause-effect relationships among the concepts of the FCM model, from induced fuzzy decision trees.

  16. Comparative study of biodegradability prediction of chemicals using decision trees, functional trees, and logistic regression.

    PubMed

    Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M

    2014-12-01

    Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed.

  17. Classification of Subcellular Phenotype Images by Decision Templates for Classifier Ensemble

    NASA Astrophysics Data System (ADS)

    Zhang, Bailing

    2010-01-01

    Subcellular localization is a key functional characteristic of proteins. An automatic, reliable and efficient prediction system for protein subcellular localization is needed for large-scale genome analysis. The automated cell phenotype image classification problem is an interesting "bioimage informatics" application. It can be used for establishing knowledge of the spatial distribution of proteins within living cells and permits to screen systems for drug discovery or for early diagnosis of a disease. In this paper, three well-known texture feature extraction methods including local binary patterns (LBP), Gabor filtering and Gray Level Coocurrence Matrix (GLCM) have been applied to cell phenotype images and the multiple layer perceptron (MLP) method has been used to classify cell phenotype image. After classification of the extracted features, decision-templates ensemble algorithm (DT) is used to combine base classifiers built on the different feature sets. Different texture feature sets can provide sufficient diversity among base classifiers, which is known as a necessary condition for improvement in ensemble performance. For the HeLa cells, the human classification error rate on this task is of 17% as reported in previous publications. We obtain with our method an error rate of 4.8%.

  18. Investigating the Utility of Oblique Tree-Based Ensembles for the Classification of Hyperspectral Data

    PubMed Central

    Poona, Nitesh; van Niekerk, Adriaan; Ismail, Riyad

    2016-01-01

    Ensemble classifiers are being widely used for the classification of spectroscopic data. In this regard, the random forest (RF) ensemble has been successfully applied in an array of applications, and has proven to be robust in handling high dimensional data. More recently, several variants of the traditional RF algorithm including rotation forest (rotF) and oblique random forest (oRF) have been applied to classifying high dimensional data. In this study we compare the traditional RF, rotF, and oRF (using three different splitting rules, i.e., ridge regression, partial least squares, and support vector machine) for the classification of healthy and infected Pinus radiata seedlings using high dimensional spectroscopic data. We further test the robustness of these five ensemble classifiers to reduced spectral resolution by spectral resampling (binning) of the original spectral bands. The results showed that the three oblique random forest ensembles outperformed both the traditional RF and rotF ensembles. Additionally, the rotF ensemble proved to be the least robust of the five ensembles tested. Spectral resampling of the original bands provided mixed results. Nevertheless, the results demonstrate that using spectral resampled bands is a promising approach to classifying asymptomatic stress in Pinus radiata seedlings. PMID:27854290

  19. Applying an Ensemble Classification Tree Approach to the Prediction of Completion of a 12-Step Facilitation Intervention with Stimulant Abusers

    PubMed Central

    Doyle, Suzanne R.; Donovan, Dennis M.

    2014-01-01

    Aims The purpose of this study was to explore the selection of predictor variables in the evaluation of drug treatment completion using an ensemble approach with classification trees. The basic methodology is reviewed and the subagging procedure of random subsampling is applied. Methods Among 234 individuals with stimulant use disorders randomized to a 12-Step facilitative intervention shown to increase stimulant use abstinence, 67.52% were classified as treatment completers. A total of 122 baseline variables were used to identify factors associated with completion. Findings The number of types of self-help activity involvement prior to treatment was the predominant predictor. Other effective predictors included better coping self-efficacy for substance use in high-risk situations, more days of prior meeting attendance, greater acceptance of the Disease model, higher confidence for not resuming use following discharge, lower ASI Drug and Alcohol composite scores, negative urine screens for cocaine or marijuana, and fewer employment problems. Conclusions The application of an ensemble subsampling regression tree method utilizes the fact that classification trees are unstable but, on average, produce an improved prediction of the completion of drug abuse treatment. The results support the notion there are early indicators of treatment completion that may allow for modification of approaches more tailored to fitting the needs of individuals and potentially provide more successful treatment engagement and improved outcomes. PMID:25134038

  20. A Theoretical Analysis of Why Hybrid Ensembles Work

    PubMed Central

    2017-01-01

    Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles. PMID:28255296

  1. The value of decision tree analysis in planning anaesthetic care in obstetrics.

    PubMed

    Bamber, J H; Evans, S A

    2016-08-01

    The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed.

  2. Career Path Suggestion using String Matching and Decision Trees

    NASA Astrophysics Data System (ADS)

    Nagpal, Akshay; P. Panda, Supriya

    2015-05-01

    High school and college graduates seemingly are often battling for the courses they should major in order to achieve their target career. In this paper, we worked on suggesting a career path to a graduate to reach his/her dream career given the current educational status. Firstly, we collected the career data of professionals and academicians from various career fields and compiled the data set by using the necessary information from the data. Further, this was used as the basis to suggest the most appropriate career path for the person given his/her current educational status. Decision trees and string matching algorithms were employed to suggest the appropriate career path for a person. Finally, an analysis of the result has been done directing to further improvements in the model.

  3. Probabilistic lung nodule classification with belief decision trees.

    PubMed

    Zinovev, Dmitriy; Feigenbaum, Jonathan; Furst, Jacob; Raicu, Daniela

    2011-01-01

    In reading Computed Tomography (CT) scans with potentially malignant lung nodules, radiologists make use of high level information (semantic characteristics) in their analysis. Computer-Aided Diagnostic Characterization (CADc) systems can assist radiologists by offering a "second opinion"--predicting these semantic characteristics for lung nodules. In this work, we propose a way of predicting the distribution of radiologists' opinions using a multiple-label classification algorithm based on belief decision trees using the National Cancer Institute (NCI) Lung Image Database Consortium (LIDC) dataset, which includes semantic annotations by up to four human radiologists for each one of the 914 nodules. Furthermore, we evaluate our multiple-label results using a novel distance-threshold curve technique--and, measuring the area under this curve, obtain 69% performance on the validation subset. We conclude that multiple-label classification algorithms are an appropriate method of representing the diagnoses of multiple radiologists on lung CT scans when ground truth is unavailable.

  4. Classification of Liss IV Imagery Using Decision Tree Methods

    NASA Astrophysics Data System (ADS)

    Verma, Amit Kumar; Garg, P. K.; Prasad, K. S. Hari; Dadhwal, V. K.

    2016-06-01

    Image classification is a compulsory step in any remote sensing research. Classification uses the spectral information represented by the digital numbers in one or more spectral bands and attempts to classify each individual pixel based on this spectral information. Crop classification is the main concern of remote sensing applications for developing sustainable agriculture system. Vegetation indices computed from satellite images gives a good indication of the presence of vegetation. It is an indicator that describes the greenness, density and health of vegetation. Texture is also an important characteristics which is used to identifying objects or region of interest is an image. This paper illustrate the use of decision tree method to classify the land in to crop land and non-crop land and to classify different crops. In this paper we evaluate the possibility of crop classification using an integrated approach methods based on texture property with different vegetation indices for single date LISS IV sensor 5.8 meter high spatial resolution data. Eleven vegetation indices (NDVI, DVI, GEMI, GNDVI, MSAVI2, NDWI, NG, NR, NNIR, OSAVI and VI green) has been generated using green, red and NIR band and then image is classified using decision tree method. The other approach is used integration of texture feature (mean, variance, kurtosis and skewness) with these vegetation indices. A comparison has been done between these two methods. The results indicate that inclusion of textural feature with vegetation indices can be effectively implemented to produce classifiedmaps with 8.33% higher accuracy for Indian satellite IRS-P6, LISS IV sensor images.

  5. Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments

    ERIC Educational Resources Information Center

    Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.

    2009-01-01

    The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…

  6. An Improved Decision Tree for Predicting a Major Product in Competing Reactions

    ERIC Educational Resources Information Center

    Graham, Kate J.

    2014-01-01

    When organic chemistry students encounter competing reactions, they are often overwhelmed by the task of evaluating multiple factors that affect the outcome of a reaction. The use of a decision tree is a useful tool to teach students to evaluate a complex situation and propose a likely outcome. Specifically, a decision tree can help students…

  7. Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised

    Technology Transfer Automated Retrieval System (TEKTRAN)

    In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with ...

  8. Molecular decision trees realized by ultrafast electronic spectroscopy

    PubMed Central

    Fresch, Barbara; Hiluf, Dawit; Collini, Elisabetta; Levine, R. D.; Remacle, F.

    2013-01-01

    The outcome of a light–matter interaction depends on both the state of matter and the state of light. It is thus a natural setting for implementing bilinear classical logic. A description of the state of a time-varying system requires measuring an (ideally complete) set of time-dependent observables. Typically, this is prohibitive, but in weak-field spectroscopy we can move toward this goal because only a finite number of levels are accessible. Recent progress in nonlinear spectroscopies means that nontrivial measurements can be implemented and thereby give rise to interesting logic schemes where the outputs are functions of the observables. Lie algebra offers a natural tool for generating the outcome of the bilinear light–matter interaction. We show how to synthesize these ideas by explicitly discussing three-photon spectroscopy of a bichromophoric molecule for which there are four accessible states. Switching logic would use the on–off occupancies of these four states as outcomes. Here, we explore the use of all 16 observables that define the time-evolving state of the bichromophoric system. The bilinear laser–system interaction with the three pulses of the setup of a 2D photon echo spectroscopy experiment can be used to generate a rich parallel logic that corresponds to the implementation of a molecular decision tree. Our simulations allow relaxation by weak coupling to the environment, which adds to the complexity of the logic operations. PMID:24043793

  9. Decision trees in selection of featured determined food quality.

    PubMed

    Dębska, B; Guzowska-Świder, B

    2011-10-31

    The determination of food quality, authenticity and the detection of adulterations are problems of increasing importance in food chemistry. Recently, chemometric classification techniques and pattern recognition analysis methods for wine and other alcoholic beverages have received great attention and have been largely used. Beer is a complex mixture of components: on one hand a volatile fraction, which is responsible for its aroma, and on the other hand, a non-volatile fraction or extract consisting of a great variety of substances with distinct characteristics. The aim of this study was to consider parameters which contribute to beer differentiation according to the quality grade. Chemical (e.g. pH, acidity, dry extract, alcohol content, CO(2) content) and sensory features (e.g. bitter taste, color) were determined in 70 beer samples and used as variables in decision tree techniques. This pattern recognition techniques applied to the dataset were able to extract information useful in obtaining a satisfactory classification of beer samples according to their quality grade. Feature selection procedures indicated which features are the most discriminating for classification.

  10. Using decision trees to measure activities in people with stroke.

    PubMed

    Zhang, Ting; Fulk, George D; Tang, Wenlong; Sazonov, Edward S

    2013-01-01

    Improving community mobility is a common goal for persons with stroke. Measuring daily physical activity is helpful to determine the effectiveness of rehabilitation interventions. In our previous studies, a novel wearable shoe-based sensor system (SmartShoe) was shown to be capable of accurately classify three major postures and activities (sitting, standing, and walking) from individuals with stroke by using Artificial Neural Network (ANN). In this study, we utilized decision tree algorithms to develop individual and group activity classification models for stroke patients. The data was acquired from 12 participants with stroke. For 3-class classification, the average accuracy was 99.1% with individual models and 91.5% with group models. Further, we extended the activities into 8 classes: sitting, standing, walking, cycling, stairs-up, stairs-down, wheel-chair-push, and wheel-chair-propel. The classification accuracy for individual models was 97.9%, and for group model was 80.2%, demonstrating feasibility of multi-class activity recognition by SmartShoe in stroke patients.

  11. Discovering Patterns in Brain Signals Using Decision Trees

    PubMed Central

    2016-01-01

    Even with emerging technologies, such as Brain-Computer Interfaces (BCI) systems, understanding how our brains work is a very difficult challenge. So we propose to use a data mining technique to help us in this task. As a case of study, we analyzed the brain's behaviour of blind people and sighted people in a spatial activity. There is a common belief that blind people compensate their lack of vision using the other senses. If an object is given to sighted people and we asked them to identify this object, probably the sense of vision will be the most determinant one. If the same experiment was repeated with blind people, they will have to use other senses to identify the object. In this work, we propose a methodology that uses decision trees (DT) to investigate the difference of how the brains of blind people and people with vision react against a spatial problem. We choose the DT algorithm because it can discover patterns in the brain signal, and its presentation is human interpretable. Our results show that using DT to analyze brain signals can help us to understand the brain's behaviour. PMID:27688746

  12. Learning from examples - Generation and evaluation of decision trees for software resource analysis

    NASA Technical Reports Server (NTRS)

    Selby, Richard W.; Porter, Adam A.

    1988-01-01

    A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.

  13. Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine

    NASA Technical Reports Server (NTRS)

    Schwabacher, Mark A.; Aguilar, Robert; Figueroa, Fernando F.

    2009-01-01

    The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically "learns" a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to "train" and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it "learned" a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location.

  14. Prediction of Regional Streamflow Frequency using Model Tree Ensembles: A data-driven approach based on natural and anthropogenic drainage area characteristics

    NASA Astrophysics Data System (ADS)

    Schnier, S.; Cai, X.

    2012-12-01

    This study introduces a highly accurate data-driven method to predict streamflow frequency statistics based on known drainage area characteristics which yields insights into the dominant controls of regional streamflow. The model is enhanced by explicit consideration of human interference in local hydrology. The basic idea is to use decision trees (i.e., regression trees) to regionalize the dataset and create a model tree by fitting multi-linear equations to the leaves of the regression tree. We improve model accuracy and obtain a measure of variable importance by creating an ensemble of randomized model trees using bootstrap aggregation (i.e., bagging). The database used to induce the models is built from public domain drainage area characteristics for 715 USGS stream gages (455 in Texas and 260 in Illinois). The database includes information on natural characteristics such as precipitation, soil type and slope, as well as anthropogenic ones including land cover, human population and water use. Model accuracy was evaluated using cross-validation and several performance metrics. During the validation, the gauges that are withheld from the analysis represent ungauged watersheds. The proposed method outperforms standard regression models such as the method of residuals for predictions in ungauged watersheds. Importantly, out-of-bag variable importance combined with models for 17 points along the flow duration curve (FDC) (i.e., from 0% to 100% exceedance frequency) yields insight into the dominant controls of regional streamflow. The most discriminant variables for high flows are drainage area and seasonal precipitation. Discriminant variables for low flows are more complex and model accuracy is improved with base-flow data, which is particularly difficult to obtain for ungauged sites. Consideration of human activities, such as percent urban and water use, is also shown to improve accuracy of low flow predictions. Drainage area characteristics, especially

  15. Using GEFS ensemble forecasts for decision making in reservoir management in California

    NASA Astrophysics Data System (ADS)

    Scheuerer, M.; Hamill, T.; Webb, R. S.

    2015-12-01

    Reservoirs such as Lake Mendocino in California's Russian River Basin provide flood control, water supply, recreation, and environmental stream flow regulation. Many of these reservoirs are operated by the U.S. Army Corps of Engineers (Corps) according to water control manuals that specify elevations for an upper volume of reservoir storage that must be kept available for capturing storm runoff and reducing flood risk, and a lower volume of storage that may be used for water supply. During extreme rainfall events, runoff is captured by these reservoirs and released as quickly as possible to create flood storage space for another potential storm. These flood control manuals are based on typical historical weather patterns - wet during the winter, dry otherwise - but are not informed directly by weather prediction. Alternative reservoir management approaches such as Forecast-Informed Reservoir Operations (FIRO), which seek to incorporate advances in weather prediction, are currently being explored as means to improve water supply availability while maintaining flood risk reduction and providing additional ecosystem benefits.We present results from a FIRO proof-of-concept study investigating the reliability of post-processed GEFS ensemble forecasts to predict the probability that day 6-to-10 precipitation accumulations in certain areas in California exceed a high threshold. Our results suggest that reliable forecast guidance can be provided, and the resulting probabilities could be used to inform decisions to release or hold water in the reservoirs. We illustrate the potential of these forecasts in a case study of extreme event probabilities for the Russian River Basin in California.

  16. Potential of ensemble tree methods for early-season prediction of winter wheat yield from short time series of remotely sensed normalized difference vegetation index and in situ meteorological data

    NASA Astrophysics Data System (ADS)

    Heremans, Stien; Dong, Qinghan; Zhang, Beier; Bydekerke, Lieven; Van Orshoven, Jos

    2015-01-01

    We aimed at analyzing the potential of two ensemble tree machine learning methods-boosted regression trees and random forests-for (early) prediction of winter wheat yield from short time series of remotely sensed vegetation indices at low spatial resolution and of in situ meteorological data in combination with annual fertilization levels. The study area was the Huaibei Plain in eastern China, and all models were calibrated and validated for five separate prefectures. To this end, a cross-validation process was developed that integrates model meta-parameterization and simple forward feature selection. We found that the resulting models deliver early estimates that are accurate enough to support decision making in the agricultural sector and to allow their operational use for yield forecasting. To attain maximum prediction accuracy, incorporating predictors from the end of the growing season is, however, recommended.

  17. Ensemble Methods

    NASA Astrophysics Data System (ADS)

    Re, Matteo; Valentini, Giorgio

    2012-03-01

    Ensemble methods are statistical and computational learning procedures reminiscent of the human social learning behavior of seeking several opinions before making any crucial decision. The idea of combining the opinions of different "experts" to obtain an overall “ensemble” decision is rooted in our culture at least from the classical age of ancient Greece, and it has been formalized during the Enlightenment with the Condorcet Jury Theorem[45]), which proved that the judgment of a committee is superior to those of individuals, provided the individuals have reasonable competence. Ensembles are sets of learning machines that combine in some way their decisions, or their learning algorithms, or different views of data, or other specific characteristics to obtain more reliable and more accurate predictions in supervised and unsupervised learning problems [48,116]. A simple example is represented by the majority vote ensemble, by which the decisions of different learning machines are combined, and the class that receives the majority of “votes” (i.e., the class predicted by the majority of the learning machines) is the class predicted by the overall ensemble [158]. In the literature, a plethora of terms other than ensembles has been used, such as fusion, combination, aggregation, and committee, to indicate sets of learning machines that work together to solve a machine learning problem [19,40,56,66,99,108,123], but in this chapter we maintain the term ensemble in its widest meaning, in order to include the whole range of combination methods. Nowadays, ensemble methods represent one of the main current research lines in machine learning [48,116], and the interest of the research community on ensemble methods is witnessed by conferences and workshops specifically devoted to ensembles, first of all the multiple classifier systems (MCS) conference organized by Roli, Kittler, Windeatt, and other researchers of this area [14,62,85,149,173]. Several theories have been

  18. Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree

    DTIC Science & Technology

    2008-04-01

    REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music

  19. Metric Sex Determination of the Human Coxal Bone on a Virtual Sample using Decision Trees.

    PubMed

    Savall, Frédéric; Faruch-Bilfeld, Marie; Dedouit, Fabrice; Sans, Nicolas; Rousseau, Hervé; Rougé, Daniel; Telmon, Norbert

    2015-11-01

    Decision trees provide an alternative to multivariate discriminant analysis, which is still the most commonly used in anthropometric studies. Our study analyzed the metric characterization of a recent virtual sample of 113 coxal bones using decision trees for sex determination. From 17 osteometric type I landmarks, a dataset was built with five classic distances traditionally reported in the literature and six new distances selected using the two-step ratio method. A ten-fold cross-validation was performed, and a decision tree was established on two subsamples (training and test sets). The decision tree established on the training set included three nodes and its application to the test set correctly classified 92% of individuals. This percentage was similar to the data of the literature. The usefulness of decision trees has been demonstrated in numerous fields. They have been already used in sex determination, body mass prediction, and ancestry estimation. This study shows another use of decision trees enabling simple and accurate sex determination.

  20. Prediction of Weather Impacted Airport Capacity using Ensemble Learning

    NASA Technical Reports Server (NTRS)

    Wang, Yao Xun

    2011-01-01

    Ensemble learning with the Bagging Decision Tree (BDT) model was used to assess the impact of weather on airport capacities at selected high-demand airports in the United States. The ensemble bagging decision tree models were developed and validated using the Federal Aviation Administration (FAA) Aviation System Performance Metrics (ASPM) data and weather forecast at these airports. The study examines the performance of BDT, along with traditional single Support Vector Machines (SVM), for airport runway configuration selection and airport arrival rates (AAR) prediction during weather impacts. Testing of these models was accomplished using observed weather, weather forecast, and airport operation information at the chosen airports. The experimental results show that ensemble methods are more accurate than a single SVM classifier. The airport capacity ensemble method presented here can be used as a decision support model that supports air traffic flow management to meet the weather impacted airport capacity in order to reduce costs and increase safety.

  1. Development of a diagnostic decision tree for obstructive pulmonary diseases based on real-life data.

    PubMed

    Metting, Esther I; In 't Veen, Johannes C C M; Dekhuijzen, P N Richard; van Heijst, Ellen; Kocks, Janwillem W H; Muilwijk-Kroes, Jacqueline B; Chavannes, Niels H; van der Molen, Thys

    2016-01-01

    The aim of this study was to develop and explore the diagnostic accuracy of a decision tree derived from a large real-life primary care population. Data from 9297 primary care patients (45% male, mean age 53±17 years) with suspicion of an obstructive pulmonary disease was derived from an asthma/chronic obstructive pulmonary disease (COPD) service where patients were assessed using spirometry, the Asthma Control Questionnaire, the Clinical COPD Questionnaire, history data and medication use. All patients were diagnosed through the Internet by a pulmonologist. The Chi-squared Automatic Interaction Detection method was used to build the decision tree. The tree was externally validated in another real-life primary care population (n=3215). Our tree correctly diagnosed 79% of the asthma patients, 85% of the COPD patients and 32% of the asthma-COPD overlap syndrome (ACOS) patients. External validation showed a comparable pattern (correct: asthma 78%, COPD 83%, ACOS 24%). Our decision tree is considered to be promising because it was based on real-life primary care patients with a specialist's diagnosis. In most patients the diagnosis could be correctly predicted. Predicting ACOS, however, remained a challenge. The total decision tree can be implemented in computer-assisted diagnostic systems for individual patients. A simplified version of this tree can be used in daily clinical practice as a desk tool.

  2. Application of decision tree model for the ground subsidence hazard mapping near abandoned underground coal mines.

    PubMed

    Lee, Saro; Park, Inhye

    2013-09-30

    Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events.

  3. [Prediction of regional soil quality based on mutual information theory integrated with decision tree algorithm].

    PubMed

    Lin, Fen-Fang; Wang, Ke; Yang, Ning; Yan, Shi-Guang; Zheng, Xin-Yu

    2012-02-01

    In this paper, some main factors such as soil type, land use pattern, lithology type, topography, road, and industry type that affect soil quality were used to precisely obtain the spatial distribution characteristics of regional soil quality, mutual information theory was adopted to select the main environmental factors, and decision tree algorithm See 5.0 was applied to predict the grade of regional soil quality. The main factors affecting regional soil quality were soil type, land use, lithology type, distance to town, distance to water area, altitude, distance to road, and distance to industrial land. The prediction accuracy of the decision tree model with the variables selected by mutual information was obviously higher than that of the model with all variables, and, for the former model, whether of decision tree or of decision rule, its prediction accuracy was all higher than 80%. Based on the continuous and categorical data, the method of mutual information theory integrated with decision tree could not only reduce the number of input parameters for decision tree algorithm, but also predict and assess regional soil quality effectively.

  4. Ramping ensemble activity in dorsal anterior cingulate neurons during persistent commitment to a decision

    PubMed Central

    Hayden, Benjamin Y.

    2015-01-01

    We frequently need to commit to a choice to achieve our goals; however, the neural processes that keep us motivated in pursuit of delayed goals remain obscure. We examined ensemble responses of neurons in macaque dorsal anterior cingulate cortex (dACC), an area previously implicated in self-control and persistence, in a task that requires commitment to a choice to obtain a reward. After reward receipt, dACC neurons signaled reward amount with characteristic ensemble firing rate patterns; during the delay in anticipation of the reward, ensemble activity smoothly and gradually came to resemble the postreward pattern. On the subset of risky trials, in which a reward was anticipated with 50% certainty, ramping ensemble activity evolved to the pattern associated with the anticipated reward (and not with the anticipated loss) and then, on loss trials, took on an inverted form anticorrelated with the form associated with a win. These findings enrich our knowledge of reward processing in dACC and may have broader implications for our understanding of persistence and self-control. PMID:26334016

  5. Bootstrap aggregating of alternating decision trees to detect sets of SNPs that associate with disease.

    PubMed

    Guy, Richard T; Santago, Peter; Langefeld, Carl D

    2012-02-01

    Complex genetic disorders are a result of a combination of genetic and nongenetic factors, all potentially interacting. Machine learning methods hold the potential to identify multilocus and environmental associations thought to drive complex genetic traits. Decision trees, a popular machine learning technique, offer a computationally low complexity algorithm capable of detecting associated sets of single nucleotide polymorphisms (SNPs) of arbitrary size, including modern genome-wide SNP scans. However, interpretation of the importance of an individual SNP within these trees can present challenges. We present a new decision tree algorithm denoted as Bagged Alternating Decision Trees (BADTrees) that is based on identifying common structural elements in a bootstrapped set of Alternating Decision Trees (ADTrees). The algorithm is order nk(2), where n is the number of SNPs considered and k is the number of SNPs in the tree constructed. Our simulation study suggests that BADTrees have higher power and lower type I error rates than ADTrees alone and comparable power with lower type I error rates compared to logistic regression. We illustrate the application of these data using simulated data as well as from the Lupus Large Association Study 1 (7,822 SNPs in 3,548 individuals). Our results suggest that BADTrees hold promise as a low computational order algorithm for detecting complex combinations of SNP and environmental factors associated with disease.

  6. Decision Support on the Sediments Flushing of Aimorés Dam Using Medium-Range Ensemble Forecasts

    NASA Astrophysics Data System (ADS)

    Mainardi Fan, Fernando; Schwanenberg, Dirk; Collischonn, Walter; Assis dos Reis, Alberto; Alvarado Montero, Rodolfo; Alencar Siqueira, Vinicius

    2015-04-01

    In the present study we investigate the use of medium-range streamflow forecasts in the Doce River basin (Brazil), at the reservoir of Aimorés Hydro Power Plant (HPP). During daily operations this reservoir acts as a "trap" to the sediments that originate from the upstream basin of the Doce River. This motivates a cleaning process called "pass through" to periodically remove the sediments from the reservoir. The "pass through" or "sediments flushing" process consists of a decrease of the reservoir's water level to a certain flushing level when a determined reservoir inflow threshold is forecasted. Then, the water in the approaching inflow is used to flush the sediments from the reservoir through the spillway and to recover the original reservoir storage. To be triggered, the sediments flushing operation requires an inflow larger than 3000m³/s in a forecast horizon of 7 days. This lead-time of 7 days is far beyond the basin's concentration time (around 2 days), meaning that the forecasts for the pass through procedure highly depends on Numerical Weather Predictions (NWP) models that generate Quantitative Precipitation Forecasts (QPF). This dependency creates an environment with a high amount of uncertainty to the operator. To support the decision making at Aimorés HPP we developed a fully operational hydrological forecasting system to the basin. The system is capable of generating ensemble streamflow forecasts scenarios when driven by QPF data from meteorological Ensemble Prediction Systems (EPS). This approach allows accounting for uncertainties in the NWP at a decision making level. This system is starting to be used operationally by CEMIG and is the one shown in the present study, including a hindcasting analysis to assess the performance of the system for the specific flushing problem. The QPF data used in the hindcasting study was derived from the TIGGE (THORPEX Interactive Grand Global Ensemble) database. Among all EPS available on TIGGE, three were

  7. Pruning a decision tree for selecting computer-related assistive devices for people with disabilities.

    PubMed

    Chi, Chia-Fen; Tseng, Li-Kai; Jang, Yuh

    2012-07-01

    Many disabled individuals lack extensive knowledge about assistive technology, which could help them use computers. In 1997, Denis Anson developed a decision tree of 49 evaluative questions designed to evaluate the functional capabilities of the disabled user and choose an appropriate combination of assistive devices, from a selection of 26, that enable the individual to use a computer. In general, occupational therapists guide the disabled users through this process. They often have to go over repetitive questions in order to find an appropriate device. A disabled user may require an alphanumeric entry device, a pointing device, an output device, a performance enhancement device, or some combination of these. Therefore, the current research eliminates redundant questions and divides Anson's decision tree into multiple independent subtrees to meet the actual demand of computer users with disabilities. The modified decision tree was tested by six disabled users to prove it can determine a complete set of assistive devices with a smaller number of evaluative questions. The means to insert new categories of computer-related assistive devices was included to ensure the decision tree can be expanded and updated. The current decision tree can help the disabled users and assistive technology practitioners to find appropriate computer-related assistive devices that meet with clients' individual needs in an efficient manner.

  8. Using decision trees to characterize verbal communication during change and stuck episodes in the therapeutic process

    PubMed Central

    Masías, Víctor H.; Krause, Mariane; Valdés, Nelson; Pérez, J. C.; Laengle, Sigifredo

    2015-01-01

    Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice. PMID:25914657

  9. [Postmastectomy pain syndrome evidence based guidelines and decision trees].

    PubMed

    Labrèze, Laurent; Dixmérias-Iskandar, Florence; Monnin, Dominique; Bussières, Emmanuel; Delahaye, Evelyne; Bernard, Dominique; Lakdja, Fabrice

    2007-03-01

    A multidisciplinary expert group had reviewed all scientific data available of post mastectomy pain syndrome. Seventy six publications were retained and thirty evidence based diagnosis, treatment and follow-up recommendations are listed. Few of theses recommendations are classed level A. Datas analysis make possible to propose a strategy based on systematic association of drugs, kinesitherapy and psychological support. Evaluation and closer follow-up are necessary. Several decisional trees are proposed.

  10. A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem

    PubMed Central

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389

  11. Post-event human decision errors: operator action tree/time reliability correlation

    SciTech Connect

    Hall, R E; Fragola, J; Wreathall, J

    1982-11-01

    This report documents an interim framework for the quantification of the probability of errors of decision on the part of nuclear power plant operators after the initiation of an accident. The framework can easily be incorporated into an event tree/fault tree analysis. The method presented consists of a structure called the operator action tree and a time reliability correlation which assumes the time available for making a decision to be the dominating factor in situations requiring cognitive human response. This limited approach decreases the magnitude and complexity of the decision modeling task. Specifically, in the past, some human performance models have attempted prediction by trying to emulate sequences of human actions, or by identifying and modeling the information processing approach applicable to the task. The model developed here is directed at describing the statistical performance of a representative group of hypothetical individuals responding to generalized situations.

  12. A modified decision tree algorithm based on genetic algorithm for mobile user classification problem.

    PubMed

    Liu, Dong-sheng; Fan, Shu-jiang

    2014-01-01

    In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity.

  13. Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS) and its application on modeling ligand functionality for 5HT-subtype GPCR families.

    PubMed

    Ma, Chao; Wang, Lirong; Xie, Xiang-Qun

    2011-03-28

    Advanced high-throughput screening (HTS) technologies generate great amounts of bioactivity data, and this data needs to be analyzed and interpreted with attention to understand how these small molecules affect biological systems. As such, there is an increasing demand to develop and adapt cheminformatics algorithms and tools in order to predict molecular and pharmacological properties on the basis of these large data sets. In this manuscript, we report a novel machine-learning-based ligand classification algorithm, named Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps (LiCABEDS), for data-mining and modeling of large chemical data sets to predict pharmacological properties in an efficient and accurate manner. The performance of LiCABEDS was evaluated through predicting GPCR ligand functionality (agonist or antagonist) using four different molecular fingerprints, including Maccs, FP2, Unity, and Molprint 2D fingerprints. Our studies showed that LiCABEDS outperformed two other popular techniques, classification tree and Naive Bayes classifier, on all four types of molecular fingerprints. Parameters in LiCABEDS, including the number of boosting iterations, initialization condition, and a "reject option" boundary, were thoroughly explored and discussed to demonstrate the capability of handling imbalanced data sets, as well as its robustness and flexibility. In addition, the detailed mathematical concepts and theory are also given to address the principle behind statistical prediction models. The LiCABEDS algorithm has been implemented into a user-friendly software package that is accessible online at http://www.cbligand.org/LiCABEDS/ .

  14. Ensembl 2007.

    PubMed

    Hubbard, T J P; Aken, B L; Beal, K; Ballester, B; Caccamo, M; Chen, Y; Clarke, L; Coates, G; Cunningham, F; Cutts, T; Down, T; Dyer, S C; Fitzgerald, S; Fernandez-Banet, J; Graf, S; Haider, S; Hammond, M; Herrero, J; Holland, R; Howe, K; Howe, K; Johnson, N; Kahari, A; Keefe, D; Kokocinski, F; Kulesha, E; Lawson, D; Longden, I; Melsopp, C; Megy, K; Meidl, P; Ouverdin, B; Parker, A; Prlic, A; Rice, S; Rios, D; Schuster, M; Sealy, I; Severin, J; Slater, G; Smedley, D; Spudich, G; Trevanion, S; Vilella, A; Vogel, J; White, S; Wood, M; Cox, T; Curwen, V; Durbin, R; Fernandez-Suarez, X M; Flicek, P; Kasprzyk, A; Proctor, G; Searle, S; Smith, J; Ureta-Vidal, A; Birney, E

    2007-01-01

    The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of chordate genome sequences. Over the past year the number of genomes available from Ensembl has increased from 15 to 33, with the addition of sites for the mammalian genomes of elephant, rabbit, armadillo, tenrec, platypus, pig, cat, bush baby, common shrew, microbat and european hedgehog; the fish genomes of stickleback and medaka and the second example of the genomes of the sea squirt (Ciona savignyi) and the mosquito (Aedes aegypti). Some of the major features added during the year include the first complete gene sets for genomes with low-sequence coverage, the introduction of new strain variation data and the introduction of new orthology/paralog annotations based on gene trees.

  15. Dynamics of Cortical Neuronal Ensembles Transit from Decision Making to Storage for Later Report

    PubMed Central

    Ponce-Alvarez, Adrián; Nácher, Verónica; Luna, Rogelio; Riehle, Alexa

    2012-01-01

    Decisions based on sensory evaluation during single trials may depend on the collective activity of neurons distributed across brain circuits. Previous studies have deepened our understanding of how the activity of individual neurons relates to the formation of a decision and its storage for later report. However, little is known about how decision-making and decision maintenance processes evolve in single trials. We addressed this problem by studying the activity of simultaneously recorded neurons from different somatosensory and frontal lobe cortices of monkeys performing a vibrotactile discrimination task. We used the hidden Markov model to describe the spatiotemporal pattern of activity in single trials as a sequence of firing rate states. We show that the animal's decision was reliably maintained in frontal lobe activity through a selective state sequence, initiated by an abrupt state transition, during which many neurons changed their activity in a concomitant way, and for which both latency and variability depended on task difficulty. Indeed, transitions were more delayed and more variable for difficult trials compared with easy trials. In contrast, state sequences in somatosensory cortices were weakly decision related, had less variable transitions, and were not affected by the difficulty of the task. In summary, our results suggest that the decision process and its subsequent maintenance are dynamically linked by a cascade of transient events in frontal lobe cortices. PMID:22933781

  16. Dynamics of cortical neuronal ensembles transit from decision making to storage for later report.

    PubMed

    Ponce-Alvarez, Adrián; Nácher, Verónica; Luna, Rogelio; Riehle, Alexa; Romo, Ranulfo

    2012-08-29

    Decisions based on sensory evaluation during single trials may depend on the collective activity of neurons distributed across brain circuits. Previous studies have deepened our understanding of how the activity of individual neurons relates to the formation of a decision and its storage for later report. However, little is known about how decision-making and decision maintenance processes evolve in single trials. We addressed this problem by studying the activity of simultaneously recorded neurons from different somatosensory and frontal lobe cortices of monkeys performing a vibrotactile discrimination task. We used the hidden Markov model to describe the spatiotemporal pattern of activity in single trials as a sequence of firing rate states. We show that the animal's decision was reliably maintained in frontal lobe activity through a selective state sequence, initiated by an abrupt state transition, during which many neurons changed their activity in a concomitant way, and for which both latency and variability depended on task difficulty. Indeed, transitions were more delayed and more variable for difficult trials compared with easy trials. In contrast, state sequences in somatosensory cortices were weakly decision related, had less variable transitions, and were not affected by the difficulty of the task. In summary, our results suggest that the decision process and its subsequent maintenance are dynamically linked by a cascade of transient events in frontal lobe cortices.

  17. ROSE: decision trees, automatic learning and their applications in cardiac medicine.

    PubMed

    Zavrsnik, J; Kokol, P; Malèiae, I; Kancler, K; Mernik, M; Bigec, M

    1995-01-01

    Computerized information systems, especially decision support systems, have acquired an increasingly important role in medical applications, particularly in those where important decisions must be made effectively and reliably. But the possibility of using computers in medical decision making is limited by many difficulties, including the complexity of conventional computer languages, methodologies, and tools. Thus a conceptual simple decision making model with the possibility of automating learning should be used. In this paper, we introduce a cardiological knowledge-based system based on the decision tree approach supporting the mitral valve prolapse determination. Prolapse is defined as the displacement of a bodily part from its normal position. The term mitral valve prolapse (PMV), therefore, implies that the mitral leaflets are displaced relative to some structure, generally taken to be the mitral annulus. The implications of the PMV are: disturbed normal laminar blood flow, turbulence of the blood flow, injury of the chordae tendinae, the possibility of thrombus's composition, bacterial endocarditis, and, finally, hemodynamic changes defined as mitral insufficiency and mitral regurgitation. Uncertainty persists about how it should be diagnosed and about its clinical importance. It is our deep belief that the echocardiography enables properly trained expert armed with proper criteria to evaluate PMV almost 100%. But, unfortunately, there are some problems concerned with the use of echocardiography. With this in mind, we have decided to start a research project aimed at finding new criteria and enabling the general practitioner to evaluate the PMV using conventional methods and to select potential patients from the general population. To empower doctors to perform needed activities, we have developed a computer tool called ROSE (computeRized prOlaps Syndrome dEtermination) based on algorithms of automatic learning. This tool supports the definition of new

  18. A Com-Gis Based Decision Tree Model Inagricultural Application

    NASA Astrophysics Data System (ADS)

    Cheng, Wei; Wang, Ke; Zhang, Xiuying

    The problem of agricultural soil pollution by heavy metals has been receiving an increasing attention in the last few decades. Geostatistics module in ArcGIS, could not however efficiently simulate the spatial distribution of heavy metals with satisfied accuracy when the spatial autocorrelation of the study area severely destroyed by human activities. In this study, the classificationand regression tree (CART) has been integrated into ArcGIS using ArcObjects and Visual Basic for Application (VBA) to predict the spatial distribution of soil heavy metals contents in the area severely polluted. This is a great improvement comparing with ordinary Kriging method in ArcGIS. The integrated approach allows for relatively easy, fast, and cost-effective estimation of spatially distributed soil heavy metals pollution.

  19. Image Change Detection via Ensemble Learning

    SciTech Connect

    Martin, Benjamin W; Vatsavai, Raju

    2013-01-01

    The concept of geographic change detection is relevant in many areas. Changes in geography can reveal much information about a particular location. For example, analysis of changes in geography can identify regions of population growth, change in land use, and potential environmental disturbance. A common way to perform change detection is to use a simple method such as differencing to detect regions of change. Though these techniques are simple, often the application of these techniques is very limited. Recently, use of machine learning methods such as neural networks for change detection has been explored with great success. In this work, we explore the use of ensemble learning methodologies for detecting changes in bitemporal synthetic aperture radar (SAR) images. Ensemble learning uses a collection of weak machine learning classifiers to create a stronger classifier which has higher accuracy than the individual classifiers in the ensemble. The strength of the ensemble lies in the fact that the individual classifiers in the ensemble create a mixture of experts in which the final classification made by the ensemble classifier is calculated from the outputs of the individual classifiers. Our methodology leverages this aspect of ensemble learning by training collections of weak decision tree based classifiers to identify regions of change in SAR images collected of a region in the Staten Island, New York area during Hurricane Sandy. Preliminary studies show that the ensemble method has approximately 11.5% higher change detection accuracy than an individual classifier.

  20. Test Reviews: Euler, B. L. (2007). "Emotional Disturbance Decision Tree". Lutz, FL: Psychological Assessment Resources

    ERIC Educational Resources Information Center

    Tansy, Michael

    2009-01-01

    The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…

  1. What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    2004-01-01

    To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…

  2. Ultrasonographic Diagnosis of Biliary Atresia Based on a Decision-Making Tree Model

    PubMed Central

    Lee, So Mi; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung

    2015-01-01

    Objective To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). Materials and Methods From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Results Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Conclusion Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology. PMID:26576128

  3. Predicting metabolic syndrome using decision tree and support vector machine methods

    PubMed Central

    Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh

    2016-01-01

    BACKGROUND Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. METHODS This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. RESULTS SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. CONCLUSION The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in

  4. An expert-guided decision tree construction strategy: an application in knowledge discovery with medical databases.

    PubMed Central

    Tsai, Y. S.; King, P. H.; Higgins, M. S.; Pierce, D.; Patel, N. P.

    1997-01-01

    With the steady growth in electronic patient records and clinical medical informatics systems, the data collected for routine clinical use have been accumulating at a dramatic rate. Inter-disciplinary research provides a new generation of computation tools in knowledge discovery and data management is in great demand. In this study, an expert-guided decision tree construction strategy is proposed to offer an user-oriented knowledge discovery environment. The strategy allows experts, based on their expertise and/or preference, to override inductive decision tree construction process. Moreover, by reviewing decision paths, experts could focus on subsets of data that may be clues to new findings, or simply contaminated cases. PMID:9357618

  5. Minimizing the cost of translocation failure with decision-tree models that predict species' behavioral response in translocation sites.

    PubMed

    Ebrahimi, Mehregan; Ebrahimie, Esmaeil; Bull, C Michael

    2015-08-01

    The high number of failures is one reason why translocation is often not recommended. Considering how behavior changes during translocations may improve translocation success. To derive decision-tree models for species' translocation, we used data on the short-term responses of an endangered Australian skink in 5 simulated translocations with different release conditions. We used 4 different decision-tree algorithms (decision tree, decision-tree parallel, decision stump, and random forest) with 4 different criteria (gain ratio, information gain, gini index, and accuracy) to investigate how environmental and behavioral parameters may affect the success of a translocation. We assumed behavioral changes that increased dispersal away from a release site would reduce translocation success. The trees became more complex when we included all behavioral parameters as attributes, but these trees yielded more detailed information about why and how dispersal occurred. According to these complex trees, there were positive associations between some behavioral parameters, such as fight and dispersal, that showed there was a higher chance, for example, of dispersal among lizards that fought than among those that did not fight. Decision trees based on parameters related to release conditions were easier to understand and could be used by managers to make translocation decisions under different circumstances.

  6. A Study of Factors that Influence First-Year Nonmusic Majors' Decisions to Participate in Music Ensembles at Small Liberal Arts Colleges in Indiana

    ERIC Educational Resources Information Center

    Faber, Ardis R.

    2010-01-01

    The purpose of this study was to investigate factors that influence first-year nonmusic majors' decisions regarding participation in music ensembles at small liberal arts colleges in Indiana. A survey questionnaire was used to gather data. The data collected was analyzed to determine significant differences between the nonmusic majors who have…

  7. Tools of the Future: How Decision Tree Analysis Will Impact Mission Planning

    NASA Technical Reports Server (NTRS)

    Otterstatter, Matthew R.

    2005-01-01

    The universe is infinitely complex; however, the human mind has a finite capacity. The multitude of possible variables, metrics, and procedures in mission planning are far too many to address exhaustively. This is unfortunate because, in general, considering more possibilities leads to more accurate and more powerful results. To compensate, we can get more insightful results by employing our greatest tool, the computer. The power of the computer will be utilized through a technology that considers every possibility, decision tree analysis. Although decision trees have been used in many other fields, this is innovative for space mission planning. Because this is a new strategy, no existing software is able to completely accommodate all of the requirements. This was determined through extensive research and testing of current technologies. It was necessary to create original software, for which a short-term model was finished this summer. The model was built into Microsoft Excel to take advantage of the familiar graphical interface for user input, computation, and viewing output. Macros were written to automate the process of tree construction, optimization, and presentation. The results are useful and promising. If this tool is successfully implemented in mission planning, our reliance on old-fashioned heuristics, an error-prone shortcut for handling complexity, will be reduced. The computer algorithms involved in decision trees will revolutionize mission planning. The planning will be faster and smarter, leading to optimized missions with the potential for more valuable data.

  8. Inductive Decision Tree Analysis of the Validity Rank of Construction Parameters of Innovative Gear Pump after Tooth Root Undercutting

    NASA Astrophysics Data System (ADS)

    Deptuła, A.; Partyka, M. A.

    2017-02-01

    The article presents an innovative use of inductive algorithm for generating the decision tree for an analysis of the rank validity parameters of construction and maintenance of the gear pump with undercut tooth. It is preventet an alternative way of generating sets of decisions and determining the hierarchy of decision variables to existing the methods of discrete optimization.

  9. The bone-grafting decision tree: a systematic methodology for achieving new bone.

    PubMed

    Smiler, Dennis; Soltan, Muna

    2006-06-01

    Successful bone grafting requires that the clinician select the optimal bone grafting material and surgical technique from among a number of alternatives. This article reviews the biology of bone growth and repair, and presents a decision-making protocol in which the clinician first evaluates the bone quality at the surgical site to determine which graft material should be used. Bone quantity is then evaluated to determine the optimal surgical technique. Choices among graft stabilization techniques are also reviewed, and cases are presented to illustrate the use of this decision tree.

  10. An Algorithm for Anticipating Future Decision Trees from Concept-Drifting Data

    NASA Astrophysics Data System (ADS)

    Böttcher, Mirko; Spott, Martin; Kruse, Rudolf

    Concept-Drift is an important topic in practical data mining, since it is reality in most business applications. Whenever a mining model is used in an application it is already outdated since the world has changed since the model induction. The solution is to predict the drift of a model and derive a future model based on such a prediction. One way would be to simulate future data and derive a model from it, but this is typically not feasible. Instead we suggest to predict the values of the measures that drive model induction. In particular, we propose to predict the future values of attribute selection measures and class label distribution for the induction of decision trees. We give an example of how concept drift is reflected in the trend of these measures and that the resulting decision trees perform considerably better than the ones produced by existing approaches.

  11. Circum-Arctic petroleum systems identified using decision-tree chemometrics

    USGS Publications Warehouse

    Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.

    2007-01-01

    Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.

  12. Three-dimensional object recognition using similar triangles and decision trees

    NASA Technical Reports Server (NTRS)

    Spirkovska, Lilly

    1993-01-01

    A system, TRIDEC, that is capable of distinguishing between a set of objects despite changes in the objects' positions in the input field, their size, or their rotational orientation in 3D space is described. TRIDEC combines very simple yet effective features with the classification capabilities of inductive decision tree methods. The feature vector is a list of all similar triangles defined by connecting all combinations of three pixels in a coarse coded 127 x 127 pixel input field. The classification is accomplished by building a decision tree using the information provided from a limited number of translated, scaled, and rotated samples. Simulation results are presented which show that TRIDEC achieves 94 percent recognition accuracy in the 2D invariant object recognition domain and 98 percent recognition accuracy in the 3D invariant object recognition domain after training on only a small sample of transformed views of the objects.

  13. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach.

    PubMed

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members.

  14. Data mining with decision trees for diagnosis of breast tumor in medical ultrasonic images.

    PubMed

    Kuo, W J; Chang, R F; Chen, D R; Lee, C C

    2001-03-01

    To increase the ability of ultrasonographic (US) technology for the differential diagnosis of solid breast tumors, we describe a novel computer-aided diagnosis (CADx) system using data mining with decision tree for classification of breast tumor to increase the levels of diagnostic confidence and to provide the immediate second opinion for physicians. Cooperating with the texture information extracted from the region of interest (ROI) image, a decision tree model generated from the training data in a top-down, general-to-specific direction with 24 co-variance texture features is used to classify the tumors as benign or malignant. In the experiments, accuracy rates for a experienced physician and the proposed CADx are 86.67% (78/90) and 95.50% (86/90), respectively.

  15. Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach

    PubMed Central

    Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor

    2016-01-01

    Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313

  16. Office of Legacy Management Decision Tree for Solar Photovoltaic Projects - 13317

    SciTech Connect

    Elmer, John; Butherus, Michael; Barr, Deborah L.

    2013-07-01

    To support consideration of renewable energy power development as a land reuse option, the DOE Office of Legacy Management (LM) and the National Renewable Energy Laboratory (NREL) established a partnership to conduct an assessment of wind and solar renewable energy resources on LM lands. From a solar capacity perspective, the larger sites in the western United States present opportunities for constructing solar photovoltaic (PV) projects. A detailed analysis and preliminary plan was developed for three large sites in New Mexico, assessing the costs, the conceptual layout of a PV system, and the electric utility interconnection process. As a result of the study, a 1,214-hectare (3,000-acre) site near Grants, New Mexico, was chosen for further study. The state incentives, utility connection process, and transmission line capacity were key factors in assessing the feasibility of the project. LM's Durango, Colorado, Disposal Site was also chosen for consideration because the uranium mill tailings disposal cell is on a hillside facing south, transmission lines cross the property, and the community was very supportive of the project. LM worked with the regulators to demonstrate that the disposal cell's long-term performance would not be impacted by the installation of a PV solar system. A number of LM-unique issues were resolved in making the site available for a private party to lease a portion of the site for a solar PV project. A lease was awarded in September 2012. Using a solar decision tree that was developed and launched by the EPA and NREL, LM has modified and expanded the decision tree structure to address the unique aspects and challenges faced by LM on its multiple sites. The LM solar decision tree covers factors such as land ownership, usable acreage, financial viability of the project, stakeholder involvement, and transmission line capacity. As additional sites are transferred to LM in the future, the decision tree will assist in determining whether a solar

  17. An application of contingent valuation and decision tree analysis to water quality improvements.

    PubMed

    Atkins, Jonathan P; Burdon, Daryl; Allen, James H

    2007-01-01

    This paper applies contingent valuation and decision tree analysis to investigate public preferences for water quality improvements, and in particular reduced eutrophication. Such preferences are important given that the development of EU water quality legislation is imposing significant costs on European economies. Results are reported of a survey undertaken of residents of Arhus County, Denmark for water quality improvements in the Randers Fjord. Results demonstrate strong public support for reduced eutrophication and identify key determinants of such support.

  18. Automatic rule learning using decision tree for fuzzy classifier in fault diagnosis of roller bearing

    NASA Astrophysics Data System (ADS)

    Sugumaran, V.; Ramachandran, K. I.

    2007-07-01

    Roller bearing is one of the most widely used elements in rotary machines. Condition monitoring of such elements is conceived as pattern recognition problem. Pattern recognition has two main phases: feature extraction and feature classification. Statistical features like minimum value, standard error and kurtosis, etc. are widely used as features in fault diagnostics. These features are extracted from vibration signals. A rule set is formed from the extracted features and input to a fuzzy classifier. The rule set necessary for building the fuzzy classifier is obtained largely by intuition and domain knowledge. This paper presents the use of decision tree to generate the rules automatically from the feature set. The vibration signal from a piezo-electric transducer is captured for the following conditions—good bearing, bearing with inner race fault, bearing with outer race fault, and inner and outer race fault. The statistical features are extracted and good features that discriminate the different fault conditions of the bearing are selected using decision tree. The rule set for fuzzy classifier is obtained once again by using the decision tree. A fuzzy classifier is built and tested with representative data. The results are found to be encouraging.

  19. Decision Trees for Continuous Data and Conditional Mutual Information as a Criterion for Splitting Instances.

    PubMed

    Drakakis, Georgios; Moledina, Saadiq; Chomenidis, Charalampos; Doganis, Philip; Sarimveis, Haralambos

    2016-01-01

    Decision trees are renowned in the computational chemistry and machine learning communities for their interpretability. Their capacity and usage are somewhat limited by the fact that they normally work on categorical data. Improvements to known decision tree algorithms are usually carried out by increasing and tweaking parameters, as well as the post-processing of the class assignment. In this work we attempted to tackle both these issues. Firstly, conditional mutual information was used as the criterion for selecting the attribute on which to split instances. The algorithm performance was compared with the results of C4.5 (WEKA's J48) using default parameters and no restrictions. Two datasets were used for this purpose, DrugBank compounds for HRH1 binding prediction and Traditional Chinese Medicine formulation predicted bioactivities for therapeutic class annotation. Secondly, an automated binning method for continuous data was evaluated, namely Scott's normal reference rule, in order to allow any decision tree to easily handle continuous data. This was applied to all approved drugs in DrugBank for predicting the RDKit SLogP property, using the remaining RDKit physicochemical attributes as input.

  20. MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination

    NASA Technical Reports Server (NTRS)

    Riggs, George A.; Hall, Dorothy K.

    2010-01-01

    Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.

  1. Data mining for multiagent rules, strategies, and fuzzy decision tree structure

    NASA Astrophysics Data System (ADS)

    Smith, James F., III; Rhyne, Robert D., II; Fisher, Kristin

    2002-03-01

    A fuzzy logic based resource manager (RM) has been developed that automatically allocates electronic attack resources in real-time over many dissimilar platforms. Two different data mining algorithms have been developed to determine rules, strategies, and fuzzy decision tree structure. The first data mining algorithm uses a genetic algorithm as a data mining function and is called from an electronic game. The game allows a human expert to play against the resource manager in a simulated battlespace with each of the defending platforms being exclusively directed by the fuzzy resource manager and the attacking platforms being controlled by the human expert or operating autonomously under their own logic. This approach automates the data mining problem. The game automatically creates a database reflecting the domain expert's knowledge. It calls a data mining function, a genetic algorithm, for data mining of the database as required and allows easy evaluation of the information mined in the second step. The criterion for re- optimization is discussed as well as experimental results. Then a second data mining algorithm that uses a genetic program as a data mining function is introduced to automatically discover fuzzy decision tree structures. Finally, a fuzzy decision tree generated through this process is discussed.

  2. Building Decision Trees for Characteristic Ellipsoid Method to Monitor Power System Transient Behaviors

    SciTech Connect

    Ma, Jian; Diao, Ruisheng; Makarov, Yuri V.; Etingov, Pavel V.; Zhou, Ning; Dagle, Jeffery E.

    2010-12-01

    The characteristic ellipsoid is a new method to monitor the dynamics of power systems. Decision trees (DTs) play an important role in applying the characteristic ellipsoid method to system operation and analysis. This paper presents the idea and initial results of building DTs for detecting transient dynamic events using the characteristic ellipsoid method. The objective is to determine fault types, fault locations and clearance time in the system using decision trees based on ellipsoids of system transient responses. The New England 10-machine 39-bus system is used for running dynamic simulations to generate a sufficiently large number of transient events in different system configurations. Comprehensive transient simulations considering three fault types, two fault clearance times and different fault locations were conducted in the study. Bus voltage magnitudes and monitored reactive and active power flows are recorded as the phasor measurements to calculate characteristic ellipsoids whose volume, eccentricity, center and projection of the longest axis are used as indices to build decision trees. The DT performances are tested and compared by considering different sets of PMU locations. The proposed method demonstrates that the characteristic ellipsoid method is a very efficient and promising tool to monitor power system dynamic behaviors.

  3. Transporter studies in drug development: experience to date and follow-up on decision trees from the International Transporter Consortium.

    PubMed

    Tweedie, D; Polli, J W; Berglund, E Gil; Huang, S M; Zhang, L; Poirier, A; Chu, X; Feng, B

    2013-07-01

    The International Transporter Consortium (ITC) organized a second workshop in March 2012 to expand on the themes developed during the inaugural ITC workshop held in 2008. The final session of the workshop provided perspectives from regulatory and industry-based scientists, with input from academic scientists, and focused primarily on the decision trees published from the first workshop. These decision trees have become a central part of subsequent regulatory drug-drug interaction (DDI) guidances issued over the past few years.

  4. Using decision trees to manage hospital readmission risk for acute myocardial infarction, heart failure, and pneumonia.

    PubMed

    Hilbert, John P; Zasadil, Scott; Keyser, Donna J; Peele, Pamela B

    2014-12-01

    To improve healthcare quality and reduce costs, the Affordable Care Act places hospitals at financial risk for excessive readmissions associated with acute myocardial infarction (AMI), heart failure (HF), and pneumonia (PN). Although predictive analytics is increasingly looked to as a means for measuring, comparing, and managing this risk, many modeling tools require data inputs that are not readily available and/or additional resources to yield actionable information. This article demonstrates how hospitals and clinicians can use their own structured discharge data to create decision trees that produce highly transparent, clinically relevant decision rules for better managing readmission risk associated with AMI, HF, and PN. For illustrative purposes, basic decision trees are trained and tested using publically available data from the California State Inpatient Databases and an open-source statistical package. As expected, these simple models perform less well than other more sophisticated tools, with areas under the receiver operating characteristic (ROC) curve (or AUC) of 0.612, 0.583, and 0.650, respectively, but achieve a lift of at least 1.5 or greater for higher-risk patients with any of the three conditions. More importantly, they are shown to offer substantial advantages in terms of transparency and interpretability, comprehensiveness, and adaptability. By enabling hospitals and clinicians to identify important factors associated with readmissions, target subgroups of patients at both high and low risk, and design and implement interventions that are appropriate to the risk levels observed, decision trees serve as an ideal application for addressing the challenge of reducing hospital readmissions.

  5. Cloud Detection from Satellite Imagery: A Comparison of Expert-Generated and Automatically-Generated Decision Trees

    NASA Technical Reports Server (NTRS)

    Shiffman, Smadar

    2004-01-01

    Automated cloud detection and tracking is an important step in assessing global climate change via remote sensing. Cloud masks, which indicate whether individual pixels depict clouds, are included in many of the data products that are based on data acquired on- board earth satellites. Many cloud-mask algorithms have the form of decision trees, which employ sequential tests that scientists designed based on empirical astrophysics studies and astrophysics simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In this study we explored the potential benefits of automatically-learned decision trees for detecting clouds from images acquired using the Advanced Very High Resolution Radiometer (AVHRR) instrument on board the NOAA-14 weather satellite of the National Oceanic and Atmospheric Administration. We constructed three decision trees for a sample of 8km-daily AVHRR data from 2000 using a decision-tree learning procedure provided within MATLAB(R), and compared the accuracy of the decision trees to the accuracy of the cloud mask. We used ground observations collected by the National Aeronautics and Space Administration Clouds and the Earth s Radiant Energy Systems S COOL project as the gold standard. For the sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks included in the AVHRR data product.

  6. Decision Tree based Prediction and Rule Induction for Groundwater Trichloroethene (TCE) Pollution Vulnerability

    NASA Astrophysics Data System (ADS)

    Park, J.; Yoo, K.

    2013-12-01

    For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.

  7. Decision tree and PCA-based fault diagnosis of rotating machinery

    NASA Astrophysics Data System (ADS)

    Sun, Weixiang; Chen, Jin; Li, Jiaqing

    2007-04-01

    After analysing the flaws of conventional fault diagnosis methods, data mining technology is introduced to fault diagnosis field, and a new method based on C4.5 decision tree and principal component analysis (PCA) is proposed. In this method, PCA is used to reduce features after data collection, preprocessing and feature extraction. Then, C4.5 is trained by using the samples to generate a decision tree model with diagnosis knowledge. At last the tree model is used to make diagnosis analysis. To validate the method proposed, six kinds of running states (normal or without any defect, unbalance, rotor radial rub, oil whirl, shaft crack and a simultaneous state of unbalance and radial rub), are simulated on Bently Rotor Kit RK4 to test C4.5 and PCA-based method and back-propagation neural network (BPNN). The result shows that C4.5 and PCA-based diagnosis method has higher accuracy and needs less training time than BPNN.

  8. Merging Multi-model CMIP5/PMIP3 Past-1000 Ensemble Simulations with Tree Ring Proxy Data by Optimal Interpolation Approach

    NASA Astrophysics Data System (ADS)

    Chen, Xin; Luo, Yong; Xing, Pei; Nie, Suping; Tian, Qinhua

    2015-04-01

    Two sets of gridded annual mean surface air temperature in past millennia over the Northern Hemisphere was constructed employing optimal interpolation (OI) method so as to merge the tree ring proxy records with the simulations from CMIP5 (the fifth phase of the Climate Model Intercomparison Project). Both the uncertainties in proxy reconstruction and model simulations can be taken into account applying OI algorithm. For better preservation of physical coordinated features and spatial-temporal completeness of climate variability in 7 copies of model results, we perform the Empirical Orthogonal Functions (EOF) analysis to truncate the ensemble mean field as the first guess (background field) for OI. 681 temperature sensitive tree-ring chronologies are collected and screened from International Tree Ring Data Bank (ITRDB) and Past Global Changes (PAGES-2k) project. Firstly, two methods (variance matching and linear regression) are employed to calibrate the tree ring chronologies with instrumental data (CRUTEM4v) individually. In addition, we also remove the bias of both the background field and proxy records relative to instrumental dataset. Secondly, time-varying background error covariance matrix (B) and static "observation" error covariance matrix (R) are calculated for OI frame. In our scheme, matrix B was calculated locally, and "observation" error covariance are partially considered in R matrix (the covariance value between the pairs of tree ring sites that are very close to each other would be counted), which is different from the traditional assumption that R matrix should be diagonal. Comparing our results, it turns out that regional averaged series are not sensitive to the selection for calibration methods. The Quantile-Quantile plots indicate regional climatologies based on both methods are tend to be more agreeable with regional reconstruction of PAGES-2k in 20th century warming period than in little ice age (LIA). Lager volcanic cooling response over Asia

  9. Ensembl 2017

    PubMed Central

    Aken, Bronwen L.; Achuthan, Premanand; Akanni, Wasiu; Amode, M. Ridwan; Bernsdorff, Friederike; Bhai, Jyothish; Billis, Konstantinos; Carvalho-Silva, Denise; Cummins, Carla; Clapham, Peter; Gil, Laurent; Girón, Carlos García; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah E.; Janacek, Sophie H.; Juettemann, Thomas; Keenan, Stephen; Laird, Matthew R.; Lavidas, Ilias; Maurel, Thomas; McLaren, William; Moore, Benjamin; Murphy, Daniel N.; Nag, Rishi; Newman, Victoria; Nuhn, Michael; Ong, Chuang Kee; Parker, Anne; Patricio, Mateus; Riat, Harpreet Singh; Sheppard, Daniel; Sparrow, Helen; Taylor, Kieron; Thormann, Anja; Vullo, Alessandro; Walts, Brandon; Wilder, Steven P.; Zadissa, Amonida; Kostadima, Myrto; Martin, Fergal J.; Muffato, Matthieu; Perry, Emily; Ruffier, Magali; Staines, Daniel M.; Trevanion, Stephen J.; Cunningham, Fiona; Yates, Andrew; Zerbino, Daniel R.; Flicek, Paul

    2017-01-01

    Ensembl (www.ensembl.org) is a database and genome browser for enabling research on vertebrate genomes. We import, analyse, curate and integrate a diverse collection of large-scale reference data to create a more comprehensive view of genome biology than would be possible from any individual dataset. Our extensive data resources include evidence-based gene and regulatory region annotation, genome variation and gene trees. An accompanying suite of tools, infrastructure and programmatic access methods ensure uniform data analysis and distribution for all supported species. Together, these provide a comprehensive solution for large-scale and targeted genomics applications alike. Among many other developments over the past year, we have improved our resources for gene regulation and comparative genomics, and added CRISPR/Cas9 target sites. We released new browser functionality and tools, including improved filtering and prioritization of genome variation, Manhattan plot visualization for linkage disequilibrium and eQTL data, and an ontology search for phenotypes, traits and disease. We have also enhanced data discovery and access with a track hub registry and a selection of new REST end points. All Ensembl data are freely released to the scientific community and our source code is available via the open source Apache 2.0 license. PMID:27899575

  10. Decision Optimization of Machine Sets Taking Into Consideration Logical Tree Minimization of Design Guidelines

    NASA Astrophysics Data System (ADS)

    Deptuła, A.; Partyka, M. A.

    2014-08-01

    The method of minimization of complex partial multi-valued logical functions determines the degree of importance of construction and exploitation parameters playing the role of logical decision variables. Logical functions are taken into consideration in the issues of modelling machine sets. In multi-valued logical functions with weighting products, it is possible to use a modified Quine - McCluskey algorithm of multi-valued functions minimization. Taking into account weighting coefficients in the logical tree minimization reflects a physical model of the object being analysed much better

  11. Improvement and analysis of ID3 algorithm in decision-making tree

    NASA Astrophysics Data System (ADS)

    Xie, Xiao-Lan; Long, Zhen; Liao, Wen-Qi

    2015-12-01

    For the cooperative system under development, it needs to use the spatial analysis and relative technology concerning data mining in order to carry out the detection of the subject conflict and redundancy, while the ID3 algorithm is an important data mining. Due to the traditional ID3 algorithm in the decision-making tree towards the log part is rather complicated, this paper obtained a new computational formula of information gain through the optimization of algorithm of the log part. During the experiment contrast and theoretical analysis, it is found that IID3 (Improved ID3 Algorithm) algorithm owns higher calculation efficiency and accuracy and thus worth popularizing.

  12. Decision Tree Classifier for Classification of Plant and Animal Micro RNA's

    NASA Astrophysics Data System (ADS)

    Pant, Bhasker; Pant, Kumud; Pardasani, K. R.

    Gene expression is regulated by miRNAs or micro RNAs which can be 21-23 nucleotide in length. They are non coding RNAs which control gene expression either by translation repression or mRNA degradation. Plants and animals both contain miRNAs which have been classified by wet lab techniques. These techniques are highly expensive, labour intensive and time consuming. Hence faster and economical computational approaches are needed. In view of above a machine learning model has been developed for classification of plant and animal miRNAs using decision tree classifier. The model has been tested on available data and it gives results with 91% accuracy.

  13. Spatial distribution of block falls using volumetric GIS-decision-tree models

    NASA Astrophysics Data System (ADS)

    Abdallah, C.

    2010-10-01

    Block falls are considered a significant aspect of surficial instability contributing to losses in land and socio-economic aspects through their damaging effects to natural and human environments. This paper predicts and maps the geographic distribution and volumes of block falls in central Lebanon using remote sensing, geographic information systems (GIS) and decision-tree modeling (un-pruned and pruned trees). Eleven terrain parameters (lithology, proximity to fault line, karst type, soil type, distance to drainage line, elevation, slope gradient, slope aspect, slope curvature, land cover/use, and proximity to roads) were generated to statistically explain the occurrence of block falls. The latter were discriminated using SPOT4 satellite imageries, and their dimensions were determined during field surveys. The un-pruned tree model based on all considered parameters explained 86% of the variability in field block fall measurements. Once pruned, it classifies 50% in block falls' volumes by selecting just four parameters (lithology, slope gradient, soil type, and land cover/use). Both tree models (un-pruned and pruned) were converted to quantitative 1:50,000 block falls' maps with different classes; starting from Nil (no block falls) to more than 4000 m 3. These maps are fairly matching with coincidence value equal to 45%; however, both can be used to prioritize the choice of specific zones for further measurement and modeling, as well as for land-use management. The proposed tree models are relatively simple, and may also be applied to other areas (i.e. the choice of un-pruned or pruned model is related to the availability of terrain parameters in a given area).

  14. Decision support for mitigating the risk of tree induced transmission line failure in utility rights-of-way.

    PubMed

    Poulos, H M; Camp, A E

    2010-02-01

    Vegetation management is a critical component of rights-of-way (ROW) maintenance for preventing electrical outages and safety hazards resulting from tree contact with conductors during storms. Northeast Utility's (NU) transmission lines are a critical element of the nation's power grid; NU is therefore under scrutiny from federal agencies charged with protecting the electrical transmission infrastructure of the United States. We developed a decision support system to focus right-of-way maintenance and minimize the potential for a tree fall episode that disables transmission capacity across the state of Connecticut. We used field data on tree characteristics to develop a system for identifying hazard trees (HTs) in the field using limited equipment to manage Connecticut power line ROW. Results from this study indicated that the tree height-to-diameter ratio, total tree height, and live crown ratio were the key characteristics that differentiated potential risk trees (danger trees) from trees with a high probability of tree fall (HTs). Products from this research can be transferred to adaptive right-of-way management, and the methods we used have great potential for future application to other regions of the United States and elsewhere where tree failure can disrupt electrical power.

  15. Integrating Decision Tree and Hidden Markov Model (HMM) for Subtype Prediction of Human Influenza A Virus

    NASA Astrophysics Data System (ADS)

    Attaluri, Pavan K.; Chen, Zhengxin; Weerakoon, Aruna M.; Lu, Guoqing

    Multiple criteria decision making (MCDM) has significant impact in bioinformatics. In the research reported here, we explore the integration of decision tree (DT) and Hidden Markov Model (HMM) for subtype prediction of human influenza A virus. Infection with influenza viruses continues to be an important public health problem. Viral strains of subtype H3N2 and H1N1 circulates in humans at least twice annually. The subtype detection depends mainly on the antigenic assay, which is time-consuming and not fully accurate. We have developed a Web system for accurate subtype detection of human influenza virus sequences. The preliminary experiment showed that this system is easy-to-use and powerful in identifying human influenza subtypes. Our next step is to examine the informative positions at the protein level and extend its current functionality to detect more subtypes. The web functions can be accessed at http://glee.ist.unomaha.edu/.

  16. A study of fuzzy logic ensemble system performance on face recognition problem

    NASA Astrophysics Data System (ADS)

    Polyakova, A.; Lipinskiy, L.

    2017-02-01

    Some problems are difficult to solve by using a single intelligent information technology (IIT). The ensemble of the various data mining (DM) techniques is a set of models which are able to solve the problem by itself, but the combination of which allows increasing the efficiency of the system as a whole. Using the IIT ensembles can improve the reliability and efficiency of the final decision, since it emphasizes on the diversity of its components. The new method of the intellectual informational technology ensemble design is considered in this paper. It is based on the fuzzy logic and is designed to solve the classification and regression problems. The ensemble consists of several data mining algorithms: artificial neural network, support vector machine and decision trees. These algorithms and their ensemble have been tested by solving the face recognition problems. Principal components analysis (PCA) is used for feature selection.

  17. Using Boosting Decision Trees in Gravitational Wave Searches triggered by Gamma-ray Bursts

    NASA Astrophysics Data System (ADS)

    Zuraw, Sarah; LIGO Collaboration

    2015-04-01

    The search for gravitational wave bursts requires the ability to distinguish weak signals from background detector noise. Gravitational wave bursts are characterized by their transient nature, making them particularly difficult to detect as they are similar to non-Gaussian noise fluctuations in the detector. The Boosted Decision Tree method is a powerful machine learning algorithm which uses Multivariate Analysis techniques to explore high-dimensional data sets in order to distinguish between gravitational wave signal and background detector noise. It does so by training with known noise events and simulated gravitational wave events. The method is tested using waveform models and compared with the performance of the standard gravitational wave burst search pipeline for Gamma-ray Bursts. It is shown that the method is able to effectively distinguish between signal and background events under a variety of conditions and over multiple Gamma-ray Burst events. This example demonstrates the usefulness and robustness of the Boosted Decision Tree and Multivariate Analysis techniques as a detection method for gravitational wave bursts. LIGO, UMass, PREP, NEGAP.

  18. Diagnostic Features of Common Oral Ulcerative Lesions: An Updated Decision Tree

    PubMed Central

    Safi, Yaser

    2016-01-01

    Diagnosis of oral ulcerative lesions might be quite challenging. This narrative review article aims to introduce an updated decision tree for diagnosing oral ulcerative lesions on the basis of their diagnostic features. Various general search engines and specialized databases including PubMed, PubMed Central, Medline Plus, EBSCO, Science Direct, Scopus, Embase, and authenticated textbooks were used to find relevant topics by means of MeSH keywords such as “oral ulcer,” “stomatitis,” and “mouth diseases.” Thereafter, English-language articles published since 1983 to 2015 in both medical and dental journals including reviews, meta-analyses, original papers, and case reports were appraised. Upon compilation of the relevant data, oral ulcerative lesions were categorized into three major groups: acute, chronic, and recurrent ulcers and into five subgroups: solitary acute, multiple acute, solitary chronic, multiple chronic, and solitary/multiple recurrent, based on the number and duration of lesions. In total, 29 entities were organized in the form of a decision tree in order to help clinicians establish a logical diagnosis by stepwise progression. PMID:27781066

  19. Recognition of Protozoa and Metazoa using image analysis tools, discriminant analysis, neural networks and decision trees.

    PubMed

    Ginoris, Y P; Amaral, A L; Nicolau, A; Coelho, M A Z; Ferreira, E C

    2007-07-09

    Protozoa and metazoa are considered good indicators of the treatment quality in activated sludge systems due to the fact that these organisms are fairly sensitive to physical, chemical and operational processes. Therefore, it is possible to establish close relationships between the predominance of certain species or groups of species and several operational parameters of the plant, such as the biotic indices, namely the Sludge Biotic Index (SBI). This procedure requires the identification, classification and enumeration of the different species, which is usually achieved manually implying both time and expertise availability. Digital image analysis combined with multivariate statistical techniques has proved to be a useful tool to classify and quantify organisms in an automatic and not subjective way. This work presents a semi-automatic image analysis procedure for protozoa and metazoa recognition developed in Matlab language. The obtained morphological descriptors were analyzed using discriminant analysis, neural network and decision trees multivariable statistical techniques to identify and classify each protozoan or metazoan. The obtained procedure was quite adequate for distinguishing between the non-sessile protozoa classes and also for the metazoa classes, with high values for the overall species recognition with the exception of sessile protozoa. In terms of the wastewater conditions assessment the obtained results were found to be suitable for the prediction of these conditions. Finally, the discriminant analysis and neural networks results were found to be quite similar whereas the decision trees technique was less appropriate.

  20. Using decision-tree classifier systems to extract knowledge from databases

    NASA Technical Reports Server (NTRS)

    St.clair, D. C.; Sabharwal, C. L.; Hacke, Keith; Bond, W. E.

    1990-01-01

    One difficulty in applying artificial intelligence techniques to the solution of real world problems is that the development and maintenance of many AI systems, such as those used in diagnostics, require large amounts of human resources. At the same time, databases frequently exist which contain information about the process(es) of interest. Recently, efforts to reduce development and maintenance costs of AI systems have focused on using machine learning techniques to extract knowledge from existing databases. Research is described in the area of knowledge extraction using a class of machine learning techniques called decision-tree classifier systems. Results of this research suggest ways of performing knowledge extraction which may be applied in numerous situations. In addition, a measurement called the concept strength metric (CSM) is described which can be used to determine how well the resulting decision tree can differentiate between the concepts it has learned. The CSM can be used to determine whether or not additional knowledge needs to be extracted from the database. An experiment involving real world data is presented to illustrate the concepts described.

  1. Development of decision tree models for substrates, inhibitors, and inducers of p-glycoprotein.

    PubMed

    Hammann, Felix; Gutmann, Heike; Jecklin, Ursula; Maunz, Andreas; Helma, Christoph; Drewe, Juergen

    2009-05-01

    In silico classification of new compounds for certain properties is a useful tool to guide further experiments or compound selection. Interaction of new compounds with the efflux pump P-glycoprotein (P-gp) is an important drug property determining tissue distribution and the potential for drug-drug interactions. We present three datasets on substrate, inhibitor, and inducer activities for P-gp (n = 471) obtained from a literature search which we compared to an existing evaluation of the Prestwick Chemical Library with the calcein-AM assay (retrieved from PubMed). Additionally, we present decision tree models of these activities with predictive accuracies of 77.7 % (substrates), 86.9 % (inhibitors), and 90.3 % (inducers) using three algorithms (CHAID, CART, and C4.5). We also present decision tree models of the calcein-AM assay (79.9 %). Apart from a comprehensive dataset of P-gp interacting compounds, our study provides evidence of the efficacy of logD descriptors and of two algorithms not commonly used in pharmacological QSAR studies (CART and CHAID).

  2. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

    PubMed Central

    Chang, Wan-Yu; Chiu, Chung-Cheng; Yang, Jia-Horng

    2015-01-01

    In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods. PMID:26393597

  3. Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees.

    PubMed

    Chang, Wan-Yu; Chiu, Chung-Cheng; Yang, Jia-Horng

    2015-09-18

    In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods.

  4. Computational prediction of blood-brain barrier permeability using decision tree induction.

    PubMed

    Suenderhauf, Claudia; Hammann, Felix; Huwyler, Jörg

    2012-08-31

    Predicting blood-brain barrier (BBB) permeability is essential to drug development, as a molecule cannot exhibit pharmacological activity within the brain parenchyma without first transiting this barrier. Understanding the process of permeation, however, is complicated by a combination of both limited passive diffusion and active transport. Our aim here was to establish predictive models for BBB drug permeation that include both active and passive transport. A database of 153 compounds was compiled using in vivo surface permeability product (logPS) values in rats as a quantitative parameter for BBB permeability. The open source Chemical Development Kit (CDK) was used to calculate physico-chemical properties and descriptors. Predictive computational models were implemented by machine learning paradigms (decision tree induction) on both descriptor sets. Models with a corrected classification rate (CCR) of 90% were established. Mechanistic insight into BBB transport was provided by an Ant Colony Optimization (ACO)-based binary classifier analysis to identify the most predictive chemical substructures. Decision trees revealed descriptors of lipophilicity (aLogP) and charge (polar surface area), which were also previously described in models of passive diffusion. However, measures of molecular geometry and connectivity were found to be related to an active drug transport component.

  5. Decision trees to characterise the roles of permeability and solubility on the prediction of oral absorption.

    PubMed

    Newby, Danielle; Freitas, Alex A; Ghafourian, Taravat

    2015-01-27

    Oral absorption of compounds depends on many physiological, physiochemical and formulation factors. Two important properties that govern oral absorption are in vitro permeability and solubility, which are commonly used as indicators of human intestinal absorption. Despite this, the nature and exact characteristics of the relationship between these parameters are not well understood. In this study a large dataset of human intestinal absorption was collated along with in vitro permeability, aqueous solubility, melting point, and maximum dose for the same compounds. The dataset allowed a permeability threshold to be established objectively to predict high or low intestinal absorption. Using this permeability threshold, classification decision trees incorporating a solubility-related parameter such as experimental or predicted solubility, or the melting point based absorption potential (MPbAP), along with structural molecular descriptors were developed and validated to predict oral absorption class. The decision trees were able to determine the individual roles of permeability and solubility in oral absorption process. Poorly permeable compounds with high solubility show low intestinal absorption, whereas poorly water soluble compounds with high or low permeability may have high intestinal absorption provided that they have certain molecular characteristics such as a small polar surface or specific topology.

  6. Decision tree analysis of factors influencing rainfall-related building damage

    NASA Astrophysics Data System (ADS)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-04-01

    Flood damage prediction models are essential building blocks in flood risk assessments. Little research has been dedicated so far to damage of small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period of 1998-2011. The databases include claims of water-related damage, for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor. Response variables being modelled are average claim size and claim frequency, per district per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), ownership structure (content data only) and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size, which suggest that variability in average claim size is related to explanatory variables that cannot be defined at the district scale. Cross-validation results show that decision trees were able to predict 22-26% of variance in claim frequency, which is considerably better compared to results from global multiple regression models (11-18% of variance explained). Still, a

  7. Decision-tree analysis of factors influencing rainfall-related building structure and content damage

    NASA Astrophysics Data System (ADS)

    Spekkers, M. H.; Kok, M.; Clemens, F. H. L. R.; ten Veldhuis, J. A. E.

    2014-09-01

    Flood-damage prediction models are essential building blocks in flood risk assessments. So far, little research has been dedicated to damage from small-scale urban floods caused by heavy rainfall, while there is a need for reliable damage models for this flood type among insurers and water authorities. The aim of this paper is to investigate a wide range of damage-influencing factors and their relationships with rainfall-related damage, using decision-tree analysis. For this, district-aggregated claim data from private property insurance companies in the Netherlands were analysed, for the period 1998-2011. The databases include claims of water-related damage (for example, damages related to rainwater intrusion through roofs and pluvial flood water entering buildings at ground floor). Response variables being modelled are average claim size and claim frequency, per district, per day. The set of predictors include rainfall-related variables derived from weather radar images, topographic variables from a digital terrain model, building-related variables and socioeconomic indicators of households. Analyses were made separately for property and content damage claim data. Results of decision-tree analysis show that claim frequency is most strongly associated with maximum hourly rainfall intensity, followed by real estate value, ground floor area, household income, season (property data only), buildings age (property data only), a fraction of homeowners (content data only), a and fraction of low-rise buildings (content data only). It was not possible to develop statistically acceptable trees for average claim size. It is recommended to investigate explanations for the failure to derive models. These require the inclusion of other explanatory factors that were not used in the present study, an investigation of the variability in average claim size at different spatial scales, and the collection of more detailed insurance data that allows one to distinguish between the

  8. High resolution multisensor fusion of SAR, optical and LiDAR data based on crisp vs. fuzzy and feature vs. decision ensemble systems

    NASA Astrophysics Data System (ADS)

    Bigdeli, Behnaz; Pahlavani, Parham

    2016-10-01

    Synthetic Aperture Radar (SAR) data are of high interest for different applications in remote sensing specially land cover classification. SAR imaging is independent of solar illumination and weather conditions. It can even penetrate some of the Earth's surface materials to return information about subsurface features. However, the response of radar is more a function of geometry and structure than a surface reflection occurs in optical images. In addition, the backscatter of objects in the microwave range depends on the frequency of the band used, and the grey values in SAR images are different from the usual assumption of the spectral reflectance of the Earth's surface. Consequently, SAR imaging is often used as a complementary technique to traditional optical remote sensing. This study presents different ensemble systems for multisensor fusion of SAR, multispectral and LiDAR data. First, in decision ensemble system, after extraction and selection of proper features from each data, crisp SVM (Support Vector Machine) and Fuzzy KNN (K Nearest Neighbor) are utilized on each feature space. Finally Bayesian Theory is applied to fuse SVMs when Decision Template (DT) and Dempster Shafer (DS) are applied as fuzzy decision fusion methods on KNNs. Second, in feature ensemble system, features from all data are applied on a cube. Then classifications were performed by SVM and FKNN as crisp and fuzzy decision making system respectively. A co-registered TerrraSAR-X, WorldView-2 and LiDAR data set form San Francisco of USA was available to examine the effectiveness of the proposed method. The results show that combinations of SAR data with different sensor improves classification results for most of the classes.

  9. Determinants of farmers' tree planting investment decision as a degraded landscape management strategy in the central highlands of Ethiopia

    NASA Astrophysics Data System (ADS)

    Gessesse, B.; Bewket, W.; Bräuning, A.

    2015-11-01

    Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and binary logistic regression model. The model significantly predicted farmers' tree planting decision (Chi-square = 37.29, df = 15, P<0.001). Besides, the computed significant value of the model suggests that all the considered predictor variables jointly influenced the farmers' decision to plant trees as a land management strategy. In this regard, the finding of the study show that local land-users' willingness to adopt tree growing decision is a function of a wide range of biophysical, institutional, socioeconomic and household level factors, however, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system have positively and significantly influence on tree growing investment decisions in the study watershed. Eventually, the processes of land use conversion and land degradation are serious which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, devising sustainable and integrated land management policy options and implementing them would enhance ecological restoration and livelihood sustainability in the study watershed.

  10. Determinants of farmers' tree-planting investment decisions as a degraded landscape management strategy in the central highlands of Ethiopia

    NASA Astrophysics Data System (ADS)

    Gessesse, Berhan; Bewket, Woldeamlak; Bräuning, Achim

    2016-04-01

    Land degradation due to lack of sustainable land management practices is one of the critical challenges in many developing countries including Ethiopia. This study explored the major determinants of farm-level tree-planting decisions as a land management strategy in a typical farming and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and a binary logistic regression model. The model significantly predicted farmers' tree-planting decisions (χ2 = 37.29, df = 15, P < 0.001). Besides, the computed significant value of the model revealed that all the considered predictor variables jointly influenced the farmers' decisions to plant trees as a land management strategy. The findings of the study demonstrated that the adoption of tree-growing decisions by local land users was a function of a wide range of biophysical, institutional, socioeconomic and household-level factors. In this regard, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system had a critical influence on tree-growing investment decisions in the study watershed. Eventually, the processes of land-use conversion and land degradation were serious, which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, the study recommended that devising and implementing sustainable land management policy options would enhance ecological restoration and livelihood sustainability in the study watershed.

  11. Generation of 2D Land Cover Maps for Urban Areas Using Decision Tree Classification

    NASA Astrophysics Data System (ADS)

    Höhle, J.

    2014-09-01

    A 2D land cover map can automatically and efficiently be generated from high-resolution multispectral aerial images. First, a digital surface model is produced and each cell of the elevation model is then supplemented with attributes. A decision tree classification is applied to extract map objects like buildings, roads, grassland, trees, hedges, and walls from such an "intelligent" point cloud. The decision tree is derived from training areas which borders are digitized on top of a false-colour orthoimage. The produced 2D land cover map with six classes is then subsequently refined by using image analysis techniques. The proposed methodology is described step by step. The classification, assessment, and refinement is carried out by the open source software "R"; the generation of the dense and accurate digital surface model by the "Match-T DSM" program of the Trimble Company. A practical example of a 2D land cover map generation is carried out. Images of a multispectral medium-format aerial camera covering an urban area in Switzerland are used. The assessment of the produced land cover map is based on class-wise stratified sampling where reference values of samples are determined by means of stereo-observations of false-colour stereopairs. The stratified statistical assessment of the produced land cover map with six classes and based on 91 points per class reveals a high thematic accuracy for classes "building" (99 %, 95 % CI: 95 %-100 %) and "road and parking lot" (90 %, 95 % CI: 83 %-95 %). Some other accuracy measures (overall accuracy, kappa value) and their 95 % confidence intervals are derived as well. The proposed methodology has a high potential for automation and fast processing and may be applied to other scenes and sensors.

  12. MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine

    PubMed Central

    Valdes, Gilmer; Luna, José Marcio; Eaton, Eric; Simone, Charles B.; Ungar, Lyle H.; Solberg, Timothy D.

    2016-01-01

    Machine learning algorithms that are both interpretable and accurate are essential in applications such as medicine where errors can have a dire consequence. Unfortunately, there is currently a tradeoff between accuracy and interpretability among state-of-the-art methods. Decision trees are interpretable and are therefore used extensively throughout medicine for stratifying patients. Current decision tree algorithms, however, are consistently outperformed in accuracy by other, less-interpretable machine learning models, such as ensemble methods. We present MediBoost, a novel framework for constructing decision trees that retain interpretability while having accuracy similar to ensemble methods, and compare MediBoost’s performance to that of conventional decision trees and ensemble methods on 13 medical classification problems. MediBoost significantly outperformed current decision tree algorithms in 11 out of 13 problems, giving accuracy comparable to ensemble methods. The resulting trees are of the same type as decision trees used throughout clinical practice but have the advantage of improved accuracy. Our algorithm thus gives the best of both worlds: it grows a single, highly interpretable tree that has the high accuracy of ensemble methods. PMID:27901055

  13. MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine

    NASA Astrophysics Data System (ADS)

    Valdes, Gilmer; Luna, José Marcio; Eaton, Eric; Simone, Charles B.; Ungar, Lyle H.; Solberg, Timothy D.

    2016-11-01

    Machine learning algorithms that are both interpretable and accurate are essential in applications such as medicine where errors can have a dire consequence. Unfortunately, there is currently a tradeoff between accuracy and interpretability among state-of-the-art methods. Decision trees are interpretable and are therefore used extensively throughout medicine for stratifying patients. Current decision tree algorithms, however, are consistently outperformed in accuracy by other, less-interpretable machine learning models, such as ensemble methods. We present MediBoost, a novel framework for constructing decision trees that retain interpretability while having accuracy similar to ensemble methods, and compare MediBoost’s performance to that of conventional decision trees and ensemble methods on 13 medical classification problems. MediBoost significantly outperformed current decision tree algorithms in 11 out of 13 problems, giving accuracy comparable to ensemble methods. The resulting trees are of the same type as decision trees used throughout clinical practice but have the advantage of improved accuracy. Our algorithm thus gives the best of both worlds: it grows a single, highly interpretable tree that has the high accuracy of ensemble methods.

  14. CorRECTreatment: A Web-based Decision Support Tool for Rectal Cancer Treatment that Uses the Analytic Hierarchy Process and Decision Tree

    PubMed Central

    Karakülah, G.; Dicle, O.; Sökmen, S.; Çelikoğlu, C.C.

    2015-01-01

    Summary Background The selection of appropriate rectal cancer treatment is a complex multi-criteria decision making process, in which clinical decision support systems might be used to assist and enrich physicians’ decision making. Objective The objective of the study was to develop a web-based clinical decision support tool for physicians in the selection of potentially beneficial treatment options for patients with rectal cancer. Methods The updated decision model contained 8 and 10 criteria in the first and second steps respectively. The decision support model, developed in our previous study by combining the Analytic Hierarchy Process (AHP) method which determines the priority of criteria and decision tree that formed using these priorities, was updated and applied to 388 patients data collected retrospectively. Later, a web-based decision support tool named corRECTreatment was developed. The compatibility of the treatment recommendations by the expert opinion and the decision support tool was examined for its consistency. Two surgeons were requested to recommend a treatment and an overall survival value for the treatment among 20 different cases that we selected and turned into a scenario among the most common and rare treatment options in the patient data set. Results In the AHP analyses of the criteria, it was found that the matrices, generated for both decision steps, were consistent (consistency ratio<0.1). Depending on the decisions of experts, the consistency value for the most frequent cases was found to be 80% for the first decision step and 100% for the second decision step. Similarly, for rare cases consistency was 50% for the first decision step and 80% for the second decision step. Conclusions The decision model and corRECTreatment, developed by applying these on real patient data, are expected to provide potential users with decision support in rectal cancer treatment processes and facilitate them in making projections about treatment options

  15. Trees

    ERIC Educational Resources Information Center

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  16. Comparative Analysis of Decision Trees with Logistic Regression in Predicting Fault-Prone Classes

    NASA Astrophysics Data System (ADS)

    Singh, Yogesh; Takkar, Arvinder Kaur; Malhotra, Ruchika

    There are available metrics for predicting fault prone classes, which may help software organizations for planning and performing testing activities. This may be possible due to proper allocation of resources on fault prone parts of the design and code of the software. Hence, importance and usefulness of such metrics is understandable, but empirical validation of these metrics is always a great challenge. Decision Tree (DT) methods have been successfully applied for solving classification problems in many applications. This paper evaluates the capability of three DT methods and compares its performance with statistical method in predicting fault prone software classes using publicly available NASA data set. The results indicate that the prediction performance of DT is generally better than statistical model. However, similar types of studies are required to be carried out in order to establish the acceptability of the DT models.

  17. Multi-output decision trees for lesion segmentation in multiple sclerosis

    NASA Astrophysics Data System (ADS)

    Jog, Amod; Carass, Aaron; Pham, Dzung L.; Prince, Jerry L.

    2015-03-01

    Multiple Sclerosis (MS) is a disease of the central nervous system in which the protective myelin sheath of the neurons is damaged. MS leads to the formation of lesions, predominantly in the white matter of the brain and the spinal cord. The number and volume of lesions visible in magnetic resonance (MR) imaging (MRI) are important criteria for diagnosing and tracking the progression of MS. Locating and delineating lesions manually requires the tedious and expensive efforts of highly trained raters. In this paper, we propose an automated algorithm to segment lesions in MR images using multi-output decision trees. We evaluated our algorithm on the publicly available MICCAI 2008 MS Lesion Segmentation Challenge training dataset of 20 subjects, and showed improved results in comparison to state-of-the-art methods. We also evaluated our algorithm on an in-house dataset of 49 subjects with a true positive rate of 0.41 and a positive predictive value 0.36.

  18. Identification of Water Bodies in a Landsat 8 OLI Image Using a J48 Decision Tree

    PubMed Central

    Acharya, Tri Dev; Lee, Dong Ha; Yang, In Tae; Lee, Jae Kang

    2016-01-01

    Water bodies are essential to humans and other forms of life. Identification of water bodies can be useful in various ways, including estimation of water availability, demarcation of flooded regions, change detection, and so on. In past decades, Landsat satellite sensors have been used for land use classification and water body identification. Due to the introduction of a New Operational Land Imager (OLI) sensor on Landsat 8 with a high spectral resolution and improved signal-to-noise ratio, the quality of imagery sensed by Landsat 8 has improved, enabling better characterization of land cover and increased data size. Therefore, it is necessary to explore the most appropriate and practical water identification methods that take advantage of the improved image quality and use the fewest inputs based on the original OLI bands. The objective of the study is to explore the potential of a J48 decision tree (JDT) in identifying water bodies using reflectance bands from Landsat 8 OLI imagery. J48 is an open-source decision tree. The test site for the study is in the Northern Han River Basin, which is located in Gangwon province, Korea. Training data with individual bands were used to develop the JDT model and later applied to the whole study area. The performance of the model was statistically analysed using the kappa statistic and area under the curve (AUC). The results were compared with five other known water identification methods using a confusion matrix and related statistics. Almost all the methods showed high accuracy, and the JDT was successfully applied to the OLI image using only four bands, where the new additional deep blue band of OLI was found to have the third highest information gain. Thus, the JDT can be a good method for water body identification based on images with improved resolution and increased size. PMID:27420067

  19. A novel decision-tree method for structured continuous-label classification.

    PubMed

    Hu, Hsiao-Wei; Chen, Yen-Liang; Tang, Kwei

    2013-12-01

    Structured continuous-label classification is a variety of classification in which the label is continuous in the data, but the goal is to classify data into classes that are a set of predefined ranges and can be organized in a hierarchy. In the hierarchy, the ranges at the lower levels are more specific and inherently more difficult to predict, whereas the ranges at the upper levels are less specific and inherently easier to predict. Therefore, both prediction specificity and prediction accuracy must be considered when building a decision tree (DT) from this kind of data. This paper proposes a novel classification algorithm for learning DT classifiers from data with structured continuous labels. This approach considers the distribution of labels throughout the hierarchical structure during the construction of trees without requiring discretization in the preprocessing stage. We compared the results of the proposed method with those of the C4.5 algorithm using eight real data sets. The empirical results indicate that the proposed method outperforms the C4.5 algorithm with regard to prediction accuracy, prediction specificity, and computational complexity.

  20. A data mining approach to optimize pellets manufacturing process based on a decision tree algorithm.

    PubMed

    Ronowicz, Joanna; Thommes, Markus; Kleinebudde, Peter; Krysiński, Jerzy

    2015-06-20

    The present study is focused on the thorough analysis of cause-effect relationships between pellet formulation characteristics (pellet composition as well as process parameters) and the selected quality attribute of the final product. The shape using the aspect ratio value expressed the quality of pellets. A data matrix for chemometric analysis consisted of 224 pellet formulations performed by means of eight different active pharmaceutical ingredients and several various excipients, using different extrusion/spheronization process conditions. The data set contained 14 input variables (both formulation and process variables) and one output variable (pellet aspect ratio). A tree regression algorithm consistent with the Quality by Design concept was applied to obtain deeper understanding and knowledge of formulation and process parameters affecting the final pellet sphericity. The clear interpretable set of decision rules were generated. The spehronization speed, spheronization time, number of holes and water content of extrudate have been recognized as the key factors influencing pellet aspect ratio. The most spherical pellets were achieved by using a large number of holes during extrusion, a high spheronizer speed and longer time of spheronization. The described data mining approach enhances knowledge about pelletization process and simultaneously facilitates searching for the optimal process conditions which are necessary to achieve ideal spherical pellets, resulting in good flow characteristics. This data mining approach can be taken into consideration by industrial formulation scientists to support rational decision making in the field of pellets technology.

  1. Ensemble machine learning on gene expression data for cancer classification.

    PubMed

    Tan, Aik Choon; Gilbert, David

    2003-01-01

    Whole genome RNA expression studies permit systematic approaches to understanding the correlation between gene expression profiles to disease states or different developmental stages of a cell. Microarray analysis provides quantitative information about the complete transcription profile of cells that facilitate drug and therapeutics development, disease diagnosis, and understanding in the basic cell biology. One of the challenges in microarray analysis, especially in cancerous gene expression profiles, is to identify genes or groups of genes that are highly expressed in tumour cells but not in normal cells and vice versa. Previously, we have shown that ensemble machine learning consistently performs well in classifying biological data. In this paper, we focus on three different supervised machine learning techniques in cancer classification, namely C4.5 decision tree, and bagged and boosted decision trees. We have performed classification tasks on seven publicly available cancerous microarray data and compared the classification/prediction performance of these methods. We have observed that ensemble learning (bagged and boosted decision trees) often performs better than single decision trees in this classification task.

  2. Classification of Parkinsonian Syndromes from FDG-PET Brain Data Using Decision Trees with SSM/PCA Features

    PubMed Central

    Mudali, D.; Teune, L. K.; Renken, R. J.; Leenders, K. L.; Roerdink, J. B. T. M.

    2015-01-01

    Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data. PMID:25918550

  3. A Decision-Tree-Oriented Guidance Mechanism for Conducting Nature Science Observation Activities in a Context-Aware Ubiquitous Learning

    ERIC Educational Resources Information Center

    Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung

    2010-01-01

    A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…

  4. What Satisfies Students? Mining Student-Opinion Data with Regression and Decision-Tree Analysis. AIR 2002 Forum Paper.

    ERIC Educational Resources Information Center

    Thomas, Emily H.; Galambos, Nora

    To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…

  5. VR-BFDT: A variance reduction based binary fuzzy decision tree induction method for protein function prediction.

    PubMed

    Golzari, Fahimeh; Jalili, Saeed

    2015-07-21

    In protein function prediction (PFP) problem, the goal is to predict function of numerous well-sequenced known proteins whose function is not still known precisely. PFP is one of the special and complex problems in machine learning domain in which a protein (regarded as instance) may have more than one function simultaneously. Furthermore, the functions (regarded as classes) are dependent and also are organized in a hierarchical structure in the form of a tree or directed acyclic graph. One of the common learning methods proposed for solving this problem is decision trees in which, by partitioning data into sharp boundaries sets, small changes in the attribute values of a new instance may cause incorrect change in predicted label of the instance and finally misclassification. In this paper, a Variance Reduction based Binary Fuzzy Decision Tree (VR-BFDT) algorithm is proposed to predict functions of the proteins. This algorithm just fuzzifies the decision boundaries instead of converting the numeric attributes into fuzzy linguistic terms. It has the ability of assigning multiple functions to each protein simultaneously and preserves the hierarchy consistency between functional classes. It uses the label variance reduction as splitting criterion to select the best "attribute-value" at each node of the decision tree. The experimental results show that the overall performance of the proposed algorithm is promising.

  6. Using Ensemble Decisions and Active Selection to Improve Low-Cost Labeling for Multi-View Data

    NASA Technical Reports Server (NTRS)

    Rebbapragada, Umaa; Wagstaff, Kiri L.

    2011-01-01

    This paper seeks to improve low-cost labeling in terms of training set reliability (the fraction of correctly labeled training items) and test set performance for multi-view learning methods. Co-training is a popular multiview learning method that combines high-confidence example selection with low-cost (self) labeling. However, co-training with certain base learning algorithms significantly reduces training set reliability, causing an associated drop in prediction accuracy. We propose the use of ensemble labeling to improve reliability in such cases. We also discuss and show promising results on combining low-cost ensemble labeling with active (low-confidence) example selection. We unify these example selection and labeling strategies under collaborative learning, a family of techniques for multi-view learning that we are developing for distributed, sensor-network environments.

  7. Visualization of spatial decision tree for predicting hotspot occurrence in land and forest in Rokan Hilir District Riau

    NASA Astrophysics Data System (ADS)

    Primajaya, Aji; Sukaesih Sitanggang, Imas; Syaufina, Lailan

    2017-01-01

    Visualization is an important issue in datamining to easy understand patterns extracted from dataset. This research applied the Bottom-Up Approach method to develop a visualization module for a spatial decision tree in a geographic information system. Spatial data used in this work consists of nine explanatory layers and one target layers. Explanatory layers include maximum daily temperature, daily precipitation, wind of speed, distance of nearest river, distance of nearest road, land cover, peatland type, peatland depth, income source. The target layer contains hotspot and non-hotspot points that occurred in 2008. The result is the visualization module of spatial decision tree that has three main features including mapping window, interactive window, tree node and tabular visualization for predicting hotspot occurrence.

  8. Ensemble-based analysis of Front Range severe convection on 6-7 June 2012: Forecast uncertainty and communication of weather information to Front Range decision-makers

    NASA Astrophysics Data System (ADS)

    Vincente, Vanessa

    -allowing ensemble also showed greater skill in forecasting heavy precipitation amounts in the vicinity of where they were observed during the most active convective period, particularly near urbanized areas. A total of 9 Front Range EMs were interviewed to research how they understood hazardous weather information, and how their perception of forecast uncertainty would influence their decision making following a heavy rain event. Many of the EMs use situational awareness and past experiences with major weather events to guide their emergency planning. They also highly valued their relationship with the National Weather Service to improve their understanding of weather forecasts and ask questions about the uncertainties. Most of the EMs perceived forecast uncertainty in terms of probability and with the understanding that forecasting the weather is an imprecise science. The greater the likelihood of occurrence (implied by a higher probability of precipitation) showed greater confidence in the forecast that an event was likely to happen. Five probabilistic forecast products were generated from the convection-allowing ensemble output to generate a hypothetical warm season heavy rain event scenario. Responses varied between the EMs in which products they found most practical or least useful. Most EMs believed that there was a high probability for flooding, as illustrated by the degree of forecasted precipitation intensity. Most confirmed perceiving uncertainty in the different forecast representations, sharing the idea that there is an inherent uncertainty that follows modeled forecasts. The long-term goal of this research is to develop and add reliable probabilistic forecast products to the "toolbox" of decision-makers to help them better assess hazardous weather information and improve warning notifications and response.

  9. Determining Cutoff Point of Ensemble Trees Based on Sample Size in Predicting Clinical Dose with DNA Microarray Data

    PubMed Central

    Karabulut, Erdem; Alpar, Celal Reha

    2016-01-01

    Background/Aim. Evaluating the success of dose prediction based on genetic or clinical data has substantially advanced recently. The aim of this study is to predict various clinical dose values from DNA gene expression datasets using data mining techniques. Materials and Methods. Eleven real gene expression datasets containing dose values were included. First, important genes for dose prediction were selected using iterative sure independence screening. Then, the performances of regression trees (RTs), support vector regression (SVR), RT bagging, SVR bagging, and RT boosting were examined. Results. The results demonstrated that a regression-based feature selection method substantially reduced the number of irrelevant genes from raw datasets. Overall, the best prediction performance in nine of 11 datasets was achieved using SVR; the second most accurate performance was provided using a gradient-boosting machine (GBM). Conclusion. Analysis of various dose values based on microarray gene expression data identified common genes found in our study and the referenced studies. According to our findings, SVR and GBM can be good predictors of dose-gene datasets. Another result of the study was to identify the sample size of n = 25 as a cutoff point for RT bagging to outperform a single RT. PMID:28096893

  10. Applying of Decision Tree Analysis to Risk Factors Associated with Pressure Ulcers in Long-Term Care Facilities

    PubMed Central

    Moon, Mikyung

    2017-01-01

    Objectives The purpose of this study was to use decision tree analysis to explore the factors associated with pressure ulcers (PUs) among elderly people admitted to Korean long-term care facilities. Methods The data were extracted from the 2014 National Inpatient Sample (NIS)—data of Health Insurance Review and Assessment Service (HIRA). A MapReduce-based program was implemented to join and filter 5 tables of the NIS. The outcome predicted by the decision tree model was the prevalence of PUs as defined by the Korean Standard Classification of Disease-7 (KCD-7; code L89*). Using R 3.3.1, a decision tree was generated with the finalized 15,856 cases and 830 variables. Results The decision tree displayed 15 subgroups with 8 variables showing 0.804 accuracy, 0.820 sensitivity, and 0.787 specificity. The most significant primary predictor of PUs was length of stay less than 0.5 day. Other predictors were the presence of an infectious wound dressing, followed by having diagnoses numbering less than 3.5 and the presence of a simple dressing. Among diagnoses, “injuries to the hip and thigh” was the top predictor ranking 5th overall. Total hospital cost exceeding 2,200,000 Korean won (US $2,000) rounded out the top 7. Conclusions These results support previous studies that showed length of stay, comorbidity, and total hospital cost were associated with PUs. Moreover, wound dressings were commonly used to treat PUs. They also show that machine learning, such as a decision tree, could effectively predict PUs using big data. PMID:28261530

  11. Application Of Decision Tree Approach To Student Selection Model- A Case Study

    NASA Astrophysics Data System (ADS)

    Harwati; Sudiya, Amby

    2016-01-01

    The main purpose of the institution is to provide quality education to the students and to improve the quality of managerial decisions. One of the ways to improve the quality of students is to arrange the selection of new students with a more selective. This research takes the case in the selection of new students at Islamic University of Indonesia, Yogyakarta, Indonesia. One of the university's selection is through filtering administrative selection based on the records of prospective students at the high school without paper testing. Currently, that kind of selection does not yet has a standard model and criteria. Selection is only done by comparing candidate application file, so the subjectivity of assessment is very possible to happen because of the lack standard criteria that can differentiate the quality of students from one another. By applying data mining techniques classification, can be built a model selection for new students which includes criteria to certain standards such as the area of origin, the status of the school, the average value and so on. These criteria are determined by using rules that appear based on the classification of the academic achievement (GPA) of the students in previous years who entered the university through the same way. The decision tree method with C4.5 algorithm is used here. The results show that students are given priority for admission is that meet the following criteria: came from the island of Java, public school, majoring in science, an average value above 75, and have at least one achievement during their study in high school.

  12. A mutual information-Dempster-Shafer based decision ensemble system for land cover classification of hyperspectral data

    NASA Astrophysics Data System (ADS)

    Pahlavani, Parham; Bigdeli, Behnaz

    2016-12-01

    Hyperspectral images contain extremely rich spectral information that offer great potential to discriminate between various land cover classes. However, these images are usually composed of tens or hundreds of spectrally close bands, which result in high redundancy and great amount of computation time in hyperspectral classification. Furthermore, in the presence of mixed coverage pixels, crisp classifiers produced errors, omission and commission. This paper presents a mutual information-Dempster-Shafer system through an ensemble classification approach for classification of hyperspectral data. First, mutual information is applied to split data into a few independent partitions to overcome high dimensionality. Then, a fuzzy maximum likelihood classifies each band subset. Finally, Dempster-Shafer is applied to fuse the results of the fuzzy classifiers. In order to assess the proposed method, a crisp ensemble system based on a support vector machine as the crisp classifier and weighted majority voting as the crisp fusion method are applied on hyperspectral data. Furthermore, a dimension reduction system is utilized to assess the effectiveness of mutual information band splitting of the proposed method. The proposed methodology provides interesting conclusions on the effectiveness and potentiality of mutual information-Dempster-Shafer based classification of hyperspectral data.

  13. Smart on-board diagnostic decision trees for quantitative aviation equipment and safety procedures validation

    NASA Astrophysics Data System (ADS)

    Ali, Ali H.; Markarian, Garik; Tarter, Alex; Kölle, Rainer

    2010-04-01

    The current trend in high-accuracy aircraft navigation systems is towards using data from one or more inertial navigation subsystem and one or more navigational reference subsystems. The enhancement in fault diagnosis and detection is achieved via computing the minimum mean square estimate of the aircraft states using, for instance, Kalman filter method. However, this enhancement might degrade if the cause of a subsystem fault has some effect on other subsystems that are calculating the same measurement. One instance of such case is the tragic incident of Air France Flight 447 in June, 2009 where message transmissions in the last moment before the crash indicated inconsistencies in measured airspeed as reported by Airbus. In this research, we propose the use of mathematical aircraft model to work out the current states of the airplane and in turn, using these states to validate the readings of the navigation equipment throughout smart diagnostic decision tree network. Various simulated equipment failures have been introduced in a controlled environment to proof the concept of operation. The results have showed successful detection of the failing equipment in all cases.

  14. A Low Complexity System Based on Multiple Weighted Decision Trees for Indoor Localization.

    PubMed

    Sánchez-Rodríguez, David; Hernández-Morera, Pablo; Quinteiro, José Ma; Alonso-González, Itziar

    2015-06-23

    Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity. The localization system is built using a dataset from sensor fusion, which combines the strength of radio signals from different wireless local area network access points and device orientation information from a digital compass built-in mobile device, so that extra sensors are unnecessary. Experimental results indicate that the proposed system leads to substantial improvements on computational complexity over the widely-used traditional fingerprinting methods, and it has a better accuracy than they have.

  15. Effect of training characteristics on object classification: An application using Boosted Decision Trees

    NASA Astrophysics Data System (ADS)

    Sevilla-Noarbe, I.; Etayo-Sotos, P.

    2015-06-01

    We present an application of a particular machine-learning method (Boosted Decision Trees, BDTs using AdaBoost) to separate stars and galaxies in photometric images using their catalog characteristics. BDTs are a well established machine learning technique used for classification purposes. They have been widely used specially in the field of particle and astroparticle physics, and we use them here in an optical astronomy application. This algorithm is able to improve from simple thresholding cuts on standard separation variables that may be affected by local effects such as blending, badly calculated background levels or which do not include information in other bands. The improvements are shown using the Sloan Digital Sky Survey Data Release 9, with respect to the type photometric classifier. We obtain an improvement in the impurity of the galaxy sample of a factor 2-4 for this particular dataset, adjusting for the same efficiency of the selection. Another main goal of this study is to verify the effects that different input vectors and training sets have on the classification performance, the results being of wider use to other machine learning techniques.

  16. A Low Complexity System Based on Multiple Weighted Decision Trees for Indoor Localization

    PubMed Central

    Sánchez-Rodríguez, David; Hernández-Morera, Pablo; Quinteiro, José Ma.; Alonso-González, Itziar

    2015-01-01

    Indoor position estimation has become an attractive research topic due to growing interest in location-aware services. Nevertheless, satisfying solutions have not been found with the considerations of both accuracy and system complexity. From the perspective of lightweight mobile devices, they are extremely important characteristics, because both the processor power and energy availability are limited. Hence, an indoor localization system with high computational complexity can cause complete battery drain within a few hours. In our research, we use a data mining technique named boosting to develop a localization system based on multiple weighted decision trees to predict the device location, since it has high accuracy and low computational complexity. The localization system is built using a dataset from sensor fusion, which combines the strength of radio signals from different wireless local area network access points and device orientation information from a digital compass built-in mobile device, so that extra sensors are unnecessary. Experimental results indicate that the proposed system leads to substantial improvements on computational complexity over the widely-used traditional fingerprinting methods, and it has a better accuracy than they have. PMID:26110413

  17. Trees

    NASA Astrophysics Data System (ADS)

    Epstein, Henri

    2016-11-01

    An algebraic formalism, developed with V. Glaser and R. Stora for the study of the generalized retarded functions of quantum field theory, is used to prove a factorization theorem which provides a complete description of the generalized retarded functions associated with any tree graph. Integrating over the variables associated to internal vertices to obtain the perturbative generalized retarded functions for interacting fields arising from such graphs is shown to be possible for a large category of space-times.

  18. Large unbalanced credit scoring using Lasso-logistic regression ensemble.

    PubMed

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.

  19. Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

    PubMed Central

    Wang, Hong; Xu, Qingsong; Zhou, Lifeng

    2015-01-01

    Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988

  20. A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements.

    PubMed

    Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.

  1. A Framework for Learning from Distributed Data Using Sufficient Statistics and its Application to Learning Decision Trees

    PubMed Central

    Caragea, Doina; Silvescu, Adrian; Honavar, Vasant

    2009-01-01

    This paper motivates and precisely formulates the problem of learning from distributed data; describes a general strategy for transforming traditional machine learning algorithms into algorithms for learning from distributed data; demonstrates the application of this strategy to devise algorithms for decision tree induction from distributed data; and identifies the conditions under which the algorithms in the distributed setting are superior to their centralized counterparts in terms of time and communication complexity; The resulting algorithms are provably exact in that the decision tree constructed from distributed data is identical to that obtained in the centralized setting. Some natural extensions leading to algorithms for learning from heterogeneous distributed data and learning under privacy constraints are outlined. PMID:20351798

  2. A Hybrid Approach of Stepwise Regression, Logistic Regression, Support Vector Machine, and Decision Tree for Forecasting Fraudulent Financial Statements

    PubMed Central

    Goo, Yeong-Jia James; Shen, Zone-De

    2014-01-01

    As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338

  3. Identification of Potential Sources of Mercury (Hg) in Farmland Soil Using a Decision Tree Method in China

    PubMed Central

    Zhong, Taiyang; Chen, Dongmei; Zhang, Xiuying

    2016-01-01

    Identification of the sources of soil mercury (Hg) on the provincial scale is helpful for enacting effective policies to prevent further contamination and take reclamation measurements. The natural and anthropogenic sources and their contributions of Hg in Chinese farmland soil were identified based on a decision tree method. The results showed that the concentrations of Hg in parent materials were most strongly associated with the general spatial distribution pattern of Hg concentration on a provincial scale. The decision tree analysis gained an 89.70% total accuracy in simulating the influence of human activities on the additions of Hg in farmland soil. Human activities—for example, the production of coke, application of fertilizers, discharge of wastewater, discharge of solid waste, and the production of non-ferrous metals—were the main external sources of a large amount of Hg in the farmland soil. PMID:27834884

  4. Ensembl comparative genomics resources.

    PubMed

    Herrero, Javier; Muffato, Matthieu; Beal, Kathryn; Fitzgerald, Stephen; Gordon, Leo; Pignatelli, Miguel; Vilella, Albert J; Searle, Stephen M J; Amode, Ridwan; Brent, Simon; Spooner, William; Kulesha, Eugene; Yates, Andrew; Flicek, Paul

    2016-01-01

    Evolution provides the unifying framework with which to understand biology. The coherent investigation of genic and genomic data often requires comparative genomics analyses based on whole-genome alignments, sets of homologous genes and other relevant datasets in order to evaluate and answer evolutionary-related questions. However, the complexity and computational requirements of producing such data are substantial: this has led to only a small number of reference resources that are used for most comparative analyses. The Ensembl comparative genomics resources are one such reference set that facilitates comprehensive and reproducible analysis of chordate genome data. Ensembl computes pairwise and multiple whole-genome alignments from which large-scale synteny, per-base conservation scores and constrained elements are obtained. Gene alignments are used to define Ensembl Protein Families, GeneTrees and homologies for both protein-coding and non-coding RNA genes. These resources are updated frequently and have a consistent informatics infrastructure and data presentation across all supported species. Specialized web-based visualizations are also available including synteny displays, collapsible gene tree plots, a gene family locator and different alignment views. The Ensembl comparative genomics infrastructure is extensively reused for the analysis of non-vertebrate species by other projects including Ensembl Genomes and Gramene and much of the information here is relevant to these projects. The consistency of the annotation across species and the focus on vertebrates makes Ensembl an ideal system to perform and support vertebrate comparative genomic analyses. We use robust software and pipelines to produce reference comparative data and make it freely available. Database URL: http://www.ensembl.org.

  5. ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

    NASA Astrophysics Data System (ADS)

    Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

    2016-07-01

    Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.

  6. Chi-squared Automatic Interaction Detection Decision Tree Analysis of Risk Factors for Infant Anemia in Beijing, China

    PubMed Central

    Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin

    2016-01-01

    Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328

  7. A comparison of artificial neural net and inductive decision tree learning applied to the diagnosis of coronary artery disease

    SciTech Connect

    Silver, D.L.; Hurwitz, G.A.; Cradduck, T.D.

    1994-05-01

    A variety of artificial intelligence systems are available for applications within nuclear medicine. It is important to understand the strengths and weaknesses of these systems and the class of problems for which each is best. Two supervised machine learning systems, a back propagation neural network and an inductive decision tree, were applied to the classification of coronary artery disease given a set of diagnostic input parameters. A comparison indicates that both paradigms perform well depending upon the requirements of the user. We examined the setup complexity, learning and classification speed, training accuracy, ability to generalize to previously unseen cases, and the explanatory power of the internal representations generated by the learning systems. A database of 503 patient records composed of ten parameters was used for the analysis. The target response was a binary value of disease or no disease. The results indicate that the inductive decision tree learning system is the better choice for this class of problem. It is easier to setup and training takes less time. It has good explanatory power since it produces a printed decision tree of the internal representation of acquired knowledge. On the other hand, the artificial neural net provides better generalization for new test cases, and has greater classification accuracy.

  8. Prediction of Severe Acute Pancreatitis Using a Decision Tree Model Based on the Revised Atlanta Classification of Acute Pancreatitis

    PubMed Central

    Zhang, Yushun; Yang, Chong; Gou, Shanmiao; Li, Yongfeng; Xiong, Jiongxin; Wu, Heshui; Wang, Chunyou

    2015-01-01

    Objective To develop a model for the early prediction of severe acute pancreatitis based on the revised Atlanta classification of acute pancreatitis. Methods Clinical data of 1308 patients with acute pancreatitis (AP) were included in the retrospective study. A total of 603 patients who were admitted to the hospital within 36 hours of the onset of the disease were included at last according to the inclusion criteria. The clinical data were collected within 12 hours after admission. All the patients were classified as having mild acute pancreatitis (MAP), moderately severe acute pancreatitis (MSAP) and severe acute pancreatitis (SAP) based on the revised Atlanta classification of acute pancreatitis. All the 603 patients were randomly divided into training group (402 cases) and test group (201 cases). Univariate and multiple regression analyses were used to identify the independent risk factors for the development of SAP in the training group. Then the prediction model was constructed using the decision tree method, and this model was applied to the test group to evaluate its validity. Results The decision tree model was developed using creatinine, lactate dehydrogenase, and oxygenation index to predict SAP. The diagnostic sensitivity and specificity of SAP in the training group were 80.9% and 90.0%, respectively, and the sensitivity and specificity in the test group were 88.6% and 90.4%, respectively. Conclusions The decision tree model based on creatinine, lactate dehydrogenase, and oxygenation index is more likely to predict the occurrence of SAP. PMID:26580397

  9. The Bump Hunting by the Decision Tree with the Genetic Algorithm

    NASA Astrophysics Data System (ADS)

    Hirose, Hideo

    In difficult classification problems of the z-dimensional points into two groups giving 0-1 responses due to the messy data structure, it is more favorable to search for the denser regions for the response 1 points than to find the boundaries to separate the two groups. For such problems which can often be seen in customer databases, we have developed a bump hunting method using probabilistic and statistical methods as shown in the previous study. By specifying a pureness rate in advance, a maximum capture rate will be obtained. In finding the maximum capture rate, we have used the decision tree method combined with the genetic algorithm. Then, a trade-off curve between the pureness rate and the capture rate can be constructed. However, such a trade-off curve could be optimistic if the training data set alone is used. Therefore, we should be careful in assessing the accuracy of the tradeoff curve. Using the accuracy evaluation procedures such as the cross validation or the bootstrapped hold-out method combined with the training and test data sets, we have shown that the actually applicable trade-off curve can be obtained. We have also shown that an attainable upper bound trade-off curve can be estimated by using the extreme-value statistics because the genetic algorithm provides many local maxima of the capture rates with different initial values. We have constructed the three kinds of trade-off curves; the first is the curve obtained by using the training data; the second is the return capture rate curve obtained by using the extreme-value statistics; the last is the curve obtained by using the test data. These three are indispensable like the Trinity to comprehend the whole figure of the trade-off curve between the pureness rate and the capture rate. This paper deals with the behavior of the trade-off curve from a statistical viewpoint.

  10. Effective Visualization of Temporal Ensembles.

    PubMed

    Hao, Lihua; Healey, Christopher G; Bass, Steffen A

    2016-01-01

    An ensemble is a collection of related datasets, called members, built from a series of runs of a simulation or an experiment. Ensembles are large, temporal, multidimensional, and multivariate, making them difficult to analyze. Another important challenge is visualizing ensembles that vary both in space and time. Initial visualization techniques displayed ensembles with a small number of members, or presented an overview of an entire ensemble, but without potentially important details. Recently, researchers have suggested combining these two directions, allowing users to choose subsets of members to visualization. This manual selection process places the burden on the user to identify which members to explore. We first introduce a static ensemble visualization system that automatically helps users locate interesting subsets of members to visualize. We next extend the system to support analysis and visualization of temporal ensembles. We employ 3D shape comparison, cluster tree visualization, and glyph based visualization to represent different levels of detail within an ensemble. This strategy is used to provide two approaches for temporal ensemble analysis: (1) segment based ensemble analysis, to capture important shape transition time-steps, clusters groups of similar members, and identify common shape changes over time across multiple members; and (2) time-step based ensemble analysis, which assumes ensemble members are aligned in time by combining similar shapes at common time-steps. Both approaches enable users to interactively visualize and analyze a temporal ensemble from different perspectives at different levels of detail. We demonstrate our techniques on an ensemble studying matter transition from hadronic gas to quark-gluon plasma during gold-on-gold particle collisions.

  11. Development and Validation of a Primary Care-Based Family Health History and Decision Support Program (MeTree)

    PubMed Central

    Orlando, Lori A.; Buchanan, Adam H.; Hahn, Susan E.; Christianson, Carol A.; Powell, Karen P.; Skinner, Celette Sugg; Chesnut, Blair; Blach, Colette; Due, Barbara; Ginsburg, Geoffrey S.; Henrich, Vincent C.

    2016-01-01

    INTRODUCTION Family health history is a strong predictor of disease risk. To reduce the morbidity and mortality of many chronic diseases, risk-stratified evidence-based guidelines strongly encourage the collection and synthesis of family health history to guide selection of primary prevention strategies. However, the collection and synthesis of such information is not well integrated into clinical practice. To address barriers to collection and use of family health histories, the Genomedical Connection developed and validated MeTree, a Web-based, patient-facing family health history collection and clinical decision support tool. MeTree is designed for integration into primary care practices as part of the genomic medicine model for primary care. METHODS We describe the guiding principles, operational characteristics, algorithm development, and coding used to develop MeTree. Validation was performed through stakeholder cognitive interviewing, a genetic counseling pilot program, and clinical practice pilot programs in 2 community-based primary care clinics. RESULTS Stakeholder feedback resulted in changes to MeTree’s interface and changes to the phrasing of clinical decision support documents. The pilot studies resulted in the identification and correction of coding errors and the reformatting of clinical decision support documents. MeTree’s strengths in comparison with other tools are its seamless integration into clinical practice and its provision of action-oriented recommendations guided by providers’ needs. LIMITATIONS The tool was validated in a small cohort. CONCLUSION MeTree can be integrated into primary care practices to help providers collect and synthesize family health history information from patients with the goal of improving adherence to risk-stratified evidence-based guidelines. PMID:24044145

  12. Grassland gross carbon dioxide uptake based on an improved model tree ensemble approach considering human interventions: global estimation and covariation with climate.

    PubMed

    Liang, Wei; Lü, Yihe; Zhang, Weibin; Li, Shuai; Jin, Zhao; Ciais, Philippe; Fu, Bojie; Wang, Shuai; Yan, Jianwu; Li, Junyi; Su, Huimin

    2016-12-14

    Grassland ecosystems act as a crucial role in the global carbon cycle and provide vital ecosystem services for many species. However, these low-productivity and water-limited ecosystems are sensitive and vulnerable to climate perturbations and human intervention, the latter of which is often not considered due to lack of spatial information regarding the grassland management. Here by the application of a model tree ensemble (MTE-GRASS) trained on local eddy covariance data and using as predictors gridded climate and management intensity field (grazing and cutting), we first provide an estimate of global grassland gross primary production (GPP). GPP from our study compares well (modeling efficiency NSE = 0.85 spatial; NSE between 0.69 and 0.94 interannual) with that from flux measurement. Global grassland GPP was on average 11 ± 0.31 Pg C yr(-1) and exhibited significantly increasing trend at both annual and seasonal scales, with an annual increase of 0.023 Pg C (0.2%) from 1982 to 2011. Meanwhile, we found that at both annual and seasonal scale, the trend (except for northern summer) and interannual variability of the GPP are primarily driven by arid/semiarid ecosystems, the latter of which is due to the larger variation in precipitation. Grasslands in arid/semiarid regions have a stronger (33 g C m(-2)  yr(-1) /100 mm) and faster (0- to 1-month time lag) response to precipitation than those in other regions. Although globally spatial gradients (71%) and interannual changes (51%) in GPP were mainly driven by precipitation, where most regions with arid/semiarid climate zone, temperature and radiation together shared half of GPP variability, which is mainly distributed in the high-latitude or cold regions. Our findings and the results of other studies suggest the overwhelming importance of arid/semiarid regions as a control on grassland ecosystems carbon cycle. Similarly, under the projected future climate change, grassland ecosystems in these regions

  13. The creation of a digital soil map for Cyprus using decision-tree classification techniques

    NASA Astrophysics Data System (ADS)

    Camera, Corrado; Zomeni, Zomenia; Bruggeman, Adriana; Noller, Joy; Zissimos, Andreas

    2014-05-01

    Considering the increasing threats soil are experiencing especially in semi-arid, Mediterranean environments like Cyprus (erosion, contamination, sealing and salinisation), producing a high resolution, reliable soil map is essential for further soil conservation studies. This study aims to create a 1:50.000 soil map covering the area under the direct control of the Republic of Cyprus (5.760 km2). The study consists of two major steps. The first is the creation of a raster database of predictive variables selected according to the scorpan formula (McBratney et al., 2003). It is of particular interest the possibility of using, as soil properties, data coming from three older island-wide soil maps and the recently published geochemical atlas of Cyprus (Cohen et al., 2011). Ten highly characterizing elements were selected and used as predictors in the present study. For the other factors usual variables were used: temperature and aridity index for climate; total loss on ignition, vegetation and forestry types maps for organic matter; the DEM and related relief derivatives (slope, aspect, curvature, landscape units); bedrock, surficial geology and geomorphology (Noller, 2009) for parent material and age; and a sub-watershed map to better bound location related to parent material sources. In the second step, the digital soil map is created using the Random Forests package in R. Random Forests is a decision tree classification technique where many trees, instead of a single one, are developed and compared to increase the stability and the reliability of the prediction. The model is trained and verified on areas where a 1:25.000 published soil maps obtained from field work is available and then it is applied for predictive mapping to the other areas. Preliminary results obtained in a small area in the plain around the city of Lefkosia, where eight different soil classes are present, show very good capacities of the method. The Ramdom Forest approach leads to reproduce soil

  14. Ensemble Models

    EPA Science Inventory

    Ensemble forecasting has been used for operational numerical weather prediction in the United States and Europe since the early 1990s. An ensemble of weather or climate forecasts is used to characterize the two main sources of uncertainty in computer models of physical systems: ...

  15. Using decision trees to predict benthic communities within and near the German Exclusive Economic Zone (EEZ) of the North Sea.

    PubMed

    Pesch, Roland; Pehlke, Hendrik; Jerosch, Kerstin; Schröder, Winfried; Schlüter, Michael

    2008-01-01

    In this article a concept is described in order to predict and map the occurrence of benthic communities within and near the German Exclusive Economic Zone (EEZ) of the North Sea. The approach consists of two work steps: (1) geostatistical analysis of abiotic measurement data and (2) calculation of benthic provinces by means of Classification and Regression Trees (CART) and GIS-techniques. From bottom water measurements on salinity, temperature, silicate and nutrients as well as from punctual data on grain size ranges (0-20, 20-63, 63-2,000 mu) raster maps were calculated by use of geostatistical methods. At first the autocorrelation structure was examined and modelled with help of variogram analysis. The resulting variogram models were then used to calculate raster maps by applying ordinary kriging procedures. After intersecting these raster maps with punctual data on eight benthic communities a decision tree was derived to predict the occurrence of these communities within the study area. Since such a CART tree corresponds to a hierarchically ordered set of decision rules it was applied to the geostatistically estimated raster data to predict benthic habitats within and near the EEZ.

  16. Analysis of the impact of recreational trail usage for prioritising management decisions: a regression tree approach

    NASA Astrophysics Data System (ADS)

    Tomczyk, Aleksandra; Ewertowski, Marek; White, Piran; Kasprzak, Leszek

    2016-04-01

    The dual role of many Protected Natural Areas in providing benefits for both conservation and recreation poses challenges for management. Although recreation-based damage to ecosystems can occur very quickly, restoration can take many years. The protection of conservation interests at the same as providing for recreation requires decisions to be made about how to prioritise and direct management actions. Trails are commonly used to divert visitors from the most important areas of a site, but high visitor pressure can lead to increases in trail width and a concomitant increase in soil erosion. Here we use detailed field data on condition of recreational trails in Gorce National Park, Poland, as the basis for a regression tree analysis to determine the factors influencing trail deterioration, and link specific trail impacts with environmental, use related and managerial factors. We distinguished 12 types of trails, characterised by four levels of degradation: (1) trails with an acceptable level of degradation; (2) threatened trails; (3) damaged trails; and (4) heavily damaged trails. Damaged trails were the most vulnerable of all trails and should be prioritised for appropriate conservation and restoration. We also proposed five types of monitoring of recreational trail conditions: (1) rapid inventory of negative impacts; (2) monitoring visitor numbers and variation in type of use; (3) change-oriented monitoring focusing on sections of trail which were subjected to changes in type or level of use or subjected to extreme weather events; (4) monitoring of dynamics of trail conditions; and (5) full assessment of trail conditions, to be carried out every 10-15 years. The application of the proposed framework can enhance the ability of Park managers to prioritise their trail management activities, enhancing trail conditions and visitor safety, while minimising adverse impacts on the conservation value of the ecosystem. A.M.T. was supported by the Polish Ministry of

  17. Cost-Effectiveness of a new Rotavirus Vaccination Program in Pakistan: a Decision Tree Model

    PubMed Central

    Patel, Hiten D.; Roberts, Eric T.; Constenla, Dagna O.

    2013-01-01

    Background Rotavirus gastroenteritis places a significant health and economic burden on Pakistan. To determine the public health impact of a national rotavirus vaccination program, we performed a cost-effectiveness study from the perspective of the health care system. Methods A decision tree model was developed to assess the cost-effectiveness of a national vaccination program in Pakistan. Disease and cost burden with the program were compared to the current state. Disease parameters, vaccine-related costs, and medical treatment costs were based on published epidemiological and economic data, which were specific to Pakistan when possible. An annual birth cohort of children was followed for 5 years to model the public health impact of vaccination on health-related events and costs. The cost-effectiveness was assessed and quantified in cost (2012 US$) per disability-adjusted life-year (DALY) averted and cost per death averted. Sensitivity analyses were performed to assess the robustness of the incremental cost-effectiveness ratios (ICERs). Results The base case results showed vaccination prevented 1.2 million cases of rotavirus gastroenteritis, 93,000 outpatient visits, 43,000 hospitalizations, and 6,700 deaths by 5 years of age for an annual birth cohort scaled from 6% current coverage to DPT3 levels (85%). The medical cost savings would be US$1.4 million from hospitalizations and US$200,000 from outpatient visit costs. The vaccination program would cost US$35 million at a vaccine price of US$5.00. The ICER was US$149.50 per DALY averted or US$4,972 per death averted. Sensitivity analyses showed changes in case-fatality ratio, vaccine efficacy, and vaccine cost exerted the greatest influence on the ICER. Conclusions Across a range of sensitivity analyses, a national rotavirus vaccination program was predicted to decrease health and economic burden due to rotavirus gastroenteritis in Pakistan by ~40%. Vaccination was highly cost-effective in this context. As

  18. Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes

    PubMed Central

    2013-01-01

    Background Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets. Results A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups. Conclusions The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease. PMID:24188919

  19. Multiclass Cancer Classification by Using Fuzzy Support Vector Machine and Binary Decision Tree With Gene Selection

    PubMed Central

    2005-01-01

    We investigate the problems of multiclass cancer classification with gene selection from gene expression data. Two different constructed multiclass classifiers with gene selection are proposed, which are fuzzy support vector machine (FSVM) with gene selection and binary classification tree based on SVM with gene selection. Using F test and recursive feature elimination based on SVM as gene selection methods, binary classification tree based on SVM with F test, binary classification tree based on SVM with recursive feature elimination based on SVM, and FSVM with recursive feature elimination based on SVM are tested in our experiments. To accelerate computation, preselecting the strongest genes is also used. The proposed techniques are applied to analyze breast cancer data, small round blue-cell tumors, and acute leukemia data. Compared to existing multiclass cancer classifiers and binary classification tree based on SVM with F test or binary classification tree based on SVM with recursive feature elimination based on SVM mentioned in this paper, FSVM based on recursive feature elimination based on SVM can find most important genes that affect certain types of cancer with high recognition accuracy. PMID:16046822

  20. Use of CHAID Decision Trees to Formulate Pathways for the Early Detection of Metabolic Syndrome in Young Adults

    PubMed Central

    Liu, Pei-Yang

    2014-01-01

    Metabolic syndrome (MetS) in young adults (age 20–39) is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID) decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES) 2009-2010 Cohort as a representative sample of the United States population (n = 745). Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP) classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS. PMID:24817904

  1. Use of CHAID decision trees to formulate pathways for the early detection of metabolic syndrome in young adults.

    PubMed

    Miller, Brian; Fridline, Mark; Liu, Pei-Yang; Marino, Deborah

    2014-01-01

    Metabolic syndrome (MetS) in young adults (age 20-39) is often undiagnosed. A simple screening tool using a surrogate measure might be invaluable in the early detection of MetS. Methods. A chi-squared automatic interaction detection (CHAID) decision tree analysis with waist circumference user-specified as the first level was used to detect MetS in young adults using data from the National Health and Nutrition Examination Survey (NHANES) 2009-2010 Cohort as a representative sample of the United States population (n = 745). Results. Twenty percent of the sample met the National Cholesterol Education Program Adult Treatment Panel III (NCEP) classification criteria for MetS. The user-specified CHAID model was compared to both CHAID model with no user-specified first level and logistic regression based model. This analysis identified waist circumference as a strong predictor in the MetS diagnosis. The accuracy of the final model with waist circumference user-specified as the first level was 92.3% with its ability to detect MetS at 71.8% which outperformed comparison models. Conclusions. Preliminary findings suggest that young adults at risk for MetS could be identified for further followup based on their waist circumference. Decision tree methods show promise for the development of a preliminary detection algorithm for MetS.

  2. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections.

    PubMed

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-03-10

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule's overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context.

  3. Lessons Learned from Applications of a Climate Change Decision Tree toWater System Projects in Kenya and Nepal

    NASA Astrophysics Data System (ADS)

    Ray, P. A.; Bonzanigo, L.; Taner, M. U.; Wi, S.; Yang, Y. C. E.; Brown, C.

    2015-12-01

    The Decision Tree Framework developed for the World Bank's Water Partnership Program provides resource-limited project planners and program managers with a cost-effective and effort-efficient, scientifically defensible, repeatable, and clear method for demonstrating the robustness of a project to climate change. At the conclusion of this process, the project planner is empowered to confidently communicate the method by which the vulnerabilities of the project have been assessed, and how the adjustments that were made (if any were necessary) improved the project's feasibility and profitability. The framework adopts a "bottom-up" approach to risk assessment that aims at a thorough understanding of a project's vulnerabilities to climate change in the context of other nonclimate uncertainties (e.g., economic, environmental, demographic, political). It helps identify projects that perform well across a wide range of potential future climate conditions, as opposed to seeking solutions that are optimal in expected conditions but fragile to conditions deviating from the expected. Lessons learned through application of the Decision Tree to case studies in Kenya and Nepal will be presented, and aspects of the framework requiring further refinement will be described.

  4. Mapping potential carbon and timber losses from hurricanes using a decision tree and ecosystem services driver model.

    PubMed

    Delphin, S; Escobedo, F J; Abd-Elrahman, A; Cropper, W

    2013-11-15

    Information on the effect of direct drivers such as hurricanes on ecosystem services is relevant to landowners and policy makers due to predicted effects from climate change. We identified forest damage risk zones due to hurricanes and estimated the potential loss of 2 key ecosystem services: aboveground carbon storage and timber volume. Using land cover, plot-level forest inventory data, the Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) model, and a decision tree-based framework; we determined potential damage to subtropical forests from hurricanes in the Lower Suwannee River (LS) and Pensacola Bay (PB) watersheds in Florida, US. We used biophysical factors identified in previous studies as being influential in forest damage in our decision tree and hurricane wind risk maps. Results show that 31% and 0.5% of the total aboveground carbon storage in the LS and PB, respectively was located in high forest damage risk (HR) zones. Overall 15% and 0.7% of the total timber net volume in the LS and PB, respectively, was in HR zones. This model can also be used for identifying timber salvage areas, developing ecosystem service provision and management scenarios, and assessing the effect of other drivers on ecosystem services and goods.

  5. Procalcitonin and C-reactive protein-based decision tree model for distinguishing PFAPA flares from acute infections

    PubMed Central

    Kraszewska-Głomba, Barbara; Szymańska-Toczek, Zofia; Szenborn, Leszek

    2016-01-01

    As no specific laboratory test has been identified, PFAPA (periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis) remains a diagnosis of exclusion. We searched for a practical use of procalcitonin (PCT) and C-reactive protein (CRP) in distinguishing PFAPA attacks from acute bacterial and viral infections. Levels of PCT and CRP were measured in 38 patients with PFAPA and 81 children diagnosed with an acute bacterial (n=42) or viral (n=39) infection. Statistical analysis with the use of the C4.5 algorithm resulted in the following decision tree: viral infection if CRP≤19.1 mg/L; otherwise for cases with CRP>19.1 mg/L: bacterial infection if PCT>0.65ng/mL, PFAPA if PCT≤0.65 ng/mL. The model was tested using a 10-fold cross validation and in an independent test cohort (n=30), the rule’s overall accuracy was 76.4% and 90% respectively. Although limited by a small sample size, the obtained decision tree might present a potential diagnostic tool for distinguishing PFAPA flares from acute infections when interpreted cautiously and with reference to the clinical context. PMID:27131024

  6. Comparison of decision tree-fuzzy and rough set-fuzzy methods for fault categorization of mono-block centrifugal pump

    NASA Astrophysics Data System (ADS)

    Sakthivel, N. R.; Sugumaran, V.; Nair, Binoy. B.

    2010-08-01

    Mono-block centrifugal pumps are widely used in a variety of applications. In many applications the role of mono-block centrifugal pump is critical and condition monitoring is essential. Vibration based continuous monitoring and analysis using machine learning approach is gaining momentum. Particularly, artificial neural networks, fuzzy logic have been employed for continuous monitoring and fault diagnosis. This paper presents the use of decision tree and rough sets to generate the rules from statistical features extracted from vibration signals under good and faulty conditions of a mono-block centrifugal pump. A fuzzy classifier is built using decision tree and rough set rules and tested using test data. The results obtained using decision tree rules and those obtained using rough set rules are compared. Finally, the accuracy of a principle component analysis based decision tree-fuzzy system is also evaluated. The study reveals that overall classification accuracy obtained by the decision tree-fuzzy hybrid system is to some extent better than the rough set-fuzzy hybrid system.

  7. Refined estimation of solar energy potential on roof areas using decision trees on CityGML-data

    NASA Astrophysics Data System (ADS)

    Baumanns, K.; Löwner, M.-O.

    2009-04-01

    We present a decision tree for a refined solar energy plant potential estimation on roof areas using the exchange format CityGML. Compared to raster datasets CityGML-data holds geometric and semantic information of buildings and roof areas in more detail. In addition to shadowing effects ownership structures and lifetime of roof areas can be incorporated into the valuation. Since the Renewable Energy Sources Act came into force in Germany in 2000, private house owners and municipals raise attention to the production of green electricity. At this the return on invest depends on the statutory price per Watt, the initial costs of the solar energy plant, its lifetime, and the real production of this installation. The latter depends on the radiation that is obtained from and the size of the solar energy plant. In this context the exposition and slope of the roof area is as important as building parts like chimneys or dormers that might shadow parts of the roof. Knowing the controlling factors a decision tree can be created to support a beneficial deployment of a solar energy plant. Also sufficient data has to be available. Airborne raster datasets can only support a coarse estimation of the solar energy potential of roof areas. While they carry no semantically information, even roof installations are hardly to identify. CityGML as an Open Geospatial Consortium standard is an interoperable exchange data format for virtual 3-dimensional Cities. Based on international standards it holds the aforementioned geometric properties as well as semantically information. In Germany many Cities are on the way to provide CityGML dataset, e. g. Berlin. Here we present a decision tree that incorporates geometrically as well as semantically demands for a refined estimation of the solar energy potential on roof areas. Based on CityGML's attribute lists we consider geometries of roofs and roof installations as well as global radiation which can be derived e. g. from the European Solar

  8. The relation of student behavior, peer status, race, and gender to decisions about school discipline using CHAID decision trees and regression modeling.

    PubMed

    Horner, Stacy B; Fireman, Gary D; Wang, Eugene W

    2010-04-01

    Peer nominations and demographic information were collected from a diverse sample of 1493 elementary school participants to examine behavior (overt and relational aggression, impulsivity, and prosociality), context (peer status), and demographic characteristics (race and gender) as predictors of teacher and administrator decisions about discipline. Exploratory results using classification tree analyses indicated students nominated as average or highly overtly aggressive were more likely to be disciplined than others. Among these students, race was the most significant predictor, with African American students more likely to be disciplined than Caucasians, Hispanics, or Others. Among the students nominated as low in overt aggression, a lack of prosocial behavior was the most significant predictor. Confirmatory analysis using hierarchical logistic regression supported the exploratory results. Similarities with other biased referral patterns, proactive classroom management strategies, and culturally sensitive recommendations are discussed.

  9. A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations

    PubMed Central

    Rood, Richard B.

    2016-01-01

    Abstract An object‐based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smooth topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small‐scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales. PMID:28239437

  10. A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations.

    PubMed

    Soner Yorgun, M; Rood, Richard B

    2016-12-01

    An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smooth topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.

  11. Contrasting determinants for the introduction and establishment success of exotic birds in Taiwan using decision trees models

    PubMed Central

    Liang, Shih-Hsiung; Walther, Bruno Andreas

    2017-01-01

    Background Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. Methods We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. Results The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables

  12. Decision-tree model for predicting outcomes after out-of-hospital cardiac arrest in the emergency department

    PubMed Central

    2013-01-01

    Introduction Estimation of outcomes in patients after out-of-hospital cardiac arrest (OHCA) soon after arrival at the hospital may help clinicians guide in-hospital strategies, particularly in the emergency department. This study aimed to develop a simple and generally applicable bedside model for predicting outcomes after cardiac arrest. Methods We analyzed data for 390,226 adult patients who had undergone OHCA, from a prospectively recorded nationwide Utstein-style Japanese database for 2005 through 2009. The primary end point was survival with favorable neurologic outcome (cerebral performance category (CPC) scale, categories 1 to 2 [CPC 1 to 2]) at 1 month. The secondary end point was survival at 1 month. We developed a decision-tree prediction model by using data from a 4-year period (2005 through 2008, n = 307,896), with validation by using external data from 2009 (n = 82,330). Results Recursive partitioning analysis of the development cohort for 10 predictors indicated that the best single predictor for survival and CPC 1 to 2 was shockable initial rhythm. The next predictors for patients with shockable initial rhythm were age (<70 years) followed by witnessed arrest and age (>70 years) followed by arrest witnessed by emergency medical services (EMS) personnel. For patients with unshockable initial rhythm, the next best predictor was witnessed arrest. A simple decision-tree prediction mode permitted stratification into four prediction groups: good, moderately good, poor, and absolutely poor. This model identified patient groups with a range from 1.2% to 30.2% for survival and from 0.3% to 23.2% for CPC 1 to 2 probabilities. Similar results were observed when this model was applied to the validation cohort. Conclusions On the basis of a decision-tree prediction model using four prehospital variables (shockable initial rhythm, age, witnessed arrest, and witnessed by EMS personnel), OHCA patients can be readily stratified into the four groups (good, moderately

  13. Application of decision trees to the analysis of soil radon data for earthquake prediction.

    PubMed

    Zmazek, B; Todorovski, L; Dzeroski, S; Vaupotic, J; Kobal, I

    2003-06-01

    Different regression methods have been used to predict radon concentration in soil gas on the basis of environmental data, i.e. barometric pressure, soil temperature, air temperature and rainfall. Analyses of the radon data from three stations in the Krsko basin, Slovenia, have shown that model trees outperform other regression methods. A model has been built which predicts radon concentration with a correlation of 0.8, provided it is influenced only by the environmental parameters. In periods with seismic activity this correlation is much lower. This decrease in predictive accuracy appears 1-7 days before earthquakes with local magnitude 0.8-3.3.

  14. Ensembl 2016

    PubMed Central

    Yates, Andrew; Akanni, Wasiu; Amode, M. Ridwan; Barrell, Daniel; Billis, Konstantinos; Carvalho-Silva, Denise; Cummins, Carla; Clapham, Peter; Fitzgerald, Stephen; Gil, Laurent; Girón, Carlos García; Gordon, Leo; Hourlier, Thibaut; Hunt, Sarah E.; Janacek, Sophie H.; Johnson, Nathan; Juettemann, Thomas; Keenan, Stephen; Lavidas, Ilias; Martin, Fergal J.; Maurel, Thomas; McLaren, William; Murphy, Daniel N.; Nag, Rishi; Nuhn, Michael; Parker, Anne; Patricio, Mateus; Pignatelli, Miguel; Rahtz, Matthew; Riat, Harpreet Singh; Sheppard, Daniel; Taylor, Kieron; Thormann, Anja; Vullo, Alessandro; Wilder, Steven P.; Zadissa, Amonida; Birney, Ewan; Harrow, Jennifer; Muffato, Matthieu; Perry, Emily; Ruffier, Magali; Spudich, Giulietta; Trevanion, Stephen J.; Cunningham, Fiona; Aken, Bronwen L.; Zerbino, Daniel R.; Flicek, Paul

    2016-01-01

    The Ensembl project (http://www.ensembl.org) is a system for genome annotation, analysis, storage and dissemination designed to facilitate the access of genomic annotation from chordates and key model organisms. It provides access to data from 87 species across our main and early access Pre! websites. This year we introduced three newly annotated species and released numerous updates across our supported species with a concentration on data for the latest genome assemblies of human, mouse, zebrafish and rat. We also provided two data updates for the previous human assembly, GRCh37, through a dedicated website (http://grch37.ensembl.org). Our tools, in particular the VEP, have been improved significantly through integration of additional third party data. REST is now capable of larger-scale analysis and our regulatory data BioMart can deliver faster results. The website is now capable of displaying long-range interactions such as those found in cis-regulated datasets. Finally we have launched a website optimized for mobile devices providing views of genes, variants and phenotypes. Our data is made available without restriction and all code is available from our GitHub organization site (http://github.com/Ensembl) under an Apache 2.0 license. PMID:26687719

  15. An approach for automated fault diagnosis based on a fuzzy decision tree and boundary analysis of a reconstructed phase space.

    PubMed

    Aydin, Ilhan; Karakose, Mehmet; Akin, Erhan

    2014-03-01

    Although reconstructed phase space is one of the most powerful methods for analyzing a time series, it can fail in fault diagnosis of an induction motor when the appropriate pre-processing is not performed. Therefore, boundary analysis based a new feature extraction method in phase space is proposed for diagnosis of induction motor faults. The proposed approach requires the measurement of one phase current signal to construct the phase space representation. Each phase space is converted into an image, and the boundary of each image is extracted by a boundary detection algorithm. A fuzzy decision tree has been designed to detect broken rotor bars and broken connector faults. The results indicate that the proposed approach has a higher recognition rate than other methods on the same dataset.

  16. Towards closed-loop deep brain stimulation: decision tree-based essential tremor patient's state classifier and tremor reappearance predictor.

    PubMed

    Shukla, Pitamber; Basu, Ishita; Tuninetti, Daniela

    2014-01-01

    Deep Brain Stimulation (DBS) is a surgical procedure to treat some progressive neurological movement disorders, such as Essential Tremor (ET), in an advanced stage. Current FDA-approved DBS systems operate open-loop, i.e., their parameters are unchanged over time. This work develops a Decision Tree (DT) based algorithm that, by using non-invasively measured surface EMG and accelerometer signals as inputs during DBS-OFF periods, classifies the ET patient's state and then predicts when tremor is about to reappear, at which point DBS is turned ON again for a fixed amount of time. The proposed algorithm achieves an overall accuracy of 93.3% and sensitivity of 97.4%, along with 2.9% false alarm rate. Also, the ratio between predicted tremor delay and the actual detected tremor delay is about 0.93, indicating that tremor prediction is very close to the instant where tremor actually reappeared.

  17. Effective Prediction of Errors by Non-native Speakers Using Decision Tree for Speech Recognition-Based CALL System

    NASA Astrophysics Data System (ADS)

    Wang, Hongcui; Kawahara, Tatsuya

    CALL (Computer Assisted Language Learning) systems using ASR (Automatic Speech Recognition) for second language learning have received increasing interest recently. However, it still remains a challenge to achieve high speech recognition performance, including accurate detection of erroneous utterances by non-native speakers. Conventionally, possible error patterns, based on linguistic knowledge, are added to the lexicon and language model, or the ASR grammar network. However, this approach easily falls in the trade-off of coverage of errors and the increase of perplexity. To solve the problem, we propose a method based on a decision tree to learn effective prediction of errors made by non-native speakers. An experimental evaluation with a number of foreign students learning Japanese shows that the proposed method can effectively generate an ASR grammar network, given a target sentence, to achieve both better coverage of errors and smaller perplexity, resulting in significant improvement in ASR accuracy.

  18. Using image processing technology combined with decision tree algorithm in laryngeal video stroboscope automatic identification of common vocal fold diseases.

    PubMed

    Jeffrey Kuo, Chung-Feng; Wang, Po-Chun; Chu, Yueng-Hsiang; Wang, Hsing-Won; Lai, Chun-Yu

    2013-10-01

    This study used the actual laryngeal video stroboscope videos taken by physicians in clinical practice as the samples for experimental analysis. The samples were dynamic vocal fold videos. Image processing technology was used to automatically capture the image of the largest glottal area from the video to obtain the physiological data of the vocal folds. In this study, an automatic vocal fold disease identification system was designed, which can obtain the physiological parameters for normal vocal folds, vocal paralysis and vocal nodules from image processing according to the pathological features. The decision tree algorithm was used as the classifier of the vocal fold diseases. The identification rate was 92.6%, and the identification rate with an image recognition improvement processing procedure after classification can be improved to 98.7%. Hence, the proposed system has value in clinical practices.

  19. Improved γ/hadron separation for the detection of faint γ-ray sources using boosted decision trees

    NASA Astrophysics Data System (ADS)

    Krause, Maria; Pueschel, Elisa; Maier, Gernot

    2017-03-01

    Imaging atmospheric Cherenkov telescopes record an enormous number of cosmic-ray background events. Suppressing these background events while retaining γ-rays is key to achieving good sensitivity to faint γ-ray sources. The differentiation between signal and background events can be accomplished using machine learning algorithms, which are already used in various fields of physics. Multivariate analyses combine several variables into a single variable that indicates the degree to which an event is γ-ray-like or cosmic-ray-like. In this paper we will focus on the use of "boosted decision trees" for γ/hadron separation. We apply the method to data from the Very Energetic Radiation Imaging Telescope Array System (VERITAS), and demonstrate an improved sensitivity compared to the VERITAS standard analysis.

  20. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    NASA Astrophysics Data System (ADS)

    Kupriyanov, M. S.; Shukeilo, E. Y.; Shichkina, J. A.

    2015-11-01

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient's health condition using data from a wearable device considers in this article.

  1. A method of building of decision trees based on data from wearable device during a rehabilitation of patients with tibia fractures

    SciTech Connect

    Kupriyanov, M. S. Shukeilo, E. Y. Shichkina, J. A.

    2015-11-17

    Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article.

  2. Exploring the intrinsic differences among breast tumor subtypes defined using immunohistochemistry markers based on the decision tree

    PubMed Central

    Li, Yang; Tang, Xu-Qing; Bai, Zhonghu; Dai, Xiaofeng

    2016-01-01

    Exploring the intrinsic differences among breast cancer subtypes is of crucial importance for precise diagnosis and therapeutic decision-making in diseases of high heterogeneity. The subtypes defined with several layers of information are related but not consistent, especially using immunohistochemistry markers and gene expression profiling. Here, we explored the intrinsic differences among the subtypes defined by the estrogen receptor, progesterone receptor and human epidermal growth factor receptor 2 based on the decision tree. We identified 30 mRNAs and 7 miRNAs differentially expressed along the tree’s branches. The final signature panel contained 30 mRNAs, whose performance was validated using two public datasets based on 3 well-known classifiers. The network and pathway analysis were explored for feature genes, from which key molecules including FOXQ1 and SFRP1 were revealed to be densely connected with other molecules and participate in the validated metabolic pathways. Our study uncovered the differences among the four IHC-defined breast tumor subtypes at the mRNA and miRNA levels, presented a novel signature for breast tumor subtyping, and identified several key molecules potentially driving the heterogeneity of such tumors. The results help us further understand breast tumor heterogeneity, which could be availed in clinics. PMID:27786176

  3. Forest or the trees: At what scale do elephants make foraging decisions?

    NASA Astrophysics Data System (ADS)

    Shrader, Adrian M.; Bell, Caroline; Bertolli, Liandra; Ward, David

    2012-07-01

    For herbivores, food is distributed spatially in a hierarchical manner ranging from plant parts to regions. Ultimately, utilisation of food is dependent on the scale at which herbivores make foraging decisions. A key factor that influences these decisions is body size, because selection inversely relates to body size. As a result, large animals can be less selective than small herbivores. Savanna elephants (Loxodonta africana) are the largest terrestrial herbivore. Thus, they represent a potential extreme with respect to unselective feeding. However, several studies have indicated that elephants prefer specific habitats and certain woody plant species. Thus, it is unclear at which scale elephants focus their foraging decisions. To determine this, we recorded the seasonal selection of habitats and woody plant species by elephants in the Ithala Game Reserve, South Africa. We expected that during the wet season, when both food quality and availability were high, that elephants would select primarily for habitats. This, however, does not mean that they would utilise plant species within these habitats in proportion to availability, but rather would show a stronger selection for habitats compared to plants. In contrast, during the dry season when food quality and availability declined, we expected that elephants would shift and select for the remaining high quality woody species across all habitats. Consistent with our predictions, elephants selected for the larger spatial scale (i.e. habitats) during the wet season. However, elephants did not increase their selection of woody species during the dry season, but rather increased their selection of habitats relative to woody plant selection. Unlike a number of earlier studies, we found that that neither palatability (i.e. crude protein, digestibility, and energy) alone nor tannin concentrations had a significant effect for determining the elephants' selection of woody species. However, the palatability:tannin ratio was

  4. Decision-tree analysis of clinical data to aid diagnostic reasoning for equine laminitis: a cross-sectional study.

    PubMed

    Wylie, C E; Shaw, D J; Verheyen, K L P; Newton, J R

    2016-04-23

    The objective of this cross-sectional study was to compare the prevalence of selected clinical signs in laminitis cases and non-laminitic but lame controls to evaluate their capability to discriminate laminitis from other causes of lameness. Participating veterinary practitioners completed a checklist of laminitis-associated clinical signs identified by literature review. Cases were defined as horses/ponies with veterinary-diagnosed, clinically apparent laminitis; controls were horses/ponies with any lameness other than laminitis. Associations were tested by logistic regression with adjusted odds ratios (ORs) and 95% confidence intervals, with veterinary practice as an a priori fixed effect. Multivariable analysis using graphical classification tree-based statistical models linked laminitis prevalence with specific combinations of clinical signs. Data were collected for 588 cases and 201 controls. Five clinical signs had a difference in prevalence of greater than +50 per cent: 'reluctance to walk' (OR 4.4), 'short, stilted gait at walk' (OR 9.4), 'difficulty turning' (OR 16.9), 'shifting weight' (OR 17.7) and 'increased digital pulse' (OR 13.2) (all P<0.001). 'Bilateral forelimb lameness' was the best discriminator; 92 per cent of animals with this clinical sign had laminitis (OR 40.5, P<0.001). If, in addition, horses/ponies had an 'increased digital pulse', 99 per cent were identified as laminitis. 'Presence of a flat/convex sole' also significantly enhanced clinical diagnosis discrimination (OR 15.5, P<0.001). This is the first epidemiological laminitis study to use decision-tree analysis, providing the first evidence base for evaluating clinical signs to differentially diagnose laminitis from other causes of lameness. Improved evaluation of the clinical signs displayed by laminitic animals examined by first-opinion practitioners will lead to equine welfare improvements.

  5. An improved methodology for land-cover classification using artificial neural networks and a decision tree classifier

    NASA Astrophysics Data System (ADS)

    Arellano-Neri, Olimpia

    Mapping is essential for the analysis of the land and land-cover dynamics, which influence many environmental processes and properties. When creating land-cover maps it is important to minimize error, since error will propagate into later analyses based upon these land cover maps. The reliability of land cover maps derived from remotely sensed data depends upon an accurate classification. For decades, traditional statistical methods have been applied in land-cover classification with varying degrees of accuracy. One of the most significant developments in the field of land-cover classification using remotely sensed data has been the introduction of Artificial Neural Networks (ANN) procedures. In this research, Artificial Neural Networks were applied to remotely sensed data of the southwestern Ohio region for land-cover classification. Three variants on traditional ANN-based classifiers were explored here: (1) the use of a customized architecture of the neural network in terms of the input layer for each land-cover class, (2) the use of texture analysis to combine spectral information and spatial information which is essential for urban classes, and (3) the use of decision tree (DT) classification to refine the ANN classification and ultimately to achieve a more reliable land-cover thematic map. The objective of this research was to prove that a classification based on Artificial Neural Networks (ANN) and decision tree (DT) would outperform by far the National Land Cover Data (NLCD). The NLCD is a land-cover classification produced by a cooperative effort between the United States Geological Survey (USGS) and the United States Environmental Protection Agency (USEPA). In order to achieve this objective, an accuracy assessment was conducted for both NLCD classification and ANN/DT classification. Error matrices resulting from the accuracy assessments provided overall accuracy, accuracy of each class, omission errors, and commission errors for each classification. The

  6. The WHO classification of lymphomas: cost-effective immunohistochemistry using a deductive reasoning "decision tree" approach: part II: the decision tree approach: diffuse patterns of proliferation in lymph nodes.

    PubMed

    Taylor, Clive R

    2009-12-01

    The 2008 World Health Organization Classification of Tumors of the Haematopoietic and Lymphoid Tissues defines current standards of practice for the diagnosis and classification of malignant lymphomas and related entities. More than 50 different types of lymphomas are described. Faced with such a broad range of different lymphomas, some encountered only rarely, and a rapidly growing armamentarium of 80 or more pertinent immunohistochemical (IHC) "stains," the challenge to the pathologist is to use IHC in an efficient manner to arrive at an assured and timely diagnosis. This review uses deductive reasoning following a decision tree or dendrogram model, combining basic morphologic patterns and common IHC markers to classify node-based malignancies by the World Health Organization schema. The review is divided into 2 parts, the first addressing those lymphomas that produce a follicular or nodular pattern of lymph nodal involvement appeared in the previous issue of AIMM. The second part addresses diffuse proliferations in lymph nodes. Emphasis is given to the more common lymphomas and the more commonly available IHC "stains" for a pragmatic and practical approach that is both broadly feasible and cost-effective. By this method, an assured diagnosis may be reached in the majority of nodal lymphomas, at the same time developing a sufficiency of data to recognize those rare or atypical cases that require referral to a specialized center.

  7. Detecting subcanopy invasive plant species in tropical rainforest by integrating optical and microwave (InSAR/PolInSAR) remote sensing data, and a decision tree algorithm

    NASA Astrophysics Data System (ADS)

    Ghulam, Abduwasit; Porton, Ingrid; Freeman, Karen

    2014-02-01

    In this paper, we propose a decision tree algorithm to characterize spatial extent and spectral features of invasive plant species (i.e., guava, Madagascar cardamom, and Molucca raspberry) in tropical rainforests by integrating datasets from passive and active remote sensing sensors. The decision tree algorithm is based on a number of input variables including matching score and infeasibility images from Mixture Tuned Matched Filtering (MTMF), land-cover maps, tree height information derived from high resolution stereo imagery, polarimetric feature images, Radar Forest Degradation Index (RFDI), polarimetric and InSAR coherence and phase difference images. Spatial distributions of the study organisms are mapped using pixel-based Winner-Takes-All (WTA) algorithm, object oriented feature extraction, spectral unmixing, and compared with the newly developed decision tree approach. Our results show that the InSAR phase difference and PolInSAR HH-VV coherence images of L-band PALSAR data are the most important variables following the MTMF outputs in mapping subcanopy invasive plant species in tropical rainforest. We also show that the three types of invasive plants alone occupy about 17.6% of the Betampona Nature Reserve (BNR) while mixed forest, shrubland and grassland areas are summed to 11.9% of the reserve. This work presents the first systematic attempt to evaluate forest degradation, habitat quality and invasive plant statistics in the BNR, and provides significant insights as to management strategies for the control of invasive plants and conversation in the reserve.

  8. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    NASA Astrophysics Data System (ADS)

    Khader, A.; Rosenberg, D.; McKee, M.

    2012-12-01

    Nitrate pollution poses a health risk for infants whose freshwater drinking source is groundwater. This risk creates a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision maker and the expected outcomes from these alternatives. The alternatives include: (i) ignore the health risk of nitrate contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, pollution transport processes, and climate (Khader and McKee, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine where methemoglobinemia is the main health problem associated with the principal pollutant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not-use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs include healthcare for methemoglobinemia, purchase of bottled water, and installation and maintenance of the groundwater monitoring system. At current

  9. A decision tree model to estimate the value of information provided by a groundwater quality monitoring network

    NASA Astrophysics Data System (ADS)

    Khader, A. I.; Rosenberg, D. E.; McKee, M.

    2013-05-01

    Groundwater contaminated with nitrate poses a serious health risk to infants when this contaminated water is used for culinary purposes. To avoid this health risk, people need to know whether their culinary water is contaminated or not. Therefore, there is a need to design an effective groundwater monitoring network, acquire information on groundwater conditions, and use acquired information to inform management options. These actions require time, money, and effort. This paper presents a method to estimate the value of information (VOI) provided by a groundwater quality monitoring network located in an aquifer whose water poses a spatially heterogeneous and uncertain health risk. A decision tree model describes the structure of the decision alternatives facing the decision-maker and the expected outcomes from these alternatives. The alternatives include (i) ignore the health risk of nitrate-contaminated water, (ii) switch to alternative water sources such as bottled water, or (iii) implement a previously designed groundwater quality monitoring network that takes into account uncertainties in aquifer properties, contaminant transport processes, and climate (Khader, 2012). The VOI is estimated as the difference between the expected costs of implementing the monitoring network and the lowest-cost uninformed alternative. We illustrate the method for the Eocene Aquifer, West Bank, Palestine, where methemoglobinemia (blue baby syndrome) is the main health problem associated with the principal contaminant nitrate. The expected cost of each alternative is estimated as the weighted sum of the costs and probabilities (likelihoods) associated with the uncertain outcomes resulting from the alternative. Uncertain outcomes include actual nitrate concentrations in the aquifer, concentrations reported by the monitoring system, whether people abide by manager recommendations to use/not use aquifer water, and whether people get sick from drinking contaminated water. Outcome costs

  10. Diagnosis of pulmonary hypertension from magnetic resonance imaging–based computational models and decision tree analysis

    PubMed Central

    Swift, Andrew J.; Capener, David; Kiely, David; Hose, Rod; Wild, Jim M.

    2016-01-01

    Abstract Accurately identifying patients with pulmonary hypertension (PH) using noninvasive methods is challenging, and right heart catheterization (RHC) is the gold standard. Magnetic resonance imaging (MRI) has been proposed as an alternative to echocardiography and RHC in the assessment of cardiac function and pulmonary hemodynamics in patients with suspected PH. The aim of this study was to assess whether machine learning using computational modeling techniques and image-based metrics of PH can improve the diagnostic accuracy of MRI in PH. Seventy-two patients with suspected PH attending a referral center underwent RHC and MRI within 48 hours. Fifty-seven patients were diagnosed with PH, and 15 had no PH. A number of functional and structural cardiac and cardiovascular markers derived from 2 mathematical models and also solely from MRI of the main pulmonary artery and heart were integrated into a classification algorithm to investigate the diagnostic utility of the combination of the individual markers. A physiological marker based on the quantification of wave reflection in the pulmonary artery was shown to perform best individually, but optimal diagnostic performance was found by the combination of several image-based markers. Classifier results, validated using leave-one-out cross validation, demonstrated that combining computation-derived metrics reflecting hemodynamic changes in the pulmonary vasculature with measurement of right ventricular morphology and function, in a decision support algorithm, provides a method to noninvasively diagnose PH with high accuracy (92%). The high diagnostic accuracy of these MRI-based model parameters may reduce the need for RHC in patients with suspected PH. PMID:27252844

  11. Prediction of healthy blood with data mining classification by using Decision Tree, Naive Baysian and SVM approaches

    NASA Astrophysics Data System (ADS)

    Khalilinezhad, Mahdieh; Minaei, Behrooz; Vernazza, Gianni; Dellepiane, Silvana

    2015-03-01

    Data mining (DM) is the process of discovery knowledge from large databases. Applications of data mining in Blood Transfusion Organizations could be useful for improving the performance of blood donation service. The aim of this research is the prediction of healthiness of blood donors in Blood Transfusion Organization (BTO). For this goal, three famous algorithms such as Decision Tree C4.5, Naïve Bayesian classifier, and Support Vector Machine have been chosen and applied to a real database made of 11006 donors. Seven fields such as sex, age, job, education, marital status, type of donor, results of blood tests (doctors' comments and lab results about healthy or unhealthy blood donors) have been selected as input to these algorithms. The results of the three algorithms have been compared and an error cost analysis has been performed. According to this research and the obtained results, the best algorithm with low error cost and high accuracy is SVM. This research helps BTO to realize a model from blood donors in each area in order to predict the healthy blood or unhealthy blood of donors. This research could be useful if used in parallel with laboratory tests to better separate unhealthy blood.

  12. A decision-tree approach to the assessment of posttraumatic stress disorder: Engineering empirically rigorous and ecologically valid assessment measures.

    PubMed

    Stewart, Regan W; Tuerk, Peter W; Metzger, Isha W; Davidson, Tatiana M; Young, John

    2016-02-01

    Structured diagnostic interviews are widely considered to be the optimal method of assessing symptoms of posttraumatic stress; however, few clinicians report using structured assessments to guide clinical practice. One commonly cited impediment to these assessment approaches is the amount of time required for test administration and interpretation. Empirically keyed methods to reduce the administration time of structured assessments may be a viable solution to increase the use of standardized and reliable diagnostic tools. Thus, the present research conducted an initial feasibility study using a sample of treatment-seeking military veterans (N = 1,517) to develop a truncated assessment protocol based on the Clinician-Administered Posttraumatic Stress Disorder (PTSD) Scale (CAPS). Decision-tree analysis was utilized to identify a subset of predictor variables among the CAPS items that were most predictive of a diagnosis of PTSD. The algorithm-driven, atheoretical sequence of questions reduced the number of items administered by more than 75% and classified the validation sample at 92% accuracy. These results demonstrated the feasibility of developing a protocol to assess PTSD in a way that imposes little assessment burden while still providing a reliable categorization.

  13. Systemic inflammation and family history in relation to the prevalence of type 2 diabetes based on an alternating decision tree

    PubMed Central

    Uemura, Hirokazu; Ghaibeh, A. Ammar; Katsuura-Kamano, Sakurako; Yamaguchi, Miwa; Bahari, Tirani; Ishizu, Masashi; Moriguchi, Hiroki; Arisawa, Kokichi

    2017-01-01

    To investigate unknown patterns associated with type 2 diabetes in the Japanese population, we first used an alternating decision tree (ADTree) algorithm, a powerful classification algorithm from data mining, for the data from 1,102 subjects aged 35–69 years. On the basis of the investigated patterns, we then evaluated the associations of serum high-sensitivity C-reactive protein (hs-CRP) as a biomarker of systemic inflammation and family history of diabetes (negative, positive or unknown) with the prevalence of type 2 diabetes because their detailed associations have been scarcely reported. Elevated serum hs-CRP levels were proportionally associated with the increased prevalence of type 2 diabetes after adjusting for probable covariates, including body mass index and family history of diabetes (P for trend = 0.016). Stratified analyses revealed that elevated serum hs-CRP levels were proportionally associated with increased prevalence of diabetes in subjects without a family history of diabetes (P for trend = 0.020) but not in those with a family history or with an unknown family history of diabetes. Our study demonstrates that systemic inflammation was proportionally associated with increased prevalence of type 2 diabetes even after adjusting for body mass index, especially in subjects without a family history of diabetes. PMID:28361994

  14. Landsat-derived cropland mask for Tanzania using 2010-2013 time series and decision tree classifier methods

    NASA Astrophysics Data System (ADS)

    Justice, C. J.

    2015-12-01

    80% of Tanzania's population is involved in the agriculture sector. Despite this national dependence, agricultural reporting is minimal and monitoring efforts are in their infancy. The cropland mask developed through this study provides the framework for agricultural monitoring through informing analysis of crop conditions, dispersion, and intensity at a national scale. Tanzania is dominated by smallholder agricultural systems with an average field size of less than one hectare (Sarris et al, 2006). At this field scale, previous classifications of agricultural land in Tanzania using MODIS course resolution data are insufficient to inform a working monitoring system. The nation-wide cropland mask in this study was developed using composited Landsat tiles from a 2010-2013 time series. Decision tree classifiers methods were used in the study with representative training areas collected for agriculture and no agriculture using appropriate indices to separate these classes (Hansen et al, 2013). Validation was done using random sample and high resolution satellite images to compare Agriculture and No agriculture samples from the study area. The techniques used in this study were successful and have the potential to be adapted for other countries, allowing targeted monitoring efforts to improve food security, market price, and inform agricultural policy.

  15. Feature selection using Decision Tree and classification through Proximal Support Vector Machine for fault diagnostics of roller bearing

    NASA Astrophysics Data System (ADS)

    Sugumaran, V.; Muralidharan, V.; Ramachandran, K. I.

    2007-02-01

    Roller bearing is one of the most widely used rotary elements in a rotary machine. The roller bearing's nature of vibration reveals its condition and the features that show the nature, are to be extracted through some indirect means. Statistical parameters like kurtosis, standard deviation, maximum value, etc. form a set of features, which are widely used in fault diagnostics. Often the problem is, finding out good features that discriminate the different fault conditions of the bearing. Selection of good features is an important phase in pattern recognition and requires detailed domain knowledge. This paper illustrates the use of a Decision Tree that identifies the best features from a given set of samples for the purpose of classification. It uses Proximal Support Vector Machine (PSVM), which has the capability to efficiently classify the faults using statistical features. The vibration signal from a piezoelectric transducer is captured for the following conditions: good bearing, bearing with inner race fault, bearing with outer race fault, and inner and outer race fault. The statistical features are extracted therefrom and classified successfully using PSVM and SVM. The results of PSVM and SVM are compared.

  16. An expert system with radial basis function neural network based on decision trees for predicting sediment transport in sewers.

    PubMed

    Ebtehaj, Isa; Bonakdari, Hossein; Zaji, Amir Hossein

    2016-01-01

    In this study, an expert system with a radial basis function neural network (RBF-NN) based on decision trees (DT) is designed to predict sediment transport in sewer pipes at the limit of deposition. First, sensitivity analysis is carried out to investigate the effect of each parameter on predicting the densimetric Froude number (Fr). The results indicate that utilizing the ratio of the median particle diameter to pipe diameter (d/D), ratio of median particle diameter to hydraulic radius (d/R) and volumetric sediment concentration (C(V)) as the input combination leads to the best Fr prediction. Subsequently, the new hybrid DT-RBF method is presented. The results of DT-RBF are compared with RBF and RBF-particle swarm optimization (PSO), which uses PSO for RBF training. It appears that DT-RBF is more accurate (R(2) = 0.934, MARE = 0.103, RMSE = 0.527, SI = 0.13, BIAS = -0.071) than the two other RBF methods. Moreover, the proposed DT-RBF model offers explicit expressions for use by practicing engineers.

  17. Systemic inflammation and family history in relation to the prevalence of type 2 diabetes based on an alternating decision tree.

    PubMed

    Uemura, Hirokazu; Ghaibeh, A Ammar; Katsuura-Kamano, Sakurako; Yamaguchi, Miwa; Bahari, Tirani; Ishizu, Masashi; Moriguchi, Hiroki; Arisawa, Kokichi

    2017-03-31

    To investigate unknown patterns associated with type 2 diabetes in the Japanese population, we first used an alternating decision tree (ADTree) algorithm, a powerful classification algorithm from data mining, for the data from 1,102 subjects aged 35-69 years. On the basis of the investigated patterns, we then evaluated the associations of serum high-sensitivity C-reactive protein (hs-CRP) as a biomarker of systemic inflammation and family history of diabetes (negative, positive or unknown) with the prevalence of type 2 diabetes because their detailed associations have been scarcely reported. Elevated serum hs-CRP levels were proportionally associated with the increased prevalence of type 2 diabetes after adjusting for probable covariates, including body mass index and family history of diabetes (P for trend = 0.016). Stratified analyses revealed that elevated serum hs-CRP levels were proportionally associated with increased prevalence of diabetes in subjects without a family history of diabetes (P for trend = 0.020) but not in those with a family history or with an unknown family history of diabetes. Our study demonstrates that systemic inflammation was proportionally associated with increased prevalence of type 2 diabetes even after adjusting for body mass index, especially in subjects without a family history of diabetes.

  18. Assessing and monitoring the risk of desertification in Dobrogea, Romania, using Landsat data and decision tree classifier.

    PubMed

    Vorovencii, Iosif

    2015-04-01

    The risk of the desertification of a part of Romania is increasingly evident, constituting a serious problem for the environment and the society. This article attempts to assess and monitor the risk of desertification in Dobrogea using Landsat Thematic Mapper (TM) satellite images acquired in 1987, 1994, 2000, 2007 and 2011. In order to assess the risk of desertification, we used as indicators the Modified Soil Adjustment Vegetation Index 1 (MSAVI1), the Moving Standard Deviation Index (MSDI) and the albedo, indices relating to the vegetation conditions, the landscape pattern and micrometeorology. The decision tree classifier (DTC) was also used on the basis of pre-established rules, and maps displaying six grades of desertification risk were obtained: non, very low, low, medium, high and severe. Land surface temperature (LST) was also used for the analysis. The results indicate that, according to pre-established rules for the period of 1987-2011, there are two grades of desertification risk that have an ascending trend in Dobrogea, namely very low and medium desertification. An investigation into the causes of the desertification risk revealed that high temperature is the main factor, accompanied by the destruction of forest shelterbelts and of the irrigation system and, to a smaller extent, by the fragmentation of agricultural land and the deforestation in the study area.

  19. A Decision-Tree Approach to the Assessment of Posttraumatic Stress Disorder: Engineering Empirically Rigorous and Ecologically Valid Assessment Measures

    PubMed Central

    Stewart, Regan W.; Tuerk, Peter W.; Metzger, Isha W.; Davidson, Tatiana M.; Young, John

    2017-01-01

    Structured diagnostic interviews are widely considered to be the optimal method of assessing symptoms of posttraumatic stress; however, few clinicians report using structured assessments to guide clinical practice. One commonly cited impediment to these assessment approaches is the amount of time required for test administration and interpretation. Empirically keyed methods to reduce the administration time of structured assessments may be a viable solution to increase the use of standardized and reliable diagnostic tools. Thus, the present research conducted an initial feasibility study using a sample of treatment-seeking military veterans (N = 1,517) to develop a truncated assessment protocol based on the Clinician-Administered Posttraumatic Stress Disorder (PTSD) Scale (CAPS). Decision-tree analysis was utilized to identify a subset of predictor variables among the CAPS items that were most predictive of a diagnosis of PTSD. The algorithm-driven, atheoretical sequence of questions reduced the number of items administered by more than 75% and classified the validation sample at 92% accuracy. These results demonstrated the feasibility of developing a protocol to assess PTSD in a way that imposes little assessment burden while still providing a reliable categorization. PMID:26654473

  20. Ensemble Feature Learning of Genomic Data Using Support Vector Machine.

    PubMed

    Anaissi, Ali; Goyal, Madhu; Catchpoole, Daniel R; Braytee, Ali; Kennedy, Paul J

    2016-01-01

    The identification of a subset of genes having the ability to capture the necessary information to distinguish classes of patients is crucial in bioinformatics applications. Ensemble and bagging methods have been shown to work effectively in the process of gene selection and classification. Testament to that is random forest which combines random decision trees with bagging to improve overall feature selection and classification accuracy. Surprisingly, the adoption of these methods in support vector machines has only recently received attention but mostly on classification not gene selection. This paper introduces an ensemble SVM-Recursive Feature Elimination (ESVM-RFE) for gene selection that follows the concepts of ensemble and bagging used in random forest but adopts the backward elimination strategy which is the rationale of RFE algorithm. The rationale behind this is, building ensemble SVM models using randomly drawn bootstrap samples from the training set, will produce different feature rankings which will be subsequently aggregated as one feature ranking. As a result, the decision for elimination of features is based upon the ranking of multiple SVM models instead of choosing one particular model. Moreover, this approach will address the problem of imbalanced datasets by constructing a nearly balanced bootstrap sample. Our experiments show that ESVM-RFE for gene selection substantially increased the classification performance on five microarray datasets compared to state-of-the-art methods. Experiments on the childhood leukaemia dataset show that an average 9% better accuracy is achieved by ESVM-RFE over SVM-RFE, and 5% over random forest based approach. The selected genes by the ESVM-RFE algorithm were further explored with Singular Value Decomposition (SVD) which reveals significant clusters with the selected data.

  1. An Ensemble Rule Learning Approach for Automated Morphological Classification of Erythrocytes.

    PubMed

    Maity, Maitreya; Mungle, Tushar; Dhane, Dhiraj; Maiti, A K; Chakraborty, Chandan

    2017-04-01

    The analysis of pathophysiological change to erythrocytes is important for early diagnosis of anaemia. The manual assessment of pathology slides is time-consuming and complicated regarding various types of cell identification. This paper proposes an ensemble rule-based decision-making approach for morphological classification of erythrocytes. Firstly, the digital microscopic blood smear images are pre-processed for removal of spurious regions followed by colour normalisation and thresholding. The erythrocytes are segmented from background image using the watershed algorithm. The shape features are then extracted from the segmented image to detect shape abnormality present in microscopic blood smear images. The decision about the abnormality is taken using proposed multiple rule-based expert systems. The deciding factor is majority ensemble voting for abnormally shaped erythrocytes. Here, shape-based features are considered for nine different types of abnormal erythrocytes including normal erythrocytes. Further, the adaptive boosting algorithm is used to generate multiple decision tree models where each model tree generates an individual rule set. The supervised classification method is followed to generate rules using a C4.5 decision tree. The proposed ensemble approach is precise in detecting eight types of abnormal erythrocytes with an overall accuracy of 97.81% and weighted sensitivity of 97.33%, weighted specificity of 99.7%, and weighted precision of 98%. This approach shows the robustness of proposed strategy for erythrocytes classification into abnormal and normal class. The article also clarifies its latent quality to be incorporated in point of care technology solution targeting a rapid clinical assistance.

  2. Rapid decision support tool based on novel ecosystem service variables for retrofitting of permeable pavement systems in the presence of trees.

    PubMed

    Scholz, Miklas; Uzomah, Vincent C

    2013-08-01

    The retrofitting of sustainable drainage systems (SuDS) such as permeable pavements is currently undertaken ad hoc using expert experience supported by minimal guidance based predominantly on hard engineering variables. There is a lack of practical decision support tools useful for a rapid assessment of the potential of ecosystem services when retrofitting permeable pavements in urban areas that either feature existing trees or should be planted with trees in the near future. Thus the aim of this paper is to develop an innovative rapid decision support tool based on novel ecosystem service variables for retrofitting of permeable pavement systems close to trees. This unique tool proposes the retrofitting of permeable pavements that obtained the highest ecosystem service score for a specific urban site enhanced by the presence of trees. This approach is based on a novel ecosystem service philosophy adapted to permeable pavements rather than on traditional engineering judgement associated with variables based on quick community and environment assessments. For an example case study area such as Greater Manchester, which was dominated by Sycamore and Common Lime, a comparison with the traditional approach of determining community and environment variables indicates that permeable pavements are generally a preferred SuDS option. Permeable pavements combined with urban trees received relatively high scores, because of their great potential impact in terms of water and air quality improvement, and flood control, respectively. The outcomes of this paper are likely to lead to more combined permeable pavement and tree systems in the urban landscape, which are beneficial for humans and the environment.

  3. The risk of disabling, surgery and reoperation in Crohn’s disease – A decision tree-based approach to prognosis

    PubMed Central

    Dias, Cláudia Camila; Pereira Rodrigues, Pedro; Fernandes, Samuel; Portela, Francisco; Ministro, Paula; Martins, Diana; Sousa, Paula; Lago, Paula; Rosa, Isadora; Correia, Luis; Moura Santos, Paula

    2017-01-01

    Introduction Crohn’s disease (CD) is a chronic inflammatory bowel disease known to carry a high risk of disabling and many times requiring surgical interventions. This article describes a decision-tree based approach that defines the CD patients’ risk or undergoing disabling events, surgical interventions and reoperations, based on clinical and demographic variables. Materials and methods This multicentric study involved 1547 CD patients retrospectively enrolled and divided into two cohorts: a derivation one (80%) and a validation one (20%). Decision trees were built upon applying the CHAIRT algorithm for the selection of variables. Results Three-level decision trees were built for the risk of disabling and reoperation, whereas the risk of surgery was described in a two-level one. A receiver operating characteristic (ROC) analysis was performed, and the area under the curves (AUC) Was higher than 70% for all outcomes. The defined risk cut-off values show usefulness for the assessed outcomes: risk levels above 75% for disabling had an odds test positivity of 4.06 [3.50–4.71], whereas risk levels below 34% and 19% excluded surgery and reoperation with an odds test negativity of 0.15 [0.09–0.25] and 0.50 [0.24–1.01], respectively. Overall, patients with B2 or B3 phenotype had a higher proportion of disabling disease and surgery, while patients with later introduction of pharmacological therapeutic (1 months after initial surgery) had a higher proportion of reoperation. Conclusions The decision-tree based approach used in this study, with demographic and clinical variables, has shown to be a valid and useful approach to depict such risks of disabling, surgery and reoperation. PMID:28225800

  4. Selecting Relevant Descriptors for Classification by Bayesian Estimates: A Comparison with Decision Trees and Support Vector Machines Approaches for Disparate Data Sets.

    PubMed

    Carbon-Mangels, Miriam; Hutter, Michael C

    2011-10-01

    Classification algorithms suffer from the curse of dimensionality, which leads to overfitting, particularly if the problem is over-determined. Therefore it is of particular interest to identify the most relevant descriptors to reduce the complexity. We applied Bayesian estimates to model the probability distribution of descriptors values used for binary classification using n-fold cross-validation. As a measure for the discriminative power of the classifiers, the symmetric form of the Kullback-Leibler divergence of their probability distributions was computed. We found that the most relevant descriptors possess a Gaussian-like distribution of their values, show the largest divergences, and therefore appear most often in the cross-validation scenario. The results were compared to those of the LASSO feature selection method applied to multiple decision trees and support vector machine approaches for data sets of substrates and nonsubstrates of three Cytochrome P450 isoenzymes, which comprise strongly unbalanced compound distributions. In contrast to decision trees and support vector machines, the performance of Bayesian estimates is less affected by unbalanced data sets. This strategy reveals those descriptors that allow a simple linear separation of the classes, whereas the superior accuracy of decision trees and support vector machines can be attributed to nonlinear separation, which are in turn more prone to overfitting.

  5. MSEBAG: a dynamic classifier ensemble generation based on `minimum-sufficient ensemble' and bagging

    NASA Astrophysics Data System (ADS)

    Chen, Lei; Kamel, Mohamed S.

    2016-01-01

    In this paper, we propose a dynamic classifier system, MSEBAG, which is characterised by searching for the 'minimum-sufficient ensemble' and bagging at the ensemble level. It adopts an 'over-generation and selection' strategy and aims to achieve a good bias-variance trade-off. In the training phase, MSEBAG first searches for the 'minimum-sufficient ensemble', which maximises the in-sample fitness with the minimal number of base classifiers. Then, starting from the 'minimum-sufficient ensemble', a backward stepwise algorithm is employed to generate a collection of ensembles. The objective is to create a collection of ensembles with a descending fitness on the data, as well as a descending complexity in the structure. MSEBAG dynamically selects the ensembles from the collection for the decision aggregation. The extended adaptive aggregation (EAA) approach, a bagging-style algorithm performed at the ensemble level, is employed for this task. EAA searches for the competent ensembles using a score function, which takes into consideration both the in-sample fitness and the confidence of the statistical inference, and averages the decisions of the selected ensembles to label the test pattern. The experimental results show that the proposed MSEBAG outperforms the benchmarks on average.

  6. IHC and the WHO classification of lymphomas: cost effective immunohistochemistry using a deductive reasoning "decision tree" approach.

    PubMed

    Taylor, Clive R

    2009-10-01

    The 2008 World Health Organization Classification of Tumors of the Hematopoietic and Lymphoid Tissues defines current standards of practice for the diagnosis and classification of malignant lymphomas and related entities. More than 50 different types of lymphomas are described, combining fine morphologic criteria with immunohistochemical (IHC), and sometimes molecular, findings. Faced with such a broad range of different lymphomas, some encountered only rarely, and a rapidly growing, ever changing, armamentarium of approximately 80 pertinent IHC "stains", the challenge to the pathologist is to employ IHC in an efficient manner, to arrive at an assured diagnosis as rapidly as possible. This review uses deductive reasoning, after a decision tree or dendrogram model that relies upon recognition of basic morphologic patterns for efficient selection, use and interpretation of IHC markers to classify node-based malignancies by the World Health Organization schema. The review is divided into 2 parts, the first addressing those lymphomas that produce a follicular or nodular pattern of lymph nodal involvement; the second addressing diffuse proliferations in lymph nodes. It is accepted that only specialized centers are able to apply all of the technical resources and experience necessary for definitive diagnosis of unusual cases. Emphasis therefore is given to the more common lymphomas and the more commonly available IHC "stains", for a pragmatic and practical approach that is both broadly feasible and cost effective. By this method an assured diagnosis may be reached in the majority of nodal lymphomas, at the same time developing a sufficiency of data to recognize those rare or atypical cases that require referral to a specialized center.

  7. Energy spectra unfolding of fast neutron sources using the group method of data handling and decision tree algorithms

    NASA Astrophysics Data System (ADS)

    Hosseini, Seyed Abolfazl; Afrakoti, Iman Esmaili Paeen

    2017-04-01

    Accurate unfolding of the energy spectrum of a neutron source gives important information about unknown neutron sources. The obtained information is useful in many areas like nuclear safeguards, nuclear nonproliferation, and homeland security. In the present study, the energy spectrum of a poly-energetic fast neutron source is reconstructed using the developed computational codes based on the Group Method of Data Handling (GMDH) and Decision Tree (DT) algorithms. The neutron pulse height distribution (neutron response function) in the considered NE-213 liquid organic scintillator has been simulated using the developed MCNPX-ESUT computational code (MCNPX-Energy engineering of Sharif University of Technology). The developed computational codes based on the GMDH and DT algorithms use some data for training, testing and validation steps. In order to prepare the required data, 4000 randomly generated energy spectra distributed over 52 bins are used. The randomly generated energy spectra and the simulated neutron pulse height distributions by MCNPX-ESUT for each energy spectrum are used as the output and input data. Since there is no need to solve the inverse problem with an ill-conditioned response matrix, the unfolded energy spectrum has the highest accuracy. The 241Am-9Be and 252Cf neutron sources are used in the validation step of the calculation. The unfolded energy spectra for the used fast neutron sources have an excellent agreement with the reference ones. Also, the accuracy of the unfolded energy spectra obtained using the GMDH is slightly better than those obtained from the DT. The results obtained in the present study have good accuracy in comparison with the previously published paper based on the logsig and tansig transfer functions.

  8. Assessment of the potential allergenicity of ice structuring protein type III HPLC 12 using the FAO/WHO 2001 decision tree for novel foods.

    PubMed

    Bindslev-Jensen, C; Sten, E; Earl, L K; Crevel, R W R; Bindslev-Jensen, U; Hansen, T K; Stahl Skov, P; Poulsen, L K

    2003-01-01

    The introduction of novel proteins into foods carries a risk of eliciting allergic reactions in individuals sensitive to the introduced protein. Therefore, decision trees for evaluation of the risk have been developed, the latest being proposed by WHO/FAO early in 2001. Proteins developed using modern biotechnology and derived from fish are being considered for use in food and other applications, and since allergy to fish is well established, a potential risk from such proteins to susceptible human beings exists. The overall aim of the study was to investigate the potential allergenicity of an Ice Structuring Protein (ISP) originating from an arctic fish (the ocean pout, Macrozoarces americanus) using the newly developed decision tree proposed by FAO/WHO. The methods used were those proposed by FAO/WHO including amino acid sequence analysis for sequence similarity to known allergens, methods for assessing degradability under standardised conditions, assays for detection of specific IgE against the protein (Maxisorb RAST) and histamine release from human basophils. In the present paper we describe the serum screening phase of the study and discuss the overall application of the decision tree to the assessment of the potential allergenicity of ISP Type III. In an accompanying paper [Food Chem. Toxicol. 40 (2002) 965], we detail the specific methodology used for the sequence analysis and assessment of resistance to pepsin-catalysed proteolysis of this protein. The ISP showed no sequence similarity to known allergens nor was it stable to proteolytic degradation using standardised methods. Using sera from 20 patients with a well-documented clinical history of fish allergy, positive in skin prick tests to ocean pout, eel pout and eel were used, positive IgE-binding in vitro to extracts of the same fish was confirmed. The sera also elicited histamine release in vitro in the presence of the same extracts. The ISP was negative in all cases in the same experiments. Using the

  9. Decision Tree Phytoremediation

    DTIC Science & Technology

    1999-12-01

    Volatile metals are taken up, changed in species, and transpired. Mercury and selenium 1.3.1 Applicable or Relevant and Appropriate Requirements...disposal methods must be developed and approved by regulatory agencies. 5. Transpiration of heavy metals such as mercury or organic contaminants such...heavy metals, such as mercury , or organic contaminants, such as TCE, may create a hazard to human health or the environment. The transpiration products

  10. The use of decision tree induction and artificial neural networks for recognizing the geochemical distribution patterns of LREE in the Choghart deposit, Central Iran

    NASA Astrophysics Data System (ADS)

    Zaremotlagh, S.; Hezarkhani, A.

    2017-04-01

    Some evidences of rare earth elements (REE) concentrations are found in iron oxide-apatite (IOA) deposits which are located in Central Iranian microcontinent. There are many unsolved problems about the origin and metallogenesis of IOA deposits in this district. Although it is considered that felsic magmatism and mineralization were simultaneous in the district, interaction of multi-stage hydrothermal-magmatic processes within the Early Cambrian volcano-sedimentary sequence probably caused some epigenetic mineralizations. Secondary geological processes (e.g., multi-stage mineralization, alteration, and weathering) have affected on variations of major elements and possible redistribution of REE in IOA deposits. Hence, the geochemical behaviors and distribution patterns of REE are expected to be complicated in different zones of these deposits. The aim of this paper is recognizing LREE distribution patterns based on whole-rock chemical compositions and automatic discovery of their geochemical rules. For this purpose, the pattern recognition techniques including decision tree and neural network were applied on a high-dimensional geochemical dataset from Choghart IOA deposit. Because some data features were irrelevant or redundant in recognizing the distribution patterns of each LREE, a greedy attribute subset selection technique was employed to select the best subset of predictors used in classification tasks. The decision trees (CART algorithm) were pruned optimally to more accurately categorize independent test data than unpruned ones. The most effective classification rules were extracted from the pruned tree to describe the meaningful relationships between the predictors and different concentrations of LREE. A feed-forward artificial neural network was also applied to reliably predict the influence of various rock compositions on the spatial distribution patterns of LREE with a better performance than the decision tree induction. The findings of this study could be

  11. Assessing the safety of co-exposure to food packaging migrants in food and water using the maximum cumulative ratio and an established decision tree.

    PubMed

    Price, Paul; Zaleski, Rosemary; Hollnagel, Heli; Ketelslegers, Hans; Han, Xianglu

    2014-01-01

    Food contact materials can release low levels of multiple chemicals (migrants) into foods and beverages, to which individuals can be exposed through food consumption. This paper investigates the potential for non-carcinogenic effects from exposure to multiple migrants using the Cefic Mixtures Ad hoc Team (MIAT) decision tree. The purpose of the assessment is to demonstrate how the decision tree can be applied to concurrent exposures to multiple migrants using either hazard or structural data on the specific components, i.e. based on the acceptable daily intake (ADI) or the threshold of toxicological concern. The tree was used to assess risks from co-exposure to migrants reported in a study on non-intentionally added substances (NIAS) eluting from food contact-grade plastic and two studies of water bottles: one on organic compounds and the other on ionic forms of various elements. The MIAT decision tree assigns co-exposures to different risk management groups (I, II, IIIA and IIIB) based on the hazard index, and the maximum cumulative ratio (MCR). The predicted co-exposures for all examples fell into Group II (low toxicological concern) and had MCR values of 1.3 and 2.4 (indicating that one or two components drove the majority of the mixture's toxicity). MCR values from the study of inorganic ions (126 mixtures) ranged from 1.1 to 3.8 for glass and from 1.1 to 5.0 for plastic containers. The MCR values indicated that a single compound drove toxicity in 58% of the mixtures. MCR values also declined with increases in the hazard index for the screening assessments of exposure (suggesting fewer substances contributed as risk potential increased). Overall, it can be concluded that the data on co-exposure to migrants evaluated in these case studies are of low toxicological concern and the safety assessment approach described in this paper was shown to be a helpful screening tool.

  12. Ensemble Integration of Forest Disturbance Maps for the Landscape Change Monitoring System (LCMS)

    NASA Astrophysics Data System (ADS)

    Cohen, W. B.; Healey, S. P.; Yang, Z.; Zhu, Z.; Woodcock, C. E.; Kennedy, R. E.; Huang, C.; Steinwand, D.; Vogelmann, J. E.; Stehman, S. V.; Loveland, T. R.

    2014-12-01

    The recent convergence of free, high quality Landsat data and acceleration in the development of dense Landsat time series algorithms has spawned a nascent interagency effort known as the Landscape Change Monitoring System (LCMS). LCMS is being designed to map historic land cover changes associated with all major disturbance agents and land cover types in the US. Currently, five existing algorithms are being evaluated for inclusion in LCMS. The priorities of these five algorithms overlap to some degree, but each has its own strengths. This has led to the adoption of a novel approach, within LCMS, to integrate the map outputs (i.e., base learners) from these change detection algorithms using empirical ensemble models. Training data are derived from independent datasets representing disturbances such as: harvest, fire, insects, wind, and land use change. Ensemble modeling is expected to produce significant increases in predictive accuracy relative to the results of the individual base learners. The non-parametric models used in LCMS also provide a framework for matching output ensemble maps to independent sample-based statistical estimates of disturbance area. Multiple decision trees "vote" on class assignment, and it is possible to manipulate vote thresholds to ensure that ensemble maps reflect areas of disturbance derived from sources such as national-scale ground or image-based inventories. This talk will focus on results of the first ensemble integration of the base learners for six Landsat scenes distributed across the US. We will present an assessment of base learner performance across different types of disturbance against an independently derived, sample-based disturbance dataset (derived from the TimeSync Landsat time series visualization tool). The goal is to understand the contributions of each base learner to the quality of the ensemble map products. We will also demonstrate how the ensemble map products can be manipulated to match sample-based annual

  13. Measurement of single top quark production in the tau+jets channnel using boosted decision trees at D0

    SciTech Connect

    Liu, Zhiyi

    2009-12-01

    The top quark is the heaviest known matter particle and plays an important role in the Standard Model of particle physics. At hadron colliders, it is possible to produce single top quarks via the weak interaction. This allows a direct measurement of the CKM matrix element Vtb and serves as a window to new physics. The first direct measurement of single top quark production with a tau lepton in the final state (the tau+jets channel) is presented in this thesis. The measurement uses 4.8 fb-1 of Tevatron Run II data in p$\\bar{p}$ collisions at √s = 1.96 TeV acquired by the D0 experiment. After selecting a data sample and building a background model, the data and background model are in good agreement. A multivariate technique, boosted decision trees, is employed in discriminating the small single top quark signal from a large background. The expected sensitivity of the tau+jets channel in the Standard Model is 1.8 standard deviations. Using a Bayesian statistical approach, an upper limit on the cross section of single top quark production in the tau+jets channel is measured as 7.3 pb at 95% confidence level, and the cross section is measured as 3.4-1.8+2.0 pb. The result of the single top quark production in the tau+jets channel is also combined with those in the electron+jets and muon+jets channels. The expected sensitivity of the electron, muon and tau combined analysis is 4.7 standard deviations, to be compared to 4.5 standard deviations in electron and muon alone. The measured cross section in the three combined final states is σ(p$\\bar{p}$ → tb + X,tqb + X) = 3.84-0.83+0.89 pb. A lower limit on |Vtb| is also measured in the three combined final states to be larger than 0.85 at 95% confidence level. These results are consistent with Standard Model expectations.

  14. Decision tree model for predicting long-term outcomes in children with out-of-hospital cardiac arrest: a nationwide, population-based observational study

    PubMed Central

    2014-01-01

    Introduction At hospital arrival, early prognostication for children after out-of-hospital cardiac arrest (OHCA) might help clinicians formulate strategies, particularly in the emergency department. In this study, we aimed to develop a simple and generally applicable bedside tool for predicting outcomes in children after cardiac arrest. Methods We analyzed data of 5,379 children who had undergone OHCA. The data were extracted from a prospectively recorded, nationwide, Utstein-style Japanese database. The primary endpoint was survival with favorable neurological outcome (Cerebral Performance Category (CPC) scale categories 1 and 2) at 1 month after OHCA. We developed a decision tree prediction model by using data from a 2-year period (2008 to 2009, n = 3,693), and the data were validated using external data from 2010 (n = 1,686). Results Recursive partitioning analysis for 11 predictors in the development cohort indicated that the best single predictor for CPC 1 and 2 at 1 month was the prehospital return of spontaneous circulation (ROSC). The next predictor for children with prehospital ROSC was an initial shockable rhythm. For children without prehospital ROSC, the next best predictor was a witnessed arrest. Use of a simple decision tree prediction model permitted stratification into four outcome prediction groups: good (prehospital ROSC and initial shockable rhythm), moderately good (prehospital ROSC and initial nonshockable rhythm), poor (prehospital non-ROSC and witnessed arrest) and very poor (prehospital non-ROSC and unwitnessed arrest). By using this model, we identified patient groups ranging from 0.2% to 66.2% for 1-month CPC 1 and 2 probabilities. The validated decision tree prediction model demonstrated a sensitivity of 69.7% (95% confidence interval (CI) = 58.7% to 78.9%), a specificity of 95.2% (95% CI = 94.1% to 96.2%) and an area under the receiver operating characteristic curve of 0.88 (95% CI = 0.87 to 0.90) for predicting 1-month

  15. Can Religious Beliefs be a Protective Factor for Suicidal Behavior? A Decision Tree Analysis in a Mid-Sized City in Iran, 2013.

    PubMed

    Baneshi, Mohammad Reza; Haghdoost, Ali Akbar; Zolala, Farzaneh; Nakhaee, Nouzar; Jalali, Maryam; Tabrizi, Reza; Akbari, Maryam

    2017-04-01

    This study aimed to assess using tree-based models the impact of different dimensions of religion and other risk factors on suicide attempts in the Islamic Republic of Iran. Three hundred patients who attempted suicide and 300 age- and sex-matched patient attendants with other types of disease who referred to Kerman Afzalipour Hospital were recruited for this study following a convenience sampling. Religiosity was assessed by the Duke University Religion Index. A tree-based model was constructed using the Gini Index as the homogeneity criterion. A complementary discrimination analysis was also applied. Variables contributing to the construction of the tree were stressful life events, mental disorder, family support, and religious belief. Strong religious belief was a protective factor for those with a low number of stressful life events and those with a high mental disorder score; 72 % of those who formed these two groups had not attempted suicide. Moreover, 63 % of those with a high number of stressful life events, strong family support, strong problem-solving skills, and a low mental disorder score were less likely to attempt suicide. The significance of four other variables, GHQ, problem-coping skills, friend support, and neuroticism, was revealed in the discrimination analysis. Religious beliefs seem to be an independent factor that can predict risk for suicidal behavior. Based on the decision tree, religious beliefs among people with a high number of stressful life events might not be a dissuading factor. Such subjects need more family support and problem-solving skills.

  16. The Reliability of Classification of Terminal Nodes in GUIDE Decision Tree to Predict the Nonalcoholic Fatty Liver Disease.

    PubMed

    Birjandi, Mehdi; Ayatollahi, Seyyed Mohammad Taghi; Pourahmad, Saeedeh

    2016-01-01

    Tree structured modeling is a data mining technique used to recursively partition a dataset into relatively homogeneous subgroups in order to make more accurate predictions on generated classes. One of the classification tree induction algorithms, GUIDE, is a nonparametric method with suitable accuracy and low bias selection, which is used for predicting binary classes based on many predictors. In this tree, evaluating the accuracy of predicted classes (terminal nodes) is clinically of special importance. For this purpose, we used GUIDE classification tree in two statuses of equal and unequal misclassification cost in order to predict nonalcoholic fatty liver disease (NAFLD), considering 30 predictors. Then, to evaluate the accuracy of predicted classes by using bootstrap method, first the classification reliability in which individuals are assigned to a unique class and next the prediction probability reliability as support for that are considered.

  17. The Reliability of Classification of Terminal Nodes in GUIDE Decision Tree to Predict the Nonalcoholic Fatty Liver Disease

    PubMed Central

    Pourahmad, Saeedeh

    2016-01-01

    Tree structured modeling is a data mining technique used to recursively partition a dataset into relatively homogeneous subgroups in order to make more accurate predictions on generated classes. One of the classification tree induction algorithms, GUIDE, is a nonparametric method with suitable accuracy and low bias selection, which is used for predicting binary classes based on many predictors. In this tree, evaluating the accuracy of predicted classes (terminal nodes) is clinically of special importance. For this purpose, we used GUIDE classification tree in two statuses of equal and unequal misclassification cost in order to predict nonalcoholic fatty liver disease (NAFLD), considering 30 predictors. Then, to evaluate the accuracy of predicted classes by using bootstrap method, first the classification reliability in which individuals are assigned to a unique class and next the prediction probability reliability as support for that are considered. PMID:28053651

  18. Class Evolution Tree: a graphical tool to support decisions on the number of classes in exploratory categorical latent variable modeling for rehabilitation research.

    PubMed

    Kriston, Levente; Melchior, Hanne; Hergert, Anika; Bergelt, Corinna; Watzke, Birgit; Schulz, Holger; von Wolff, Alessa

    2011-06-01

    The aim of our study was to develop a graphical tool that can be used in addition to standard statistical criteria to support decisions on the number of classes in explorative categorical latent variable modeling for rehabilitation research. Data from two rehabilitation research projects were used. In the first study, a latent profile analysis was carried out in patients with cancer receiving an inpatient rehabilitation program to identify prototypical combinations of treatment elements. In the second study, growth mixture modeling was used to identify latent trajectory classes based on weekly symptom severity measurements during inpatient treatment of patients with mental disorders. A graphical tool, the Class Evolution Tree, was developed, and its central components were described. The Class Evolution Tree can be used in addition to statistical criteria to systematically address the issue of number of classes in explorative categorical latent variable modeling.

  19. An ensemble classification-based approach applied to retinal blood vessel segmentation.

    PubMed

    Fraz, Muhammad Moazam; Remagnino, Paolo; Hoppe, Andreas; Uyyanonvara, Bunyarit; Rudnicka, Alicja R; Owen, Christopher G; Barman, Sarah A

    2012-09-01

    This paper presents a new supervised method for segmentation of blood vessels in retinal photographs. This method uses an ensemble system of bagged and boosted decision trees and utilizes a feature vector based on the orientation analysis of gradient vector field, morphological transformation, line strength measures, and Gabor filter responses. The feature vector encodes information to handle the healthy as well as the pathological retinal image. The method is evaluated on the publicly available DRIVE and STARE databases, frequently used for this purpose and also on a new public retinal vessel reference dataset CHASE_DB1 which is a subset of retinal images of multiethnic children from the Child Heart and Health Study in England (CHASE) dataset. The performance of the ensemble system is evaluated in detail and the incurred accuracy, speed, robustness, and simplicity make the algorithm a suitable tool for automated retinal image analysis.

  20. Application of Decision Tree to Obtain Optimal Operation Rules for Reservoir Flood Control Considering Sediment Desilting-Case Study of Tseng Wen Reservoir

    NASA Astrophysics Data System (ADS)

    ShiouWei, L.

    2014-12-01

    Reservoirs are the most important water resources facilities in Taiwan.However,due to the steep slope and fragile geological conditions in the mountain area,storm events usually cause serious debris flow and flood,and the flood then will flush large amount of sediment into reservoirs.The sedimentation caused by flood has great impact on the reservoirs life.Hence,how to operate a reservoir during flood events to increase the efficiency of sediment desilting without risk the reservoir safety and impact the water supply afterward is a crucial issue in Taiwan.  Therefore,this study developed a novel optimization planning model for reservoir flood operation considering flood control and sediment desilting,and proposed easy to use operating rules represented by decision trees.The decision trees rules have considered flood mitigation,water supply and sediment desilting.The optimal planning model computes the optimal reservoir release for each flood event that minimum water supply impact and maximum sediment desilting without risk the reservoir safety.Beside the optimal flood operation planning model,this study also proposed decision tree based flood operating rules that were trained by the multiple optimal reservoir releases to synthesis flood scenarios.The synthesis flood scenarios consists of various synthesis storm events,reservoir's initial storage and target storages at the end of flood operating.  Comparing the results operated by the decision tree operation rules(DTOR) with that by historical operation for Krosa Typhoon in 2007,the DTOR removed sediment 15.4% more than that of historical operation with reservoir storage only8.38×106m3 less than that of historical operation.For Jangmi Typhoon in 2008,the DTOR removed sediment 24.4% more than that of historical operation with reservoir storage only 7.58×106m3 less than that of historical operation.The results show that the proposed DTOR model can increase the sediment desilting efficiency and extend the

  1. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions

    PubMed Central

    Sundaram, Mekala; Willoughby, Janna R.; Lichti, Nathanael I.; Steele, Michael A.; Swihart, Robert K.

    2015-01-01

    The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae) to eastern gray squirrels (Sciurus carolinensis), and measured the time and distance travelled by squirrels that consumed or cached each seed. We estimated 11 physical and chemical seed traits for each species, and the phylogenetic relationships between the 23 hardwood trees. Variance partitioning revealed that considerable variation in foraging investment was attributable to seed traits alone (27–73%), and combined effects of seed traits and phylogeny of hardwood trees (5–55%). A phylogenetic PCA (pPCA) on seed traits and tree phylogeny resulted in 2 “global” axes of traits that were phylogenetically autocorrelated at the family and genus level and a third “local” axis in which traits were not phylogenetically autocorrelated. Collectively, these axes explained 30–76% of the variation in squirrel foraging investments. The first global pPCA axis, which produced large scores for seed species with thin shells, low lipid and high carbohydrate content, was negatively related to time to consume and cache seeds and travel distance to cache. The second global pPCA axis, which produced large scores for seeds with high protein, low tannin and low dormancy levels, was an important predictor of consumption time only. The local pPCA axis primarily reflected kernel mass. Although it explained only 12% of the variation in trait space and was not autocorrelated among phylogenetic clades, the local axis was related to all four squirrel foraging investments. Squirrel foraging behaviors are influenced by a combination of phylogenetically conserved and more evolutionarily labile seed traits that is

  2. Segregating the Effects of Seed Traits and Common Ancestry of Hardwood Trees on Eastern Gray Squirrel Foraging Decisions.

    PubMed

    Sundaram, Mekala; Willoughby, Janna R; Lichti, Nathanael I; Steele, Michael A; Swihart, Robert K

    2015-01-01

    The evolution of specific seed traits in scatter-hoarded tree species often has been attributed to granivore foraging behavior. However, the degree to which foraging investments and seed traits correlate with phylogenetic relationships among trees remains unexplored. We presented seeds of 23 different hardwood tree species (families Betulaceae, Fagaceae, Juglandaceae) to eastern gray squirrels (Sciurus carolinensis), and measured the time and distance travelled by squirrels that consumed or cached each seed. We estimated 11 physical and chemical seed traits for each species, and the phylogenetic relationships between the 23 hardwood trees. Variance partitioning revealed that considerable variation in foraging investment was attributable to seed traits alone (27-73%), and combined effects of seed traits and phylogeny of hardwood trees (5-55%). A phylogenetic PCA (pPCA) on seed traits and tree phylogeny resulted in 2 "global" axes of traits that were phylogenetically autocorrelated at the family and genus level and a third "local" axis in which traits were not phylogenetically autocorrelated. Collectively, these axes explained 30-76% of the variation in squirrel foraging investments. The first global pPCA axis, which produced large scores for seed species with thin shells, low lipid and high carbohydrate content, was negatively related to time to consume and cache seeds and travel distance to cache. The second global pPCA axis, which produced large scores for seeds with high protein, low tannin and low dormancy levels, was an important predictor of consumption time only. The local pPCA axis primarily reflected kernel mass. Although it explained only 12% of the variation in trait space and was not autocorrelated among phylogenetic clades, the local axis was related to all four squirrel foraging investments. Squirrel foraging behaviors are influenced by a combination of phylogenetically conserved and more evolutionarily labile seed traits that is consistent with a weak

  3. Exploring ensemble visualization

    NASA Astrophysics Data System (ADS)

    Phadke, Madhura N.; Pinto, Lifford; Alabi, Oluwafemi; Harter, Jonathan; Taylor, Russell M., II; Wu, Xunlei; Petersen, Hannah; Bass, Steffen A.; Healey, Christopher G.

    2012-01-01

    An ensemble is a collection of related datasets. Each dataset, or member, of an ensemble is normally large, multidimensional, and spatio-temporal. Ensembles are used extensively by scientists and mathematicians, for example, by executing a simulation repeatedly with slightly different input parameters and saving the results in an ensemble to see how parameter choices affect the simulation. To draw inferences from an ensemble, scientists need to compare data both within and between ensemble members. We propose two techniques to support ensemble exploration and comparison: a pairwise sequential animation method that visualizes locally neighboring members simultaneously, and a screen door tinting method that visualizes subsets of members using screen space subdivision. We demonstrate the capabilities of both techniques, first using synthetic data, then with simulation data of heavy ion collisions in high-energy physics. Results show that both techniques are capable of supporting meaningful comparisons of ensemble data.

  4. Exploring Ensemble Visualization

    PubMed Central

    Phadke, Madhura N.; Pinto, Lifford; Alabi, Femi; Harter, Jonathan; Taylor, Russell M.; Wu, Xunlei; Petersen, Hannah; Bass, Steffen A.; Healey, Christopher G.

    2012-01-01

    An ensemble is a collection of related datasets. Each dataset, or member, of an ensemble is normally large, multidimensional, and spatio-temporal. Ensembles are used extensively by scientists and mathematicians, for example, by executing a simulation repeatedly with slightly different input parameters and saving the results in an ensemble to see how parameter choices affect the simulation. To draw inferences from an ensemble, scientists need to compare data both within and between ensemble members. We propose two techniques to support ensemble exploration and comparison: a pairwise sequential animation method that visualizes locally neighboring members simultaneously, and a screen door tinting method that visualizes subsets of members using screen space subdivision. We demonstrate the capabilities of both techniques, first using synthetic data, then with simulation data of heavy ion collisions in high-energy physics. Results show that both techniques are capable of supporting meaningful comparisons of ensemble data. PMID:22347540

  5. World Music Ensemble: Kulintang

    ERIC Educational Resources Information Center

    Beegle, Amy C.

    2012-01-01

    As instrumental world music ensembles such as steel pan, mariachi, gamelan and West African drums are becoming more the norm than the exception in North American school music programs, there are other world music ensembles just starting to gain popularity in particular parts of the United States. The kulintang ensemble, a drum and gong ensemble…

  6. Improvement of the identification of four heavy metals in environmental samples by using predictive decision tree models coupled with a set of five bioluminescent bacteria.

    PubMed

    Jouanneau, Sulivan; Durand, Marie-José; Courcoux, Philippe; Blusseau, Thomas; Thouand, Gérald

    2011-04-01

    A primary statistical model based on the crossings between the different detection ranges of a set of five bioluminescent bacterial strains was developed to identify and quantify four metals which were at several concentrations in different mixtures: cadmium, arsenic III, mercury, and copper. Four specific decision trees based on the CHAID algorithm (CHi-squared Automatic Interaction Detector type) which compose this model were designed from a database of 576 experiments (192 different mixture conditions). A specific software, 'Metalsoft', helped us choose the best decision tree and a user-friendly way to identify the metal. To validate this innovative approach, 18 environmental samples containing a mixture of these metals were submitted to a bioassay and to standardized chemical methods. The results show on average a high correlation of 98.6% for the qualitative metal identification and 94.2% for the quantification. The results are particularly encouraging, and our model is able to provide semiquantitative information after only 60 min without pretreatments of samples.

  7. Predicting skin sensitisation using a decision tree integrated testing strategy with an in silico model and in chemico/in vitro assays.

    PubMed

    Macmillan, Donna S; Canipa, Steven J; Chilton, Martyn L; Williams, Richard V; Barber, Christopher G

    2016-04-01

    There is a pressing need for non-animal methods to predict skin sensitisation potential and a number of in chemico and in vitro assays have been designed with this in mind. However, some compounds can fall outside the applicability domain of these in chemico/in vitro assays and may not be predicted accurately. Rule-based in silico models such as Derek Nexus are expert-derived from animal and/or human data and the mechanism-based alert domain can take a number of factors into account (e.g. abiotic/biotic activation). Therefore, Derek Nexus may be able to predict for compounds outside the applicability domain of in chemico/in vitro assays. To this end, an integrated testing strategy (ITS) decision tree using Derek Nexus and a maximum of two assays (from DPRA, KeratinoSens, LuSens, h-CLAT and U-SENS) was developed. Generally, the decision tree improved upon other ITS evaluated in this study with positive and negative predictivity calculated as 86% and 81%, respectively. Our results demonstrate that an ITS using an in silico model such as Derek Nexus with a maximum of two in chemico/in vitro assays can predict the sensitising potential of a number of chemicals, including those outside the applicability domain of existing non-animal assays.

  8. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  9. The Relation of Student Behavior, Peer Status, Race, and Gender to Decisions about School Discipline Using CHAID Decision Trees and Regression Modeling

    ERIC Educational Resources Information Center

    Horner, Stacy B.; Fireman, Gary D.; Wang, Eugene W.

    2010-01-01

    Peer nominations and demographic information were collected from a diverse sample of 1493 elementary school participants to examine behavior (overt and relational aggression, impulsivity, and prosociality), context (peer status), and demographic characteristics (race and gender) as predictors of teacher and administrator decisions about…

  10. A Multi Criteria Group Decision-Making Model for Teacher Evaluation in Higher Education Based on Cloud Model and Decision Tree

    ERIC Educational Resources Information Center

    Chang, Ting-Cheng; Wang, Hui

    2016-01-01

    This paper proposes a cloud multi-criteria group decision-making model for teacher evaluation in higher education which is involving subjectivity, imprecision and fuzziness. First, selecting the appropriate evaluation index depending on the evaluation objectives, indicating a clear structural relationship between the evaluation index and…

  11. Under which conditions, additional monitoring data are worth gathering for improving decision making? Application of the VOI theory in the Bayesian Event Tree eruption forecasting framework

    NASA Astrophysics Data System (ADS)

    Loschetter, Annick; Rohmer, Jérémy

    2016-04-01

    Standard and new generation of monitoring observations provide in almost real-time important information about the evolution of the volcanic system. These observations are used to update the model and contribute to a better hazard assessment and to support decision making concerning potential evacuation. The framework BET_EF (based on Bayesian Event Tree) developed by INGV enables dealing with the integration of information from monitoring with the prospect of decision making. Using this framework, the objectives of the present work are i. to propose a method to assess the added value of information (within the Value Of Information (VOI) theory) from monitoring; ii. to perform sensitivity analysis on the different parameters that influence the VOI from monitoring. VOI consists in assessing the possible increase in expected value provided by gathering information, for instance through monitoring. Basically, the VOI is the difference between the value with information and the value without additional information in a Cost-Benefit approach. This theory is well suited to deal with situations that can be represented in the form of a decision tree such as the BET_EF tool. Reference values and ranges of variation (for sensitivity analysis) were defined for input parameters, based on data from the MESIMEX exercise (performed at Vesuvio volcano in 2006). Complementary methods for sensitivity analyses were implemented: local, global using Sobol' indices and regional using Contribution to Sample Mean and Variance plots. The results (specific to the case considered) obtained with the different techniques are in good agreement and enable answering the following questions: i. Which characteristics of monitoring are important for early warning (reliability)? ii. How do experts' opinions influence the hazard assessment and thus the decision? Concerning the characteristics of monitoring, the more influent parameters are the means rather than the variances for the case considered

  12. The Hydrologic Ensemble Prediction Experiment (HEPEX)

    NASA Astrophysics Data System (ADS)

    Wood, A. W.; Thielen, J.; Pappenberger, F.; Schaake, J. C.; Hartman, R. K.

    2012-12-01

    The Hydrologic Ensemble Prediction Experiment was established in March, 2004, at a workshop hosted by the European Center for Medium Range Weather Forecasting (ECMWF). With support from the US National Weather Service (NWS) and the European Commission (EC), the HEPEX goal was to bring the international hydrological and meteorological communities together to advance the understanding and adoption of hydrological ensemble forecasts for decision support in emergency management and water resources sectors. The strategy to meet this goal includes meetings that connect the user, forecast producer and research communities to exchange ideas, data and methods; the coordination of experiments to address specific challenges; and the formation of testbeds to facilitate shared experimentation. HEPEX has organized about a dozen international workshops, as well as sessions at scientific meetings (including AMS, AGU and EGU) and special issues of scientific journals where workshop results have been published. Today, the HEPEX mission is to demonstrate the added value of hydrological ensemble prediction systems (HEPS) for emergency management and water resources sectors to make decisions that have important consequences for economy, public health, safety, and the environment. HEPEX is now organised around six major themes that represent core elements of a hydrologic ensemble prediction enterprise: input and pre-processing, ensemble techniques, data assimilation, post-processing, verification, and communication and use in decision making. This poster presents an overview of recent and planned HEPEX activities, highlighting case studies that exemplify the focus and objectives of HEPEX.

  13. Subspace ensembles for classification

    NASA Astrophysics Data System (ADS)

    Sun, Shiliang; Zhang, Changshui

    2007-11-01

    Ensemble learning constitutes one of the principal current directions in machine learning and data mining. In this paper, we explore subspace ensembles for classification by manipulating different feature subspaces. Commencing with the nature of ensemble efficacy, we probe into the microcosmic meaning of ensemble diversity, and propose to use region partitioning and region weighting to implement effective subspace ensembles. Individual classifiers possessing eminent performance on a partitioned region reflected by high neighborhood accuracies are deemed to contribute largely to this region, and are assigned large weights in determining the labels of instances in this area. A robust algorithm “Sena” that incarnates the mechanism is presented, which is insensitive to the number of nearest neighbors chosen to calculate neighborhood accuracies. The algorithm exhibits improved performance over the well-known ensembles of bagging, AdaBoost and random subspace. The difference of its effectivity with varying base classifiers is also investigated.

  14. Human Activity Recognition from Smart-Phone Sensor Data using a Multi-Class Ensemble Learning in Home Monitoring.

    PubMed

    Ghose, Soumya; Mitra, Jhimli; Karunanithi, Mohan; Dowling, Jason

    2015-01-01

    Home monitoring of chronically ill or elderly patient can reduce frequent hospitalisations and hence provide improved quality of care at a reduced cost to the community, therefore reducing the burden on the healthcare system. Activity recognition of such patients is of high importance in such a design. In this work, a system for automatic human physical activity recognition from smart-phone inertial sensors data is proposed. An ensemble of decision trees framework is adopted to train and predict the multi-class human activity system. A comparison of our proposed method with a multi-class traditional support vector machine shows significant improvement in activity recognition accuracies.

  15. Identification of Some Zeolite Group Minerals by Application of Artificial Neural Network and Decision Tree Algorithm Based on SEM-EDS Data

    NASA Astrophysics Data System (ADS)

    Akkaş, Efe; Evren Çubukçu, H.; Akin, Lutfiye; Erkut, Volkan; Yurdakul, Yasin; Karayigit, Ali Ihsan

    2016-04-01

    Identification of zeolite group minerals is complicated due to their similar chemical formulas and habits. Although the morphologies of various zeolite crystals can be recognized under Scanning Electron Microscope (SEM), it is relatively more challenging and problematic process to identify zeolites using their mineral chemical data. SEMs integrated with energy dispersive X-ray spectrometers (EDS) provide fast and reliable chemical data of minerals. However, considering elemental similarities of characteristic chemical formulae of zeolite species (e.g. Clinoptilolite ((Na,K,Ca)2 -3Al3(Al,Si)2Si13O3612H2O) and Erionite ((Na2,K2,Ca)2Al4Si14O36ṡ15H2O)) EDS data alone does not seem to be sufficient for correct identification. Furthermore, the physical properties of the specimen (e.g. roughness, electrical conductivity) and the applied analytical conditions (e.g. accelerating voltage, beam current, spot size) of the SEM-EDS should be uniform in order to obtain reliable elemental results of minerals having high alkali (Na, K) and H2O (approx. %14-18) contents. This study which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK Project No: 113Y439), aims to construct a database as large as possible for various zeolite minerals and to develop a general prediction model for the identification of zeolite minerals using SEM-EDS data. For this purpose, an artificial neural network and rule based decision tree algorithm were employed. Throughout the analyses, a total of 1850 chemical data were collected from four distinct zeolite species, (Clinoptilolite-Heulandite, Erionite, Analcime and Mordenite) observed in various rocks (e.g. coals, pyroclastics). In order to obtain a representative training data set for each minerals, a selection procedure for reference mineral analyses was applied. During the selection procedure, SEM based crystal morphology data, XRD spectra and re-calculated cationic distribution, obtained by EDS have been used for the

  16. Fast decision tree-based method to index large DNA-protein sequence databases using hybrid distributed-shared memory programming model.

    PubMed

    Jaber, Khalid Mohammad; Abdullah, Rosni; Rashid, Nur'Aini Abdul

    2014-01-01

    In recent times, the size of biological databases has increased significantly, with the continuous growth in the number of users and rate of queries; such that some databases have reached the terabyte size. There is therefore, the increasing need to access databases at the fastest rates possible. In this paper, the decision tree indexing model (PDTIM) was parallelised, using a hybrid of distributed and shared memory on resident database; with horizontal and vertical growth through Message Passing Interface (MPI) and POSIX Thread (PThread), to accelerate the index building time. The PDTIM was implemented using 1, 2, 4 and 5 processors on 1, 2, 3 and 4 threads respectively. The results show that the hybrid technique improved the speedup, compared to a sequential version. It could be concluded from results that the proposed PDTIM is appropriate for large data sets, in terms of index building time.

  17. A comparison of the decision tree approach and the neural-networks-based heuristic dynamic programming approach for subcircuit extraction problem

    NASA Astrophysics Data System (ADS)

    Zhang, Nian; Wunsch, Donald C., II

    2003-08-01

    The applications of non-standard logic device are increasing fast in the industry. Many of these applications require high speed, low power, functionality and flexibility, which cannot be obtained by standard logic device. These special logic cells can be constructed by the topology design strategy automatically or manually. However, the need arises for the topology design verification. The layout versus schematic (LVS) analysis is an essential part of topology design verification, and subcircuit extraction is one of the operations in the LVS testing. In this paper, we first provided an efficient decision tree approach to the graph isomorphism problem, and then effectively applied it to the subcircuit extraction problem based on the solution to the graph isomorphism problem. To evaluate its performance, we compare it with the neural networks based heuristic dynamic programming algorithm (SubHDP) which is by far one of the fastest algorithms for subcircuit extraction problem.

  18. Sediment source fingerprinting as an aid to catchment management: A review of the current state of knowledge and a methodological decision-tree for end-users.

    PubMed

    Collins, A L; Pulley, S; Foster, I D L; Gellis, A; Porto, P; Horowitz, A J

    2016-10-12

    The growing awareness of the environmental significance of fine-grained sediment fluxes through catchment systems continues to underscore the need for reliable information on the principal sources of this material. Source estimates are difficult to obtain using traditional monitoring techniques, but sediment source fingerprinting or tracing procedures, have emerged as a potentially valuable alternative. Despite the rapidly increasing numbers of studies reporting the use of sediment source fingerprinting, several key challenges and uncertainties continue to hamper consensus among the international scientific community on key components of the existing methodological procedures. Accordingly, this contribution reviews and presents recent developments for several key aspects of fingerprinting, namely: sediment source classification, catchment source and target sediment sampling, tracer selection, grain size issues, tracer conservatism, source apportionment modelling, and assessment of source predictions using artificial mixtures. Finally, a decision-tree representing the current state of knowledge is presented, to guide end-users in applying the fingerprinting approach.

  19. The Ensemble Canon

    NASA Technical Reports Server (NTRS)

    MIittman, David S

    2011-01-01

    Ensemble is an open architecture for the development, integration, and deployment of mission operations software. Fundamentally, it is an adaptation of the Eclipse Rich Client Platform (RCP), a widespread, stable, and supported framework for component-based application development. By capitalizing on the maturity and availability of the Eclipse RCP, Ensemble offers a low-risk, politically neutral path towards a tighter integration of operations tools. The Ensemble project is a highly successful, ongoing collaboration among NASA Centers. Since 2004, the Ensemble project has supported the development of mission operations software for NASA's Exploration Systems, Science, and Space Operations Directorates.

  20. Hydrological Ensemble Prediction System (HEPS)

    NASA Astrophysics Data System (ADS)

    Thielen-Del Pozo, J.; Schaake, J.; Martin, E.; Pailleux, J.; Pappenberger, F.

    2010-09-01

    Flood forecasting systems form a key part of ‘preparedness' strategies for disastrous floods and provide hydrological services, civil protection authorities and the public with information of upcoming events. Provided the warning leadtime is sufficiently long, adequate preparatory actions can be taken to efficiently reduce the impacts of the flooding. Following on the success of the use of ensembles for weather forecasting, the hydrological community now moves increasingly towards Hydrological Ensemble Prediction Systems (HEPS) for improved flood forecasting using operationally available NWP products as inputs. However, these products are often generated on relatively coarse scales compared to hydrologically relevant basin units and suffer systematic biases that may have considerable impact when passed through the non-linear hydrological filters. Therefore, a better understanding on how best to produce, communicate and use hydrologic ensemble forecasts in hydrological short-, medium- und long term prediction of hydrological processes is necessary. The "Hydrologic Ensemble Prediction Experiment" (HEPEX), is an international initiative consisting of hydrologists, meteorologist and end-users to advance probabilistic hydrologic forecast techniques for flood, drought and water management applications. Different aspects of the hydrological ensemble processor are being addressed including • Production of useful meteorological products relevant for hydrological applications, ranging from nowcasting products to seasonal forecasts. The importance of hindcasts that are consistent with the operational weather forecasts will be discussed to support bias correction and downscaling, statistically meaningful verification of HEPS, and the development and testing of operating rules; • Need for downscaling and post-processing of weather ensembles to reduce bias before entering hydrological applications; • Hydrological model and parameter uncertainty and how to correct and

  1. Ensemble habitat mapping of invasive plant species.

    PubMed

    Stohlgren, Thomas J; Ma, Peter; Kumar, Sunil; Rocca, Monique; Morisette, Jeffrey T; Jarnevich, Catherine S; Benson, Nate

    2010-02-01

    Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis.

  2. Ensemble habitat mapping of invasive plant species

    USGS Publications Warehouse

    Stohlgren, T.J.; Ma, P.; Kumar, S.; Rocca, M.; Morisette, J.T.; Jarnevich, C.S.; Benson, N.

    2010-01-01

    Ensemble species distribution models combine the strengths of several species environmental matching models, while minimizing the weakness of any one model. Ensemble models may be particularly useful in risk analysis of recently arrived, harmful invasive species because species may not yet have spread to all suitable habitats, leaving species-environment relationships difficult to determine. We tested five individual models (logistic regression, boosted regression trees, random forest, multivariate adaptive regression splines (MARS), and maximum entropy model or Maxent) and ensemble modeling for selected nonnative plant species in Yellowstone and Grand Teton National Parks, Wyoming; Sequoia and Kings Canyon National Parks, California, and areas of interior Alaska. The models are based on field data provided by the park staffs, combined with topographic, climatic, and vegetation predictors derived from satellite data. For the four invasive plant species tested, ensemble models were the only models that ranked in the top three models for both field validation and test data. Ensemble models may be more robust than individual species-environment matching models for risk analysis. ?? 2010 Society for Risk Analysis.

  3. Using Decision Tree Analysis to Understand Foundation Science Student Performance. Insight Gained at One South African University

    NASA Astrophysics Data System (ADS)

    Kirby, Nicola Frances; Dempster, Edith Roslyn

    2014-11-01

    The Foundation Programme of the Centre for Science Access at the University of KwaZulu-Natal, South Africa provides access to tertiary science studies to educationally disadvantaged students who do not meet formal faculty entrance requirements. The low number of students proceeding from the programme into mainstream is of concern, particularly given the national imperative to increase participation and levels of performance in tertiary-level science. An attempt was made to understand foundation student performance in a campus of this university, with the view to identifying challenges and opportunities for remediation in the curriculum and processes of selection into the programme. A classification and regression tree analysis was used to identify which variables best described student performance. The explanatory variables included biographical and school-history data, performance in selection tests, and socio-economic data pertaining to their year in the programme. The results illustrate the prognostic reliability of the model used to select students, raise concerns about the inefficiency of school performance indicators as a measure of students' academic potential in the Foundation Programme, and highlight the importance of accommodation arrangements and financial support for student success in their access year.

  4. Detection of chewing from piezoelectric film sensor signals using ensemble classifiers.

    PubMed

    Farooq, Muhammad; Sazonov, Edward

    2016-08-01

    Selection and use of pattern recognition algorithms is application dependent. In this work, we explored the use of several ensembles of weak classifiers to classify signals captured from a wearable sensor system to detect food intake based on chewing. Three sensor signals (Piezoelectric sensor, accelerometer, and hand to mouth gesture) were collected from 12 subjects in free-living conditions for 24 hrs. Sensor signals were divided into 10 seconds epochs and for each epoch combination of time and frequency domain features were computed. In this work, we present a comparison of three different ensemble techniques: boosting (AdaBoost), bootstrap aggregation (bagging) and stacking, each trained with 3 different weak classifiers (Decision Trees, Linear Discriminant Analysis (LDA) and Logistic Regression). Type of feature normalization used can also impact the classification results. For each ensemble method, three feature normalization techniques: (no-normalization, z-score normalization, and minmax normalization) were tested. A 12 fold cross-validation scheme was used to evaluate the performance of each model where the performance was evaluated in terms of precision, recall, and accuracy. Best results achieved here show an improvement of about 4% over our previous algorithms.

  5. Ensembl regulation resources

    PubMed Central

    Zerbino, Daniel R.; Johnson, Nathan; Juetteman, Thomas; Sheppard, Dan; Wilder, Steven P.; Lavidas, Ilias; Nuhn, Michael; Perry, Emily; Raffaillac-Desfosses, Quentin; Sobral, Daniel; Keefe, Damian; Gräf, Stefan; Ahmed, Ikhlak; Kinsella, Rhoda; Pritchard, Bethan; Brent, Simon; Amode, Ridwan; Parker, Anne; Trevanion, Steven; Birney, Ewan; Dunham, Ian; Flicek, Paul

    2016-01-01

    New experimental techniques in epigenomics allow researchers to assay a diversity of highly dynamic features such as histone marks, DNA modifications or chromatin structure. The study of their fluctuations should provide insights into gene expression regulation, cell differentiation and disease. The Ensembl project collects and maintains the Ensembl regulation data resources on epigenetic marks, transcription factor binding and DNA methylation for human and mouse, as well as microarray probe mappings and annotations for a variety of chordate genomes. From this data, we produce a functional annotation of the regulatory elements along the human and mouse genomes with plans to expand to other species as data becomes available. Starting from well-studied cell lines, we will progressively expand our library of measurements to a greater variety of samples. Ensembl’s regulation resources provide a central and easy-to-query repository for reference epigenomes. As with all Ensembl data, it is freely available at http://www.ensembl.org, from the Perl and REST APIs and from the public Ensembl MySQL database server at ensembldb.ensembl.org. Database URL: http://www.ensembl.org PMID:26888907

  6. Land cover and forest formation distributions for St. Kitts, Nevis, St. Eustatius, Grenada and Barbados from decision tree classification of cloud-cleared satellite imagery

    USGS Publications Warehouse

    Helmer, E.H.; Kennaway, T.A.; Pedreros, D.H.; Clark, M.L.; Marcano-Vega, H.; Tieszen, L.L.; Ruzycki, T.R.; Schill, S.R.; Carrington, C.M.S.

    2008-01-01

    Satellite image-based mapping of tropical forests is vital to conservation planning. Standard methods for automated image classification, however, limit classification detail in complex tropical landscapes. In this study, we test an approach to Landsat image interpretation on four islands of the Lesser Antilles, including Grenada and St. Kitts, Nevis and St. Eustatius, testing a more detailed classification than earlier work in the latter three islands. Secondly, we estimate the extents of land cover and protected forest by formation for five islands and ask how land cover has changed over the second half of the 20th century. The image interpretation approach combines image mosaics and ancillary geographic data, classifying the resulting set of raster data with decision tree software. Cloud-free image mosaics for one or two seasons were created by applying regression tree normalization to scene dates that could fill cloudy areas in a base scene. Such mosaics are also known as cloud-filled, cloud-minimized or cloud-cleared imagery, mosaics, or composites. The approach accurately distinguished several classes that more standard methods would confuse; the seamless mosaics aided reference data collection; and the multiseason imagery allowed us to separate drought deciduous forests and woodlands from semi-deciduous ones. Cultivated land areas declined 60 to 100 percent from about 1945 to 2000 on several islands. Meanwhile, forest cover has increased 50 to 950%. This trend will likely continue where sugar cane cultivation has dominated. Like the island of Puerto Rico, most higher-elevation forest formations are protected in formal or informal reserves. Also similarly, lowland forests, which are drier forest types on these islands, are not well represented in reserves. Former cultivated lands in lowland areas could provide lands for new reserves of drier forest types. The land-use history of these islands may provide insight for planners in countries currently considering

  7. Fragmentation of random trees

    NASA Astrophysics Data System (ADS)

    Kalay, Ziya; Ben-Naim, Eli

    2015-03-01

    We investigate the fragmentation of a random recursive tree by repeated removal of nodes, resulting in a forest of disjoint trees. The initial tree is generated by sequentially attaching new nodes to randomly chosen existing nodes until the tree contains N nodes. As nodes are removed, one at a time, the tree dissolves into an ensemble of separate trees, namely a forest. We study the statistical properties of trees and nodes in this heterogeneous forest. In the limit N --> ∞ , we find that the system is characterized by a single parameter: the fraction of remaining nodes m. We obtain analytically the size density ϕs of trees of size s, which has a power-law tail ϕs ~s-α , with exponent α = 1 + 1 / m . Therefore, the tail becomes steeper as further nodes are removed, producing an unusual scaling exponent that increases continuously with time. Furthermore, we investigate the fragment size distribution in a growing tree, where nodes are added as well as removed, and find that the distribution for this case is much narrower.

  8. Evaluating influences of seasonal variations and anthropogenic activities on alluvial groundwater hydrochemistry using ensemble learning approaches

    NASA Astrophysics Data System (ADS)

    Singh, Kunwar P.; Gupta, Shikha; Mohan, Dinesh

    2014-04-01

    Chemical composition and hydrochemistry of groundwater is influenced by the seasonal variations and anthropogenic activities in a region. Understanding of such influences and responsible factors is vital for the effective management of groundwater. In this study, ensemble learning based classification and regression models are constructed and applied to the groundwater hydrochemistry data of Unnao and Ghaziabad regions of northern India. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) models were constructed. Predictive and generalization abilities of the proposed models were investigated using several statistical parameters and compared with the support vector machines (SVM) method. The DT and SVM models discriminated the groundwater in shallow and deep aquifers, industrial and non-industrial areas, and pre- and post-monsoon seasons rendering misclassification rate (MR) between 1.52-14.92% (SDT); 0.91-6.52% (DTF); 0.61-5.27% (DTB), and 1.52-11.69% (SVM), respectively. The respective regression models yielded a correlation between measured and predicted values of COD and root mean squared error of 0.874, 0.66 (SDT); 0.952, 0.48 (DTF); 0.943, 0.52 (DTB); and 0.785, 0.85 (SVR) in complete data array of Ghaziabad. The DTF and DTB models outperformed the SVM both in classification and regression. It may be noted that incorporation of the bagging and stochastic gradient boosting algorithms in DTF and DTB models, respectively resulted in their enhanced predictive ability. The proposed ensemble models successfully delineated the influences of seasonal variations and anthropogenic activities on groundwater hydrochemistry and can be used as effective tools for forecasting the chemical composition of groundwater for its management.

  9. Ensemble Data Mining Methods

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.

    2004-01-01

    Ensemble Data Mining Methods, also known as Committee Methods or Model Combiners, are machine learning methods that leverage the power of multiple models to achieve better prediction accuracy than any of the individual models could on their own. The basic goal when designing an ensemble is the same as when establishing a committee of people: each member of the committee should be as competent as possible, but the members should be complementary to one another. If the members are not complementary, Le., if they always agree, then the committee is unnecessary---any one member is sufficient. If the members are complementary, then when one or a few members make an error, the probability is high that the remaining members can correct this error. Research in ensemble methods has largely revolved around designing ensembles consisting of competent yet complementary models.

  10. Trees Are Terrific!

    ERIC Educational Resources Information Center

    Braus, Judy, Ed.

    1992-01-01

    Ranger Rick's NatureScope is a creative education series dedicated to inspiring in children an understanding and appreciation of the natural world while developing the skills they will need to make responsible decisions about the environment. Contents are organized into the following sections: (1) "What Makes a Tree a Tree?," including…

  11. Structural Equation Model Trees

    ERIC Educational Resources Information Center

    Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman

    2013-01-01

    In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree…

  12. Response times from ensembles of accumulators

    PubMed Central

    Zandbelt, Bram; Purcell, Braden A.; Palmeri, Thomas J.; Logan, Gordon D.; Schall, Jeffrey D.

    2014-01-01

    Decision-making is explained by psychologists through stochastic accumulator models and by neurophysiologists through the activity of neurons believed to instantiate these models. We investigated an overlooked scaling problem: How does a response time (RT) that can be explained by a single model accumulator arise from numerous, redundant accumulator neurons, each of which individually appears to explain the variability of RT? We explored this scaling problem by developing a unique ensemble model of RT, called e pluribus unum, which embodies the well-known dictum “out of many, one.” We used the e pluribus unum model to analyze the RTs produced by ensembles of redundant, idiosyncratic stochastic accumulators under various termination mechanisms and accumulation rate correlations in computer simulations of ensembles of varying size. We found that predicted RT distributions are largely invariant to ensemble size if the accumulators share at least modestly correlated accumulation rates and RT is not governed by the most extreme accumulators. Under these regimes the termination times of individual accumulators was predictive of ensemble RT. We also found that the threshold measured on individual accumulators, corresponding to the firing rate of neurons measured at RT, can be invariant with RT but is equivalent to the specified model threshold only when the rate correlation is very high. PMID:24550315

  13. Support vector machine-based decision tree for snow cover extraction in mountain areas using high spatial resolution remote sensing image

    NASA Astrophysics Data System (ADS)

    Zhu, Liujun; Xiao, Pengfeng; Feng, Xuezhi; Zhang, Xueliang; Wang, Zuo; Jiang, Luyuan

    2014-01-01

    Snow cover extraction in mountain areas is a complex task, especially from high spatial resolution remote sensing (HSRRS) data. The influence of mountain shadows in HSRRS is severe and normalized difference snow index-based snow cover extraction methods are inaccessible. A decision tree building method for snow cover extraction (DTSE) integrated with an efficiency feature selection algorithm is proposed. The severe influence of terrain shadows is eliminated by extracting snow in sunlight and snow in shadow separately in different nodes. In the feature selection algorithm, deviation of fuzzy grade matrix is proposed as a class-specific criterion which improves the efficiency and robustness of the selected feature set, thus making the snow cover extraction accurate. Two experiments are carried out based on ZY-3 image of two regions (regions A and B) located in Tianshan Mountains, China. The experiment on region A achieves an adequate accuracy demonstrating the robustness of the DTSE building method. The experiment on region B shows that a general DTSE model achieves an unsatisfied accuracy for snow in shadow and DTSE rebuilding evidently improves the performance, thus providing an accurate and fast way to extract snow cover in mountain areas.

  14. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment

    PubMed Central

    Chiang, Kai-Wei; Liao, Jhen-Kai; Tsai, Guang-Je; Chang, Hsiu-Wen

    2015-01-01

    Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS) in some indoor environments. Pedestrian Dead Reckoning (PDR) is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT) aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS). Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions. PMID:26729114

  15. Assessment of the potential enhancement of rural food security in Mexico using decision tree land use classification on medium resolution satellite imagery

    NASA Astrophysics Data System (ADS)

    Bermeo, A.; Couturier, S.

    2017-01-01

    Because of its renewed importance in international agendas, food security in sub-tropical countries has been the object of studies at different scales, although the spatial components of food security are still largely undocumented. Among other aspects, food security can be assessed using a food selfsufficiency index. We propose a spatial representation of this assessment in the densely populated rural area of the Huasteca Poblana, Mexico, where there is a known tendency towards the loss of selfsufficiency of basic grains. The main agricultural systems in this area are the traditional milpa (a multicrop practice with maize as the main basic crop) system, coffee plantations and grazing land for bovine livestock. We estimate a potential additional milpa - based maize production by smallholders identifying the presence of extensive coffee and pasture systems in the production data of the agricultural census. The surface of extensive coffee plantations and pasture land were estimated using the detailed coffee agricultural census data, and a decision tree combining unsupervised and supervised spectral classification techniques of medium scale (Landsat) satellite imagery. We find that 30% of the territory would benefit more than 50% increment in food security and 13% could theoretically become maize self-sufficient from the conversion of extensive systems to the traditional multicrop milpa system.

  16. The Performance Analysis of the Map-Aided Fuzzy Decision Tree Based on the Pedestrian Dead Reckoning Algorithm in an Indoor Environment.

    PubMed

    Chiang, Kai-Wei; Liao, Jhen-Kai; Tsai, Guang-Je; Chang, Hsiu-Wen

    2015-12-28

    Hardware sensors embedded in a smartphone allow the device to become an excellent mobile navigator. A smartphone is ideal for this task because its great international popularity has led to increased phone power and since most of the necessary infrastructure is already in place. However, using a smartphone for indoor pedestrian navigation can be problematic due to the low accuracy of sensors, imprecise predictability of pedestrian motion, and inaccessibility of the Global Navigation Satellite System (GNSS) in some indoor environments. Pedestrian Dead Reckoning (PDR) is one of the most common technologies used for pedestrian navigation, but in its present form, various errors tend to accumulate. This study introduces a fuzzy decision tree (FDT) aided by map information to improve the accuracy and stability of PDR with less dependency on infrastructure. First, the map is quickly surveyed by the Indoor Mobile Mapping System (IMMS). Next, Bluetooth beacons are implemented to enable the initializing of any position. Finally, map-aided FDT can estimate navigation solutions in real time. The experiments were conducted in different fields using a variety of smartphones and users in order to verify stability. The contrast PDR system demonstrates low stability for each case without pre-calibration and post-processing, but the proposed low-complexity FDT algorithm shows good stability and accuracy under the same conditions.

  17. The Hydrologic Ensemble Prediction Experiment (HEPEX)

    NASA Astrophysics Data System (ADS)

    Wood, Andy; Wetterhall, Fredrik; Ramos, Maria-Helena

    2015-04-01

    The Hydrologic Ensemble Prediction Experiment was established in March, 2004, at a workshop hosted by the European Center for Medium Range Weather Forecasting (ECMWF), and co-sponsored by the US National Weather Service (NWS) and the European Commission (EC). The HEPEX goal was to bring the international hydrological and meteorological communities together to advance the understanding and adoption of hydrological ensemble forecasts for decision support. HEPEX pursues this goal through research efforts and practical implementations involving six core elements of a hydrologic ensemble prediction enterprise: input and pre-processing, ensemble techniques, data assimilation, post-processing, verification, and communication and use in decision making. HEPEX has grown through meetings that connect the user, forecast producer and research communities to exchange ideas, data and methods; the coordination of experiments to address specific challenges; and the formation of testbeds to facilitate shared experimentation. In the last decade, HEPEX has organized over a dozen international workshops, as well as sessions at scientific meetings (including AMS, AGU and EGU) and special issues of scientific journals where workshop results have been published. Through these interactions and an active online blog (www.hepex.org), HEPEX has built a strong and active community of nearly 400 researchers & practitioners around the world. This poster presents an overview of recent and planned HEPEX activities, highlighting case studies that exemplify the focus and objectives of HEPEX.

  18. Fragmentation of random trees

    NASA Astrophysics Data System (ADS)

    Kalay, Z.; Ben-Naim, E.

    2015-01-01

    We study fragmentation of a random recursive tree into a forest by repeated removal of nodes. The initial tree consists of N nodes and it is generated by sequential addition of nodes with each new node attaching to a randomly-selected existing node. As nodes are removed from the tree, one at a time, the tree dissolves into an ensemble of separate trees, namely, a forest. We study statistical properties of trees and nodes in this heterogeneous forest, and find that the fraction of remaining nodes m characterizes the system in the limit N\\to ∞ . We obtain analytically the size density {{φ }s} of trees of size s. The size density has power-law tail {{φ }s}˜ {{s}-α } with exponent α =1+\\frac{1}{m}. Therefore, the tail becomes steeper as further nodes are removed, and the fragmentation process is unusual in that exponent α increases continuously with time. We also extend our analysis to the case where nodes are added as well as removed, and obtain the asymptotic size density for growing trees.

  19. Ensemble Statistical Post-Processing of the National Air Quality Forecast Capability: Enhancing Ozone Forecasts in Baltimore, Maryland

    NASA Technical Reports Server (NTRS)

    Garner, Gregory G.; Thompson, Anne M.

    2013-01-01

    An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone. The user is provided the freedom to tailor the forecast to the decision at hand by using decision-specific probability thresholds that define a forecast for an ozone exceedance. Taking advantage of the ESP, the user not only receives an increase in value over the NAQFC, but also receives value for An ensemble statistical post-processor (ESP) is developed for the National Air Quality Forecast Capability (NAQFC) to address the unique challenges of forecasting surface ozone in Baltimore, MD. Air quality and meteorological data were collected from the eight monitors that constitute the Baltimore forecast region. These data were used to build the ESP using a moving-block bootstrap, regression tree models, and extreme-value theory. The ESP was evaluated using a 10-fold cross-validation to avoid evaluation with the same data used in the development process. Results indicate that the ESP is conditionally biased, likely due to slight overfitting while training the regression tree models. When viewed from the perspective of a decision-maker, the ESP provides a wealth of additional information previously not available through the NAQFC alone

  20. Online breakage detection of multitooth tools using classifier ensembles for imbalanced data

    NASA Astrophysics Data System (ADS)

    Bustillo, Andrés; Rodríguez, Juan J.

    2014-12-01

    Cutting tool breakage detection is an important task, due to its economic impact on mass production lines in the automobile industry. This task presents a central limitation: real data-sets are extremely imbalanced because breakage occurs in very few cases compared with normal operation of the cutting process. In this paper, we present an analysis of different data-mining techniques applied to the detection of insert breakage in multitooth tools. The analysis applies only one experimental variable: the electrical power consumption of the tool drive. This restriction profiles real industrial conditions more accurately than other physical variables, such as acoustic or vibration signals, which are not so easily measured. Many efforts have been made to design a method that is able to identify breakages with a high degree of reliability within a short period of time. The solution is based on classifier ensembles for imbalanced data-sets. Classifier ensembles are combinations of classifiers, which in many situations are more accurate than individual classifiers. Six different base classifiers are tested: Decision Trees, Rules, Naïve Bayes, Nearest Neighbour, Multilayer Perceptrons and Logistic Regression. Three different balancing strategies are tested with each of the classifier ensembles and compared to their performance with the original data-set: Synthetic Minority Over-Sampling Technique (SMOTE), undersampling and a combination of SMOTE and undersampling. To identify the most suitable data-mining solution, Receiver Operating Characteristics (ROC) graph and Recall-precision graph are generated and discussed. The performance of logistic regression ensembles on the balanced data-set using the combination of SMOTE and undersampling turned out to be the most suitable technique. Finally a comparison using industrial performance measures is presented, which concludes that this technique is also more suited to this industrial problem than the other techniques presented in

  1. Decision tree-based modelling for identification of potential interactions between type 2 diabetes risk factors: a decade follow-up in a Middle East prospective cohort study

    PubMed Central

    Ramezankhani, Azra; Hadavandi, Esmaeil; Pournik, Omid; Shahrabi, Jamal; Azizi, Fereidoun; Hadaegh, Farzad

    2016-01-01

    Objective The current study was undertaken for use of the decision tree (DT) method for development of different prediction models for incidence of type 2 diabetes (T2D) and for exploring interactions between predictor variables in those models. Design Prospective cohort study. Setting Tehran Lipid and Glucose Study (TLGS). Methods A total of 6647 participants (43.4% men) aged >20 years, without T2D at baselines ((1999–2001) and (2002–2005)), were followed until 2012. 2 series of models (with and without 2-hour postchallenge plasma glucose (2h-PCPG)) were developed using 3 types of DT algorithms. The performances of the models were assessed using sensitivity, specificity, area under the ROC curve (AUC), geometric mean (G-Mean) and F-Measure. Primary outcome measure T2D was primary outcome which defined if fasting plasma glucose (FPG) was ≥7 mmol/L or if the 2h-PCPG was ≥11.1 mmol/L or if the participant was taking antidiabetic medication. Results During a median follow-up of 9.5 years, 729 new cases of T2D were identified. The Quick Unbiased Efficient Statistical Tree (QUEST) algorithm had the highest sensitivity and G-Mean among all the models for men and women. The models that included 2h-PCPG had sensitivity and G-Mean of (78% and 0.75%) and (78% and 0.78%) for men and women, respectively. Both models achieved good discrimination power with AUC above 0.78. FPG, 2h-PCPG, waist-to-height ratio (WHtR) and mean arterial blood pressure (MAP) were the most important factors to incidence of T2D in both genders. Among men, those with an FPG≤4.9 mmol/L and 2h-PCPG≤7.7 mmol/L had the lowest risk, and those with an FPG>5.3 mmol/L and 2h-PCPG>4.4 mmol/L had the highest risk for T2D incidence. In women, those with an FPG≤5.2 mmol/L and WHtR≤0.55 had the lowest risk, and those with an FPG>5.2 mmol/L and WHtR>0.56 had the highest risk for T2D incidence. Conclusions Our study emphasises the utility of DT for exploring interactions between

  2. Input Decimated Ensembles

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Oza, Nikunj C.; Clancy, Daniel (Technical Monitor)

    2001-01-01

    Using an ensemble of classifiers instead of a single classifier has been shown to improve generalization performance in many pattern recognition problems. However, the extent of such improvement depends greatly on the amount of correlation among the errors of the base classifiers. Therefore, reducing those correlations while keeping the classifiers' performance levels high is an important area of research. In this article, we explore input decimation (ID), a method which selects feature subsets for their ability to discriminate among the classes and uses them to decouple the base classifiers. We provide a summary of the theoretical benefits of correlation reduction, along with results of our method on two underwater sonar data sets, three benchmarks from the Probenl/UCI repositories, and two synthetic data sets. The results indicate that input decimated ensembles (IDEs) outperform ensembles whose base classifiers use all the input features; randomly selected subsets of features; and features created using principal components analysis, on a wide range of domains.

  3. Matlab Cluster Ensemble Toolbox

    SciTech Connect

    Sapio, Vincent De; Kegelmeyer, Philip

    2009-04-27

    This is a Matlab toolbox for investigating the application of cluster ensembles to data classification, with the objective of improving the accuracy and/or speed of clustering. The toolbox divides the cluster ensemble problem into four areas, providing functionality for each. These include, (1) synthetic data generation, (2) clustering to generate individual data partitions and similarity matrices, (3) consensus function generation and final clustering to generate ensemble data partitioning, and (4) implementation of accuracy metrics. With regard to data generation, Gaussian data of arbitrary dimension can be generated. The kcenters algorithm can then be used to generate individual data partitions by either, (a) subsampling the data and clustering each subsample, or by (b) randomly initializing the algorithm and generating a clustering for each initialization. In either case an overall similarity matrix can be computed using a consensus function operating on the individual similarity matrices. A final clustering can be performed and performance metrics are provided for evaluation purposes.

  4. Decision tree analysis as a supplementary tool to enhance histomorphological differentiation when distinguishing human from non-human cranial bone in both burnt and unburnt states: A feasibility study.

    PubMed

    Simmons, T; Goodburn, B; Singhrao, S K

    2016-01-01

    This feasibility study was undertaken to describe and record the histological characteristics of burnt and unburnt cranial bone fragments from human and non-human bones. Reference series of fully mineralized, transverse sections of cranial bone, from all variables and specimen states, were prepared by manual cutting and semi-automated grinding and polishing methods. A photomicrograph catalogue reflecting differences in burnt and unburnt bone from human and non-humans was recorded and qualitative analysis was performed using an established classification system based on primary bone characteristics. The histomorphology associated with human and non-human samples was, for the main part, preserved following burning at high temperature. Clearly, fibro-lamellar complex tissue subtypes, such as plexiform or laminar primary bone, were only present in non-human bones. A decision tree analysis based on histological features provided a definitive identification key for distinguishing human from non-human bone, with an accuracy of 100%. The decision tree for samples where burning was unknown was 96% accurate, and multi-step classification to taxon was possible with 100% accuracy. The results of this feasibility study strongly suggest that histology remains a viable alternative technique if fragments of cranial bone require forensic examination in both burnt and unburnt states. The decision tree analysis may provide an additional but vital tool to enhance data interpretation. Further studies are needed to assess variation in histomorphology taking into account other cranial bones, ontogeny, species and burning conditions.

  5. Development of a decision tree to classify the most accurate tissue-specific tissue to plasma partition coefficient algorithm for a given compound.

    PubMed

    Yun, Yejin Esther; Cotton, Cecilia A; Edginton, Andrea N

    2014-02-01

    Physiologically based pharmacokinetic (PBPK) modeling is a tool used in drug discovery and human health risk assessment. PBPK models are mathematical representations of the anatomy, physiology and biochemistry of an organism and are used to predict a drug's pharmacokinetics in various situations. Tissue to plasma partition coefficients (Kp), key PBPK model parameters, define the steady-state concentration differential between tissue and plasma and are used to predict the volume of distribution. The experimental determination of these parameters once limited the development of PBPK models; however, in silico prediction methods were introduced to overcome this issue. The developed algorithms vary in input parameters and prediction accuracy, and none are considered standard, warranting further research. In this study, a novel decision-tree-based Kp prediction method was developed using six previously published algorithms. The aim of the developed classifier was to identify the most accurate tissue-specific Kp prediction algorithm for a new drug. A dataset consisting of 122 drugs was used to train the classifier and identify the most accurate Kp prediction algorithm for a certain physicochemical space. Three versions of tissue-specific classifiers were developed and were dependent on the necessary inputs. The use of the classifier resulted in a better prediction accuracy than that of any single Kp prediction algorithm for all tissues, the current mode of use in PBPK model building. Because built-in estimation equations for those input parameters are not necessarily available, this Kp prediction tool will provide Kp prediction when only limited input parameters are available. The presented innovative method will improve tissue distribution prediction accuracy, thus enhancing the confidence in PBPK modeling outputs.

  6. Gaining efficiency by parallel quantification and identification of iTRAQ-labeled peptides using HCD and decision tree guided CID/ETD on an LTQ Orbitrap.

    PubMed

    Mischerikow, Nikolai; van Nierop, Pim; Li, Ka Wan; Bernstein, Hans-Gert; Smit, August B; Heck, Albert J R; Altelaar, A F Maarten

    2010-10-01

    Isobaric stable isotope labeling of peptides using iTRAQ is an important method for MS based quantitative proteomics. Traditionally, quantitative analysis of iTRAQ labeled peptides has been confined to beam-type instruments because of the weak detection capabilities of ion traps for low mass ions. Recent technical advances in fragmentation techniques on linear ion traps and the hybrid linear ion trap-orbitrap allow circumventing this limitation. Namely, PQD and HCD facilitate iTRAQ analysis on these instrument types. Here we report a method for iTRAQ-based relative quantification on the ETD enabled LTQ Orbitrap XL, which is based on parallel peptide quantification and peptide identification. iTRAQ reporter ion generation is performed by HCD, while CID and ETD provide peptide identification data in parallel in the LTQ ion trap. This approach circumvents problems accompanying iTRAQ reporter ion generation with ETD and allows quantitative, decision tree-based CID/ETD experiments. Furthermore, the use of HCD solely for iTRAQ reporter ion read out significantly reduces the number of ions needed to obtain informative spectra, which significantly reduces the analysis time. Finally, we show that integration of this method, both with existing CID and ETD methods as well as with existing iTRAQ data analysis workflows, is simple to realize. By applying our approach to the analysis of the synapse proteome from human brain biopsies, we demonstrate that it outperforms a latest generation MALDI TOF/TOF instrument, with improvements in both peptide and protein identification and quantification. Conclusively, our work shows how HCD, CID and ETD can be beneficially combined to enable iTRAQ-based quantification on an ETD-enabled LTQ Orbitrap XL.

  7. Evaluating the High Risk Groups for Suicide: A Comparison of Logistic Regression, Support Vector Machine, Decision Tree and Artificial Neural Network

    PubMed Central

    AMINI, Payam; AHMADINIA, Hasan; POOROLAJAL, Jalal; MOQADDASI AMIRI, Mohammad

    2016-01-01

    Background: We aimed to assess the high-risk group for suicide using different classification methods includinglogistic regression (LR), decision tree (DT), artificial neural network (ANN), and support vector machine (SVM). Methods: We used the dataset of a study conducted to predict risk factors of completed suicide in Hamadan Province, the west of Iran, in 2010. To evaluate the high-risk groups for suicide, LR, SVM, DT and ANN were performed. The applied methods were compared using sensitivity, specificity, positive predicted value, negative predicted value, accuracy and the area under curve. Cochran-Q test was implied to check differences in proportion among methods. To assess the association between the observed and predicted values, Ø coefficient, contingency coefficient, and Kendall tau-b were calculated. Results: Gender, age, and job were the most important risk factors for fatal suicide attempts in common for four methods. SVM method showed the highest accuracy 0.68 and 0.67 for training and testing sample, respectively. However, this method resulted in the highest specificity (0.67 for training and 0.68 for testing sample) and the highest sensitivity for training sample (0.85), but the lowest sensitivity for the testing sample (0.53). Cochran-Q test resulted in differences between proportions in different methods (P<0.001). The association of SVM predictions and observed values, Ø coefficient, contingency coefficient, and Kendall tau-b were 0.239, 0.232 and 0.239, respectively. Conclusion: SVM had the best performance to classify fatal suicide attempts comparing to DT, LR and ANN. PMID:27957463

  8. The use of the decision tree technique and image cytometry to characterize aggressiveness in World Health Organization (WHO) grade II superficial transitional cell carcinomas of the bladder.

    PubMed

    Decaestecker, C; van Velthoven, R; Petein, M; Janssen, T; Salmon, I; Pasteels, J L; van Ham, P; Schulman, C; Kiss, R

    1996-03-01

    The aggressiveness of human bladder tumours can be assessed by means of various classification systems, including the one proposed by the World Health Organization (WHO). According to the WHO classification, three levels of malignancy are identified as grades I (low), II (intermediate), and III (high). This classification system operates satisfactorily for two of the three grades in forecasting clinical progression, most grade I tumours being associated with good prognoses and most grade III with bad. In contrast, the grade II group is very heterogeneous in terms of their clinical behaviour. The present study used two computer-assisted methods to investigate whether it is possible to sub-classify grade II tumours: computer-assisted microscope analysis (image cytometry) of Feulgen-stained nuclei and the Decision Tree Technique. This latter technique belongs to the Supervised Learning Algorithm and enables an objective assessment to be made of the diagnostic value associated with a given parameter. The combined use of these two methods in a series of 292 superficial transitional cell carcinomas shows that it is possible to identify one subgroup of grade II tumours which behave clinically like grade I tumours and a second subgroup which behaves clinically like grade III tumours. Of the nine ploidy-related parameters computed by means of image cytometry [the DNA index (DI), DNA histogram type (DHT), and the percentages of diploid, hyperdiploid, triploid, hypertriploid, tetraploid, hypertetraploid, and polyploid cell nuclei], it was the percentage of hyperdiploid and hypertetraploid cell nuclei which enabled identification, rather than conventional parameters such as the DI or the DHT.

  9. Music Ensemble: Course Proposal.

    ERIC Educational Resources Information Center

    Kovach, Brian

    A proposal is presented for a Music Ensemble course to be offered at the Community College of Philadelphia for music students who have had previous vocal or instrumental training. A standardized course proposal cover form is followed by a statement of purpose for the course, a list of major course goals, a course outline, and a bibliography. Next,…

  10. Protective Garment Ensemble

    NASA Technical Reports Server (NTRS)

    Wakefield, M. E.

    1982-01-01

    Protective garment ensemble with internally-mounted environmental- control unit contains its own air supply. Alternatively, a remote-environmental control unit or an air line is attached at the umbilical quick disconnect. Unit uses liquid air that is vaporized to provide both breathing air and cooling. Totally enclosed garment protects against toxic substances.

  11. Top Quark Produced Through the Electroweak Force: Discovery Using the Matrix Element Analysis and Search for Heavy Gauge Bosons Using Boosted Decision Trees

    SciTech Connect

    Pangilinan, Monica

    2010-05-01

    The top quark produced through the electroweak channel provides a direct measurement of the Vtb element in the CKM matrix which can be viewed as a transition rate of a top quark to a bottom quark. This production channel of top quark is also sensitive to different theories beyond the Standard Model such as heavy charged gauged bosons termed W'. This thesis measures the cross section of the electroweak produced top quark using a technique based on using the matrix elements of the processes under consideration. The technique is applied to 2.3 fb-1 of data from the D0 detector. From a comparison of the matrix element discriminants between data and the signal and background model using Bayesian statistics, we measure the cross section of the top quark produced through the electroweak mechanism σ(p$\\bar{p}$ → tb + X, tqb + X) = 4.30-1.20+0.98 pb. The measured result corresponds to a 4.9σ Gaussian-equivalent significance. By combining this analysis with other analyses based on the Bayesian Neural Network (BNN) and Boosted Decision Tree (BDT) method, the measured cross section is 3.94 ± 0.88 pb with a significance of 5.0σ, resulting in the discovery of electroweak produced top quarks. Using this measured cross section and constraining |Vtb| < 1, the 95% confidence level (C.L.) lower limit is |Vtb| > 0.78. Additionally, a search is made for the production of W' using the same samples from the electroweak produced top quark. An analysis based on the BDT method is used to separate the signal from expected backgrounds. No significant excess is found and 95% C.L. upper limits on the production cross section are set for W' with masses within 600-950 GeV. For four general models of W{prime} boson production using decay channel W' → t$\\bar{p}$, the lower mass limits are the following: M(W'L with SM couplings) > 840 GeV; M(W'R) > 880 GeV or 890 GeV if the right-handed neutrino is

  12. Top quark produced through the electroweak force: Discovery using the matrix element analysis and search for heavy gauge bosons using boosted decision trees

    NASA Astrophysics Data System (ADS)

    Pangilinan, Monica

    The top quark produced through the electroweak channel provides a direct measurement of the Vtb element in the CKM matrix which can be viewed as a transition rate of a top quark to a bottom quark. This production channel of top quark is also sensitive to different theories beyond the Standard Model such as heavy charged gauged bosons termed W'. This thesis measures the cross section of the electroweak produced top quark using a technique based on using the matrix elements of the processes under consideration. The technique is applied to 2.3 fb--1 of data from the DO detector. From a comparison of the matrix element discriminants between data and the signal and background model using Bayesian statistics, we measure the cross section of the top quark produced through the electroweak mechanism spp¯→ tb+X,tqb+X=4.30+0.98-1.2 0pb The measured result corresponds to a 4.9sigma Gaussian-equivalent significance. By combining this analysis with other analyses based on the Bayesian Neural Network (BNN) and Boosted Decision Tree (BDT) method, the measured cross section is 3.94 +/- 0.88 pb with a significance of 5.0sigma, resulting in the discovery of electroweak produced top quarks. Using this measured cross section and constraining |Vtb| < 1, the 95% confidence level (C.L.) lower limit is |Vtb| > 0.78. Additionally, a search is made for the production of W' using the same samples from the electroweak produced top quark. An analysis based on the BDT method is used to separate the signal from expected backgrounds. No significant excess is found and 95% C.L. upper limits on the production cross section are set for W' with masses within 600--950 GeV. For four general models of W' boson production using decay channel W' → tb¯, the lower mass limits are the following: M( W'L with SM couplings) > 840 GeV; M( W'R ) > 880 GeV or 890 GeV if the right-handed neutrino is lighter or heavier than W'R ; and M( W'L+R ) > 915 GeV.

  13. Ensemble Pulsar Time Scale

    NASA Astrophysics Data System (ADS)

    Yin, D. S.; Gao, Y. P.; Zhao, S. H.

    2016-05-01

    Millisecond pulsars can generate another type of time scale that is totally independent of the atomic time scale, because the physical mechanisms of the pulsar time scale and the atomic time scale are quite different from each other. Usually the pulsar timing observational data are not evenly sampled, and the internals between data points range from several hours to more than half a month. What's more, these data sets are sparse. And all these make it difficult to generate an ensemble pulsar time scale. Hence, a new algorithm to calculate the ensemble pulsar time scale is proposed. Firstly, we use cubic spline interpolation to densify the data set, and make the intervals between data points even. Then, we employ the Vondrak filter to smooth the data set, and get rid of high-frequency noise, finally adopt the weighted average method to generate the ensemble pulsar time scale. The pulsar timing residuals represent clock difference between the pulsar time and atomic time, and the high precision pulsar timing data mean the clock difference measurement between the pulsar time and atomic time with a high signal to noise ratio, which is fundamental to generate pulsar time. We use the latest released NANOGRAV (North American Nanohertz Observatory for Gravitational Waves) 9-year data set to generate the ensemble pulsar time scale. This data set is from the newest NANOGRAV data release, which includes 9-year observational data of 37 millisecond pulsars using the 100-meter Green Bank telescope and 305-meter Arecibo telescope. We find that the algorithm used in this paper can lower the influence caused by noises in timing residuals, and improve long-term stability of pulsar time. Results show that the long-term (> 1 yr) frequency stability of the pulsar time is better than 3.4×10-15.

  14. Lung Cancer Survival Prediction using Ensemble Data Mining on Seer Data

    DOE PAGES

    Agrawal, Ankit; Misra, Sanchit; Narayanan, Ramanathan; ...

    2012-01-01

    We analyze the lung cancer data available from the SEER program with the aim of developing accurate survival prediction models for lung cancer. Carefully designed preprocessing steps resulted in removal/modification/splitting of several attributes, and 2 of the 11 derived attributes were found to have significant predictive power. Several supervised classification methods were used on the preprocessed data along with various data mining optimizations and validations. In our experiments, ensemble voting of five decision tree based classifiers and meta-classifiers was found to result in the best prediction performance in terms of accuracy and area under the ROC curve. We have developedmore » an on-line lung cancer outcome calculator for estimating the risk of mortality after 6 months, 9 months, 1 year, 2 year and 5 years of diagnosis, for which a smaller non-redundant subset of 13 attributes was carefully selected using attribute selection techniques, while trying to retain the predictive power of the original set of attributes. Further, ensemble voting models were also created for predicting conditional survival outcome for lung cancer (estimating risk of mortality after 5 years of diagnosis, given that the patient has already survived for a period of time), and included in the calculator. The on-line lung cancer outcome calculator developed as a result of this study is available at http://info.eecs.northwestern.edu:8080/LungCancerOutcomeCalculator/.« less

  15. Learning classification trees

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1991-01-01

    Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.

  16. Multilevel ensemble Kalman filtering

    DOE PAGES

    Hoel, Hakon; Law, Kody J. H.; Tempone, Raul

    2016-06-14

    This study embeds a multilevel Monte Carlo sampling strategy into the Monte Carlo step of the ensemble Kalman filter (EnKF) in the setting of finite dimensional signal evolution and noisy discrete-time observations. The signal dynamics is assumed to be governed by a stochastic differential equation (SDE), and a hierarchy of time grids is introduced for multilevel numerical integration of that SDE. Finally, the resulting multilevel EnKF is proved to asymptotically outperform EnKF in terms of computational cost versus approximation accuracy. The theoretical results are illustrated numerically.

  17. ESPC Coupled Global Ensemble Design

    DTIC Science & Technology

    2014-09-30

    1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. ESPC Coupled Global Ensemble Design Justin McLay...range global atmospheric ensemble forecasting system using the Navy Global Environmental Model (NAVGEM). Couple NAVGEM to a simple SST model that...SEP 2014 2. REPORT TYPE 3. DATES COVERED 00-00-2014 to 00-00-2014 4. TITLE AND SUBTITLE ESPC Coupled Global Ensemble Design 5a. CONTRACT NUMBER

  18. Decimated Input Ensembles for Improved Generalization

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Oza, Nikunj C.; Norvig, Peter (Technical Monitor)

    1999-01-01

    Recently, many researchers have demonstrated that using classifier ensembles (e.g., averaging the outputs of multiple classifiers before reaching a classification decision) leads to improved performance for many difficult generalization problems. However, in many domains there are serious impediments to such "turnkey" classification accuracy improvements. Most notable among these is the deleterious effect of highly correlated classifiers on the ensemble performance. One particular solution to this problem is generating "new" training sets by sampling the original one. However, with finite number of patterns, this causes a reduction in the training patterns each classifier sees, often resulting in considerably worsened generalization performance (particularly for high dimensional data domains) for each individual classifier. Generally, this drop in the accuracy of the individual classifier performance more than offsets any potential gains due to combining, unless diversity among classifiers is actively promoted. In this work, we introduce a method that: (1) reduces the correlation among the classifiers; (2) reduces the dimensionality of the data, thus lessening the impact of the 'curse of dimensionality'; and (3) improves the classification performance of the ensemble.

  19. Hierarchical Ensemble Methods for Protein Function Prediction

    PubMed Central

    2014-01-01

    Protein function prediction is a complex multiclass multilabel classification problem, characterized by multiple issues such as the incompleteness of the available annotations, the integration of multiple sources of high dimensional biomolecular data, the unbalance of several functional classes, and the difficulty of univocally determining negative examples. Moreover, the hierarchical relationships between functional classes that characterize both the Gene Ontology and FunCat taxonomies motivate the development of hierarchy-aware prediction methods that showed significantly better performances than hierarchical-unaware “flat” prediction methods. In this paper, we provide a comprehensive review of hierarchical methods for protein function prediction based on ensembles of learning machines. According to this general approach, a separate learning machine is trained to learn a specific functional term and then the resulting predictions are assembled in a “consensus” ensemble decision, taking into account the hierarchical relationships between classes. The main hierarchical ensemble methods proposed in the literature are discussed in the context of existing computational methods for protein function prediction, highlighting their characteristics, advantages, and limitations. Open problems of this exciting research area of computational biology are finally considered, outlining novel perspectives for future research. PMID:25937954

  20. Integrated approach using data mining-based decision tree and object-based image analysis for high-resolution urban mapping of WorldView-2 satellite sensor data

    NASA Astrophysics Data System (ADS)

    Hamedianfar, Alireza; Shafri, Helmi Zulhaidi Mohd

    2016-04-01

    This paper integrates decision tree-based data mining (DM) and object-based image analysis (OBIA) to provide a transferable model for the detailed characterization of urban land-cover classes using WorldView-2 (WV-2) satellite images. Many articles have been published on OBIA in recent years based on DM for different applications. However, less attention has been paid to the generation of a transferable model for characterizing detailed urban land cover features. Three subsets of WV-2 images were used in this paper to generate transferable OBIA rule-sets. Many features were explored by using a DM algorithm, which created the classification rules as a decision tree (DT) structure from the first study area. The developed DT algorithm was applied to object-based classifications in the first study area. After this process, we validated the capability and transferability of the classification rules into second and third subsets. Detailed ground truth samples were collected to assess the classification results. The first, second, and third study areas achieved 88%, 85%, and 85% overall accuracies, respectively. Results from the investigation indicate that DM was an efficient method to provide the optimal and transferable classification rules for OBIA, which accelerates the rule-sets creation stage in the OBIA classification domain.

  1. Density of states for Gaussian unitary ensemble, Gaussian orthogonal ensemble, and interpolating ensembles through supersymmetric approach

    SciTech Connect

    Shamis, Mira

    2013-11-15

    We use the supersymmetric formalism to derive an integral formula for the density of states of the Gaussian Orthogonal Ensemble, and then apply saddle-point analysis to give a new derivation of the 1/N-correction to Wigner's law. This extends the work of Disertori on the Gaussian Unitary Ensemble. We also apply our method to the interpolating ensembles of Mehta–Pandey.

  2. Improving Climate Projections Using "Intelligent" Ensembles

    NASA Technical Reports Server (NTRS)

    Baker, Noel C.; Taylor, Patrick C.

    2015-01-01

    Recent changes in the climate system have led to growing concern, especially in communities which are highly vulnerable to resource shortages and weather extremes. There is an urgent need for better climate information to develop solutions and strategies for adapting to a changing climate. Climate models provide excellent tools for studying the current state of climate and making future projections. However, these models are subject to biases created by structural uncertainties. Performance metrics-or the systematic determination of model biases-succinctly quantify aspects of climate model behavior. Efforts to standardize climate model experiments and collect simulation data-such as the Coupled Model Intercomparison Project (CMIP)-provide the means to directly compare and assess model performance. Performance metrics have been used to show that some models reproduce present-day climate better than others. Simulation data from multiple models are often used to add value to projections by creating a consensus projection from the model ensemble, in which each model is given an equal weight. It has been shown that the ensemble mean generally outperforms any single model. It is possible to use unequal weights to produce ensemble means, in which models are weighted based on performance (called "intelligent" ensembles). Can performance metrics be used to improve climate projections? Previous work introduced a framework for comparing the utility of model performance metrics, showing that the best metrics are related to the variance of top-of-atmosphere outgoing longwave radiation. These metrics improve present-day climate simulations of Earth's energy budget using the "intelligent" ensemble method. The current project identifies several approaches for testing whether performance metrics can be applied to future simulations to create "intelligent" ensemble-mean climate projections. It is shown that certain performance metrics test key climate processes in the models, and

  3. Hi-trees and their layout.

    PubMed

    Marriott, Kim; Sbarski, Peter; van Gelder, Tim; Prager, Daniel; Bulka, Andy

    2011-03-01

    We introduce hi-trees, a new visual representation for hierarchical data in which, depending on the kind of parent node, the child relationship is represented using either containment or links. We give a drawing convention for hi-trees based on the standard layered drawing convention for rooted trees, then show how to extend standard bottom-up tree layout algorithms to draw hi-trees in this convention. We also explore a number of other more compact layout styles for layout of larger hi-trees and give algorithms for computing these. Finally, we describe two applications of hi-trees: argument mapping and business decision support.

  4. A multi-model ensemble approach to seabed mapping

    NASA Astrophysics Data System (ADS)

    Diesing, Markus; Stephens, David

    2015-06-01

    Seabed habitat mapping based on swath acoustic data and ground-truth samples is an emergent and active marine science discipline. Significant progress could be achieved by transferring techniques and approaches that have been successfully developed and employed in such fields as terrestrial land cover mapping. One such promising approach is the multiple classifier system, which aims at improving classification performance by combining the outputs of several classifiers. Here we present results of a multi-model ensemble applied to multibeam acoustic data covering more than 5000 km2 of seabed in the North Sea with the aim to derive accurate spatial predictions of seabed substrate. A suite of six machine learning classifiers (k-Nearest Neighbour, Support Vector Machine, Classification Tree, Random Forest, Neural Network and Naïve Bayes) was trained with ground-truth sample data classified into seabed substrate classes and their prediction accuracy was assessed with an independent set of samples. The three and five best performing models were combined to classifier ensembles. Both ensembles led to increased prediction accuracy as compared to the best performing single classifier. The improvements were however not statistically significant at the 5% level. Although the three-model ensemble did not perform significantly better than its individual component models, we noticed that the five-model ensemble did perform significantly better than three of the five component models. A classifier ensemble might therefore be an effective strategy to improve classification performance. Another advantage is the fact that the agreement in predicted substrate class between the individual models of the ensemble could be used as a measure of confidence. We propose a simple and spatially explicit measure of confidence that is based on model agreement and prediction accuracy.

  5. The Ensembl gene annotation system

    PubMed Central

    Aken, Bronwen L.; Ayling, Sarah; Barrell, Daniel; Clarke, Laura; Curwen, Valery; Fairley, Susan; Fernandez Banet, Julio; Billis, Konstantinos; García Girón, Carlos; Hourlier, Thibaut; Howe, Kevin; Kähäri, Andreas; Kokocinski, Felix; Martin, Fergal J.; Murphy, Daniel N.; Nag, Rishi; Ruffier, Magali; Schuster, Michael; Tang, Y. Amy; Vogel, Jan-Hinnerk; White, Simon; Zadissa, Amonida; Flicek, Paul

    2016-01-01

    The Ensembl gene annotation system has been used to annotate over 70 different vertebrate species across a wide range of genome projects. Furthermore, it generates the automatic alignment-based annotation for the human and mouse GENCODE gene sets. The system is based on the alignment of biological sequences, including cDNAs, proteins and RNA-seq reads, to the target genome in order to construct candidate transcript models. Careful assessment and filtering of these candidate transcripts ultimately leads to the final gene set, which is made available on the Ensembl website. Here, we describe the annotation process in detail. Database URL: http://www.ensembl.org/index.html PMID:27337980

  6. Class Evolution Tree: A Graphical Tool to Support Decisions on the Number of Classes in Exploratory Categorical Latent Variable Modeling for Rehabilitation Research

    ERIC Educational Resources Information Center

    Kriston, Levente; Melchior, Hanne; Hergert, Anika; Bergelt, Corinna; Watzke, Birgit; Schulz, Holger; von Wolff, Alessa

    2011-01-01

    The aim of our study was to develop a graphical tool that can be used in addition to standard statistical criteria to support decisions on the number of classes in explorative categorical latent variable modeling for rehabilitation research. Data from two rehabilitation research projects were used. In the first study, a latent profile analysis was…

  7. Ensemble manifold regularization.

    PubMed

    Geng, Bo; Tao, Dacheng; Xu, Chao; Yang, Linjun; Hua, Xian-Sheng

    2012-06-01

    We propose an automatic approximation of the intrinsic manifold for general semi-supervised learning (SSL) problems. Unfortunately, it is not trivial to define an optimization function to obtain optimal hyperparameters. Usually, cross validation is applied, but it does not necessarily scale up. Other problems derive from the suboptimality incurred by discrete grid search and the overfitting. Therefore, we develop an ensemble manifold regularization (EMR) framework to approximate the intrinsic manifold by combining several initial guesses. Algorithmically, we designed EMR carefully so it 1) learns both the composite manifold and the semi-supervised learner jointly, 2) is fully automatic for learning the intrinsic manifold hyperparameters implicitly, 3) is conditionally optimal for intrinsic manifold approximation under a mild and reasonable assumption, and 4) is scalable for a large number of candidate manifold hyperparameters, from both time and space perspectives. Furthermore, we prove the convergence property of EMR to the deterministic matrix at rate root-n. Extensive experiments over both synthetic and real data sets demonstrate the effectiveness of the proposed framework.

  8. In silico prediction of toxicity of phenols to Tetrahymena pyriformis by using genetic algorithm and decision tree-based modeling approach.

    PubMed

    Abbasitabar, Fatemeh; Zare-Shahabadi, Vahid

    2017-04-01

    Risk assessment of chemicals is an important issue in environmental protection; however, there is a huge lack of experimental data for a large number of end-points. The experimental determination of toxicity of chemicals involves high costs and time-consuming process. In silico tools such as quantitative structure-toxicity relationship (QSTR) models, which are constructed on the basis of computational molecular descriptors, can predict missing data for toxic end-points for existing or even not yet synthesized chemicals. Phenol derivatives are known to be aquatic pollutants. With this background, we aimed to develop an accurate and reliable QSTR model for the prediction of toxicity of 206 phenols to Tetrahymena pyriformis. A multiple linear regression (MLR)-based QSTR was obtained using a powerful descriptor selection tool named Memorized_ACO algorithm. Statistical parameters of the model were 0.72 and 0.68 for Rtraining(2) and Rtest(2), respectively. To develop a high-quality QSTR model, classification and regression tree (CART) was employed. Two approaches were considered: (1) phenols were classified into different modes of action using CART and (2) the phenols in the training set were partitioned to several subsets by a tree in such a manner that in each subset, a high-quality MLR could be developed. For the first approach, the statistical parameters of the resultant QSTR model were improved to 0.83 and 0.75 for Rtraining(2) and Rtest(2), respectively. Genetic algorithm was employed in the second approach to obtain an optimal tree, and it was shown that the final QSTR model provided excellent prediction accuracy for the training and test sets (Rtraining(2) and Rtest(2) were 0.91 and 0.93, respectively). The mean absolute error for the test set was computed as 0.1615.

  9. Tree-Ring Based Climate Scenarios to Inform Decision Making in Water Resource Management: A Case Study From the Inland Empire, CA

    NASA Astrophysics Data System (ADS)

    Groves, D.; Tingstad, A.

    2009-12-01

    Water agencies in California are concerned about meeting future demand under climate conditions that are potentially drier than today. Tree-rings can be used to reconstruct past climate which features droughts that were longer and more severe than any dry period during the 20th century for which instrumental data used by water managers are available. This research developed a new technique for modifying tree-ring based precipitation and temperature sequences that can be widely used in water management applications. A pilot study using this new method was done for the Inland Empire Utilities Agency (IEUA) in Southern California. This work employed Robust Decisionmaking to identify near-term management actions that may help mitigate future water shortages over a wide range of uncertainties related to climate, planning, and costs. The results of this work suggest that the current IEUA management plan is vulnerable to extended, high-magnitude droughts in the paleoclimate record as well as situations where management goals are not met. Increasing water banking, recycling, groundwater replenishment, and efficiency in the near-term could reduce the risk of unmet demand and shortage costs in the future.

  10. Teleportation of an atomic ensemble quantum state.

    PubMed

    Dantan, A; Treps, N; Bramati, A; Pinard, M

    2005-02-11

    We propose a protocol to achieve high fidelity quantum state teleportation of a macroscopic atomic ensemble using a pair of quantum-correlated atomic ensembles. We show how to prepare this pair of ensembles using quasiperfect quantum state transfer processes between light and atoms. Our protocol relies on optical joint measurements of the atomic ensemble states and magnetic feedback reconstruction.

  11. Is It Curtains for Traditional Ensembles?

    ERIC Educational Resources Information Center

    Van Zandt, Kathryn

    2001-01-01

    Focuses on traditional music ensembles (orchestra, bands, and choir) discussing such issues as the affects of block scheduling and how to deal with scheduling issues, the effects of funding on large ensemble programs, nontraditional ensembles in music programs, and trying to teach the National Standards for Music Education within a large ensemble.…

  12. Using high-resolution topography and hyperspectral data to classify tree species at the San Joaquin Experimental Range

    NASA Astrophysics Data System (ADS)

    Dibb, S. D.; Ustin, S.; Grigsby, S.

    2015-12-01

    Air- and space-borne remote sensing instruments allow for rapid and precise study of the diversity of the Earth's ecosystems. After atmospheric correction and ground validation are performed, the gathered hyperspectral and topographic data can be assembled into a stack of layers for land cover classification. Data for this project were collected in multiple field campaigns, including the 2013 NSF NEON California campaign and 2015 NASA SARP campaign. Using hyperspectral and high resolution topography data, 25 discriminatory attributes were processed in Exelis' ENVI software and collected for use in a decision forest to classify the four major tree species (Blue Oak, Live Oak, California Buckeye, and Foothill Pine) at the San Joaquin Experimental Range near Fresno, CA. These attributes include 21 classic vegetation indices and a number of other spectral characteristics, such as color and albedo, and four topographic layers, including slope, aspect, elevation, and tree height. Additionally, a number of nearby terrain classes, including bare earth, asphalt, water, rock, shadow, structures, and grass were created. Fifty training pixels were used for each class. The training pixels for each tree species came from collected GPS points in the field. Ensemble bootstrap aggregation of decision trees was performed in MATLAB, and an arbitrary number of 500 trees were selected to be grown. The tree that produced the minimum out-of-bag classification error (4.65%) was selected to classify the entire scene. Classification results accurately distinguished between oak species, but was suboptimal in dense areas. The entire San Joaquin Experimental Range was mapped with an overall accuracy of 94.7% and a Kappa coefficient 0.94. Finally, the Commission and Omission percentage averages were 5.3% each. A highly accurate map of tree species at this scale supports studies on drought effects, disease, and species-specific growth traits.

  13. Definition of Ensemble Error Statistics for Optimal Ensemble Data Assimilation

    NASA Astrophysics Data System (ADS)

    Frehlich, R.

    2009-09-01

    Next generation data assimilation methods must include the state dependent observation errors, i.e., the spatial and temporal variations produced by the atmospheric turbulent field. A rigorous analysis of optimal data assimilation algorithms and ensemble forecast systems requires a definition of model "truth" or perfect measurement which then defines the total observation error and forecast error. Truth is defined as the spatial average of the continuous atmospheric state variables centered on the model grid locations. To be consistent with the climatology of turbulence, the spatial average is chosen as the effective spatial filter of the numerical model. The observation errors then consist of two independent components: an instrument error and an observation sampling error which describes the mismatch of the spatial average of the observation and the spatial average of the perfect measurement or "truth". The observation sampling error is related to the "error of representativeness" but is defined only in terms of the local statistics of the atmosphere and the sampling pattern of the observation. Optimal data assimilation requires an estimate of the local background error correlation as well as the local observation error correlation. Both of these local correlations can be estimated from ensemble assimilation techniques where each member of the ensemble are produced by generating and assimilating random observations consistent with the estimates of the local sampling errors based on estimates of the local turbulent statistics. A rigorous evaluation of these optimal ensemble data assimilation techniques requires a definition of the ensemble members and the ensemble average that describes the error correlations. A new formulation is presented that is consistent with the climatology of atmospheric turbulence and the implications of this formulation for ensemble forecast systems is discussed.

  14. Reduction of predictive uncertainty in estimating irrigation water requirement through multi-model ensembles and ensemble averaging

    NASA Astrophysics Data System (ADS)

    Multsch, S.; Exbrayat, J.-F.; Kirby, M.; Viney, N. R.; Frede, H.-G.; Breuer, L.

    2014-11-01

    Irrigation agriculture plays an increasingly important role in food supply. Many evapotranspiration models are used today to estimate the water demand for irrigation. They consider different stages of crop growth by empirical crop coefficients to adapt evapotranspiration throughout the vegetation period. We investigate the importance of the model structural vs. model parametric uncertainty for irrigation simulations by considering six evapotranspiration models and five crop coefficient sets to estimate irrigation water requirements for growing wheat in the Murray-Darling Basin, Australia. The study is carried out using the spatial decision support system SPARE:WATER. We find that structural model uncertainty is far more important than model parametric uncertainty to estimate irrigation water requirement. Using the Reliability Ensemble Averaging (REA) technique, we are able to reduce the overall predictive model uncertainty by more than 10%. The exceedance probability curve of irrigation water requirements shows that a certain threshold, e.g. an irrigation water limit due to water right of 400 mm, would be less frequently exceeded in case of the REA ensemble average (45%) in comparison to the equally weighted ensemble average (66%). We conclude that multi-model ensemble predictions and sophisticated model averaging techniques are helpful in predicting irrigation demand and provide relevant information for decision making.

  15. Reduction of predictive uncertainty in estimating irrigation water requirement through multi-model ensembles and ensemble averaging

    NASA Astrophysics Data System (ADS)

    Multsch, S.; Exbrayat, J.-F.; Kirby, M.; Viney, N. R.; Frede, H.-G.; Breuer, L.

    2015-04-01

    Irrigation agriculture plays an increasingly important role in food supply. Many evapotranspiration models are used today to estimate the water demand for irrigation. They consider different stages of crop growth by empirical crop coefficients to adapt evapotranspiration throughout the vegetation period. We investigate the importance of the model structural versus model parametric uncertainty for irrigation simulations by considering six evapotranspiration models and five crop coefficient sets to estimate irrigation water requirements for growing wheat in the Murray-Darling Basin, Australia. The study is carried out using the spatial decision support system SPARE:WATER. We find that structural model uncertainty among reference ET is far more important than model parametric uncertainty introduced by crop coefficients. These crop coefficients are used to estimate irrigation water requirement following the single crop coefficient approach. Using the reliability ensemble averaging (REA) technique, we are able to reduce the overall predictive model uncertainty by more than 10%. The exceedance probability curve of irrigation water requirements shows that a certain threshold, e.g. an irrigation water limit due to water right of 400 mm, would be less frequently exceeded in case of the REA ensemble average (45%) in comparison to the equally weighted ensemble average (66%). We conclude that multi-model ensemble predictions and sophisticated model averaging techniques are helpful in predicting irrigation demand and provide relevant information for decision making.

  16. Gradient Flow and Scale Setting on MILC HISQ Ensembles

    DOE PAGES

    Bazavov, A.; Bernard, C.; Brown, N.; ...

    2016-05-25

    We report on a scale determination with gradient-flow techniques on the Nf = 2 + 1 + 1 HISQ ensembles generated by the MILC collaboration. The ensembles include four lattice spacings, ranging from approximately 0.15 to 0.06 fm, and both physical and unphysical values of the quark masses. The scales p √t0/a and w0/a and their tree-level improvements,√t0;imp and w0;imp, are computed on each ensemble using Symanzik ow and the cloverleaf definition of the energy density E. Using a combination of continuum chiral perturbation theory and a Taylor-series ansatz for the lattice-spacing and strong-coupling dependence, the results are simultaneously extrapolatedmore » to the continuum and interpolated to physical quark masses. We also determine the scales p t0 = 0:1416(+8-5) fm and w0 = 0:1717(+12-11) fm, where the errors are sums, in quadrature, of statistical and all systematic errors. The precision of w0 and √t0 is comparable to or more precise than the best previous estimates, respectively. We also find the continuum mass-dependence of w0 that will be useful for estimating the scales of other ensembles. Furthermore, we estimate the integrated autocorrelation length of . For long flow times, the autocorrelation length of appears to be comparable to or smaller than that of the topological charge.« less

  17. Efficient Gene Tree Correction Guided by Genome Evolution

    PubMed Central

    Lafond, Manuel; Seguin, Jonathan; Boussau, Bastien; Guéguen, Laurent; El-Mabrouk, Nadia; Tannier, Eric

    2016-01-01

    Motivations Gene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases. Results We propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny. Availability A web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available. PMID:27513924

  18. Comparative Visualization of Ensembles Using Ensemble Surface Slicing

    PubMed Central

    Alabi, Oluwafemi S.; Wu, Xunlei; Harter, Jonathan M.; Phadke, Madhura; Pinto, Lifford; Petersen, Hannah; Bass, Steffen; Keifer, Michael; Zhong, Sharon; Healey, Chris; Taylor, Russell M.

    2012-01-01

    By definition, an ensemble is a set of surfaces or volumes derived from a series of simulations or experiments. Sometimes the series is run with different initial conditions for one parameter to determine parameter sensitivity. The understanding and identification of visual similarities and differences among the shapes of members of an ensemble is an acute and growing challenge for researchers across the physical sciences. More specifically, the task of gaining spatial understanding and identifying similarities and differences between multiple complex geometric data sets simultaneously has proved challenging. This paper proposes a comparison and visualization technique to support the visual study of parameter sensitivity. We present a novel single-image view and sampling technique which we call Ensemble Surface Slicing (ESS). ESS produces a single image that is useful for determining differences and similarities between surfaces simultaneously from several data sets. We demonstrate the usefulness of ESS on two real-world data sets from our collaborators. PMID:23560167

  19. Algorithms on ensemble quantum computers.

    PubMed

    Boykin, P Oscar; Mor, Tal; Roychowdhury, Vwani; Vatan, Farrokh

    2010-06-01

    In ensemble (or bulk) quantum computation, all computations are performed on an ensemble of computers rather than on a single computer. Measurements of qubits in an individual computer cannot be performed; instead, only expectation values (over the complete ensemble of computers) can be measured. As a result of this limitation on the model of computation, many algorithms cannot be processed directly on such computers, and must be modified, as the common strategy of delaying the measurements usually does not resolve this ensemble-measurement problem. Here we present several new strategies for resolving this problem. Based on these strategies we provide new versions of some of the most important quantum algorithms, versions that are suitable for implementing on ensemble quantum computers, e.g., on liquid NMR quantum computers. These algorithms are Shor's factorization algorithm, Grover's search algorithm (with several marked items), and an algorithm for quantum fault-tolerant computation. The first two algorithms are simply modified using a randomizing and a sorting strategies. For the last algorithm, we develop a classical-quantum hybrid strategy for removing measurements. We use it to present a novel quantum fault-tolerant scheme. More explicitly, we present schemes for fault-tolerant measurement-free implementation of Toffoli and σ(z)(¼) as these operations cannot be implemented "bitwise", and their standard fault-tolerant implementations require measurement.

  20. Estimating preselected and postselected ensembles

    SciTech Connect

    Massar, Serge; Popescu, Sandu

    2011-11-15

    In analogy with the usual quantum state-estimation problem, we introduce the problem of state estimation for a pre- and postselected ensemble. The problem has fundamental physical significance since, as argued by Y. Aharonov and collaborators, pre- and postselected ensembles are the most basic quantum ensembles. Two new features are shown to appear: (1) information is flowing to the measuring device both from the past and from the future; (2) because of the postselection, certain measurement outcomes can be forced never to occur. Due to these features, state estimation in such ensembles is dramatically different from the case of ordinary, preselected-only ensembles. We develop a general theoretical framework for studying this problem and illustrate it through several examples. We also prove general theorems establishing that information flowing from the future is closely related to, and in some cases equivalent to, the complex conjugate information flowing from the past. Finally, we illustrate our approach on examples involving covariant measurements on spin-1/2 particles. We emphasize that all state-estimation problems can be extended to the pre- and postselected situation. The present work thus lays the foundations of a much more general theory of quantum state estimation.

  1. Using ensembles in water management: forecasting dry and wet episodes

    NASA Astrophysics Data System (ADS)

    van het Schip-Haverkamp, Tessa; van den Berg, Wim; van de Beek, Remco

    2015-04-01

    Extreme weather situations as droughts and extensive precipitation are becoming more frequent, which makes it more important to obtain accurate weather forecasts for the short and long term. Ensembles can provide a solution in terms of scenario forecasts. MeteoGroup uses ensembles in a new forecasting technique which presents a number of weather scenarios for a dynamical water management project, called Water-Rijk, in which water storage and water retention plays a large role. The Water-Rijk is part of Park Lingezegen, which is located between Arnhem and Nijmegen in the Netherlands. In collaboration with the University of Wageningen, Alterra and Eijkelkamp a forecasting system is developed for this area which can provide water boards with a number of weather and hydrology scenarios in order to assist in the decision whether or not water retention or water storage is necessary in the near future. In order to make a forecast for drought and extensive precipitation, the difference 'precipitation- evaporation' is used as a measurement of drought in the weather forecasts. In case of an upcoming drought this difference will take larger negative values. In case of a wet episode, this difference will be positive. The Makkink potential evaporation is used which gives the most accurate potential evaporation values during the summer, when evaporation plays an important role in the availability of surface water. Scenarios are determined by reducing the large number of forecasts in the ensemble to a number of averaged members with each its own likelihood of occurrence. For the Water-Rijk project 5 scenario forecasts are calculated: extreme dry, dry, normal, wet and extreme wet. These scenarios are constructed for two forecasting periods, each using its own ensemble technique: up to 48 hours ahead and up to 15 days ahead. The 48-hour forecast uses an ensemble constructed from forecasts of multiple high-resolution regional models: UKMO's Euro4 model,the ECMWF model, WRF and

  2. Talking Trees

    ERIC Educational Resources Information Center

    Tolman, Marvin

    2005-01-01

    Students love outdoor activities and will love them even more when they build confidence in their tree identification and measurement skills. Through these activities, students will learn to identify the major characteristics of trees and discover how the pace--a nonstandard measuring unit--can be used to estimate not only distances but also the…

  3. Quantum Gibbs ensemble Monte Carlo

    SciTech Connect

    Fantoni, Riccardo; Moroni, Saverio

    2014-09-21

    We present a path integral Monte Carlo method which is the full quantum analogue of the Gibbs ensemble Monte Carlo method of Panagiotopoulos to study the gas-liquid coexistence line of a classical fluid. Unlike previous extensions of Gibbs ensemble Monte Carlo to include quantum effects, our scheme is viable even for systems with strong quantum delocalization in the degenerate regime of temperature. This is demonstrated by an illustrative application to the gas-superfluid transition of {sup 4}He in two dimensions.

  4. Quantum metrology with molecular ensembles

    SciTech Connect

    Schaffry, Marcus; Gauger, Erik M.; Morton, John J. L.; Fitzsimons, Joseph; Benjamin, Simon C.; Lovett, Brendon W.

    2010-10-15

    The field of quantum metrology promises measurement devices that are fundamentally superior to conventional technologies. Specifically, when quantum entanglement is harnessed, the precision achieved is supposed to scale more favorably with the resources employed, such as system size and time required. Here, we consider measurement of magnetic-field strength using an ensemble of spin-active molecules. We identify a third essential resource: the change in ensemble polarization (entropy increase) during the metrology experiment. We find that performance depends crucially on the form of decoherence present; for a plausible dephasing model, we describe a quantum strategy, which can indeed beat the standard strategy.

  5. An Effective and Novel Neural Network Ensemble for Shift Pattern Detection in Control Charts.

    PubMed

    Barghash, Mahmoud

    2015-01-01

    Pattern recognition in control charts is critical to make a balance between discovering faults as early as possible and reducing the number of false alarms. This work is devoted to designing a multistage neural network ensemble that achieves this balance which reduces rework and scrape without reducing productivity. The ensemble under focus is composed of a series of neural network stages and a series of decision points. Initially, this work compared using multidecision points and single-decision point on the performance of the ANN which showed that multidecision points are highly preferable to single-decision points. This work also tested the effect of population percentages on the ANN and used this to optimize the ANN's performance. Also this work used optimized and nonoptimized ANNs in an ensemble and proved that using nonoptimized ANN may reduce the performance of the ensemble. The ensemble that used only optimized ANNs has improved performance over individual ANNs and three-sigma level rule. In that respect using the designed ensemble can help in reducing the number of false stops and increasing productivity. It also can be used to discover even small shifts in the mean as early as possible.

  6. An Effective and Novel Neural Network Ensemble for Shift Pattern Detection in Control Charts

    PubMed Central

    Barghash, Mahmoud

    2015-01-01

    Pattern recognition in control charts is critical to make a balance between discovering faults as early as possible and reducing the number of false alarms. This work is devoted to designing a multistage neural network ensemble that achieves this balance which reduces rework and scrape without reducing productivity. The ensemble under focus is composed of a series of neural network stages and a series of decision points. Initially, this work compared using multidecision points and single-decision point on the performance of the ANN which showed that multidecision points are highly preferable to single-decision points. This work also tested the effect of population percentages on the ANN and used this to optimize the ANN's performance. Also this work used optimized and nonoptimized ANNs in an ensemble and proved that using nonoptimized ANN may reduce the performance of the ensemble. The ensemble that used only optimized ANNs has improved performance over individual ANNs and three-sigma level rule. In that respect using the designed ensemble can help in reducing the number of false stops and increasing productivity. It also can be used to discover even small shifts in the mean as early as possible. PMID:26339235

  7. The assisted prediction modelling frame with hybridisation and ensemble for business risk forecasting and an implementation

    NASA Astrophysics Data System (ADS)

    Li, Hui; Hong, Lu-Yao; Zhou, Qing; Yu, Hai-Jie

    2015-08-01

    The business failure of numerous companies results in financial crises. The high social costs associated with such crises have made people to search for effective tools for business risk prediction, among which, support vector machine is very effective. Several modelling means, including single-technique modelling, hybrid modelling, and ensemble modelling, have been suggested in forecasting business risk with support vector machine. However, existing literature seldom focuses on the general modelling frame for business risk prediction, and seldom investigates performance differences among different modelling means. We reviewed researches on forecasting business risk with support vector machine, proposed the general assisted prediction modelling frame with hybridisation and ensemble (APMF-WHAE), and finally, investigated the use of principal components analysis, support vector machine, random sampling, and group decision, under the general frame in forecasting business risk. Under the APMF-WHAE frame with support vector machine as the base predictive model, four specific predictive models were produced, namely, pure support vector machine, a hybrid support vector machine involved with principal components analysis, a support vector machine ensemble involved with random sampling and group decision, and an ensemble of hybrid support vector machine using group decision to integrate various hybrid support vector machines on variables produced from principle components analysis and samples from random sampling. The experimental results indicate that hybrid support vector machine and ensemble of hybrid support vector machines were able to produce dominating performance than pure support vector machine and support vector machine ensemble.

  8. Characterizing and visualizing predictive uncertainty in numerical ensembles through Bayesian model averaging.

    PubMed

    Gosink, Luke; Bensema, Kevin; Pulsipher, Trenton; Obermaier, Harald; Henry, Michael; Childs, Hank; Joy, Kenneth I

    2013-12-01

    Numerical ensemble forecasting is a powerful tool that drives many risk analysis efforts and decision making tasks. These ensembles are composed of individual simulations that each uniquely model a possible outcome for a common event of interest: e.g., the direction and force of a hurricane, or the path of travel and mortality rate of a pandemic. This paper presents a new visual strategy to help quantify and characterize a numerical ensemble's predictive uncertainty: i.e., the ability for ensemble constituents to accurately and consistently predict an event of interest based on ground truth observations. Our strategy employs a Bayesian framework to first construct a statistical aggregate from the ensemble. We extend the information obtained from the aggregate with a visualization strategy that characterizes predictive uncertainty at two levels: at a global level, which assesses the ensemble as a whole, as well as a local level, which examines each of the ensemble's constituents. Through this approach, modelers are able to better assess the predictive strengths and weaknesses of the ensemble as a whole, as well as individual models. We apply our method to two datasets to demonstrate its broad applicability.

  9. China PEACE risk estimation tool for in-hospital death from acute myocardial infarction: an early risk classification tree for decisions about fibrinolytic therapy

    PubMed Central

    Li, Xi; Li, Jing; Masoudi, Frederick A; Spertus, John A; Lin, Zhenqiu; Krumholz, Harlan M; Jiang, Lixin

    2016-01-01

    Objectives As the predominant approach to acute reperfusion for ST segment elevation myocardial infarction (STEMI) in many countries, fibrinolytic therapy provides a relative risk reduction for death of ∼16% across the range of baseline risk. For patients with low baseline mortality risk, fibrinolytic therapy may therefore provide little benefit, which may be offset by the risk of major bleeding. We aimed to construct a tool to determine if it is possible to identify a low-risk group among fibrinolytic therapy-eligible patients. Design Cross-sectional study. Setting The China Patient-centered Evaluative Assessment of Cardiac Events (PEACE) study includes a nationally representative retrospective sample of patients admitted with acute myocardial infarction (AMI) in 162 hospitals. Participants 3741 patients with STEMI who were fibrinolytic-eligible but did not receive reperfusion therapy. Main outcome measures In-hospital mortality, which was defined as a composite of death occurring within hospitalisation or withdrawal from treatment due to a terminal status at discharge. Results In the study cohort, the in-hospital mortality was 14.7%. In the derivation cohort and the validation cohort, the combination of systolic blood pressure (≥100 mm Hg), age (<60 years old) and gender (male) identified one-fifth of the cohort with an average mortality rate of <3.0%. Half of this low risk group—those with non-anterior AMI—had an average in-hospital death risk of 1.5%. Conclusions Nearly, one in five patients with STEMI who are eligible for fibrinolytic therapy are at a low risk for in-hospital death. Three simple factors available at the time of presentation can identify these individuals and support decision-making about the use of fibrinolytic therapy. Trial registration number NCT01624883. PMID:27798032

  10. Graphic Representations as Tools for Decision Making.

    ERIC Educational Resources Information Center

    Howard, Judith

    2001-01-01

    Focuses on the use of graphic representations to enable students to improve their decision making skills in the social studies. Explores three visual aids used in assisting students with decision making: (1) the force field; (2) the decision tree; and (3) the decision making grid. (CMK)

  11. Ensembl genomes 2016: more genomes, more complexity

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent...

  12. African Drum and Steel Pan Ensembles.

    ERIC Educational Resources Information Center

    Sunkett, Mark E.

    2000-01-01

    Discusses how to develop both African drum and steel pan ensembles providing information on teacher preparation, instrument choice, beginning the ensemble, and lesson planning. Includes additional information for the drum ensembles. Lists references and instructional materials, sources of drums and pans, and common note layout/range for steel pan…

  13. Modeling of stage-discharge relationship for Gharraf River, southern Iraq using backpropagation artificial neural networks, M5 decision trees, and Takagi-Sugeno inference system technique: a comparative study

    NASA Astrophysics Data System (ADS)

    Al-Abadi, Alaa M.

    2016-11-01

    The potential of using three different data-driven techniques namely, multilayer perceptron with backpropagation artificial neural network (MLP), M5 decision tree model, and Takagi-Sugeno (TS) inference system for mimic stage-discharge relationship at Gharraf River system, southern Iraq has been investigated and discussed in this study. The study used the available stage and discharge data for predicting discharge using different combinations of stage, antecedent stages, and antecedent discharge values. The models' results were compared using root mean squared error (RMSE) and coefficient of determination ( R 2) error statistics. The results of the comparison in testing stage reveal that M5 and Takagi-Sugeno techniques have certain advantages for setting up stage-discharge than multilayer perceptron artificial neural network. Although the performance of TS inference system was very close to that for M5 model in terms of R 2, the M5 method has the lowest RMSE (8.10 m3/s). The study implies that both M5 and TS inference systems are promising tool for identifying stage-discharge relationship in the study area.

  14. Statistical Analysis of Protein Ensembles

    NASA Astrophysics Data System (ADS)

    Máté, Gabriell; Heermann, Dieter

    2014-04-01

    As 3D protein-configuration data is piling up, there is an ever-increasing need for well-defined, mathematically rigorous analysis approaches, especially that the vast majority of the currently available methods rely heavily on heuristics. We propose an analysis framework which stems from topology, the field of mathematics which studies properties preserved under continuous deformations. First, we calculate a barcode representation of the molecules employing computational topology algorithms. Bars in this barcode represent different topological features. Molecules are compared through their barcodes by statistically determining the difference in the set of their topological features. As a proof-of-principle application, we analyze a dataset compiled of ensembles of different proteins, obtained from the Ensemble Protein Database. We demonstrate that our approach correctly detects the different protein groupings.

  15. SRNL PARTICIPATION IN THE MULTI-SCALE ENSEMBLE EXERCISES

    SciTech Connect

    Buckley, R

    2007-10-29

    Consequence assessment during emergency response often requires atmospheric transport and dispersion modeling to guide decision making. A statistical analysis of the ensemble of results from several models is a useful way of estimating the uncertainty for a given forecast. ENSEMBLE is a European Union program that utilizes an internet-based system to ingest transport results from numerous modeling agencies. A recent set of exercises required output on three distinct spatial and temporal scales. The Savannah River National Laboratory (SRNL) uses a regional prognostic model nested within a larger-scale synoptic model to generate the meteorological conditions which are in turn used in a Lagrangian particle dispersion model. A discussion of SRNL participation in these exercises is given, with particular emphasis on requirements for provision of results in a timely manner with regard to the various spatial scales.

  16. Ensemble Learning Approaches to Predicting Complications of Blood Transfusion

    PubMed Central

    Murphree, Dennis; Ngufor, Che; Upadhyaya, Sudhindra; Madde, Nagesh; Clifford, Leanne; Kor, Daryl J.; Pathak, Jyotishman

    2016-01-01

    Of the 21 million blood components transfused in the United States during 2011, approximately 1 in 414 resulted in complication [1]. Two complications in particular, transfusion-related acute lung injury (TRALI) and transfusion-associated circulatory overload (TACO), are especially concerning. These two alone accounted for 62% of reported transfusion-related fatalities in 2013 [2]. We have previously developed a set of machine learning base models for predicting the likelihood of these adverse reactions, with a goal towards better informing the clinician prior to a transfusion decision. Here we describe recent work incorporating ensemble learning approaches to predicting TACO/TRALI. In particular we describe combining base models via majority voting, stacking of model sets with varying diversity, as well as a resampling/boosting combination algorithm called RUSBoost. We find that while the performance of many models is very good, the ensemble models do not yield significantly better performance in terms of AUC. PMID:26737958

  17. Statistical Ensemble of Large Eddy Simulations

    NASA Technical Reports Server (NTRS)

    Carati, Daniele; Rogers, Michael M.; Wray, Alan A.; Mansour, Nagi N. (Technical Monitor)

    2001-01-01

    A statistical ensemble of large eddy simulations (LES) is run simultaneously for the same flow. The information provided by the different large scale velocity fields is used to propose an ensemble averaged version of the dynamic model. This produces local model parameters that only depend on the statistical properties of the flow. An important property of the ensemble averaged dynamic procedure is that it does not require any spatial averaging and can thus be used in fully inhomogeneous flows. Also, the ensemble of LES's provides statistics of the large scale velocity that can be used for building new models for the subgrid-scale stress tensor. The ensemble averaged dynamic procedure has been implemented with various models for three flows: decaying isotropic turbulence, forced isotropic turbulence, and the time developing plane wake. It is found that the results are almost independent of the number of LES's in the statistical ensemble provided that the ensemble contains at least 16 realizations.

  18. Short-term optimal operation of water systems using ensemble forecasts

    NASA Astrophysics Data System (ADS)

    Raso, L.; Schwanenberg, D.; van de Giesen, N. C.; van Overloop, P. J.

    2014-09-01

    Short-term water system operation can be realized using Model Predictive Control (MPC). MPC is a method for operational management of complex dynamic systems. Applied to open water systems, MPC provides integrated, optimal, and proactive management, when forecasts are available. Notwithstanding these properties, if forecast uncertainty is not properly taken into account, the system performance can critically deteriorate. Ensemble forecast is a way to represent short-term forecast uncertainty. An ensemble forecast is a set of possible future trajectories of a meteorological or hydrological system. The growing ensemble forecasts’ availability and accuracy raises the question on how to use them for operational management. The theoretical innovation presented here is the use of ensemble forecasts for optimal operation. Specifically, we introduce a tree based approach. We called the new method Tree-Based Model Predictive Control (TB-MPC). In TB-MPC, a tree is used to set up a Multistage Stochastic Programming, which finds a different optimal strategy for each branch and enhances the adaptivity to forecast uncertainty. Adaptivity reduces the sensitivity to wrong forecasts and improves the operational performance. TB-MPC is applied to the operational management of Salto Grande reservoir, located at the border between Argentina and Uruguay, and compared to other methods.

  19. Dimensionality Reduction Through Classifier Ensembles

    NASA Technical Reports Server (NTRS)

    Oza, Nikunj C.; Tumer, Kagan; Norwig, Peter (Technical Monitor)

    1999-01-01

    In data mining, one often needs to analyze datasets with a very large number of attributes. Performing machine learning directly on such data sets is often impractical because of extensive run times, excessive complexity of the fitted model (often leading to overfitting), and the well-known "curse of dimensionality." In practice, to avoid such problems, feature selection and/or extraction are often used to reduce data dimensionality prior to the learning step. However, existing feature selection/extraction algorithms either evaluate features by their effectiveness across the entire data set or simply disregard class information altogether (e.g., principal component analysis). Furthermore, feature extraction algorithms such as principal components analysis create new features that are often meaningless to human users. In this article, we present input decimation, a method that provides "feature subsets" that are selected for their ability to discriminate among the classes. These features are subsequently used in ensembles of classifiers, yielding results superior to single classifiers, ensembles that use the full set of features, and ensembles based on principal component analysis on both real and synthetic datasets.

  20. Image Segmentation Using Hierarchical Merge Tree.

    PubMed

    Liu, Ting; Seyedhosseini, Mojtaba; Tasdizen, Tolga

    2016-07-18

    This paper investigates one of the most fundamental computer vision problems: image segmentation. We propose a supervised hierarchical approach to object-independent image segmentation. Starting with over-segmenting superpixels, we use a tree structure to represent the hierarchy of region merging, by which we reduce the problem of segmenting image regions to finding a set of label assignment to tree nodes. We formulate the tree structure as a constrained conditional model to associate region merging with likelihoods predicted using an ensemble boundary classifier. Final segmentations can then be inferred by finding globally optimal solutions to the model efficiently. We also present an iterative training and testing algorithm that generates various tree structures and combines them to emphasize accurate boundaries by segmentation accumulation. Experiment results and comparisons with other recent methods on six public data sets demonstrate that our approach achieves state-of-the-art region accuracy and is competitive in image segmentation without semantic priors.

  1. Audubon Tree Study Program.

    ERIC Educational Resources Information Center

    National Audubon Society, New York, NY.

    Included are an illustrated student reader, "The Story of Trees," a leaders' guide, and a large tree chart with 37 colored pictures. The student reader reviews several aspects of trees: a definition of a tree; where and how trees grow; flowers, pollination and seed production; how trees make their food; how to recognize trees; seasonal changes;…

  2. Machine Learning Through Signature Trees. Applications to Human Speech.

    ERIC Educational Resources Information Center

    White, George M.

    A signature tree is a binary decision tree used to classify unknown patterns. An attempt was made to develop a computer program for manipulating signature trees as a general research tool for exploring machine learning and pattern recognition. The program was applied to the problem of speech recognition to test its effectiveness for a specific…

  3. Gradient Flow and Scale Setting on MILC HISQ Ensembles

    SciTech Connect

    Bazavov, A.; Bernard, C.; Brown, N.; Komijani, J.; DeTar, C.; Foley, J.; Levkova, L.; Gottlieb, Steven; Heller, U. M.; Laiho, J.; Sugar, R. L.; Toussaint, D.; Van de Water, R. S.

    2016-05-25

    We report on a scale determination with gradient-flow techniques on the Nf = 2 + 1 + 1 HISQ ensembles generated by the MILC collaboration. The ensembles include four lattice spacings, ranging from approximately 0.15 to 0.06 fm, and both physical and unphysical values of the quark masses. The scales p √t0/a and w0/a and their tree-level improvements,√t0;imp and w0;imp, are computed on each ensemble using Symanzik ow and the cloverleaf definition of the energy density E. Using a combination of continuum chiral perturbation theory and a Taylor-series ansatz for the lattice-spacing and strong-coupling dependence, the results are simultaneously extrapolated to the continuum and interpolated to physical quark masses. We also determine the scales p t0 = 0:1416(+8-5) fm and w0 = 0:1717(+12-11) fm, where the errors are sums, in quadrature, of statistical and all systematic errors. The precision of w0 and √t0 is comparable to or more precise than the best previous estimates, respectively. We also find the continuum mass-dependence of w0 that will be useful for estimating the scales of other ensembles. Furthermore, we estimate the integrated autocorrelation length of . For long flow times, the autocorrelation length of appears to be comparable to or smaller than that of the topological charge.

  4. Gradient flow and scale setting on MILC HISQ ensembles

    NASA Astrophysics Data System (ADS)

    Bazavov, A.; Bernard, C.; Brown, N.; Komijani, J.; DeTar, C.; Foley, J.; Levkova, L.; Gottlieb, Steven; Heller, U. M.; Laiho, J.; Sugar, R. L.; Toussaint, D.; Van de Water, R. S.; MILC Collaboration

    2016-05-01

    We report on a scale determination with gradient-flow techniques on the Nf=2 +1 +1 highly improved staggered quark ensembles generated by the MILC Collaboration. The ensembles include four lattice spacings, ranging from approximately 0.15 to 0.06 fm, and both physical and unphysical values of the quark masses. The scales √{t0}/a and w0/a and their tree-level improvements, √{t0 ,imp} and w0 ,imp, are computed on each ensemble using Symanzik flow and the cloverleaf definition of the energy density E . Using a combination of continuum chiral-perturbation theory and a Taylor-series ansatz for the lattice-spacing and strong-coupling dependence, the results are simultaneously extrapolated to the continuum and interpolated to physical quark masses. We determine the scales √{t0 }=0.1416 (+8/-5) fm and w0=0.1714 (+15/-12) fm , where the errors are sums, in quadrature, of statistical and all systematic errors. The precision of w0 and √{t0} is comparable to or more precise than the best previous estimates, respectively. We then find the continuum mass dependence of √{t0} and w0, which will be useful for estimating the scales of new ensembles. We also estimate the integrated autocorrelation length of ⟨E (t )⟩. For long flow times, the autocorrelation length of ⟨E ⟩ appears to be comparable to that of the topological charge.

  5. Developing Climate-Informed Ensemble Streamflow Forecasts over the Colorado River Basin

    NASA Astrophysics Data System (ADS)

    Miller, W. P.; Lhotak, J.; Werner, K.; Stokes, M.

    2014-12-01

    As climate change is realized, the assumption of hydrometeorologic stationarity embedded within many hydrologic models is no longer valid over the Colorado River Basin. As such, resource managers have begun to request more information to support decisions, specifically with regards to the incorporation of climate change information and operational risk. To this end, ensemble methodologies have become increasingly popular among the scientific and forecasting communities, and resource managers have begun to incorporate this information into decision support tools and operational models. Over the Colorado River Basin, reservoir operations are determined, in large part, by forecasts issued by the Colorado Basin River Forecast Center (CBRFC). The CBRFC produces both single value and ensemble forecasts for use by resource managers in their operational decision-making process. These ensemble forecasts are currently driven by a combination of daily updating model states used as initial conditions and weather forecasts plus historical meteorological information used to generate forecasts with the assumption that past hydroclimatological conditions are representative of future hydroclimatology. Recent efforts have produced updated bias-corrected and spatially downscaled projections of future climate over the Colorado River Basin. In this study, the historical climatology used as input to the CBRFC forecast model is adjusted to represent future projections of climate based on data developed by the updated projections of future climate data. Ensemble streamflow forecasts reflecting the impacts of climate change are then developed. These forecasts are subsequently compared to non-informed ensemble streamflow forecasts to evaluate the changing range of streamflow forecasts and risk over the Colorado River Basin. Ensemble forecasts may be compared through the use of a reservoir operations planning model, providing resource managers with ensemble information regarding changing

  6. Tree harvesting

    SciTech Connect

    Badger, P.C.

    1995-12-31

    Short rotation intensive culture tree plantations have been a major part of biomass energy concepts since the beginning. One aspect receiving less attention than it deserves is harvesting. This article describes an method of harvesting somewhere between agricultural mowing machines and huge feller-bunchers of the pulpwood and lumber industries.

  7. Extended Gibbs ensembles with flow

    SciTech Connect

    Ison, M. J.

    2007-11-15

    A recently proposed [Ph. Chomaz, F. Gulminelli, and O. Juillet, Ann. Phys. (Paris) 320, 135 (2005)] statistical treatment of finite unbound systems in the presence of collective motions is applied to a classical Lennard-Jones system, numerically simulated through molecular dynamics. In the ideal gas limit, the flow dynamics can be exactly recast into effective time-dependent Lagrange parameters acting on a standard Gibbs ensemble with an extra total energy conservation constraint. Using this same ansatz for the low-density freeze-out configurations of an interacting expanding system, we show that the presence of flow can have a sizable effect on the microstate distribution.

  8. Heteroclinic contours in oscillatory ensembles.

    PubMed

    Komarov, M A; Osipov, G V; Zhou, C S

    2013-02-01

    In this work, we study the onset of sequential activity in ensembles of neuronlike oscillators with inhibitorylike coupling between them. The winnerless competition (WLC) principle is a dynamical concept underlying sequential activity generation. According to the WLC principle, stable heteroclinic sequences in the phase space of a network model represent sequential metastable dynamics. We show that stable heteroclinic sequences and stable heteroclinic channels, connecting saddle limit cycles, can appear in oscillatory models of neural activity. We find the key bifurcations which lead to the occurrence of sequential activity as well as heteroclinic sequences and channels.

  9. A Localized Ensemble Kalman Smoother

    NASA Technical Reports Server (NTRS)

    Butala, Mark D.

    2012-01-01

    Numerous geophysical inverse problems prove difficult because the available measurements are indirectly related to the underlying unknown dynamic state and the physics governing the system may involve imperfect models or unobserved parameters. Data assimilation addresses these difficulties by combining the measurements and physical knowledge. The main challenge in such problems usually involves their high dimensionality and the standard statistical methods prove computationally intractable. This paper develops and addresses the theoretical convergence of a new high-dimensional Monte-Carlo approach called the localized ensemble Kalman smoother.

  10. Measuring social interaction in music ensembles.

    PubMed

    Volpe, Gualtiero; D'Ausilio, Alessandro; Badino, Leonardo; Camurri, Antonio; Fadiga, Luciano

    2016-05-05

    Music ensembles are an ideal test-bed for quantitative analysis of social interaction. Music is an inherently social activity, and music ensembles offer a broad variety of scenarios which are particularly suitable for investigation. Small ensembles, such as string quartets, are deemed a significant example of self-managed teams, where all musicians contribute equally to a task. In bigger ensembles, such as orchestras, the relationship between a leader (the conductor) and a group of followers (the musicians) clearly emerges. This paper presents an overview of recent research on social interaction in music ensembles with a particular focus on (i) studies from cognitive neuroscience; and (ii) studies adopting a computational approach for carrying out automatic quantitative analysis of ensemble music performances.

  11. NIMEFI: gene regulatory network inference using multiple ensemble feature importance algorithms.

    PubMed

    Ruyssinck, Joeri; Huynh-Thu, Vân Anh; Geurts, Pierre; Dhaene, Tom; Demeester, Piet; Saeys, Yvan

    2014-01-01

    One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.

  12. Using Bayesian Belief Networks and event trees for volcanic hazard assessment and decision support : reconstruction of past eruptions of La Soufrière volcano, Guadeloupe and retrospective analysis of 1975-77 unrest.

    NASA Astrophysics Data System (ADS)

    Komorowski, Jean-Christophe; Hincks, Thea; Sparks, Steve; Aspinall, Willy; Legendre, Yoann; Boudon, Georges

    2013-04-01

    the contemporary volcanological narrative, and demonstrates that a formal evidential case could have been made to support the authorities' concerns and decision to evacuate. Revisiting the circumstances of the 1976 crisis highlights many contemporary challenges of decision-making under conditions of volcanological uncertainty. We suggest the BBN concept is a suitable framework for marshalling multiple observations, model results and interpretations - and all associated uncertainties - in a methodical manner. Base-rate eruption probabilities for Guadeloupe can be updated now with a new chronology of activity suggesting that 10 major explosive phases and 9 dome-forming phases occurred in the last 9150 years, associated with ≥ 8 flank-collapses and ≥ 6-7 high-energy pyroclastic density currents (blasts). Eruptive recurrence, magnitude and intensity place quantitative constraints on La Soufrière's event tree to elaborate credible scenarios. The current unrest offers an opportunity to update the BBN model and explore the uncertainty on inferences about the system's internal state. This probabilistic formalism would provoke key questions relating to unrest evolution: 1) is the unrest hydrothermal or magmatic? 2) what controls dyke/intrusion arrest and hence failed-magmatic eruptions like 1976? 3) what conditions could lead to significant pressurization with potential for explosive activity and edifice instability, and what monitoring signs might be manifest?

  13. Multi-Model Ensemble Wake Vortex Prediction

    NASA Technical Reports Server (NTRS)

    Koerner, Stephan; Holzaepfel, Frank; Ahmad, Nash'at N.

    2015-01-01

    Several multi-model ensemble methods are investigated for predicting wake vortex transport and decay. This study is a joint effort between National Aeronautics and Space Administration and Deutsches Zentrum fuer Luft- und Raumfahrt to develop a multi-model ensemble capability using their wake models. An overview of different multi-model ensemble methods and their feasibility for wake applications is presented. The methods include Reliability Ensemble Averaging, Bayesian Model Averaging, and Monte Carlo Simulations. The methodologies are evaluated using data from wake vortex field experiments.

  14. Forecast of iceberg ensemble drift

    SciTech Connect

    El-Tahan, M.S.; El-Tahan, H.W.; Venkatesh, S.

    1983-05-01

    The objectives of the study are to gain a better understanding of the characteristics of iceberg motion and the factors controlling iceberg drift, and to develop an iceberg ensemble drift forecast system to be operated by the Canadian Atmospheric Environment Service. An extensive review of field and theoretical studies on iceberg behaviour, and the factors controlling iceberg motion has been carried out. Long term and short term behaviour of icebergs are critically examined. A quantitative assessment of the effects of the factors controlling iceberg motion is presented. The study indicated that wind and currents are the primary driving forces. Coriolis Force and ocean surface slope also have significant effects. As for waves, only the higher waves have a significant effect. Iceberg drift is also affected by iceberg size characteristics. Based on the findings of the study a comprehensive computerized forecast system to predict the drift of iceberg ensembles off Canada's east coast has been designed. The expected accuracy of the forecast system is discussed and recommendations are made for future improvements to the system.

  15. Technical Tree Climbing.

    ERIC Educational Resources Information Center

    Jenkins, Peter

    Tree climbing offers a safe, inexpensive adventure sport that can be performed almost anywhere. Using standard procedures practiced in tree surgery or rock climbing, almost any tree can be climbed. Tree climbing provides challenge and adventure as well as a vigorous upper-body workout. Tree Climbers International classifies trees using a system…

  16. Exotic trees.

    PubMed

    Burda, Z; Erdmann, J; Petersson, B; Wattenberg, M

    2003-02-01

    We discuss the scaling properties of free branched polymers. The scaling behavior of the model is classified by the Hausdorff dimensions for the internal geometry, d(L) and d(H), and for the external one, D(L) and D(H). The dimensions d(H) and D(H) characterize the behavior for long distances, while d(L) and D(L) for short distances. We show that the internal Hausdorff dimension is d(L)=2 for generic and scale-free trees, contrary to d(H), which is known to be equal to 2 for generic trees and to vary between 2 and infinity for scale-free trees. We show that the external Hausdorff dimension D(H) is directly related to the internal one as D(H)=alphad(H), where alpha is the stability index of the embedding weights for the nearest-vertex interactions. The index is alpha=2 for weights from the Gaussian domain of attraction and 0

  17. GACEM: Genetic Algorithm Based Classifier Ensemble in a Multi-sensor System

    PubMed Central

    Xu, Rongwu; He, Lin

    2008-01-01

    Multi-sensor systems (MSS) have been increasingly applied in pattern classification while searching for the optimal classification framework is still an open problem. The development of the classifier ensemble seems to provide a promising solution. The classifier ensemble is a learning paradigm where many classifiers are jointly used to solve a problem, which has been proven an effective method for enhancing the classification ability. In this paper, by introducing the concept of Meta-feature (MF) and Trans-function (TF) for describing the relationship between the nature and the measurement of the observed phenomenon, classification in a multi-sensor system can be unified in the classifier ensemble framework. Then an approach called Genetic Algorithm based Classifier Ensemble in Multi-sensor system (GACEM) is presented, where a genetic algorithm is utilized for optimization of both the selection of features subset and the decision combination simultaneously. GACEM trains a number of classifiers based on different combinations of feature vectors at first and then selects the classifiers whose weight is higher than the pre-set threshold to make up the ensemble. An empirical study shows that, compared with the conventional feature-level voting and decision-level voting, not only can GACEM achieve better and more robust performance, but also simplify the system markedly. PMID:27873866

  18. Ensemble perception of size in 4-5-year-old children.

    PubMed

    Sweeny, Timothy D; Wurnitsch, Nicole; Gopnik, Alison; Whitney, David

    2015-07-01

    Groups of objects are nearly everywhere we look. Adults can perceive and understand the 'gist' of multiple objects at once, engaging ensemble-coding mechanisms that summarize a group's overall appearance. Are these group-perception mechanisms in place early in childhood? Here, we provide the first evidence that 4-5-year-old children use ensemble coding to perceive the average size of a group of objects. Children viewed a pair of trees, with each containing a group of differently sized oranges. We found that, in order to determine which tree had the larger oranges overall, children integrated the sizes of multiple oranges into ensemble representations. This pooling occurred rapidly, and it occurred despite conflicting information from numerosity, continuous extent, density, and contrast. An ideal observer analysis showed that although children's integration mechanisms are sensitive, they are not yet as efficient as adults'. Overall, our results provide a new insight into the way children see and understand the environment, and they illustrate the fundamental nature of ensemble coding in visual perception.

  19. SIMULATION OF THE ICELAND VOLCANIC ERUPTION OF APRIL 2010 USING THE ENSEMBLE SYSTEM

    SciTech Connect

    Buckley, R.

    2011-05-10

    The Eyjafjallajokull volcanic eruption in Iceland in April 2010 disrupted transportation in Europe which ultimately affected travel plans for many on a global basis. The Volcanic Ash Advisory Centre (VAAC) is responsible for providing guidance to the aviation industry of the transport of volcanic ash clouds. There are nine such centers located globally, and the London branch (headed by the United Kingdom Meteorological Office, or UKMet) was responsible for modeling the Iceland volcano. The guidance provided by the VAAC created some controversy due to the burdensome travel restrictions and uncertainty involved in the prediction of ash transport. The Iceland volcanic eruption provides a useful exercise of the European ENSEMBLE program, coordinated by the Joint Research Centre (JRC) in Ispra, Italy. ENSEMBLE, a decision support system for emergency response, uses transport model results from a variety of countries in an effort to better understand the uncertainty involved with a given accident scenario. Model results in the form of airborne concentration and surface deposition are required from each member of the ensemble in a prescribed format that may then be uploaded to a website for manipulation. The Savannah River National Laboratory (SRNL) is the lone regular United States participant throughout the 10-year existence of ENSEMBLE. For the Iceland volcano, four separate source term estimates have been provided to ENSEMBLE participants. This paper focuses only on one of those source terms. The SRNL results in relation to other modeling agency results along with useful information obtained using an ensemble of transport results will be discussed.

  20. Applications of Bayesian Procrustes shape analysis to ensemble radar reflectivity nowcast verification

    NASA Astrophysics Data System (ADS)

    Fox, Neil I.; Micheas, Athanasios C.; Peng, Yuqiang

    2016-07-01

    This paper introduces the use of Bayesian full Procrustes shape analysis in object-oriented meteorological applications. In particular, the Procrustes methodology is used to generate mean forecast precipitation fields from a set of ensemble forecasts. This approach has advantages over other ensemble averaging techniques in that it can produce a forecast that retains the morphological features of the precipitation structures and present the range of forecast outcomes represented by the ensemble. The production of the ensemble mean avoids the problems of smoothing that result from simple pixel or cell averaging, while producing credible sets that retain information on ensemble spread. Also in this paper, the full Bayesian Procrustes scheme is used as an object verification tool for precipitation forecasts. This is an extension of a previously presented Procrustes shape analysis based verification approach into a full Bayesian format designed to handle the verification of precipitation forecasts that match objects from an ensemble of forecast fields to a single truth image. The methodology is tested on radar reflectivity nowcasts produced in the Warning Decision Support System - Integrated Information (WDSS-II) by varying parameters in the K-means cluster tracking scheme.

  1. What Makes a Tree a Tree?

    ERIC Educational Resources Information Center

    NatureScope, 1986

    1986-01-01

    Provides: (1) background information on trees, focusing on the parts of trees and how they differ from other plants; (2) eight activities; and (3) ready-to-copy pages dealing with tree identification and tree rings. Activities include objective(s), recommended age level(s), subject area(s), list of materials needed, and procedures. (JN)

  2. Layered Ensemble Architecture for Time Series Forecasting.

    PubMed

    Rahman, Md Mustafizur; Islam, Md Monirul; Murase, Kazuyuki; Yao, Xin

    2016-01-01

    Time series forecasting (TSF) has been widely used in many application areas such as science, engineering, and finance. The phenomena generating time series are usually unknown and information available for forecasting is only limited to the past values of the series. It is, therefore, necessary to use an appropriate number of past values, termed lag, for forecasting. This paper proposes a layered ensemble architecture (LEA) for TSF problems. Our LEA consists of two layers, each of which uses an ensemble of multilayer perceptron (MLP) networks. While the first ensemble layer tries to find an appropriate lag, the second ensemble layer employs the obtained lag for forecasting. Unlike most previous work on TSF, the proposed architecture considers both accuracy and diversity of the individual networks in constructing an ensemble. LEA trains different networks in the ensemble by using different training sets with an aim of maintaining diversity among the networks. However, it uses the appropriate lag and combines the best trained networks to construct the ensemble. This indicates LEAs emphasis on accuracy of the networks. The proposed architecture has been tested extensively on time series data of neural network (NN)3 and NN5 competitions. It has also been tested on several standard benchmark time series data. In terms of forecasting accuracy, our experimental results have revealed clearly that LEA is better than other ensemble and nonensemble methods.

  3. Visual stimuli recruit intrinsically generated cortical ensembles.

    PubMed

    Miller, Jae-eun Kang; Ayzenshtat, Inbal; Carrillo-Reid, Luis; Yuste, Rafael

    2014-09-23

    The cortical microcircuit is built with recurrent excitatory connections, and it has long been suggested that the purpose of this design is to enable intrinsically driven reverberating activity. To understand the dynamics of neocortical intrinsic activity better, we performed two-photon calcium imaging of populations of neurons from the primary visual cortex of awake mice during visual stimulation and spontaneous activity. In both conditions, cortical activity is dominated by coactive groups of neurons, forming ensembles whose activation cannot be explained by the independent firing properties of their contributing neurons, considered in isolation. Moreover, individual neurons flexibly join multiple ensembles, vastly expanding the encoding potential of the circuit. Intriguingly, the same coactive ensembles can repeat spontaneously and in response to visual stimuli, indicating that stimulus-evoked responses arise from activating these intrinsic building blocks. Although the spatial properties of stimulus-driven and spontaneous ensembles are similar, spontaneous ensembles are active at random intervals, whereas visually evoked ensembles are time-locked to stimuli. We conclude that neuronal ensembles, built by the coactivation of flexible groups of neurons, are emergent functional units of cortical activity and propose that visual stimuli recruit intrinsically generated ensembles to represent visual attributes.

  4. Fine-Tuning Your Ensemble's Jazz Style.

    ERIC Educational Resources Information Center

    Garcia, Antonio J.

    1991-01-01

    Proposes instructional strategies for directors of jazz groups, including guidelines for developing of skills necessary for good performance. Includes effective methods for positive changes in ensemble style. Addresses jazz group problems such as beat, tempo, staying in tune, wind power, and solo/ensemble lines. Discusses percussionists, bassists,…

  5. Predicting the predictive power of IDP ensembles.

    PubMed

    Tompa, Peter; Varadi, Mihaly

    2014-02-04

    The function of intrinsically disordered proteins may be interpreted in terms of their structural ensembles. The article by Schwalbe and colleagues in this issue of Structure combines NMR and SAXS constraints to generate structural ensembles that unveil important functional and pathological features.

  6. Combining Structural Modeling with Ensemble Machine Learning to Accurately Predict Protein Fold Stability and Binding Affinity Effects upon Mutation

    PubMed Central

    Garcia Lopez, Sebastian; Kim, Philip M.

    2014-01-01

    Advances in sequencing have led to a rapid accumulation of mutations, some of which are associated with diseases. However, to draw mechanistic conclusions, a biochemical understanding of these mutations is necessary. For coding mutations, accurate prediction of significant changes in either the stability of proteins or their affinity to their binding partners is required. Traditional methods have used semi-empirical force fields, while newer methods employ machine learning of sequence and structural features. Here, we show how combining both of these approaches leads to a marked boost in accuracy. We introduce ELASPIC, a novel ensemble machine learning approach that is able to predict stability effects upon mutation in both, domain cores and domain-domain interfaces. We combine semi-empirical energy terms, sequence conservation, and a wide variety of molecular details with a Stochastic Gradient Boosting of Decision Trees (SGB-DT) algorithm. The accuracy of our predictions surpasses existing methods by a considerable margin, achieving correlation coefficients of 0.77 for stability, and 0.75 for affinity predictions. Notably, we integrated homology modeling to enable proteome-wide prediction and show that accurate prediction on modeled structures is possible. Lastly, ELASPIC showed significant differences between various types of disease-associated mutations, as well as between disease and common neutral mutations. Unlike pure sequence-based prediction methods that try to predict phenotypic effects of mutations, our predictions unravel the molecular details governing the protein instability, and help us better understand the molecular causes of diseases. PMID:25243403

  7. Perception of ensemble statistics requires attention.

    PubMed

    Jackson-Nielsen, Molly; Cohen, Michael A; Pitts, Michael A

    2017-02-01

    To overcome inherent limitations in perceptual bandwidth, many aspects of the visual world are represented as summary statistics (e.g., average size, orientation, or density of objects). Here, we investigated the relationship between summary (ensemble) statistics and visual attention. Recently, it was claimed that one ensemble statistic in particular, color diversity, can be perceived without focal attention. However, a broader debate exists over the attentional requirements of conscious perception, and it is possible that some form of attention is necessary for ensemble perception. To test this idea, we employed a modified inattentional blindness paradigm and found that multiple types of summary statistics (color and size) often go unnoticed without attention. In addition, we found attentional costs in dual-task situations, further implicating a role for attention in statistical perception. Overall, we conclude that while visual ensembles may be processed efficiently, some amount of attention is necessary for conscious perception of ensemble statistics.

  8. Medium Range Ensembles Flood Forecasts for Community Level Applications

    NASA Astrophysics Data System (ADS)

    Fakhruddin, S.; Kawasaki, A.; Babel, M. S.; AIT

    2013-05-01

    Early warning is a key element for disaster risk reduction. In recent decades, there has been a major advancement in medium range and seasonal forecasting. These could provide a great opportunity to improve early warning systems and advisories for early action for strategic and long term planning. This could result in increasing emphasis on proactive rather than reactive management of adverse consequences of flood events. This can be also very helpful for the agricultural sector by providing a diversity of options to farmers (e.g. changing cropping pattern, planting timing, etc.). An experimental medium range (1-10 days) flood forecasting model has been developed for Bangladesh which provides 51 set of discharge ensembles forecasts of one to ten days with significant persistence and high certainty. This could help communities (i.e. farmer) for gain/lost estimation as well as crop savings. This paper describe the application of ensembles probabilistic flood forecast at the community level for differential decision making focused on agriculture. The framework allows users to interactively specify the objectives and criteria that are germane to a particular situation, and obtain the management options that are possible, and the exogenous influences that should be taken into account before planning and decision making. risk and vulnerability assessment was conducted through community consultation. The forecast lead time requirement, users' needs, impact and management options for crops, livestock and fisheries sectors were identified through focus group discussions, informal interviews and questionnaire survey.

  9. Modeling Dynamic Systems with Efficient Ensembles of Process-Based Models

    PubMed Central

    Simidjievski, Nikola; Todorovski, Ljupčo; Džeroski, Sašo

    2016-01-01

    Ensembles are a well established machine learning paradigm, leading to accurate and robust models, predominantly applied to predictive modeling tasks. Ensemble models comprise a finite set of diverse predictive models whose combined output is expected to yield an improved predictive performance as compared to an individual model. In this paper, we propose a new method for learning ensembles of process-based models of dynamic systems. The process-based modeling paradigm employs domain-specific knowledge to automatically learn models of dynamic systems from time-series observational data. Previous work has shown that ensembles based on sampling observational data (i.e., bagging and boosting), significantly improve predictive performance of process-based models. However, this improvement comes at the cost of a substantial increase of the computational time needed for learning. To address this problem, the paper proposes a method that aims at efficiently learning ensembles of process-based models, while maintaining their accurate long-term predictive performance. This is achieved by constructing ensembles with sampling domain-specific knowledge instead of sampling data. We apply the proposed method to and evaluate its performance on a set of problems of automated predictive modeling in three lake ecosystems using a library of process-based knowledge for modeling population dynamics. The experimental results identify the optimal design decisions regarding the learning algorithm. The results also show that the proposed ensembles yield significantly more accurate predictions of population dynamics as compared to individual process-based models. Finally, while their predictive performance is comparable to the one of ensembles obtained with the state-of-the-art methods of bagging and boosting, they are substantially more efficient. PMID:27078633

  10. A Hyper-Heuristic Ensemble Method for Static Job-Shop Scheduling.

    PubMed

    Hart, Emma; Sim, Kevin

    2016-01-01

    We describe a new hyper-heuristic method NELLI-GP for solving job-shop scheduling problems (JSSP) that evolves an ensemble of heuristics. The ensemble adopts a divide-and-conquer approach in which each heuristic solves a unique subset of the instance set considered. NELLI-GP extends an existing ensemble method called NELLI by introducing a novel heuristic generator that evolves heuristics composed of linear sequences of dispatching rules: each rule is represented using a tree structure and is itself evolved. Following a training period, the ensemble is shown to outperform both existing dispatching rules and a standard genetic programming algorithm on a large set of new test instances. In addition, it obtains superior results on a set of 210 benchmark problems from the literature when compared to two state-of-the-art hyper-heuristic approaches. Further analysis of the relationship between heuristics in the evolved ensemble and the instances each solves provides new insights into features that might describe similar instances.

  11. Room-temperature and temperature-dependent QSRR modelling for predicting the nitrate radical reaction rate constants of organic chemicals using ensemble learning methods.

    PubMed

    Gupta, S; Basant, N; Mohan, D; Singh, K P

    2016-07-01

    Experimental determinations of the rate constants of the reaction of NO3 with a large number of organic chemicals are tedious, and time and resource intensive; and the development of computational methods has widely been advocated. In this study, we have developed room-temperature (298 K) and temperature-dependent quantitative structure-reactivity relationship (QSRR) models based on the ensemble learning approaches (decision tree forest (DTF) and decision treeboost (DTB)) for predicting the rate constant of the reaction of NO3 radicals with diverse organic chemicals, under OECD guidelines. Predictive powers of the developed models were established in terms of statistical coefficients. In the test phase, the QSRR models yielded a correlation (r(2)) of >0.94 between experimental and predicted rate constants. The applicability domains of the constructed models were determined. An attempt has been made to provide the mechanistic interpretation of the selected features for QSRR development. The proposed QSRR models outperformed the previous reports, and the temperature-dependent models offered a much wider applicability domain. This is the first report presenting a temperature-dependent QSRR model for predicting the nitrate radical reaction rate constant at different temperatures. The proposed models can be useful tools in predicting the reactivities of chemicals towards NO3 radicals in the atmosphere, hence, their persistence and exposure risk assessment.

  12. Hybrid Data Assimilation without Ensemble Filtering

    NASA Technical Reports Server (NTRS)

    Todling, Ricardo; Akkraoui, Amal El

    2014-01-01

    The Global Modeling and Assimilation Office is preparing to upgrade its three-dimensional variational system to a hybrid approach in which the ensemble is generated using a square-root ensemble Kalman filter (EnKF) and the variational problem is solved using the Grid-point Statistical Interpolation system. As in most EnKF applications, we found it necessary to employ a combination of multiplicative and additive inflations, to compensate for sampling and modeling errors, respectively and, to maintain the small-member ensemble solution close to the variational solution; we also found it necessary to re-center the members of the ensemble about the variational analysis. During tuning of the filter we have found re-centering and additive inflation to play a considerably larger role than expected, particularly in a dual-resolution context when the variational analysis is ran at larger resolution than the ensemble. This led us to consider a hybrid strategy in which the members of the ensemble are generated by simply converting the variational analysis to the resolution of the ensemble and applying additive inflation, thus bypassing the EnKF. Comparisons of this, so-called, filter-free hybrid procedure with an EnKF-based hybrid procedure and a control non-hybrid, traditional, scheme show both hybrid strategies to provide equally significant improvement over the control; more interestingly, the filter-free procedure was found to give qualitatively similar results to the EnKF-based procedure.

  13. The Tree Worker's Manual.

    ERIC Educational Resources Information Center

    Smithyman, S. J.

    This manual is designed to prepare students for entry-level positions as tree care professionals. Addressed in the individual chapters of the guide are the following topics: the tree service industry; clothing, eqiupment, and tools; tree workers; basic tree anatomy; techniques of pruning; procedures for climbing and working in the tree; aerial…

  14. Towards reliable seasonal ensemble streamflow forecasts for ephemeral rivers

    NASA Astrophysics Data System (ADS)

    Bennett, James; Wang, Qj; Li, Ming; Robertson, David

    2016-04-01

    Despite their inherently variable nature, ephemeral rivers are an important water resource in many dry regions. Water managers are likely benefit considerably from even mildly skilful ensemble forecasts of streamflow in ephemeral rivers. As with any ensemble forecast, forecast uncertainty - i.e., the spread of the ensemble - must be reliably quantified to allow users of the forecasts to make well-founded decisions. Correctly quantifying uncertainty in ephemeral rivers is particularly challenging because of the high incidence of zero flows, which are difficult to handle with conventional statistical techniques. Here we apply a seasonal streamflow forecasting system, the model for generating Forecast Guided Stochastic Scenarios (FoGSS), to 26 Australian ephemeral rivers. FoGSS uses post-processed ensemble rainfall forecasts from a coupled ocean-atmosphere prediction system to force an initialised monthly rainfall runoff model, and then applies a staged hydrological error model to describe and propagate hydrological uncertainty in the forecast. FoGSS produces 12-month streamflow forecasts; as forecast skill declines with lead time, the forecasts are designed to transit seamlessly to stochastic scenarios. The ensemble rainfall forecasts used in FoGSS are known to be unbiased and reliable, and we concentrate here on the hydrological error model. The FoGSS error model has several features that make it well suited to forecasting ephemeral rivers. First, FoGSS models the error after data is transformed with a log-sinh transformation. The log-sinh transformation is able to normalise even highly skewed data and homogenise its variance, allowing us to assume that errors are Gaussian. Second, FoGSS handles zero values using data censoring. Data censoring allows streamflow in ephemeral rivers to be treated as a continuous variable, rather than having to model the occurrence of non-zero values and the distribution of non-zero values separately. This greatly simplifies parameter

  15. Statistical Ensembles for Economic Networks

    NASA Astrophysics Data System (ADS)

    Bargigli, Leonardo

    2014-03-01

    Economic networks share with other social networks the fundamental property of sparsity. It is well known that the maximum entropy techniques usually employed to estimate or simulate weighted networks produce unrealistic dense topologies. At the same time, strengths should not be neglected, since they are related to core economic variables like supply and demand. To overcome this limitation, the exponential Bosonic model has been previously extended in order to obtain ensembles where the average degree and strength sequences are simultaneously fixed (conditional geometric model). In this paper a new exponential model, which is the network equivalent of Boltzmann ideal systems, is introduced and then extended to the case of joint degree-strength constraints (conditional Poisson model). Finally, the fitness of these alternative models is tested against a number of networks. While the conditional geometric model generally provides a better goodness-of-fit in terms of log-likelihoods, the conditional Poisson model could nevertheless be preferred whenever it provides a higher similarity with original data. If we are interested instead only in topological properties, the simple Bernoulli model appears to be preferable to the correlated topologies of the two more complex models.

  16. In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches

    SciTech Connect

    Singh, Kunwar P. Gupta, Shikha

    2014-03-15

    Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data, optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the

  17. Teaching the Tools of Pharmaceutical Care Decision-Analysis.

    ERIC Educational Resources Information Center

    Rittenhouse, Brian E.

    1994-01-01

    A method of decision-analysis in pharmaceutical care that integrates epidemiology and economics is presented, including an example illustrating both the deceptive nature of medical decision making and the power of decision analysis. Principles in determining both general and specific probabilities of interest and use of decision trees for…

  18. Additive Similarity Trees

    ERIC Educational Resources Information Center

    Sattath, Shmuel; Tversky, Amos

    1977-01-01

    Tree representations of similarity data are investigated. Hierarchical clustering is critically examined, and a more general procedure, called the additive tree, is presented. The additive tree representation is then compared to multidimensional scaling. (Author/JKS)

  19. A 4D-Ensemble-Variational System for Data Assimilation and Ensemble Initialization

    NASA Astrophysics Data System (ADS)

    Bowler, Neill; Clayton, Adam; Jardak, Mohamed; Lee, Eunjoo; Jermey, Peter; Lorenc, Andrew; Piccolo, Chiara; Pring, Stephen; Wlasak, Marek; Barker, Dale; Inverarity, Gordon; Swinbank, Richard

    2016-04-01

    The Met Office has been developing a four-dimensional ensemble variational (4DEnVar) data assimilation system over the past four years. The 4DEnVar system is intended both as data assimilation system in its own right and also an improved means of initializing the Met Office Global and Regional Ensemble Prediction System (MOGREPS). The global MOGREPS ensemble has been initialized by running an ensemble of 4DEnVars (En-4DEnVar). The scalability and maintainability of ensemble data assimilation methods make them increasingly attractive, and 4DEnVar may be adopted in the context of the Met Office's LFRic project to redevelop the technical infrastructure to enable its Unified Model (MetUM) to be run efficiently on massively parallel supercomputers. This presentation will report on the results of the 4DEnVar development project, including experiments that have been run using ensemble sizes of up to 200 members.

  20. Applications and Limitations of Using Large State-of-the-Art Climate Ensembles for Risk Management

    NASA Astrophysics Data System (ADS)

    Tredger, E. R.; Stainforth, D. A.; Smith, L. A.

    2007-12-01

    Climate model output is increasingly being offered to both policy makers and industry in support of detailed risk management and decision making strategies. Ensemble techniques illuminate the impact of some sources of uncertainty in future climate scenarios, and may provide valuable information to risk managers in their attempts to make robust decisions. Ensemble climate experiments are designed to explore variability in model behaviour due to differences in initial conditions, model parameters, and to a more limited extent model structure and other uncertainties. After discussing the strengths and limitations of this approach, an unprecedented range of behaviour is shown to arise in an ensemble of 45,000 General Circulation Model (HaDSM3) climateprediction.net runs. Initial condition uncertainty is shown to play a significant role if this model were to be used for risk management, especially in terms of estimating extremes. In terms of global behaviour, both low (less than 1 degree Celsius) and high (over 16 degrees Celsius) climate sensitivity runs are observed in models versions with comparable performance in 1xCO2 simulations. The data set is produced using a perturbed physics grand ensemble generated by the climateprediction.net experiment, a publically distributed computing experiment. Over 10,000 different model versions (the same structural model with different parameter values) are run, allowing for an assessment of parametric uncertainty. For each of these model versions, an initial condition ensemble is run, providing an estimate of each model versions' climate distribution. A grand ensemble of runs gives the opportunity for a better understanding of the state of climate modeling science. The wide range of behaviour raises important questions of how to interpret climate predictions for policy makers and how state-of-the-art (2007) climate modeling experiments might be related to the Earth's climate. While ensembles contain useful information, the limits

  1. An ensemble weighting approach for dendroclimatology: drought reconstructions for the northeastern Tibetan Plateau.

    PubMed

    Fang, Keyan; Wilmking, Martin; Davi, Nicole; Zhou, Feifei; Liu, Changzhi

    2014-01-01

    Traditional detrending methods assign equal mean value to all tree-ring series for chronology developments, despite that the mean annual growth changes in different time periods. We find that the strength of a tree-ring model can be improved by giving more weights to tree-ring series that have a stronger climate signal and less weight to series that have a weaker signal. We thus present an ensemble weighting method to mitigate these potential biases and to more accurately extract the climate signals in dendroclimatology studies. This new method has been used to develop the first annual precipitation reconstruction (previous August to current July) at the Songmingyan Mountain and to recalculate the tree-ring chronology from Shenge site in Dulan area in northeastern Tibetan Plateau (TP), a marginal area of Asian summer monsoon. The ensemble weighting method explains 31.7% of instrumental variance for the reconstructions at Songmingyan Mountain and 57.3% of the instrumental variance in the Dulan area, which are higher than those developed using traditional methods. We focus on the newly introduced reconstruction at Songmingyan Mountain, which showsextremely dry (wet) epochs from 1862-1874, 1914-1933 and 1991-1999 (1882-1905). These dry/wet epochs were also found in the marginal areas of summer monsoon and the Indian subcontinent, indicating the linkages between regional hydroclimate changes and the Indian summer monsoon.

  2. Developing planning hydrologic ensembles that reflect combined paleoclimate and projected climate information sets

    NASA Astrophysics Data System (ADS)

    Prairie, J. R.; Brekke, L.; Pruitt, T.; Rajagopalan, B.; Woodhouse, C.

    2008-12-01

    Historically, Reclamation has performed probabilistic analysis to assess risk and reliability only considering the instrumental record. Understanding that the assumption of a future similar to the relatively short instrumental past is losing credibility for long-term planning; Reclamation has conducted recent studies in the Colorado River Basin involving methods that relate water supply assumptions to a blend of instrumental record with tree-ring based reconstructed flow information. In addition, Reclamation has conducted studies in California that relate projected climate information to natural runoff change and adjusted water supply assumptions for long-term simulation of Central Valley Project and State Water Project operations. Both methods provide means to estimate probabilities and risks in water management that do not only consider the relatively short instrumental record. Motivated by both of these efforts Reclamation is exploring a method that relates blended tree-ring based reconstructed flow information and projected hydroclimate information to ensemble water supply assumptions suitable for long-term planning. Presentation will focus on method application and results in the Missouri River Basin above Toston, Montana. The method builds from a recently published nonparametric method that resamples the flow magnitudes of the instrument-record conditioned on the hydrologic "state" sequences from tree-ring based reconstructions. In this application, magnitudes from the instrument record are replaced by magnitudes from runoff simulations consistent with climate projections. The resultant hydrologic ensemble is then compared to the ensemble consistent with only projected climate information and runoff projections to explore the advantages and disadvantages of conditioning the climate projections relative to paleo-climate information. This will be accomplished by comparison of ensemble descriptive statistics and probabilities of drought and surplus events.

  3. From interacting particles to equilibrium statistical ensembles

    NASA Astrophysics Data System (ADS)

    Ilievski, Enej; Quinn, Eoin; Caux, Jean-Sébastien

    2017-03-01

    We argue that a particle language provides a conceptually simple framework for the description of anomalous equilibration in isolated quantum systems. We address this paradigm in the context of integrable models, which are those where particles scatter completely elastically and are stable against decay. In particular, we demonstrate that a complete description of equilibrium ensembles for interacting integrable models requires a formulation built from the mode occupation numbers of the underlying particle content, mirroring the case of noninteracting particles. This yields an intuitive physical interpretation of generalized Gibbs ensembles, and reconciles them with the microcanonical ensemble. We explain how previous attempts to identify an appropriate ensemble overlooked an essential piece of information, and provide explicit examples in the context of quantum quenches.

  4. Ensemble treatments of thermal pairing in nuclei

    NASA Astrophysics Data System (ADS)

    Hung, Nguyen Quang; Dang, Nguyen Dinh

    2009-10-01

    A systematic comparison is conducted for pairing properties of finite systems at nonzero temperature as predicted by the exact solutions of the pairing problem embedded in three principal statistical ensembles, namely the grandcanonical ensemble, canonical ensemble and microcanonical ensemble, as well as the unprojected (FTBCS1+SCQRPA) and Lipkin-Nogami projected (FTLN1+SCQRPA) theories that include the quasiparticle number fluctuation and coupling to pair vibrations within the self-consistent quasiparticle random-phase approximation. The numerical calculations are performed for the pairing gap, total energy, heat capacity, entropy, and microcanonical temperature within the doubly-folded equidistant multilevel pairing model. The FTLN1+SCQRPA predictions are found to agree best with the exact grand-canonical results. In general, all approaches clearly show that the superfluid-normal phase transition is smoothed out in finite systems. A novel formula is suggested for extracting the empirical pairing gap in reasonable agreement with the exact canonical results.

  5. Particle number fluctuations in the microcanonical ensemble

    NASA Astrophysics Data System (ADS)

    Begun, V. V.; Gorenstein, M. I.; Kostyuk, A. P.; Zozulya, O. S.

    2005-05-01

    Particle number fluctuations are studied in the microcanonical ensemble. For the Boltzmann statistics we deduce exact analytical formulas for the microcanonical partition functions in the case of noninteracting massless neutral particles and charged particles with zero net charge. The particle number fluctuations are calculated and we find that in the microcanonical ensemble they are suppressed in comparison to the fluctuations in the canonical and grand canonical ensembles. This remains valid in the thermodynamic limit too, so that the well-known equivalence of all statistical ensembles refers to average quantities, but does not apply to fluctuations. In the thermodynamic limit we are able to calculate the particle number fluctuations in the system of massive bosons and fermions when the exact conservation laws of both the energy and charge are taken into account.

  6. "Verfremdung" in Action at the Berliner Ensemble

    ERIC Educational Resources Information Center

    Brown, Thomas K.

    1973-01-01

    Discussion of Brecht's aesthetic principles, particularly "Verfremdung" (the device of renewal and estrangement), including the opinions of the Berliner Ensemble concerning to what degree they have retained Brecht's principles in productions of his plays. (DD)

  7. Training Tree Transducers

    DTIC Science & Technology

    2004-01-01

    trees (similar to the role played by the finite- state acceptor FSA for strings). We describe the version (equivalent to TSG ( Schabes , 1990)) where...strictly contained in tree sets of tree adjoining gram- mars (Joshi and Schabes , 1997). 4 Extended-LHS Tree Transducers (xR) Section 1 informally described...changes without modifying the training procedure, as long as we stick to tree automata. 10 Related Work Tree substitution grammars or TSG ( Schabes , 1990

  8. An ensemble climate projection for Africa

    NASA Astrophysics Data System (ADS)

    Buontempo, Carlo; Mathison, Camilla; Jones, Richard; Williams, Karina; Wang, Changgui; McSweeney, Carol

    2015-04-01

    The Met Office Hadley Centre's PRECIS regional climate modelling system has been used to generate a five member ensemble of climate projections for Africa over the 50 km resolution Coordinated Regional climate Downscaling Experiment-Africa domain. The ensemble comprises the downscaling of a subset of the Hadley Centre's perturbed physics global climate model (GCM) ensemble chosen to exclude ensemble members unable to represent the African climate realistically and then to capture the spread in outcomes from the projections of the remaining models. The PRECIS simulations were run from December 1949 to December 2100. The regional climate model (RCM) ensemble captures the annual cycle of temperatures well both for Africa as a whole and the sub-regions. It slightly overestimates precipitation over Africa as a whole and captures the annual cycle of rainfall for most of the African regions. The RCM ensemble substantially improve the patterns and magnitude of precipitation simulation compared to their driving GCM which is particularly noticeable in the Sahel for both the magnitude and timing of the wet season. Present-day simulations of the RCM ensemble are more similar to each other than those of the driving GCM ensemble which indicates that their climatologies are influenced significantly by the RCM formulation and less so by their driving GCMs. Consistent with this, the spread and magnitudes of the large-scale responses of the RCMs are often different than the driving GCMs and arguably more credible given the improved performance of the RCM. This also suggests that local climate forcing will be a significant driver of the regional response to climate change over Africa.

  9. An ensemble climate projection for Africa

    NASA Astrophysics Data System (ADS)

    Buontempo, Carlo; Mathison, Camilla; Jones, Richard; Williams, Karina; Wang, Changgui; McSweeney, Carol

    2014-09-01

    The Met Office Hadley Centre's PRECIS regional climate modelling system has been used to generate a five member ensemble of climate projections for Africa over the 50 km resolution Coordinated Regional climate Downscaling Experiment-Africa domain. The ensemble comprises the downscaling of a subset of the Hadley Centre's perturbed physics global climate model (GCM) ensemble chosen to exclude ensemble members unable to represent the African climate realistically and then to capture the spread in outcomes from the projections of the remaining models. The PRECIS simulations were run from December 1949 to December 2100. The regional climate model (RCM) ensemble captures the annual cycle of temperatures well both for Africa as a whole and the sub-regions. It slightly overestimates precipitation over Africa as a whole and captures the annual cycle of rainfall for most of the African regions. The RCM ensemble substantially improve the patterns and magnitude of precipitation simulation compared to their driving GCM which is particularly noticeable in the Sahel for both the magnitude and timing of the wet season. Present-day simulations of the RCM ensemble are more similar to each other than those of the driving GCM ensemble which indicates that their climatologies are influenced significantly by the RCM formulation and less so by their driving GCMs. Consistent with this, the spread and magnitudes of the large-scale responses of the RCMs are often different than the driving GCMs and arguably more credible given the improved performance of the RCM. This also suggests that local climate forcing will be a significant driver of the regional response to climate change over Africa.

  10. Meaning of temperature in different thermostatistical ensembles.

    PubMed

    Hänggi, Peter; Hilbert, Stefan; Dunkel, Jörn

    2016-03-28

    Depending on the exact experimental conditions, the thermodynamic properties of physical systems can be related to one or more thermostatistical ensembles. Here, we survey the notion of thermodynamic temperature in different statistical ensembles, focusing in particular on subtleties that arise when ensembles become non-equivalent. The 'mother' of all ensembles, the microcanonical ensemble, uses entropy and internal energy (the most fundamental, dynamically conserved quantity) to derive temperature as a secondary thermodynamic variable. Over the past century, some confusion has been caused by the fact that several competing microcanonical entropy definitions are used in the literature, most commonly the volume and surface entropies introduced by Gibbs. It can be proved, however, that only the volume entropy satisfies exactly the traditional form of the laws of thermodynamics for a broad class of physical systems, including all standard classical Hamiltonian systems, regardless of their size. This mathematically rigorous fact implies that negative 'absolute' temperatures and Carnot efficiencies more than 1 are not achievable within a standard thermodynamical framework. As an important offspring of microcanonical thermostatistics, we shall briefly consider the canonical ensemble and comment on the validity of the Boltzmann weight factor. We conclude by addressing open mathematical problems that arise for systems with discrete energy spectra.

  11. Heterogeneous Ensemble Combination Search Using Genetic Algorithm for Class Imbalanced Data Classification.

    PubMed

    Haque, Mohammad Nazmul; Noman, Nasimul; Berretta, Regina; Moscato, Pablo

    2016-01-01

    Classification of datasets with imbalanced sample distributions has always been a challenge. In general, a popular approach for enhancing classification performance is the construction of an ensemble of classifiers. However, the performance of an ensemble is dependent on the choice of constituent base classifiers. Therefore, we propose a genetic algorithm-based search method for finding the optimum combination from a pool of base classifiers to form a heterogeneous ensemble. The algorithm, called GA-EoC, utilises 10 fold-cross validation on training data for evaluating the quality of each candidate ensembles. In order to combine the base classifiers decision into ensemble's output, we used the simple and widely used majority voting approach. The proposed algorithm, along with the random sub-sampling approach to balance the class distribution, has been used for classifying class-imbalanced datasets. Additionally, if a feature set was not available, we used the (α, β) - k Feature Set method to select a better subset of features for classification. We have tested GA-EoC with three benchmarking datasets from the UCI-Machine Learning repository, one Alzheimer's disease dataset and a subset of the PubFig database of Columbia University. In general, the performance of the proposed method on the chosen datasets is robust and better than that of the constituent base classifiers and many other well-known ensembles. Based on our empirical study we claim that a genetic algorithm is a superior and reliable approach to heterogeneous ensemble construction and we expect that the proposed GA-EoC would perform consistently in other cases.

  12. Evaluation of real-time hydrometeorological ensemble prediction on hydrologic scales in Northern California

    NASA Astrophysics Data System (ADS)

    Georgakakos, Konstantine P.; Graham, Nicholas E.; Modrick, Theresa M.; Murphy, Michael J.; Shamir, Eylon; Spencer, Cristopher R.; Sperfslage, Jason A.

    2014-11-01

    . Reservoir inflow forecasts exhibit also good skill for the shorter lead-times out to a week or so, and provide a good quantitative basis in support of reservoir management decisions pertaining to objectives with a short term horizon (e.g., flood control and energy production). For the northernmost basin of Trinity reservoir inflow forecasts exhibit good skill for lead times longer than 3 weeks in the snow melt season. Bias correction of the ensemble precipitation and temperature forecasts with fixed bias factors over the range of lead times improves forecast performance for almost all leads for precipitation and temperature and for the shorter lead times for reservoir inflow. The results constitute a first look at the performance of operational coupled hydrometeorological ensemble forecasts in support of reservoir management.

  13. Short-term ensemble streamflow forecasting using operationally-produced single-valued streamflow forecasts

    NASA Astrophysics Data System (ADS)

    Regonda, Satish; Seo, Dong-Jun; Lawrence, Bill

    2010-05-01

    We present a statistical procedure that generates short-term streamflow ensemble forecasts from single-valued, or deterministic, forecasts operationally produced by the National Weather Service (NWS) River Forecast Centers (RFC). The resulting ensemble forecast provides an estimate of the uncertainty in the single-valued forecast to aid risk-based decision making by the emergency managers and by the users of the forecast products and services. The single-valued forecasts are produced at a 6-hr time step for 5 days into the future, and reflect single-valued short-term quantitative precipitation and temperature forecasts (QPF, QTF) and various run-time modifications (MOD), or manual data assimilation, by human forecasters to reduce various sources of error in the end-to-end forecast process. The proposed procedure generates 5 day-ahead ensemble traces of streamflow from a very parsimonious approximation of the conditional multivariate probability distribution of future streamflow given the single-valued streamflow forecasts, QPF and recent streamflow observations. For parameter estimation and evaluation, we used a 10-year archive of the single-valued river stage forecasts for six forecast points in Oklahoma produced operationally by the Arkansas-Red River Basin River Forecast Center (ABRFC). To evaluate the procedure, we carried out dependent and leave-one-year-out cross validation. The resulting ensemble hindcasts are then verified using the Ensemble Verification System (EVS) developed at the NWS Office of Hydrologic Development (OHD).

  14. Ensemble approaches to structural seismology: seek many rather than one

    NASA Astrophysics Data System (ADS)

    Sambridge, M.; Bodin, T.; Tkalcic, H.; Gallagher, K.

    2011-12-01

    For the past forty years seismologists have built models of the Earth's seismic structure over local, regional and global distance scales using derived quantities of a seismogram covering the frequency spectrum. A feature common to (almost) all cases is the objective of building a single `best' Earth model, in some sense. This is despite the fact that the data by themselves often do not require, or even allow, a single best fit Earth model to exist. It is widely recognized that many seismic inverse problems are ill-posed and non-unique and hence require regularization or additional constraints to obtain a single structural model. Interpretation of optimal models can be fraught with difficulties, particularly when formal uncertainty estimates become heavily dependent on the regularization imposed. An alternative approach is to embrace the non-uniqueness directly and employ an inference process based on parameter space sampling. Instead of seeking a best model within an optimization framework one seeks an ensemble of solutions and derives properties of that ensemble for inspection. While this idea has itself been employed for more than 30 years, it is not commonplace in seismology. Recent work has shown that trans-dimensional and hierarchical sampling methods have some considerable benefits for seismological problems involving multiple parameter types, uncertain data errors and/or uncertain model parameterizations. Rather than being forced to make decisions on parameterization, level of data noise and weights between data types in advance, as is often the case in an optimization framework, these choices can be relaxed and instead constrained by the data themselves. Limitations exist with sampling based approaches in that computational cost is often considered to be high for large scale structural problems, i.e. many unknowns and data. However there are a surprising number of areas where they are now feasible. This presentation will describe recent developments in

  15. Theory of the decision/problem state

    NASA Technical Reports Server (NTRS)

    Dieterly, D. L.

    1980-01-01

    A theory of the decision-problem state was introduced and elaborated. Starting with the basic model of a decision-problem condition, an attempt was made to explain how a major decision-problem may consist of subsets of decision-problem conditions composing different condition sequences. In addition, the basic classical decision-tree model was modified to allow for the introduction of a series of characteristics that may be encountered in an analysis of a decision-problem state. The resulting hierarchical model reflects the unique attributes of the decision-problem state. The basic model of a decision-problem condition was used as a base to evolve a more complex model that is more representative of the decision-problem state and may be used to initiate research on decision-problem states.

  16. Decision making.

    PubMed

    Chambers, David W

    2011-01-01

    A decision is a commitment of resources under conditions of risk in expectation of the best future outcome. The smart decision is always the strategy with the best overall expected value-the best combination of facts and values. Some of the special circumstances involved in decision making are discussed, including decisions where there are multiple goals, those where more than one person is involved in making the decision, using trigger points, framing decisions correctly, commitments to lost causes, and expert decision makers. A complex example of deciding about removal of asymptomatic third molars, with and without an EBD search, is discussed.

  17. Ensemble postprocessing for probabilistic quantitative precipitation forecasts

    NASA Astrophysics Data System (ADS)

    Bentzien, S.; Friederichs, P.

    2012-12-01

    Precipitation is one of the most difficult weather variables to predict in hydrometeorological applications. In order to assess the uncertainty inherent in deterministic numerical weather prediction (NWP), meteorological services around the globe develop ensemble prediction systems (EPS) based on high-resolution NWP systems. With non-hydrostatic model dynamics and without parameterization of deep moist convection, high-resolution NWP models are able to describe convective processes in more detail and provide more realistic mesoscale structures. However, precipitation forecasts are still affected by displacement errors, systematic biases and fast error growth on small scales. Probabilistic guidance can be achieved from an ensemble setup which accounts for model error and uncertainty of initial and boundary conditions. The German Meteorological Service (Deutscher Wetterdienst, DWD) provides such an ensemble system based on the German-focused limited-area model COSMO-DE. With a horizontal grid-spacing of 2.8 km, COSMO-DE is the convection-permitting high-resolution part of the operational model chain at DWD. The COSMO-DE-EPS consists of 20 realizations of COSMO-DE, driven by initial and boundary conditions derived from 4 global models and 5 perturbations of model physics. Ensemble systems like COSMO-DE-EPS are often limited with respect to ensemble size due to the immense computational costs. As a consequence, they can be biased and exhibit insufficient ensemble spread, and probabilistic forecasts may be not well calibrated. In this study, probabilistic quantitative precipitation forecasts are derived from COSMO-DE-EPS and evaluated at more than 1000 rain gauges located all over Germany. COSMO-DE-EPS is a frequently updated ensemble system, initialized 8 times a day. We use the time-lagged approach to inexpensively increase ensemble spread, which results in more reliable forecasts especially for extreme precipitation events. Moreover, we will show that statistical

  18. ON THE CONVERGENCE OF THE ENSEMBLE KALMAN FILTER

    PubMed Central

    Mandel, Jan; Cobb, Loren; Beezley, Jonathan D.

    2013-01-01

    Convergence of the ensemble Kalman filter in the limit for large ensembles to the Kalman filter is proved. In each step of the filter, convergence of the ensemble sample covariance follows from a weak law of large numbers for exchangeable random variables, the continuous mapping theorem gives convergence in probability of the ensemble members, and Lp bounds on the ensemble then give Lp convergence. PMID:24843228

  19. Dynamically weighted ensemble classification for non-stationary EEG processing

    NASA Astrophysics Data System (ADS)

    Liyanage, Sidath Ravindra; Guan, Cuntai; Zhang, Haihong; Keng Ang, Kai; Xu, JianXin; Lee, Tong Heng

    2013-06-01

    Objective. The non-stationary nature of EEG poses a major challenge to robust operation of brain-computer interfaces (BCIs). The objective of this paper is to propose and investigate a computational method to address non-stationarity in EEG classification. Approach. We developed a novel dynamically weighted ensemble classification (DWEC) framework whereby an ensemble of multiple classifiers are trained on clustered features. The decisions from these multiple classifiers are dynamically combined based on the distances of the cluster centres to each test data sample being classified. Main Results. The clusters of the feature space from the second session spanned a different space compared to the clusters of the feature space from the first session which highlights the processes of session-to-session non-stationarity. The session-to-session performance of the proposed DWEC method was evaluated on two datasets. The results on publicly available BCI Competition IV dataset 2A yielded a significantly higher mean accuracy of 81.48% compared to 75.9% from the baseline support vector machine (SVM) classifier without dynamic weighting. Results on the data collected from our twelve in-house subjects yielded a significantly higher mean accuracy of 73% compared to 69.4% from the baseline SVM classifier without dynamic weighting. Significance. The cluster based analysis provides insight into session-to-session non-stationarity in EEG data. The results demonstrate the effectiveness of the proposed method in addressing non-stationarity in EEG data for the operation of a BCI.

  20. Class-specific Error Bounds for Ensemble Classifiers

    SciTech Connect

    Prenger, R; Lemmond, T; Varshney, K; Chen, B; Hanley, W

    2009-10-06

    The generalization error, or probability of misclassification, of ensemble classifiers has been shown to be bounded above by a function of the mean correlation between the constituent (i.e., base) classifiers and their average strength. This bound suggests that increasing the strength and/or decreasing the correlation of an ensemble's base classifiers may yield improved performance under the assumption of equal error costs. However, this and other existing bounds do not directly address application spaces in which error costs are inherently unequal. For applications involving binary classification, Receiver Operating Characteristic (ROC) curves, performance curves that explicitly trade off false alarms and missed detections, are often utilized to support decision making. To address performance optimization in this context, we have developed a lower bound for the entire ROC curve that can be expressed in terms of the class-specific strength and correlation of the base classifiers. We present empirical analyses demonstrating the efficacy of these bounds in predicting relative classifier performance. In addition, we specify performance regions of the ROC curve that are naturally delineated by the class-specific strengths of the base classifiers and show that each of these regions can be associated with a unique set of guidelines for performance optimization of binary classifiers within unequal error cost regimes.

  1. Categorizing ideas about trees: a tree of trees.

    PubMed

    Fisler, Marie; Lecointre, Guillaume

    2013-01-01

    The aim of this study is to explore whether matrices and MP trees used to produce systematic categories of organisms could be useful to produce categories of ideas in history of science. We study the history of the use of trees in systematics to represent the diversity of life from 1766 to 1991. We apply to those ideas a method inspired from coding homologous parts of organisms. We discretize conceptual parts of ideas, writings and drawings about trees contained in 41 main writings; we detect shared parts among authors and code them into a 91-characters matrix and use a tree representation to show who shares what with whom. In other words, we propose a hierarchical representation of the shared ideas about trees among authors: this produces a "tree of trees." Then, we categorize schools of tree-representations. Classical schools like "cladists" and "pheneticists" are recovered but others are not: "gradists" are separated into two blocks, one of them being called here "grade theoreticians." We propose new interesting categories like the "buffonian school," the "metaphoricians," and those using "strictly genealogical classifications." We consider that networks are not useful to represent shared ideas at the present step of the study. A cladogram is made for showing who is sharing what with whom, but also heterobathmy and homoplasy of characters. The present cladogram is not modelling processes of transmission of ideas about trees, and here it is mostly used to test for proximity of ideas of the same age and for categorization.

  2. Tree Tectonics

    NASA Astrophysics Data System (ADS)

    Vogt, Peter R.

    2004-09-01

    Nature often replicates her processes at different scales of space and time in differing media. Here a tree-trunk cross section I am preparing for a dendrochronological display at the Battle Creek Cypress Swamp Nature Sanctuary (Calvert County, Maryland) dried and cracked in a way that replicates practically all the planform features found along the Mid-Oceanic Ridge (see Figure 1). The left-lateral offset of saw marks, contrasting with the right-lateral ``rift'' offset, even illustrates the distinction between transcurrent (strike-slip) and transform faults, the latter only recognized as a geologic feature, by J. Tuzo Wilson, in 1965. However, wood cracking is but one of many examples of natural processes that replicate one or several elements of lithospheric plate tectonics. Many of these examples occur in everyday venues and thus make great teaching aids, ``teachable'' from primary school to university levels. Plate tectonics, the dominant process of Earth geology, also occurs in miniature on the surface of some lava lakes, and as ``ice plate tectonics'' on our frozen seas and lakes. Ice tectonics also happens at larger spatial and temporal scales on the Jovian moons Europa and perhaps Ganymede. Tabletop plate tectonics, in which a molten-paraffin ``asthenosphere'' is surfaced by a skin of congealing wax ``plates,'' first replicated Mid-Oceanic Ridge type seafloor spreading more than three decades ago. A seismologist (J. Brune, personal communication, 2004) discovered wax plate tectonics by casually and serendipitously pulling a stick across a container of molten wax his wife and daughters had used in making candles. Brune and his student D. Oldenburg followed up and mirabile dictu published the results in Science (178, 301-304).

  3. Effects of initial conditions uncertainty on regional climate variability: An analysis using a low-resolution CESM ensemble

    NASA Astrophysics Data System (ADS)

    Sriver, Ryan L.; Forest, Chris E.; Keller, Klaus

    2015-07-01

    The uncertainties surrounding the initial conditions in Earth system models can considerably influence interpretations about climate trends and variability. Here we present results from a new climate change ensemble experiment using the Community Earth System Model (CESM) to analyze the effect of internal variability on regional climate variables that are relevant for decision making. Each simulation is initialized from a unique and dynamically consistent model state sampled from a ~10,000 year fully coupled equilibrium simulation, which captures the internal unforced variability of the coupled Earth system. We find that internal variability has a sizeable contribution to the modeled ranges of temperature and precipitation. The effects increase for more localized regions. The ensemble exhibits skill in simulating key regional climate processes relevant to decision makers, such as seasonal temperature variability and extremes. The presented ensemble framework and results can provide useful resources for uncertainty quantification, integrated assessment, and climate risk management.

  4. Pre- and post-processing of hydro-meteorological ensembles for the Norwegian flood forecasting system in 145 basins.

    NASA Astrophysics Data System (ADS)

    Jahr Hegdahl, Trine; Steinsland, Ingelin; Merete Tallaksen, Lena; Engeland, Kolbjørn

    2016-04-01

    Probabilistic flood forecasting has an added value for decision making. The Norwegian flood forecasting service is based on a flood forecasting model that run for 145 basins. Covering all of Norway the basins differ in both size and hydrological regime. Currently the flood forecasting is based on deterministic meteorological forecasts, and an auto-regressive procedure is used to achieve probabilistic forecasts. An alternative approach is to use meteorological and hydrological ensemble forecasts to quantify the uncertainty in forecasted streamflow. The hydrological ensembles are based on forcing a hydrological model with meteorological ensemble forecasts of precipitation and temperature. However, the ensembles of precipitation are often biased and the spread is too small, especially for the shortest lead times, i.e. they are not calibrated. These properties will, to some extent, propagate to hydrological ensembles, that most likely will be uncalibrated as well. Pre- and post-processing methods are commonly used to obtain calibrated meteorological and hydrological ensembles respectively. Quantitative studies showing the effect of the combined processing of the meteorological (pre-processing) and the hydrological (post-processing) ensembles are however few. The aim of this study is to evaluate the influence of pre- and post-processing on the skill of streamflow predictions, and we will especially investigate if the forecasting skill depends on lead-time, basin size and hydrological regime. This aim is achieved by applying the 51 medium-range ensemble forecast of precipitation and temperature provided by the European Center of Medium-Range Weather Forecast (ECMWF). These ensembles are used as input to the operational Norwegian flood forecasting model, both raw and pre-processed. Precipitation ensembles are calibrated using a zero-adjusted gamma distribution. Temperature ensembles are calibrated using a Gaussian distribution and altitude corrected by a constant gradient

  5. The Needs of Trees

    ERIC Educational Resources Information Center

    Boyd, Amy E.; Cooper, Jim

    2004-01-01

    Tree rings can be used not only to look at plant growth, but also to make connections between plant growth and resource availability. In this lesson, students in 2nd-4th grades use role-play to become familiar with basic requirements of trees and how availability of those resources is related to tree ring sizes and tree growth. These concepts can…

  6. A dynamic fault tree model of a propulsion system

    NASA Technical Reports Server (NTRS)

    Xu, Hong; Dugan, Joanne Bechta; Meshkat, Leila

    2006-01-01

    We present a dynamic fault tree model of the benchmark propulsion system, and solve it using Galileo. Dynamic fault trees (DFT) extend traditional static fault trees with special gates to model spares and other sequence dependencies. Galileo solves DFT models using a judicious combination of automatically generated Markov and Binary Decision Diagram models. Galileo easily handles the complexities exhibited by the benchmark problem. In particular, Galileo is designed to model phased mission systems.

  7. Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data.

    PubMed

    Bolser, Dan; Staines, Daniel M; Pritchard, Emily; Kersey, Paul

    2016-01-01

    Ensembl Plants ( http://plants.ensembl.org ) is an integrative resource presenting genome-scale information for a growing number of sequenced plant species (currently 33). Data provided includes genome sequence, gene models, functional annotation, and polymorphic loci. Various additional information are provided for variation data, including population structure, individual genotypes, linkage, and phenotype data. In each release, comparative analyses are performed on whole genome and protein sequences, and genome alignments and gene trees are made available that show the implied evolutionary history of each gene family. Access to the data is provided through a genome browser incorporating many specialist interfaces for different data types, and through a variety of additional methods for programmatic access and data mining. These access routes are consistent with those offered through the Ensembl interface for the genomes of non-plant species, including those of plant pathogens, pests, and pollinators.Ensembl Plants is updated 4-5 times a year and is developed in collaboration with our international partners in the Gramene ( http://www.gramene.org ) and transPLANT projects ( http://www.transplantdb.org ).

  8. Improving sub-pixel imperviousness change prediction by ensembling heterogeneous non-linear regression models

    NASA Astrophysics Data System (ADS)

    Drzewiecki, Wojciech

    2016-12-01

    In this work nine non-linear regression models were compared for sub-pixel impervious surface area mapping from Landsat images. The comparison was done in three study areas both for accuracy of imperviousness coverage evaluation in individual points in time and accuracy of imperviousness change assessment. The performance of individual machine learning algorithms (Cubist, Random Forest, stochastic gradient boosting of regression trees, k-nearest neighbors regression, random k-nearest neighbors regression, Multivariate Adaptive Regression Splines, averaged neural networks, and support vector machines with polynomial and radial kernels) was also compared with the performance of heterogeneous model ensembles constructed from the best models trained using particular techniques. The results proved that in case of sub-pixel evaluation the most accurate prediction of change may not necessarily be based on the most accurate individual assessments. When single methods are considered, based on obtained results Cubist algorithm may be advised for Landsat based mapping of imperviousness for single dates. However, Random Forest may be endorsed when the most reliable evaluation of imperviousness change is the primary goal. It gave lower accuracies for individual assessments, but better prediction of change due to more correlated errors of individual predictions. Heterogeneous model ensembles performed for individual time points assessments at least as well as the best individual models. In case of imperviousness change assessment the ensembles always outperformed single model approaches. It means that it is possible to improve the accuracy of sub-pixel imperviousness change assessment using ensembles of heterogeneous non-linear regression models.

  9. Laser noise imposed limitations of ensemble quantum metrology

    NASA Astrophysics Data System (ADS)

    Plankensteiner, D.; Schachenmayer, J.; Ritsch, H.; Genes, C.

    2016-12-01

    Laser noise is a decisive limiting factor in high precision spectroscopy of narrow lines using atomic ensembles. In an idealized Doppler and differential-light-shift-free magic wavelength lattice configuration, it remains as one distinct principal limitation beyond collective atomic decay. In this work we study the limitations originating from laser phase and amplitude noise in an idealized Ramsey pulse interrogation scheme with uncorrelated atoms. Phase noise leads to a saturation of the frequency sensitivity with increasing atom number while amplitude noise implies a scaling 1/\\sqrt{τ } with τ being the interrogation time. We employ a technique using decoherence-free subspaces first introduced in Dorner (2012 New J. Phys. 14 043011) which can restore the scaling with the square root of the inverse particle number 1/\\sqrt{N}. Similar results and improvements are obtained numerically for a Rabi spectroscopy setup.

  10. Decision analysis: a primer and application to pain-related studies.

    PubMed

    Kim, Jaewhan; Nelson, Richard; Biskupiak, Joseph

    2008-01-01

    Decision analysis is a quantitative approach to decision making under uncertainty that explicitly states all relevant components of the decision, including statement of the problem, identification of the perspective of the decision maker, alternative courses of action and their consequences, and a model that illustrates the decision-making process. Decision trees and Markov models are used to provide a simplified version of complex clinical problems to help decision makers understand the risks and benefits of several clinical options. This article provides an introduction to decision analysis by describing the construction of decision trees and Markov models and employing examples from the recent literature.

  11. Inclusion of Sea-Surface Temperature Variation in the U.S. Navy Ensemble-Transform Global Ensemble Prediction System

    DTIC Science & Technology

    2012-10-13

    Inclusion of sea-surface temperature variation in the U.S. Navy ensemble-transform global ensemble prediction system J. G. McLay,1 M. K. Flatau,1 C...Operational Global Atmospheric Prediction System (NOGAPS) global spectral model to generate a medium-range forecast ensemble. When compared to a control...Navy ensemble-transform global ensemble prediction system, J. Geophys. Res., 117, D19120, doi:10.1029/2011JD016937. 1. Introduction [2] The uppermost

  12. GumTree: Data reduction

    NASA Astrophysics Data System (ADS)

    Rayner, Hugh; Hathaway, Paul; Hauser, Nick; Fei, Yang; Franceschini, Ferdi; Lam, Tony

    2006-11-01

    Access to software tools for interactive data reduction, visualisation and analysis during a neutron scattering experiment enables instrument users to make informed decisions regarding the direction and success of their experiment. ANSTO aims to enhance the experiment experience of its facility's users by integrating these data reduction tools with the instrument control interface for immediate feedback. GumTree is a software framework and application designed to support an Integrated Scientific Experimental Environment, for concurrent access to instrument control, data acquisition, visualisation and analysis software. The Data Reduction and Analysis (DRA) module is a component of the GumTree framework that allows users to perform data reduction, correction and basic analysis within GumTree while an experiment is running. It is highly integrated with GumTree, able to pull experiment data and metadata directly from the instrument control and data acquisition components. The DRA itself uses components common to all instruments at the facility, providing a consistent interface. It features familiar ISAW-based 1D and 2D plotting, an OpenGL-based 3D plotter and peak fitting performed by fityk. This paper covers the benefits of integration, the flexibility of the DRA module, ease of use for the interface and audit trail generation.

  13. Potential value of operationally available and spatially distributed ensemble soil water estimates for agriculture

    NASA Astrophysics Data System (ADS)

    Georgakakos, Konstantine P.; Carpenter, Theresa M.

    2006-08-01

    SummaryThe focus of this paper is to develop a methodology to answer the question: do the spatially distributed soil water estimates produced by operational distributed hydrologic models provide potential benefits for agriculture? The formulation quantifies the potential value through a cost-loss analysis, whereby cost for the farmer is associated with the decision to irrigate the field and loss is associated with the decision not to irrigate while damaging soil water deficits occur. Farmer decisions are made in view of the likelihood of damaging events as estimated by the ensemble distributed model simulations of soil water deficit. The ensemble simulations account for parametric and radar rainfall uncertainty. The application area for the economic value analysis is the farmland of the Illinois River watershed in northwestern Arkansas (mainly) and eastern Oklahoma, for which operational-quality distributed model input is available. The land is used to produce hay for feed. The analysis indicates that there is substantial potential economic value in using the ensemble soil water estimates to make decisions regarding irrigation within the watershed for the months of July, August and September, when severe soil water deficits may occur. The benefits are higher for lower cost-loss ratios and for higher yield plants. They exhibit considerable spatial variability within the watershed in agreement with the spatial variability of the incidence of soil water deficits and with the spatial variability of the ability of the ensemble model simulations to reproduce this variability. The results of this study warrant additional analysis of the economic value of distributed model simulations in other regions, different distributed models and for other types of crops. Consideration of forecasts in addition to simulations is also an important next step.

  14. Tea tree oil.

    PubMed

    Hartford, Orville; Zug, Kathryn A

    2005-09-01

    Tea tree oil is a popular ingredient in many over-the-counter healthcare and cosmetic products. With the explosion of the natural and alternative medicine industry, more and more people are using products containing tea tree oil. This article reviews basic information about tea tree oil and contact allergy, including sources of tea tree oil, chemical composition, potential cross reactions, reported cases of allergic contact dermatitis, allergenic compounds in tea tree oil, practical patch testing information, and preventive measures.

  15. Imaging and Optically Manipulating Neuronal Ensembles.

    PubMed

    Carrillo-Reid, Luis; Yang, Weijian; Kang Miller, Jae-Eun; Peterka, Darcy S; Yuste, Rafael

    2017-03-15

    The neural code that relates the firing of neurons to the generation of behavior and mental states must be implemented by spatiotemporal patterns of activity across neuronal populations. These patterns engage selective groups of neurons, called neuronal ensembles, which are emergent building blocks of neural circuits. We review optical and computational methods, based on two-photon calcium imaging and two-photon optogenetics, to detect, characterize, and manipulate neuronal ensembles in three dimensions. We review data using these methods in the mammalian cortex that demonstrate the existence of neuronal ensembles in the spontaneous and evoked cortical activity in vitro and in vivo. Moreover, two-photon optogenetics enable the possibility of artificially imprinting neuronal ensembles into awake, behaving animals and of later recalling those ensembles selectively by stimulating individual cells. These methods could enable deciphering the neural code and also be used to understand the pathophysiology of neurological and mental diseases and design novel therapies. Expected final online publication date for the Annual Review of Biophysics Volume 46 is May 20, 2017. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

  16. A Bayesian Ensemble Approach for Epidemiological Projections

    PubMed Central

    Lindström, Tom; Tildesley, Michael; Webb, Colleen

    2015-01-01

    Mathematical models are powerful tools for epidemiology and can be used to compare control actions. However, different models and model parameterizations may provide different prediction of outcomes. In other fields of research, ensemble modeling has been used to combine multiple projections. We explore the possibility of applying such methods to epidemiology by adapting Bayesian techniques developed for climate forecasting. We exemplify the implementation with single model ensembles based on different parameterizations of the Warwick model run for the 2001 United Kingdom foot and mouth disease outbreak and compare the efficacy of different control actions. This allows us to investigate the effect that discrepancy among projections based on different modeling assumptions has on the ensemble prediction. A sensitivity analysis showed that the choice of prior can have a pronounced effect on the posterior estimates of quantities of interest, in particular for ensembles with large discrepancy among projections. However, by using a hierarchical extension of the method we show that prior sensitivity can be circumvented. We further extend the method to include a priori beliefs about different modeling assumptions and demonstrate that the effect of this can have different consequences depending on the discrepancy among projections. We propose that the method is a promising analytical tool for ensemble modeling of disease outbreaks. PMID:25927892

  17. Multiscale macromolecular simulation: role of evolving ensembles.

    PubMed

    Singharoy, A; Joshi, H; Ortoleva, P J

    2012-10-22

    Multiscale analysis provides an algorithm for the efficient simulation of macromolecular assemblies. This algorithm involves the coevolution of a quasiequilibrium probability density of atomic configurations and the Langevin dynamics of spatial coarse-grained variables denoted order parameters (OPs) characterizing nanoscale system features. In practice, implementation of the probability density involves the generation of constant OP ensembles of atomic configurations. Such ensembles are used to construct thermal forces and diffusion factors that mediate the stochastic OP dynamics. Generation of all-atom ensembles at every Langevin time step is computationally expensive. Here, multiscale computation for macromolecular systems is made more efficient by a method that self-consistently folds in ensembles of all-atom configurations constructed in an earlier step, history, of the Langevin evolution. This procedure accounts for the temporal evolution of these ensembles, accurately providing thermal forces and diffusions. It is shown that efficiency and accuracy of the OP-based simulations is increased via the integration of this historical information. Accuracy improves with the square root of the number of historical timesteps included in the calculation. As a result, CPU usage can be decreased by a factor of 3-8 without loss of accuracy. The algorithm is implemented into our existing force-field based multiscale simulation platform and demonstrated via the structural dynamics of viral capsomers.

  18. Verification of Ensemble Forecasts for the New York City Operations Support Tool

    NASA Astrophysics Data System (ADS)

    Day, G.; Schaake, J. C.; Thiemann, M.; Draijer, S.; Wang, L.

    2012-12-01

    The New York City water supply system operated by the Department of Environmental Protection (DEP) serves nine million people. It covers 2,000 square miles of portions of the Catskill, Delaware, and Croton watersheds, and it includes nineteen reservoirs and three controlled lakes. DEP is developing an Operations Support Tool (OST) to support its water supply operations and planning activities. OST includes historical and real-time data, a model of the water supply system complete with operating rules, and lake water quality models developed to evaluate alternatives for managing turbidity in the New York City Catskill reservoirs. OST will enable DEP to manage turbidity in its unfiltered system while satisfying its primary objective of meeting the City's water supply needs, in addition to considering secondary objectives of maintaining ecological flows, supporting fishery and recreation releases, and mitigating downstream flood peaks. The current version of OST relies on statistical forecasts of flows in the system based on recent observed flows. To improve short-term decision making, plans are being made to transition to National Weather Service (NWS) ensemble forecasts based on hydrologic models that account for short-term weather forecast skill, longer-term climate information, as well as the hydrologic state of the watersheds and recent observed flows. To ensure that the ensemble forecasts are unbiased and that the ensemble spread reflects the actual uncertainty of the forecasts, a statistical model has been developed to post-process the NWS ensemble forecasts to account for hydrologic model error as well as any inherent bias and uncertainty in initial model states, meteorological data and forecasts. The post-processor is designed to produce adjusted ensemble forecasts that are consistent with the DEP historical flow sequences that were used to develop the system operating rules. A set of historical hindcasts that is representative of the real-time ensemble

  19. Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

    PubMed

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2015-02-01

    Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can

  20. Ensemble forecasting for renewable energy applications - status and current challenges for their generation and verification

    NASA Astrophysics Data System (ADS)

    Pinson, Pierre

    2016-04-01

    The operational management of renewable energy generation in power systems and electricity markets requires forecasts in various forms, e.g., deterministic or probabilistic, continuous or categorical, depending upon the decision process at hand. Besides, such forecasts may also be necessary at various spatial and temporal scales, from high temporal resolutions (in the order of minutes) and very localized for an offshore wind farm, to coarser temporal resolutions (hours) and covering a whole country for day-ahead power scheduling problems. As of today, weather predictions are a common input to forecasting methodologies for renewable energy generation. Since for most decision processes, optimal decisions can only be made if accounting for forecast uncertainties, ensemble predictions and density forecasts are increasingly seen as the product of choice. After discussing some of the basic approaches to obtaining ensemble forecasts of renewable power generation, it will be argued that space-time trajectories of renewable power production may or may not be necessitate post-processing ensemble forecasts for relevant weather variables. Example approaches and test case applications will be covered, e.g., looking at the Horns Rev offshore wind farm in Denmark, or gridded forecasts for the whole continental Europe. Eventually, we will illustrate some of the limitations of current frameworks to forecast verification, which actually make it difficult to fully assess the quality of post-processing approaches to obtain renewable energy predictions.

  1. Decision technology.

    PubMed

    Edwards, W; Fasolo, B

    2001-01-01

    This review is about decision technology-the rules and tools that help us make wiser decisions. First, we review the three rules that are at the heart of most traditional decision technology-multi-attribute utility, Bayes' theorem, and subjective expected utility maximization. Since the inception of decision research, these rules have prescribed how we should infer values and probabilities and how we should combine them to make better decisions. We suggest how to make best use of all three rules in a comprehensive 19-step model. The remainder of the review explores recently developed tools of decision technology. It examines the characteristics and problems of decision-facilitating sites on the World Wide Web. Such sites now provide anyone who can use a personal computer with access to very sophisticated decision-aiding tools structured mainly to facilitate consumer decision making. It seems likely that the Web will be the mode by means of which decision tools will be distributed to lay users. But methods for doing such apparently simple things as winnowing 3000 options down to a more reasonable number, like 10, contain traps for unwary decision technologists. The review briefly examines Bayes nets and influence diagrams-judgment and decision-making tools that are available as computer programs. It very briefly summarizes the state of the art of eliciting probabilities from experts. It concludes that decision tools will be as important in the 21st century as spreadsheets were in the 20th.

  2. Foraging Behaviour in Magellanic Woodpeckers Is Consistent with a Multi-Scale Assessment of Tree Quality

    PubMed Central

    Vergara, Pablo M.; Soto, Gerardo E.; Rodewald, Amanda D.; Meneses, Luis O.; Pérez-Hernández, Christian G.

    2016-01-01

    Theoretical models predict that animals should make foraging decisions after assessing the quality of available habitat, but most models fail to consider the spatio-temporal scales at which animals perceive habitat availability. We tested three foraging strategies that explain how Magellanic woodpeckers (Campephilus magellanicus) assess the relative quality of trees: 1) Woodpeckers with local knowledge select trees based on the available trees in the immediate vicinity. 2) Woodpeckers lacking local knowledge select trees based on their availability at previously visited locations. 3) Woodpeckers using information from long-term memory select trees based on knowledge about trees available within the entire landscape. We observed foraging woodpeckers and used a Brownian Bridge Movement Model to identify trees available to woodpeckers along foraging routes. Woodpeckers selected trees with a later decay stage than available trees. Selection models indicated that preferences of Magellanic woodpeckers were based on clusters of trees near the most recently visited trees, thus suggesting that woodpeckers use visual cues from neighboring trees. In a second analysis, Cox’s proportional hazards models showed that woodpeckers used information consolidated across broader spatial scales to adjust tree residence times. Specifically, woodpeckers spent more time at trees with larger diameters and in a more advanced stage of decay than trees available along their routes. These results suggest that Magellanic woodpeckers make foraging decisions based on the relative quality of trees that they perceive and memorize information at different spatio-temporal scales. PMID:27416115

  3. Foraging Behaviour in Magellanic Woodpeckers Is Consistent with a Multi-Scale Assessment of Tree Quality.

    PubMed

    Vergara, Pablo M; Soto, Gerardo E; Moreira-Arce, Darío; Rodewald, Amanda D; Meneses, Luis O; Pérez-Hernández, Christian G

    2016-01-01

    Theoretical models predict that animals should make foraging decisions after assessing the quality of available habitat, but most models fail to consider the spatio-temporal scales at which animals perceive habitat availability. We tested three foraging strategies that explain how Magellanic woodpeckers (Campephilus magellanicus) assess the relative quality of trees: 1) Woodpeckers with local knowledge select trees based on the available trees in the immediate vicinity. 2) Woodpeckers lacking local knowledge select trees based on their availability at previously visited locations. 3) Woodpeckers using information from long-term memory select trees based on knowledge about trees available within the entire landscape. We observed foraging woodpeckers and used a Brownian Bridge Movement Model to identify trees available to woodpeckers along foraging routes. Woodpeckers selected trees with a later decay stage than available trees. Selection models indicated that preferences of Magellanic woodpeckers were based on clusters of trees near the most recently visited trees, thus suggesting that woodpeckers use visual cues from neighboring trees. In a second analysis, Cox's proportional hazards models showed that woodpeckers used information consolidated across broader spatial scales to adjust tree residence times. Specifically, woodpeckers spent more time at trees with larger diameters and in a more advanced stage of decay than trees available along their routes. These results suggest that Magellanic woodpeckers make foraging decisions based on the relative quality of trees that they perceive and memorize information at different spatio-temporal scales.

  4. Cavity cooling of an ensemble spin system.

    PubMed

    Wood, Christopher J; Borneman, Troy W; Cory, David G

    2014-02-07

    We describe how sideband cooling techniques may be applied to large spin ensembles in magnetic resonance. Using the Tavis-Cummings model in the presence of a Rabi drive, we solve a Markovian master equation describing the joint spin-cavity dynamics to derive cooling rates as a function of ensemble size. Our calculations indicate that the coupled angular momentum subspaces of a spin ensemble containing roughly 10(11) electron spins may be polarized in a time many orders of magnitude shorter than the typical thermal relaxation time. The described techniques should permit efficient removal of entropy for spin-based quantum information processors and fast polarization of spin samples. The proposed application of a standard technique in quantum optics to magnetic resonance also serves to reinforce the connection between the two fields, which has recently begun to be explored in further detail due to the development of hybrid designs for manufacturing noise-resilient quantum devices.

  5. Optimized gold nanoshell ensembles for biomedical applications

    PubMed Central

    2013-01-01

    We theoretically study the properties of the optimal size distribution in the ensemble of hollow gold nanoshells (HGNs) that exhibits the best performance at in vivo biomedical applications. For the first time, to the best of our knowledge, we analyze the dependence of the optimal geometric means of the nanoshells’ thicknesses and core radii on the excitation wavelength and the type of human tissue, while assuming lognormal fit to the size distribution in a real HGN ensemble. Regardless of the tissue type, short-wavelength, near-infrared lasers are found to be the most effective in both absorption- and scattering-based applications. We derive approximate analytical expressions enabling one to readily estimate the parameters of optimal distribution for which an HGN ensemble exhibits the maximum efficiency of absorption or scattering inside a human tissue irradiated by a near-infrared laser. PMID:23537206

  6. Optimized gold nanoshell ensembles for biomedical applications.

    PubMed

    Sikdar, Debabrata; Rukhlenko, Ivan D; Cheng, Wenlong; Premaratne, Malin

    2013-03-28

    : We theoretically study the properties of the optimal size distribution in the ensemble of hollow gold nanoshells (HGNs) that exhibits the best performance at in vivo biomedical applications. For the first time, to the best of our knowledge, we analyze the dependence of the optimal geometric means of the nanoshells' thicknesses and core radii on the excitation wavelength and the type of human tissue, while assuming lognormal fit to the size distribution in a real HGN ensemble. Regardless of the tissue type, short-wavelength, near-infrared lasers are found to be the most effective in both absorption- and scattering-based applications. We derive approximate analytical expressions enabling one to readily estimate the parameters of optimal distribution for which an HGN ensemble exhibits the maximum efficiency of absorption or scattering inside a human tissue irradiated by a near-infrared laser.

  7. Spectroscopy with Random and Displaced Random Ensembles

    NASA Astrophysics Data System (ADS)

    Velázquez, V.; Zuker, A. P.

    2002-02-01

    Because of the time reversal invariance of the angular momentum operator J2, the average energies and variances at fixed J for random two-body Hamiltonians exhibit odd-even- J staggering that may be especially strong for J = 0. It is shown that upon ensemble averaging over random runs, this behavior is reflected in the yrast states. Displaced (attractive) random ensembles lead to rotational spectra with strongly enhanced B(E2) transitions for a certain class of model spaces. It is explained how to generalize these results to other forms of collectivity.

  8. Quantum measurement of a mesoscopic spin ensemble

    SciTech Connect

    Giedke, G.; Taylor, J. M.; Lukin, M. D.; D'Alessandro, D.; Imamoglu, A.

    2006-09-15

    We describe a method for precise estimation of the polarization of a mesoscopic spin ensemble by using its coupling to a single two-level system. Our approach requires a minimal number of measurements on the two-level system for a given measurement precision. We consider the application of this method to the case of nuclear-spin ensemble defined by a single electron-charged quantum dot: we show that decreasing the electron spin dephasing due to nuclei and increasing the fidelity of nuclear-spin-based quantum memory could be within the reach of present day experiments.

  9. Ensemble Eclipse: A Process for Prefab Development Environment for the Ensemble Project

    NASA Technical Reports Server (NTRS)

    Wallick, Michael N.; Mittman, David S.; Shams, Khawaja, S.; Bachmann, Andrew G.; Ludowise, Melissa

    2013-01-01

    This software simplifies the process of having to set up an Eclipse IDE programming environment for the members of the cross-NASA center project, Ensemble. It achieves this by assembling all the necessary add-ons and custom tools/preferences. This software is unique in that it allows developers in the Ensemble Project (approximately 20 to 40 at any time) across multiple NASA centers to set up a development environment almost instantly and work on Ensemble software. The software automatically has the source code repositories and other vital information and settings included. The Eclipse IDE is an open-source development framework. The NASA (Ensemble-specific) version of the software includes Ensemble-specific plug-ins as well as settings for the Ensemble project. This software saves developers the time and hassle of setting up a programming environment, making sure that everything is set up in the correct manner for Ensemble development. Existing software (i.e., standard Eclipse) requires an intensive setup process that is both time-consuming and error prone. This software is built once by a single user and tested, allowing other developers to simply download and use the software

  10. Calibrated Ensemble Forecasts using Quantile Regression Forests and Ensemble Model Output Statistics.

    NASA Astrophysics Data System (ADS)

    Taillardat, Maxime; Mestre, Olivier; Zamo, Michaël; Naveau, Philippe

    2016-04-01

    Ensembles used for probabilistic weather forecasting tend to be biased and underdispersive. This presentation proposes a statistical method for postprocessing ensembles based on Quantile Regression Forests (QRF), a generalization of random forests for quantile regression. This method does not fit a parametric probability density function like in Ensemble Model Output Statistics (EMOS) but provides an estimation of desired quantiles. This is a non-parametric approach which eliminates any assumption on the variable subject to calibration. This method can estimate quantiles using not only members of the ensemble but any predictor available including statistics on other variables for example. The method is applied to the Météo-France 35-members ensemble forecast (PEARP) for surface temperature and wind-speed for available lead times from 3 up to 54 hours and compared to EMOS. All postprocessed ensembles are much better calibrated than the PEARP raw ensemble and experiments on real data also show that QRF performs better than EMOS, and can bring a real gain for forecasters compared to EMOS. QRF provides sharp and reliable probabilistic forecasts. At last, classical scoring rules to verify predictive forecasts are completed by the introduction of entropy as a general measure of reliability.

  11. Total probabilities of ensemble runoff forecasts

    NASA Astrophysics Data System (ADS)

    Olav Skøien, Jon; Bogner, Konrad; Salamon, Peter; Smith, Paul; Pappenberger, Florian

    2016-04-01

    Ensemble forecasting has for a long time been used as a method in meteorological modelling to indicate the uncertainty of the forecasts. However, as the ensembles often exhibit both bias and dispersion errors, it is necessary to calibrate and post-process them. Two of the most common methods for this are Bayesian Model Averaging (Raftery et al., 2005) and Ensemble Model Output Statistics (EMOS) (Gneiting et al., 2005). There are also methods for regionalizing these methods (Berrocal et al., 2007) and for incorporating the correlation between lead times (Hemri et al., 2013). Engeland and Steinsland Engeland and Steinsland (2014) developed a framework which can estimate post-processing parameters which are different in space and time, but still can give a spatially and temporally consistent output. However, their method is computationally complex for our larger number of stations, and cannot directly be regionalized in the way we would like, so we suggest a different path below. The target of our work is to create a mean forecast with uncertainty bounds for a large number of locations in the framework of the European Flood Awareness System (EFAS - http://www.efas.eu) We are therefore more interested in improving the forecast skill for high-flows rather than the forecast skill of lower runoff levels. EFAS uses a combination of ensemble forecasts and deterministic forecasts from different forecasters to force a distributed hydrologic model and to compute runoff ensembles for each river pixel within the model domain. Instead of showing the mean and the variability of each forecast ensemble individually, we will now post-process all model outputs to find a total probability, the post-processed mean and uncertainty of all ensembles. The post-processing parameters are first calibrated for each calibration location, but assuring that they have some spatial correlation, by adding a spatial penalty in the calibration process. This can in some cases have a slight negative

  12. Fault-Tree Compiler

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Boerschlein, David P.

    1993-01-01

    Fault-Tree Compiler (FTC) program, is software tool used to calculate probability of top event in fault tree. Gates of five different types allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language easy to understand and use. In addition, program supports hierarchical fault-tree definition feature, which simplifies tree-description process and reduces execution time. Set of programs created forming basis for reliability-analysis workstation: SURE, ASSIST, PAWS/STEM, and FTC fault-tree tool (LAR-14586). Written in PASCAL, ANSI-compliant C language, and FORTRAN 77. Other versions available upon request.

  13. Categorizing Ideas about Trees: A Tree of Trees

    PubMed Central

    Fisler, Marie; Lecointre, Guillaume

    2013-01-01

    The aim of this study is to explore whether matrices and MP trees used to produce systematic categories of organisms could be useful to produce categories of ideas in history of science. We study the history of the use of trees in systematics to represent the diversity of life from 1766 to 1991. We apply to those ideas a method inspired from coding homologous parts of organisms. We discretize conceptual parts of ideas, writings and drawings about trees contained in 41 main writings; we detect shared parts among authors and code them into a 91-characters matrix and use a tree representation to show who shares what with whom. In other words, we propose a hierarchical representation of the shared ideas about trees among authors: this produces a “tree of trees.” Then, we categorize schools of tree-representations. Classical schools like “cladists” and “pheneticists” are recovered but others are not: “gradists” are separated into two blocks, one of them being called here “grade theoreticians.” We propose new interesting categories like the “buffonian school,” the “metaphoricians,” and those using “strictly genealogical classifications.” We consider that networks are not useful to represent shared ideas at the present step of the study. A cladogram is made for showing who is sharing what with whom, but also heterobathmy and homoplasy of characters. The present cladogram is not modelling processes of transmission of ideas about trees, and here it is mostly used to test for proximity of ideas of the same age and for categorization. PMID:23950877

  14. Ensemble forecasting of short-term system scale irrigation demands using real-time flow data and numerical weather predictions

    NASA Astrophysics Data System (ADS)

    Perera, Kushan C.; Western, Andrew W.; Robertson, David E.; George, Biju; Nawarathna, Bandara

    2016-06-01

    Irrigation demands fluctuate in response to weather variations and a range of irrigation management decisions, which creates challenges for water supply system operators. This paper develops a method for real-time ensemble forecasting of irrigation demand and applies it to irrigation command areas of various sizes for lead times of 1 to 5 days. The ensemble forecasts are based on a deterministic time series model coupled with ensemble representations of the various inputs to that model. Forecast inputs include past flow, precipitation, and potential evapotranspiration. These inputs are variously derived from flow observations from a modernized irrigation delivery system; short-term weather forecasts derived from numerical weather prediction models and observed weather data available from automatic weather stations. The predictive performance for the ensemble spread of irrigation demand was quantified using rank histograms, the mean continuous rank probability score (CRPS), the mean CRPS reliability and the temporal mean of the ensemble root mean squared error (MRMSE). The mean forecast was evaluated using root mean squared error (RMSE), Nash-Sutcliffe model efficiency (NSE) and bias. The NSE values for evaluation periods ranged between 0.96 (1 day lead time, whole study area) and 0.42 (5 days lead time, smallest command area). Rank histograms and comparison of MRMSE, mean CRPS, mean CRPS reliability and RMSE indicated that the ensemble spread is generally a reliable representation of the forecast uncertainty for short lead times but underestimates the uncertainty for long lead times.

  15. Transfer of radiative heat through clothing ensembles.

    PubMed

    Lotens, W A; Pieters, A M

    1995-06-01

    A mathematical model was designed to calculate the temperature and dry heat transfer in the various layers of a clothing ensemble, and the total heat loss of a human who is irradiated for a certain fraction of his or her area. The clothing ensemble that is irradiated by an external heat source is considered to be composed of underclothing, trapped air, and outer fabric. The model was experimentally tested with heat balance methods, using subjects, varying the activity, wind, and radiation characteristics of the outer garment of two-layer ensembles. In two experiments the subjects could only give off dry heat because they were wrapped in plastic foil. The model appeared to be correct within about 1 degree C (rms error) and 10 Wm-2 (rms error). In a third experiment, sweat evaporation was also taken into account, showing that the resulting physiological heat load of 10 to 30% of the intercepted additional radiation is compensated by additional sweating. The resulting heat strain was rather mild. It is concluded that the mathematical model is a valid tool for the investigation of heat transfer through two-layer ensembles in radiant environments.

  16. A Hierarchical Bayes Ensemble Kalman Filter

    NASA Astrophysics Data System (ADS)

    Tsyrulnikov, Michael; Rakitko, Alexander

    2017-01-01

    A new ensemble filter that allows for the uncertainty in the prior distribution is proposed and tested. The filter relies on the conditional Gaussian distribution of the state given the model-error and predictability-error covariance matrices. The latter are treated as random matrices and updated in a hierarchical Bayes scheme along with the state. The (hyper)prior distribution of the covariance matrices is assumed to be inverse Wishart. The new Hierarchical Bayes Ensemble Filter (HBEF) assimilates ensemble members as generalized observations and allows ordinary observations to influence the covariances. The actual probability distribution of the ensemble members is allowed to be different from the true one. An approximation that leads to a practicable analysis algorithm is proposed. The new filter is studied in numerical experiments with a doubly stochastic one-variable model of "truth". The model permits the assessment of the variance of the truth and the true filtering error variance at each time instance. The HBEF is shown to outperform the EnKF and the HEnKF by Myrseth and Omre (2010) in a wide range of filtering regimes in terms of performance of its primary and secondary filters.

  17. NMR Studies of Dynamic Biomolecular Conformational Ensembles

    PubMed Central

    Torchia, Dennis A.

    2015-01-01

    Multidimensional heteronuclear NMR approaches can provide nearly complete sequential signal assignments of isotopically enriched biomolecules. The availability of assignments together with measurements of spin relaxation rates, residual spin interactions, J-couplings and chemical shifts provides information at atomic resolution about internal dynamics on timescales ranging from ps to ms, both in solution and in the solid state. However, due to the complexity of biomolecules, it is not possible to extract a unique atomic-resolution description of biomolecular motions even from extensive NMR data when many conformations are sampled on multiple timescales. For this reason, powerful computational approaches are increasingly applied to large NMR data sets to elucidate conformational ensembles sampled by biomolecules. In the past decade, considerable attention has been directed at an important class of biomolecules that function by binding to a wide variety of target molecules. Questions of current interest are: “Does the free biomolecule sample a conformational ensemble that encompasses the conformations found when it binds to various targets; and if so, on what time scale is the ensemble sampled?” This article reviews recent efforts to answer these questions, with a focus on comparing ensembles obtained for the same biomolecules by different investigators. A detailed comparison of results obtained is provided for three biomolecules: ubiquitin, calmodulin and the HIV-1 trans-activation response RNA. PMID:25669739

  18. Ensembl Genomes 2016: more genomes, more complexity

    PubMed Central

    Kersey, Paul Julian; Allen, James E.; Armean, Irina; Boddu, Sanjay; Bolt, Bruce J.; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Falin, Lee J.; Grabmueller, Christoph; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Aranganathan, Naveen K.; Langridge, Nicholas; Lowy, Ernesto; McDowall, Mark D.; Maheswari, Uma; Nuhn, Michael; Ong, Chuang Kee; Overduin, Bert; Paulini, Michael; Pedro, Helder; Perry, Emily; Spudich, Giulietta; Tapanari, Electra; Walts, Brandon; Williams, Gareth; Tello–Ruiz, Marcela; Stein, Joshua; Wei, Sharon; Ware, Doreen; Bolser, Daniel M.; Howe, Kevin L.; Kulesha, Eugene; Lawson, Daniel; Maslen, Gareth; Staines, Daniel M.

    2016-01-01

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces. PMID:26578574

  19. Cosmological ensemble and directional averages of observables

    SciTech Connect

    Bonvin, Camille; Clarkson, Chris; Durrer, Ruth; Maartens, Roy; Umeh, Obinna E-mail: chris.clarkson@gmail.com E-mail: roy.maartens@gmail.com

    2015-07-01

    We show that at second order, ensemble averages of observables and directional averages do not commute due to gravitational lensing—observing the same thing in many directions over the sky is not the same as taking an ensemble average. In principle this non-commutativity is significant for a variety of quantities that we often use as observables and can lead to a bias in parameter estimation. We derive the relation between the ensemble average and the directional average of an observable, at second order in perturbation theory. We discuss the relevance of these two types of averages for making predictions of cosmological observables, focusing on observables related to distances and magnitudes. In particular, we show that the ensemble average of the distance in a given observed direction is increased by gravitational lensing, whereas the directional average of the distance is decreased. For a generic observable, there exists a particular function of the observable that is not affected by second-order lensing perturbations. We also show that standard areas have an advantage over standard rulers, and we discuss the subtleties involved in averaging in the case of supernova observations.

  20. Memory for Multiple Visual Ensembles in Infancy

    ERIC Educational Resources Information Center

    Zosh, Jennifer M.; Halberda, Justin; Feigenson, Lisa

    2011-01-01

    The number of individual items that can be maintained in working memory is limited. One solution to this problem is to store representations of ensembles that contain summary information about large numbers of items (e.g., the approximate number or cumulative area of a group of many items). Here we explored the developmental origins of ensemble…

  1. Marking up lattice QCD configurations and ensembles

    SciTech Connect

    P.Coddington; B.Joo; C.M.Maynard; D.Pleiter; T.Yoshie

    2007-10-01

    QCDml is an XML-based markup language designed for sharing QCD configurations and ensembles world-wide via the International Lattice Data Grid (ILDG). Based on the latest release, we present key ingredients of the QCDml in order to provide some starting points for colleagues in this community to markup valuable configurations and submit them to the ILDG.

  2. Conductor gestures influence evaluations of ensemble performance

    PubMed Central

    Morrison, Steven J.; Price, Harry E.; Smedley, Eric M.; Meals, Cory D.

    2014-01-01

    Previous research has found that listener evaluations of ensemble performances vary depending on the expressivity of the conductor’s gestures, even when performances are otherwise identical. It was the purpose of the present study to test whether this effect of visual information was evident in the evaluation of specific aspects of ensemble performance: articulation and dynamics. We constructed a set of 32 music performances that combined auditory and visual information and were designed to feature a high degree of contrast along one of two target characteristics: articulation and dynamics. We paired each of four music excerpts recorded by a chamber ensemble in both a high- and low-contrast condition with video of four conductors demonstrating high- and low-contrast gesture specifically appropriate to either articulation or dynamics. Using one of two equivalent test forms, college music majors and non-majors (N = 285) viewed sixteen 30 s performances and evaluated the quality of the ensemble’s articulation, dynamics, technique, and tempo along with overall expressivity. Results showed significantly higher evaluations for performances featuring high rather than low conducting expressivity regardless of the ensemble’s performance quality. Evaluations for both articulation and dynamics were strongly and positively correlated with evaluations of overall ensemble expressivity. PMID:25104944

  3. Eigenstate Gibbs ensemble in integrable quantum systems

    NASA Astrophysics Data System (ADS)

    Nandy, Sourav; Sen, Arnab; Das, Arnab; Dhar, Abhishek

    2016-12-01

    The eigenstate thermalization hypothesis conjectures that for a thermodynamically large system in one of its energy eigenstates, the reduced density matrix describing any finite subsystem is determined solely by a set of relevant conserved quantities. In a chaotic quantum system, only the energy is expected to play that role and hence eigenstates appear locally thermal. Integrable systems, on the other hand, possess an extensive number of such conserved quantities and therefore the reduced density matrix requires specification of all the corresponding parameters (generalized Gibbs ensemble). However, here we show by unbiased statistical sampling of the individual eigenstates with a given finite energy density that the local description of an overwhelming majority of these states of even such an integrable system is actually Gibbs-like, i.e., requires only the energy density of the eigenstate. Rare eigenstates that cannot be represented by the Gibbs ensemble can also be sampled efficiently by our method and their local properties are then shown to be described by appropriately truncated generalized Gibbs ensembles. We further show that the presence of these rare eigenstates differentiates the model from the chaotic case and leads to the system being described by a generalized Gibbs ensemble at long time under a unitary dynamics following a sudden quench, even when the initial state is a typical (Gibbs-like) eigenstate of the prequench Hamiltonian.

  4. Ensembl Genomes 2016: more genomes, more complexity.

    PubMed

    Kersey, Paul Julian; Allen, James E; Armean, Irina; Boddu, Sanjay; Bolt, Bruce J; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Falin, Lee J; Grabmueller, Christoph; Humphrey, Jay; Kerhornou, Arnaud; Khobova, Julia; Aranganathan, Naveen K; Langridge, Nicholas; Lowy, Ernesto; McDowall, Mark D; Maheswari, Uma; Nuhn, Michael; Ong, Chuang Kee; Overduin, Bert; Paulini, Michael; Pedro, Helder; Perry, Emily; Spudich, Giulietta; Tapanari, Electra; Walts, Brandon; Williams, Gareth; Tello-Ruiz, Marcela; Stein, Joshua; Wei, Sharon; Ware, Doreen; Bolser, Daniel M; Howe, Kevin L; Kulesha, Eugene; Lawson, Daniel; Maslen, Gareth; Staines, Daniel M

    2016-01-04

    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces.

  5. Evolution of tree nutrition.

    PubMed

    Raven, John A; Andrews, Mitchell

    2010-09-01

    Using a broad definition of trees, the evolutionary origins of trees in a nutritional context is considered using data from the fossil record and molecular phylogeny. Trees are first known from the Late Devonian about 380 million years ago, originated polyphyletically at the pteridophyte grade of organization; the earliest gymnosperms were trees, and trees are polyphyletic in the angiosperms. Nutrient transporters, assimilatory pathways, homoiohydry (cuticle, intercellular gas spaces, stomata, endohydric water transport systems including xylem and phloem-like tissue) and arbuscular mycorrhizas preceded the origin of trees. Nutritional innovations that began uniquely in trees were the seed habit and, certainly (but not necessarily uniquely) in trees, ectomycorrhizas, cyanobacterial, actinorhizal and rhizobial (Parasponia, some legumes) diazotrophic symbioses and cluster roots.

  6. Tree Classification Software

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1993-01-01

    This paper introduces the IND Tree Package to prospective users. IND does supervised learning using classification trees. This learning task is a basic tool used in the development of diagnosis, monitoring and expert systems. The IND Tree Package was developed as part of a NASA project to semi-automate the development of data analysis and modelling algorithms using artificial intelligence techniques. The IND Tree Package integrates features from CART and C4 with newer Bayesian and minimum encoding methods for growing classification trees and graphs. The IND Tree Package also provides an experimental control suite on top. The newer features give improved probability estimates often required in diagnostic and screening tasks. The package comes with a manual, Unix 'man' entries, and a guide to tree methods and research. The IND Tree Package is implemented in C under Unix and was beta-tested at university and commercial research laboratories in the United States.

  7. Ensemble of classifiers for confidence-rated classification of NDE signal

    NASA Astrophysics Data System (ADS)

    Banerjee, Portia; Safdarnejad, Seyed; Udpa, Lalita; Udpa, Satish

    2016-02-01

    Ensemble of classifiers in general, aims to improve classification accuracy by combining results from multiple weak hypotheses into a single strong classifier through weighted majority voting. Improved versions of ensemble of classifiers generate self-rated confidence scores which estimate the reliability of each of its prediction and boost the classifier using these confidence-rated predictions. However, such a confidence metric is based only on the rate of correct classification. In existing works, although ensemble of classifiers has been widely used in computational intelligence, the effect of all factors of unreliability on the confidence of classification is highly overlooked. With relevance to NDE, classification results are affected by inherent ambiguity of classifica-tion, non-discriminative features, inadequate training samples and noise due to measurement. In this paper, we extend the existing ensemble classification by maximizing confidence of every classification decision in addition to minimizing the classification error. Initial results of the approach on data from eddy current inspection show improvement in classification performance of defect and non-defect indications.

  8. Illumination Under Trees

    SciTech Connect

    Max, N

    2002-08-19

    This paper is a survey of the author's work on illumination and shadows under trees, including the effects of sky illumination, sun penumbras, scattering in a misty atmosphere below the trees, and multiple scattering and transmission between leaves. It also describes a hierarchical image-based rendering method for trees.

  9. The Wish Tree Project

    ERIC Educational Resources Information Center

    Brooks, Sarah DeWitt

    2010-01-01

    This article describes the author's experience in implementing a Wish Tree project in her school in an effort to bring the school community together with a positive art-making experience during a potentially stressful time. The concept of a wish tree is simple: plant a tree; provide tags and pencils for writing wishes; and encourage everyone to…

  10. Diary of a Tree.

    ERIC Educational Resources Information Center

    Srulowitz, Frances

    1992-01-01

    Describes an activity to develop students' skills of observation and recordkeeping by studying the growth of a tree's leaves during the spring. Children monitor the growth of 11 tress over a 2-month period, draw pictures of the tree at different stages of growth, and write diaries of the tree's growth. (MDH)