Generation and Termination of Binary Decision Trees for Nonparametric Multiclass Classification.
1984-10-01
O M coF=F;; UMBER2. GOVT ACCE5SION NO.1 3 . REC,PINS :A7AL:,G NUMBER ( ’eneration and Terminat_,on :)f Binary D-ecision jC j ik; Trees for Nonnararetrc...1-I . v)IAMO 0~I4 EDvt" O F I 00 . 3 15I OR%.OL.ETL - S-S OCTOBER 1984 LIDS-P-1411 GENERATION AND TERMINATION OF BINARY DECISION TREES FOR...minimizes the Bayes risk. Tree generation and termination are based on the training and test samples, respectively. 0 0 0/ 6 0¢ A 3 I. Introduction We state
Balk, Benjamin; Elder, Kelly
2000-01-01
We model the spatial distribution of snow across a mountain basin using an approach that combines binary decision tree and geostatistical techniques. In April 1997 and 1998, intensive snow surveys were conducted in the 6.9‐km2 Loch Vale watershed (LVWS), Rocky Mountain National Park, Colorado. Binary decision trees were used to model the large‐scale variations in snow depth, while the small‐scale variations were modeled through kriging interpolation methods. Binary decision trees related depth to the physically based independent variables of net solar radiation, elevation, slope, and vegetation cover type. These decision tree models explained 54–65% of the observed variance in the depth measurements. The tree‐based modeled depths were then subtracted from the measured depths, and the resulting residuals were spatially distributed across LVWS through kriging techniques. The kriged estimates of the residuals were added to the tree‐based modeled depths to produce a combined depth model. The combined depth estimates explained 60–85% of the variance in the measured depths. Snow densities were mapped across LVWS using regression analysis. Snow‐covered area was determined from high‐resolution aerial photographs. Combining the modeled depths and densities with a snow cover map produced estimates of the spatial distribution of snow water equivalence (SWE). This modeling approach offers improvement over previous methods of estimating SWE distribution in mountain basins.
Method and apparatus for detecting a desired behavior in digital image data
Kegelmeyer, Jr., W. Philip
1997-01-01
A method for detecting stellate lesions in digitized mammographic image data includes the steps of prestoring a plurality of reference images, calculating a plurality of features for each of the pixels of the reference images, and creating a binary decision tree from features of randomly sampled pixels from each of the reference images. Once the binary decision tree has been created, a plurality of features, preferably including an ALOE feature (analysis of local oriented edges), are calculated for each of the pixels of the digitized mammographic data. Each of these plurality of features of each pixel are input into the binary decision tree and a probability is determined, for each of the pixels, corresponding to the likelihood of the presence of a stellate lesion, to create a probability image. Finally, the probability image is spatially filtered to enforce local consensus among neighboring pixels and the spatially filtered image is output.
Method and apparatus for detecting a desired behavior in digital image data
Kegelmeyer, Jr., W. Philip
1997-01-01
A method for detecting stellate lesions in digitized mammographic image data includes the steps of prestoring a plurality of reference images, calculating a plurality of features for each of the pixels of the reference images, and creating a binary decision tree from features of randomly sampled pixels from each of the reference images. Once the binary decision tree has been created, a plurality of features, preferably including an ALOE feature (analysis of local oriented edges), are calculated for each of the pixels of the digitized mammographic data. Each of these plurality of features of each pixel are input into the binary decision tree and a probability is determined, for each of the pixels, corresponding to the likelihood of the presence of a stellate lesion, to create a probability image. Finally, the probability image is spacially filtered to enforce local consensus among neighboring pixels and the spacially filtered image is output.
Kernel and divergence techniques in high energy physics separations
NASA Astrophysics Data System (ADS)
Bouř, Petr; Kůs, Václav; Franc, Jiří
2017-10-01
Binary decision trees under the Bayesian decision technique are used for supervised classification of high-dimensional data. We present a great potential of adaptive kernel density estimation as the nested separation method of the supervised binary divergence decision tree. Also, we provide a proof of alternative computing approach for kernel estimates utilizing Fourier transform. Further, we apply our method to Monte Carlo data set from the particle accelerator Tevatron at DØ experiment in Fermilab and provide final top-antitop signal separation results. We have achieved up to 82 % AUC while using the restricted feature selection entering the signal separation procedure.
Soft context clustering for F0 modeling in HMM-based speech synthesis
NASA Astrophysics Data System (ADS)
Khorram, Soheil; Sameti, Hossein; King, Simon
2015-12-01
This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method.
Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees
Chang, Wan-Yu; Chiu, Chung-Cheng; Yang, Jia-Horng
2015-01-01
In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods. PMID:26393597
Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa
2017-03-01
Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm 3 , there was 80% probability of a cyst. If volume was <247 mm 3 and root displacement was present, cyst probability was 60% (78% accuracy). The good accuracy and high specificity of the decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.
Machine Learning Through Signature Trees. Applications to Human Speech.
ERIC Educational Resources Information Center
White, George M.
A signature tree is a binary decision tree used to classify unknown patterns. An attempt was made to develop a computer program for manipulating signature trees as a general research tool for exploring machine learning and pattern recognition. The program was applied to the problem of speech recognition to test its effectiveness for a specific…
A dynamic fault tree model of a propulsion system
NASA Technical Reports Server (NTRS)
Xu, Hong; Dugan, Joanne Bechta; Meshkat, Leila
2006-01-01
We present a dynamic fault tree model of the benchmark propulsion system, and solve it using Galileo. Dynamic fault trees (DFT) extend traditional static fault trees with special gates to model spares and other sequence dependencies. Galileo solves DFT models using a judicious combination of automatically generated Markov and Binary Decision Diagram models. Galileo easily handles the complexities exhibited by the benchmark problem. In particular, Galileo is designed to model phased mission systems.
NASA Astrophysics Data System (ADS)
To, Cuong; Pham, Tuan D.
2010-01-01
In machine learning, pattern recognition may be the most popular task. "Similar" patterns identification is also very important in biology because first, it is useful for prediction of patterns associated with disease, for example cancer tissue (normal or tumor); second, similarity or dissimilarity of the kinetic patterns is used to identify coordinately controlled genes or proteins involved in the same regulatory process. Third, similar genes (proteins) share similar functions. In this paper, we present an algorithm which uses genetic programming to create decision tree for binary classification problem. The application of the algorithm was implemented on five real biological databases. Base on the results of comparisons with well-known methods, we see that the algorithm is outstanding in most of cases.
Decision tree modeling using R.
Zhang, Zhongheng
2016-08-01
In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Decision and Game Theory for Security
NASA Astrophysics Data System (ADS)
Alpcan, Tansu; Buttyán, Levente; Baras, John S.
Attack--defense trees are used to describe security weaknesses of a system and possible countermeasures. In this paper, the connection between attack--defense trees and game theory is made explicit. We show that attack--defense trees and binary zero-sum two-player extensive form games have equivalent expressive power when considering satisfiability, in the sense that they can be converted into each other while preserving their outcome and their internal structure.
Phan, Thanh G; Chen, Jian; Beare, Richard; Ma, Henry; Clissold, Benjamin; Van Ly, John; Srikanth, Velandai
2017-01-01
Prognostication following intracerebral hemorrhage (ICH) has focused on poor outcome at the expense of lumping together mild and moderate disability. We aimed to develop a novel approach at classifying a range of disability following ICH. The Virtual International Stroke Trial Archive collaboration database was searched for patients with ICH and known volume of ICH on baseline CT scans. Disability was partitioned into mild [modified Rankin Scale (mRS) at 90 days of 0-2], moderate (mRS = 3-4), and severe disabilities (mRS = 5-6). We used binary and trichotomy decision tree methodology. The data were randomly divided into training (2/3 of data) and validation (1/3 data) datasets. The area under the receiver operating characteristic curve (AUC) was used to calculate the accuracy of the decision tree model. We identified 957 patients, age 65.9 ± 12.3 years, 63.7% males, and ICH volume 22.6 ± 22.1 ml. The binary tree showed that lower ICH volume (<13.7 ml), age (<66.5 years), serum glucose (<8.95 mmol/l), and systolic blood pressure (<170 mm Hg) discriminate between mild versus moderate-to-severe disabilities with AUC of 0.79 (95% CI 0.73-0.85). Large ICH volume (>27.9 ml), older age (>69.5 years), and low Glasgow Coma Scale (<15) classify severe disability with AUC of 0.80 (95% CI 0.75-0.86). The trichotomy tree showed that ICH volume, age, and serum glucose can separate mild, moderate, and severe disability groups with AUC 0.79 (95% CI 0.71-0.87). Both the binary and trichotomy methods provide equivalent discrimination of disability outcome after ICH. The trichotomy method can classify three categories at once, whereas this action was not possible with the binary method. The trichotomy method may be of use to clinicians and trialists for classifying a range of disability in ICH.
Comprehensive decision tree models in bioinformatics.
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
2012-01-01
Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.
Comprehensive Decision Tree Models in Bioinformatics
Stiglic, Gregor; Kocbek, Simon; Pernek, Igor; Kokol, Peter
2012-01-01
Purpose Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. Methods This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. Results The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. Conclusions The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. PMID:22479449
Kleinhans, Sonja; Herrmann, Eva; Kohnen, Thomas; Bühren, Jens
2017-08-15
Background Iatrogenic keratectasia is one of the most dreaded complications of refractive surgery. In most cases, keratectasia develops after refractive surgery of eyes suffering from subclinical stages of keratoconus with few or no signs. Unfortunately, there has been no reliable procedure for the early detection of keratoconus. In this study, we used binary decision trees (recursive partitioning) to assess their suitability for discrimination between normal eyes and eyes with subclinical keratoconus. Patients and Methods The method of decision tree analysis was compared with discriminant analysis which has shown good results in previous studies. Input data were 32 eyes of 32 patients with newly diagnosed keratoconus in the contralateral eye and preoperative data of 10 eyes of 5 patients with keratectasia after laser in-situ keratomileusis (LASIK). The control group was made up of 245 normal eyes after LASIK and 12-month follow-up without any signs of iatrogenic keratectasia. Results Decision trees gave better accuracy and specificity than did discriminant analysis. The sensitivity of decision trees was lower than the sensitivity of discriminant analysis. Conclusion On the basis of the patient population of this study, decision trees did not prove to be superior to linear discriminant analysis for the detection of subclinical keratoconus. Georg Thieme Verlag KG Stuttgart · New York.
Wheeler, David C.; Burstyn, Igor; Vermeulen, Roel; Yu, Kai; Shortreed, Susan M.; Pronk, Anjoeka; Stewart, Patricia A.; Colt, Joanne S.; Baris, Dalsu; Karagas, Margaret R.; Schwenn, Molly; Johnson, Alison; Silverman, Debra T.; Friesen, Melissa C.
2014-01-01
Objectives Evaluating occupational exposures in population-based case-control studies often requires exposure assessors to review each study participants' reported occupational information job-by-job to derive exposure estimates. Although such assessments likely have underlying decision rules, they usually lack transparency, are time-consuming and have uncertain reliability and validity. We aimed to identify the underlying rules to enable documentation, review, and future use of these expert-based exposure decisions. Methods Classification and regression trees (CART, predictions from a single tree) and random forests (predictions from many trees) were used to identify the underlying rules from the questionnaire responses and an expert's exposure assignments for occupational diesel exhaust exposure for several metrics: binary exposure probability and ordinal exposure probability, intensity, and frequency. Data were split into training (n=10,488 jobs), testing (n=2,247), and validation (n=2,248) data sets. Results The CART and random forest models' predictions agreed with 92–94% of the expert's binary probability assignments. For ordinal probability, intensity, and frequency metrics, the two models extracted decision rules more successfully for unexposed and highly exposed jobs (86–90% and 57–85%, respectively) than for low or medium exposed jobs (7–71%). Conclusions CART and random forest models extracted decision rules and accurately predicted an expert's exposure decisions for the majority of jobs and identified questionnaire response patterns that would require further expert review if the rules were applied to other jobs in the same or different study. This approach makes the exposure assessment process in case-control studies more transparent and creates a mechanism to efficiently replicate exposure decisions in future studies. PMID:23155187
Binary recursive partitioning: background, methods, and application to psychology.
Merkle, Edgar C; Shaffer, Victoria A
2011-02-01
Binary recursive partitioning (BRP) is a computationally intensive statistical method that can be used in situations where linear models are often used. Instead of imposing many assumptions to arrive at a tractable statistical model, BRP simply seeks to accurately predict a response variable based on values of predictor variables. The method outputs a decision tree depicting the predictor variables that were related to the response variable, along with the nature of the variables' relationships. No significance tests are involved, and the tree's 'goodness' is judged based on its predictive accuracy. In this paper, we describe BRP methods in a detailed manner and illustrate their use in psychological research. We also provide R code for carrying out the methods.
Efficient algorithms for dilated mappings of binary trees
NASA Technical Reports Server (NTRS)
Iqbal, M. Ashraf
1990-01-01
The problem is addressed to find a 1-1 mapping of the vertices of a binary tree onto those of a target binary tree such that the son of a node on the first binary tree is mapped onto a descendent of the image of that node in the second binary tree. There are two natural measures of the cost of this mapping, namely the dilation cost, i.e., the maximum distance in the target binary tree between the images of vertices that are adjacent in the original tree. The other measure, expansion cost, is defined as the number of extra nodes/edges to be added to the target binary tree in order to ensure a 1-1 mapping. An efficient algorithm to find a mapping of one binary tree onto another is described. It is shown that it is possible to minimize one cost of mapping at the expense of the other. This problem arises when designing pipelined arithmetic logic units (ALU) for special purpose computers. The pipeline is composed of ALU chips connected in the form of a binary tree. The operands to the pipeline can be supplied to the leaf nodes of the binary tree which then process and pass the results up to their parents. The final result is available at the root. As each new application may require a distinct nesting of operations, it is useful to be able to find a good mapping of a new binary tree over existing ALU tree. Another problem arises if every distinct required binary tree is known beforehand. Here it is useful to hardwire the pipeline in the form of a minimal supertree that contains all required binary trees.
Rough Set Based Splitting Criterion for Binary Decision Tree Classifiers
2006-09-26
Alata O. Fernandez-Maloigne C., and Ferrie J.C. (2001). Unsupervised Algorithm for the Segmentation of Three-Dimensional Magnetic Resonance Brain ...instinctual and learned responses in the brain , causing it to make decisions based on patterns in the stimuli. Using this deceptively simple process...2001. [2] Bohn C. (1997). An Incremental Unsupervised Learning Scheme for Function Approximation. In: Proceedings of the 1997 IEEE International
NASA Astrophysics Data System (ADS)
Gessesse, B.; Bewket, W.; Bräuning, A.
2015-11-01
Land degradation due to lack of sustainable land management practices are one of the critical challenges in many developing countries including Ethiopia. This study explores the major determinants of farm level tree planting decision as a land management strategy in a typical framing and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and binary logistic regression model. The model significantly predicted farmers' tree planting decision (Chi-square = 37.29, df = 15, P<0.001). Besides, the computed significant value of the model suggests that all the considered predictor variables jointly influenced the farmers' decision to plant trees as a land management strategy. In this regard, the finding of the study show that local land-users' willingness to adopt tree growing decision is a function of a wide range of biophysical, institutional, socioeconomic and household level factors, however, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system have positively and significantly influence on tree growing investment decisions in the study watershed. Eventually, the processes of land use conversion and land degradation are serious which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, devising sustainable and integrated land management policy options and implementing them would enhance ecological restoration and livelihood sustainability in the study watershed.
NASA Astrophysics Data System (ADS)
Gessesse, Berhan; Bewket, Woldeamlak; Bräuning, Achim
2016-04-01
Land degradation due to lack of sustainable land management practices is one of the critical challenges in many developing countries including Ethiopia. This study explored the major determinants of farm-level tree-planting decisions as a land management strategy in a typical farming and degraded landscape of the Modjo watershed, Ethiopia. The main data were generated from household surveys and analysed using descriptive statistics and a binary logistic regression model. The model significantly predicted farmers' tree-planting decisions (χ2 = 37.29, df = 15, P < 0.001). Besides, the computed significant value of the model revealed that all the considered predictor variables jointly influenced the farmers' decisions to plant trees as a land management strategy. The findings of the study demonstrated that the adoption of tree-growing decisions by local land users was a function of a wide range of biophysical, institutional, socioeconomic and household-level factors. In this regard, the likelihood of household size, productive labour force availability, the disparity of schooling age, level of perception of the process of deforestation and the current land tenure system had a critical influence on tree-growing investment decisions in the study watershed. Eventually, the processes of land-use conversion and land degradation were serious, which in turn have had adverse effects on agricultural productivity, local food security and poverty trap nexus. Hence, the study recommended that devising and implementing sustainable land management policy options would enhance ecological restoration and livelihood sustainability in the study watershed.
Exact Algorithms for Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees.
Kordi, Misagh; Bansal, Mukul S
2017-06-01
Duplication-Transfer-Loss (DTL) reconciliation is a powerful method for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation seeks to reconcile gene trees with species trees by postulating speciation, duplication, transfer, and loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. In practice, however, gene trees are often non-binary due to uncertainty in the gene tree topologies, and DTL reconciliation with non-binary gene trees is known to be NP-hard. In this paper, we present the first exact algorithms for DTL reconciliation with non-binary gene trees. Specifically, we (i) show that the DTL reconciliation problem for non-binary gene trees is fixed-parameter tractable in the maximum degree of the gene tree, (ii) present an exponential-time, but in-practice efficient, algorithm to track and enumerate all optimal binary resolutions of a non-binary input gene tree, and (iii) apply our algorithms to a large empirical data set of over 4700 gene trees from 100 species to study the impact of gene tree uncertainty on DTL-reconciliation and to demonstrate the applicability and utility of our algorithms. The new techniques and algorithms introduced in this paper will help biologists avoid incorrect evolutionary inferences caused by gene tree uncertainty.
NASA Technical Reports Server (NTRS)
Lure, Y. M. Fleming; Grody, Norman C.; Chiou, Y. S. Peter; Yeh, H. Y. Michael
1993-01-01
A data fusion system with artificial neural networks (ANN) is used for fast and accurate classification of five earth surface conditions and surface changes, based on seven SSMI multichannel microwave satellite measurements. The measurements include brightness temperatures at 19, 22, 37, and 85 GHz at both H and V polarizations (only V at 22 GHz). The seven channel measurements are processed through a convolution computation such that all measurements are located at same grid. Five surface classes including non-scattering surface, precipitation over land, over ocean, snow, and desert are identified from ground-truth observations. The system processes sensory data in three consecutive phases: (1) pre-processing to extract feature vectors and enhance separability among detected classes; (2) preliminary classification of Earth surface patterns using two separate and parallely acting classifiers: back-propagation neural network and binary decision tree classifiers; and (3) data fusion of results from preliminary classifiers to obtain the optimal performance in overall classification. Both the binary decision tree classifier and the fusion processing centers are implemented by neural network architectures. The fusion system configuration is a hierarchical neural network architecture, in which each functional neural net will handle different processing phases in a pipelined fashion. There is a total of around 13,500 samples for this analysis, of which 4 percent are used as the training set and 96 percent as the testing set. After training, this classification system is able to bring up the detection accuracy to 94 percent compared with 88 percent for back-propagation artificial neural networks and 80 percent for binary decision tree classifiers. The neural network data fusion classification is currently under progress to be integrated in an image processing system at NOAA and to be implemented in a prototype of a massively parallel and dynamically reconfigurable Modular Neural Ring (MNR).
Assessment of various supervised learning algorithms using different performance metrics
NASA Astrophysics Data System (ADS)
Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.
2017-11-01
Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.
On the Complexity of Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees.
Kordi, Misagh; Bansal, Mukul S
2017-01-01
Duplication-Transfer-Loss (DTL) reconciliation has emerged as a powerful technique for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation takes as input a gene family phylogeny and the corresponding species phylogeny, and reconciles the two by postulating speciation, gene duplication, horizontal gene transfer, and gene loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. However, gene trees are frequently non-binary. With such non-binary gene trees, the reconciliation problem seeks to find a binary resolution of the gene tree that minimizes the reconciliation cost. Given the prevalence of non-binary gene trees, many efficient algorithms have been developed for this problem in the context of the simpler Duplication-Loss (DL) reconciliation model. Yet, no efficient algorithms exist for DTL reconciliation with non-binary gene trees and the complexity of the problem remains unknown. In this work, we resolve this open question by showing that the problem is, in fact, NP-hard. Our reduction applies to both the dated and undated formulations of DTL reconciliation. By resolving this long-standing open problem, this work will spur the development of both exact and heuristic algorithms for this important problem.
NASA Astrophysics Data System (ADS)
Kotelnikov, E. V.; Milov, V. R.
2018-05-01
Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.
Visualizing the Bayesian 2-test case: The effect of tree diagrams on medical decision making.
Binder, Karin; Krauss, Stefan; Bruckmaier, Georg; Marienhagen, Jörg
2018-01-01
In medicine, diagnoses based on medical test results are probabilistic by nature. Unfortunately, cognitive illusions regarding the statistical meaning of test results are well documented among patients, medical students, and even physicians. There are two effective strategies that can foster insight into what is known as Bayesian reasoning situations: (1) translating the statistical information on the prevalence of a disease and the sensitivity and the false-alarm rate of a specific test for that disease from probabilities into natural frequencies, and (2) illustrating the statistical information with tree diagrams, for instance, or with other pictorial representation. So far, such strategies have only been empirically tested in combination for "1-test cases", where one binary hypothesis ("disease" vs. "no disease") has to be diagnosed based on one binary test result ("positive" vs. "negative"). However, in reality, often more than one medical test is conducted to derive a diagnosis. In two studies, we examined a total of 388 medical students from the University of Regensburg (Germany) with medical "2-test scenarios". Each student had to work on two problems: diagnosing breast cancer with mammography and sonography test results, and diagnosing HIV infection with the ELISA and Western Blot tests. In Study 1 (N = 190 participants), we systematically varied the presentation of statistical information ("only textual information" vs. "only tree diagram" vs. "text and tree diagram in combination"), whereas in Study 2 (N = 198 participants), we varied the kinds of tree diagrams ("complete tree" vs. "highlighted tree" vs. "pruned tree"). All versions were implemented in probability format (including probability trees) and in natural frequency format (including frequency trees). We found that natural frequency trees, especially when the question-related branches were highlighted, improved performance, but that none of the corresponding probabilistic visualizations did.
Two Improved Access Methods on Compact Binary (CB) Trees.
ERIC Educational Resources Information Center
Shishibori, Masami; Koyama, Masafumi; Okada, Makoto; Aoe, Jun-ichi
2000-01-01
Discusses information retrieval and the use of binary trees as a fast access method for search strategies such as hashing. Proposes new methods based on compact binary trees that provide faster access and more compact storage, explains the theoretical basis, and confirms the validity of the methods through empirical observations. (LRW)
NASA Astrophysics Data System (ADS)
Behzadi, Naghi; Ahansaz, Bahram
2018-04-01
We propose a mechanism for quantum state transfer (QST) over a binary tree spin network on the basis of incomplete collapsing measurements. To this aim, we perform initially a weak measurement (WM) on the central qubit of the binary tree network where the state of our concern has been prepared on that qubit. After the time evolution of the whole system, a quantum measurement reversal (QMR) is performed on a chosen target qubit. By taking optimal value for the strength of QMR, it is shown that the QST quality from the sending qubit to any typical target qubit on the binary tree is considerably improved in terms of the WM strength. Also, we show that how high-quality entanglement distribution over the binary tree network is achievable by using this approach.
NASA Technical Reports Server (NTRS)
Owre, Sam; Shankar, Natarajan
1997-01-01
PVS (Prototype Verification System) is a general-purpose environment for developing specifications and proofs. This document deals primarily with the abstract datatype mechanism in PVS which generates theories containing axioms and definitions for a class of recursive datatypes. The concepts underlying the abstract datatype mechanism are illustrated using ordered binary trees as an example. Binary trees are described by a PVS abstract datatype that is parametric in its value type. The type of ordered binary trees is then presented as a subtype of binary trees where the ordering relation is also taken as a parameter. We define the operations of inserting an element into, and searching for an element in an ordered binary tree; the bulk of the report is devoted to PVS proofs of some useful properties of these operations. These proofs illustrate various approaches to proving properties of abstract datatype operations. They also describe the built-in capabilities of the PVS proof checker for simplifying abstract datatype expressions.
Efficient Merge and Insert Operations for Binary Heaps and Trees
NASA Technical Reports Server (NTRS)
Kuszmaul, Christopher Lee; Woo, Alex C. (Technical Monitor)
2000-01-01
Binary heaps and binary search trees merge efficiently. We introduce a new amortized analysis that allows us to prove the cost of merging either binary heaps or balanced binary trees is O(l), in the amortized sense. The standard set of other operations (create, insert, delete, extract minimum, in the case of binary heaps, and balanced binary trees, as well as a search operation for balanced binary trees) remain with a cost of O(log n). For binary heaps implemented as arrays, we show a new merge algorithm that has a single operation cost for merging two heaps, a and b, of O(absolute value of a + min(log absolute value of b log log absolute value of b. log absolute value of a log absolute value of b). This is an improvement over O(absolute value of a + log absolute value of a log absolute value of b). The cost of the new merge is so low that it can be used in a new structure which we call shadow heaps. to implement the insert operation to a tunable efficiency. Shadow heaps support the insert operation for simple priority queues in an amortized time of O(f(n)) and other operations in time O((log n log log n)/f (n)), where 1 less than or equal to f (n) less than or equal to log log n. More generally, the results here show that any data structure with operations that change its size by at most one, with the exception of a merge (aka meld) operation, can efficiently amortize the cost of the merge under conditions that are true for most implementations of binary heaps and search trees.
Quantum computation with classical light: Implementation of the Deutsch-Jozsa algorithm
NASA Astrophysics Data System (ADS)
Perez-Garcia, Benjamin; McLaren, Melanie; Goyal, Sandeep K.; Hernandez-Aranda, Raul I.; Forbes, Andrew; Konrad, Thomas
2016-05-01
We propose an optical implementation of the Deutsch-Jozsa Algorithm using classical light in a binary decision-tree scheme. Our approach uses a ring cavity and linear optical devices in order to efficiently query the oracle functional values. In addition, we take advantage of the intrinsic Fourier transforming properties of a lens to read out whether the function given by the oracle is balanced or constant.
Binary partition tree analysis based on region evolution and its application to tree simplification.
Lu, Huihai; Woods, John C; Ghanbari, Mohammed
2007-04-01
Pyramid image representations via tree structures are recognized methods for region-based image analysis. Binary partition trees can be applied which document the merging process with small details found at the bottom levels and larger ones close to the root. Hindsight of the merging process is stored within the tree structure and provides the change histories of an image property from the leaf to the root node. In this work, the change histories are modelled by evolvement functions and their second order statistics are analyzed by using a knee function. Knee values show the reluctancy of each merge. We have systematically formulated these findings to provide a novel framework for binary partition tree analysis, where tree simplification is demonstrated. Based on an evolvement function, for each upward path in a tree, the tree node associated with the first reluctant merge is considered as a pruning candidate. The result is a simplified version providing a reduced solution space and still complying with the definition of a binary tree. The experiments show that image details are preserved whilst the number of nodes is dramatically reduced. An image filtering tool also results which preserves object boundaries and has applications for segmentation.
The Use of Binary Search Trees in External Distribution Sorting.
ERIC Educational Resources Information Center
Cooper, David; Lynch, Michael F.
1984-01-01
Suggests new method of external distribution called tree partitioning that involves use of binary tree to split incoming file into successively smaller partitions for internal sorting. Number of disc accesses during a tree-partitioning sort were calculated in simulation using files extracted from British National Bibliography catalog files. (19…
An effective method on pornographic images realtime recognition
NASA Astrophysics Data System (ADS)
Wang, Baosong; Lv, Xueqiang; Wang, Tao; Wang, Chengrui
2013-03-01
In this paper, skin detection, texture filtering and face detection are used to extract feature on an image library, training them with the decision tree arithmetic to create some rules as a decision tree classifier to distinguish an unknown image. Experiment based on more than twenty thousand images, the precision rate can get 76.21% when testing on 13025 pornographic images and elapsed time is less than 0.2s. This experiment shows it has a good popularity. Among the steps mentioned above, proposing a new skin detection model which called irregular polygon region skin detection model based on YCbCr color space. This skin detection model can lower the false detection rate on skin detection. A new method called sequence region labeling on binary connected area can calculate features on connected area, it is faster and needs less memory than other recursive methods.
Pärkkä, Juha; Cluitmans, Luc; Ermes, Miikka
2010-09-01
Inactive and sedentary lifestyle is a major problem in many industrialized countries today. Automatic recognition of type of physical activity can be used to show the user the distribution of his daily activities and to motivate him into more active lifestyle. In this study, an automatic activity-recognition system consisting of wireless motion bands and a PDA is evaluated. The system classifies raw sensor data into activity types online. It uses a decision tree classifier, which has low computational cost and low battery consumption. The classifier parameters can be personalized online by performing a short bout of an activity and by telling the system which activity is being performed. Data were collected with seven volunteers during five everyday activities: lying, sitting/standing, walking, running, and cycling. The online system can detect these activities with overall 86.6% accuracy and with 94.0% accuracy after classifier personalization.
CARTAM. The Cartesian Access Method for Data Structures with n-dimensional Keys.
1979-01-01
become apparent later, I have chosen to store structural information in an explicit binary tree , with modifications. instead of the left and right links of...the usual binary tree , I use the child and twin pointers of a ring structure or circular list. This ring structure as illustrated in figure 3-1* also...Since the file is being stored as an explicit binary tree , note that additional records are being generated, and the concept of an Ni-thm record for
Binary tree eigen solver in finite element analysis
NASA Technical Reports Server (NTRS)
Akl, F. A.; Janetzke, D. C.; Kiraly, L. J.
1993-01-01
This paper presents a transputer-based binary tree eigensolver for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on the method of recursive doubling, which parallel implementation of a number of associative operations on an arbitrary set having N elements is of the order of o(log2N), compared to (N-1) steps if implemented sequentially. The hardware used in the implementation of the binary tree consists of 32 transputers. The algorithm is written in OCCAM which is a high-level language developed with the transputers to address parallel programming constructs and to provide the communications between processors. The algorithm can be replicated to match the size of the binary tree transputer network. Parallel and sequential finite element analysis programs have been developed to solve for the set of the least-order eigenpairs using the modified subspace method. The speed-up obtained for a typical analysis problem indicates close agreement with the theoretical prediction given by the method of recursive doubling.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Symbolic Boolean Manipulation with Ordered Binary Decision Diagrams
1992-07-01
memories , where careful attention has been given to programming the memory management routines [Brace et al 19901. To extract maximum performance, it...OBDDs) represent Boolean functions as directed acyclic graphs. They form a canonical representation, making testing of functional properties such as...indicated 3 X X2 X3 f 000 0 0 01 0X22 0 10 0 0 11 1 d 1 0 0 0 X3 X 3X 1 01 1 1 10 0 - i"o11 10o 1 1 Figure 1: Truth Table and Decison Tree Repremmtatios
An object-based approach for tree species extraction from digital orthophoto maps
NASA Astrophysics Data System (ADS)
Jamil, Akhtar; Bayram, Bulent
2018-05-01
Tree segmentation is an active and ongoing research area in the field of photogrammetry and remote sensing. It is more challenging due to both intra-class and inter-class similarities among various tree species. In this study, we exploited various statistical features for extraction of hazelnut trees from 1 : 5000 scaled digital orthophoto maps. Initially, the non-vegetation areas were eliminated using traditional normalized difference vegetation index (NDVI) followed by application of mean shift segmentation for transforming the pixels into meaningful homogeneous objects. In order to eliminate false positives, morphological opening and closing was employed on candidate objects. A number of heuristics were also derived to eliminate unwanted effects such as shadow and bounding box aspect ratios, before passing them into the classification stage. Finally, a knowledge based decision tree was constructed to distinguish the hazelnut trees from rest of objects which include manmade objects and other type of vegetation. We evaluated the proposed methodology on 10 sample orthophoto maps obtained from Giresun province in Turkey. The manually digitized hazelnut tree boundaries were taken as reference data for accuracy assessment. Both manually digitized and segmented tree borders were converted into binary images and the differences were calculated. According to the obtained results, the proposed methodology obtained an overall accuracy of more than 85 % for all sample images.
Nearest Neighbor Searching in Binary Search Trees: Simulation of a Multiprocessor System.
ERIC Educational Resources Information Center
Stewart, Mark; Willett, Peter
1987-01-01
Describes the simulation of a nearest neighbor searching algorithm for document retrieval using a pool of microprocessors. Three techniques are described which allow parallel searching of a binary search tree as well as a PASCAL-based system, PASSIM, which can simulate these techniques. Fifty-six references are provided. (Author/LRW)
Live phylogeny with polytomies: Finding the most compact parsimonious trees.
Papamichail, D; Huang, A; Kennedy, E; Ott, J-L; Miller, A; Papamichail, G
2017-08-01
Construction of phylogenetic trees has traditionally focused on binary trees where all species appear on leaves, a problem for which numerous efficient solutions have been developed. Certain application domains though, such as viral evolution and transmission, paleontology, linguistics, and phylogenetic stemmatics, often require phylogeny inference that involves placing input species on ancestral tree nodes (live phylogeny), and polytomies. These requirements, despite their prevalence, lead to computationally harder algorithmic solutions and have been sparsely examined in the literature to date. In this article we prove some unique properties of most parsimonious live phylogenetic trees with polytomies, and their mapping to traditional binary phylogenetic trees. We show that our problem reduces to finding the most compact parsimonious tree for n species, and describe a novel efficient algorithm to find such trees without resorting to exhaustive enumeration of all possible tree topologies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Thuillard, Marc; Fraix-Burnet, Didier
2015-01-01
This article presents an innovative approach to phylogenies based on the reduction of multistate characters to binary-state characters. We show that the reduction to binary characters' approach can be applied to both character- and distance-based phylogenies and provides a unifying framework to explain simply and intuitively the similarities and differences between distance- and character-based phylogenies. Building on these results, this article gives a possible explanation on why phylogenetic trees obtained from a distance matrix or a set of characters are often quite reasonable despite lateral transfers of genetic material between taxa. In the presence of lateral transfers, outer planar networks furnish a better description of evolution than phylogenetic trees. We present a polynomial-time reconstruction algorithm for perfect outer planar networks with a fixed number of states, characters, and lateral transfers.
Rajavel, Rajkumar; Thangarathinam, Mala
2015-01-01
Optimization of negotiation conflict in the cloud service negotiation framework is identified as one of the major challenging issues. This negotiation conflict occurs during the bilateral negotiation process between the participants due to the misperception, aggressive behavior, and uncertain preferences and goals about their opponents. Existing research work focuses on the prerequest context of negotiation conflict optimization by grouping similar negotiation pairs using distance, binary, context-dependent, and fuzzy similarity approaches. For some extent, these approaches can maximize the success rate and minimize the communication overhead among the participants. To further optimize the success rate and communication overhead, the proposed research work introduces a novel probabilistic decision making model for optimizing the negotiation conflict in the long-term negotiation context. This decision model formulates the problem of managing different types of negotiation conflict that occurs during negotiation process as a multistage Markov decision problem. At each stage of negotiation process, the proposed decision model generates the heuristic decision based on the past negotiation state information without causing any break-off among the participants. In addition, this heuristic decision using the stochastic decision tree scenario can maximize the revenue among the participants available in the cloud service negotiation framework. PMID:26543899
Rajavel, Rajkumar; Thangarathinam, Mala
2015-01-01
Optimization of negotiation conflict in the cloud service negotiation framework is identified as one of the major challenging issues. This negotiation conflict occurs during the bilateral negotiation process between the participants due to the misperception, aggressive behavior, and uncertain preferences and goals about their opponents. Existing research work focuses on the prerequest context of negotiation conflict optimization by grouping similar negotiation pairs using distance, binary, context-dependent, and fuzzy similarity approaches. For some extent, these approaches can maximize the success rate and minimize the communication overhead among the participants. To further optimize the success rate and communication overhead, the proposed research work introduces a novel probabilistic decision making model for optimizing the negotiation conflict in the long-term negotiation context. This decision model formulates the problem of managing different types of negotiation conflict that occurs during negotiation process as a multistage Markov decision problem. At each stage of negotiation process, the proposed decision model generates the heuristic decision based on the past negotiation state information without causing any break-off among the participants. In addition, this heuristic decision using the stochastic decision tree scenario can maximize the revenue among the participants available in the cloud service negotiation framework.
Reliable binary cell-fate decisions based on oscillations
NASA Astrophysics Data System (ADS)
Pfeuty, B.; Kaneko, K.
2014-02-01
Biological systems have often to perform binary decisions under highly dynamic and noisy environments, such as during cell-fate determination. These decisions can be implemented by two main bifurcation mechanisms based on the transitions from either monostability or oscillation to bistability. We compare these two mechanisms by using stochastic models with time-varying fields and by establishing asymptotic formulas for the choice probabilities. Different scaling laws for decision sensitivity with respect to noise strength and signal timescale are obtained, supporting a role for oscillatory dynamics in performing noise-robust and temporally tunable binary decision-making. This result provides a rationale for recent experimental evidences showing that oscillatory expression of proteins often precedes binary cell-fate decisions.
Two Upper Bounds for the Weighted Path Length of Binary Trees. Report No. UIUCDCS-R-73-565.
ERIC Educational Resources Information Center
Pradels, Jean Louis
Rooted binary trees with weighted nodes are structures encountered in many areas, such as coding theory, searching and sorting, information storage and retrieval. The path length is a meaningful quantity which gives indications about the expected time of a search or the length of a code, for example. In this paper, two sharp bounds for the total…
NASA Technical Reports Server (NTRS)
Chang, Chi-Yung (Inventor); Fang, Wai-Chi (Inventor); Curlander, John C. (Inventor)
1995-01-01
A system for data compression utilizing systolic array architecture for Vector Quantization (VQ) is disclosed for both full-searched and tree-searched. For a tree-searched VQ, the special case of a Binary Tree-Search VQ (BTSVQ) is disclosed with identical Processing Elements (PE) in the array for both a Raw-Codebook VQ (RCVQ) and a Difference-Codebook VQ (DCVQ) algorithm. A fault tolerant system is disclosed which allows a PE that has developed a fault to be bypassed in the array and replaced by a spare at the end of the array, with codebook memory assignment shifted one PE past the faulty PE of the array.
NASA Technical Reports Server (NTRS)
Lee, Charles; Alena, Richard L.; Robinson, Peter
2004-01-01
We started from ISS fault trees example to migrate to decision trees, presented a method to convert fault trees to decision trees. The method shows that the visualizations of root cause of fault are easier and the tree manipulating becomes more programmatic via available decision tree programs. The visualization of decision trees for the diagnostic shows a format of straight forward and easy understands. For ISS real time fault diagnostic, the status of the systems could be shown by mining the signals through the trees and see where it stops at. The other advantage to use decision trees is that the trees can learn the fault patterns and predict the future fault from the historic data. The learning is not only on the static data sets but also can be online, through accumulating the real time data sets, the decision trees can gain and store faults patterns in the trees and recognize them when they come.
Fast Localization in Large-Scale Environments Using Supervised Indexing of Binary Features.
Youji Feng; Lixin Fan; Yihong Wu
2016-01-01
The essence of image-based localization lies in matching 2D key points in the query image and 3D points in the database. State-of-the-art methods mostly employ sophisticated key point detectors and feature descriptors, e.g., Difference of Gaussian (DoG) and Scale Invariant Feature Transform (SIFT), to ensure robust matching. While a high registration rate is attained, the registration speed is impeded by the expensive key point detection and the descriptor extraction. In this paper, we propose to use efficient key point detectors along with binary feature descriptors, since the extraction of such binary features is extremely fast. The naive usage of binary features, however, does not lend itself to significant speedup of localization, since existing indexing approaches, such as hierarchical clustering trees and locality sensitive hashing, are not efficient enough in indexing binary features and matching binary features turns out to be much slower than matching SIFT features. To overcome this, we propose a much more efficient indexing approach for approximate nearest neighbor search of binary features. This approach resorts to randomized trees that are constructed in a supervised training process by exploiting the label information derived from that multiple features correspond to a common 3D point. In the tree construction process, node tests are selected in a way such that trees have uniform leaf sizes and low error rates, which are two desired properties for efficient approximate nearest neighbor search. To further improve the search efficiency, a probabilistic priority search strategy is adopted. Apart from the label information, this strategy also uses non-binary pixel intensity differences available in descriptor extraction. By using the proposed indexing approach, matching binary features is no longer much slower but slightly faster than matching SIFT features. Consequently, the overall localization speed is significantly improved due to the much faster key point detection and descriptor extraction. It is empirically demonstrated that the localization speed is improved by an order of magnitude as compared with state-of-the-art methods, while comparable registration rate and localization accuracy are still maintained.
NASA Astrophysics Data System (ADS)
Xiao, Guoqiang; Jiang, Yang; Song, Gang; Jiang, Jianmin
2010-12-01
We propose a support-vector-machine (SVM) tree to hierarchically learn from domain knowledge represented by low-level features toward automatic classification of sports videos. The proposed SVM tree adopts a binary tree structure to exploit the nature of SVM's binary classification, where each internal node is a single SVM learning unit, and each external node represents the classified output type. Such a SVM tree presents a number of advantages, which include: 1. low computing cost; 2. integrated learning and classification while preserving individual SVM's learning strength; and 3. flexibility in both structure and learning modules, where different numbers of nodes and features can be added to address specific learning requirements, and various learning models can be added as individual nodes, such as neural networks, AdaBoost, hidden Markov models, dynamic Bayesian networks, etc. Experiments support that the proposed SVM tree achieves good performances in sports video classifications.
Steensels, M; Antler, A; Bahr, C; Berckmans, D; Maltz, E; Halachmi, I
2016-09-01
Early detection of post-calving health problems is critical for dairy operations. Separating sick cows from the herd is important, especially in robotic-milking dairy farms, where searching for a sick cow can disturb the other cows' routine. The objectives of this study were to develop and apply a behaviour- and performance-based health-detection model to post-calving cows in a robotic-milking dairy farm, with the aim of detecting sick cows based on available commercial sensors. The study was conducted in an Israeli robotic-milking dairy farm with 250 Israeli-Holstein cows. All cows were equipped with rumination- and neck-activity sensors. Milk yield, visits to the milking robot and BW were recorded in the milking robot. A decision-tree model was developed on a calibration data set (historical data of the 10 months before the study) and was validated on the new data set. The decision model generated a probability of being sick for each cow. The model was applied once a week just before the veterinarian performed the weekly routine post-calving health check. The veterinarian's diagnosis served as a binary reference for the model (healthy-sick). The overall accuracy of the model was 78%, with a specificity of 87% and a sensitivity of 69%, suggesting its practical value.
A new approach to enhance the performance of decision tree for classifying gene expression data.
Hassan, Md; Kotagiri, Ramamohanarao
2013-12-20
Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.
Safety validation of decision trees for hepatocellular carcinoma.
Wang, Xian-Qiang; Liu, Zhe; Lv, Wen-Ping; Luo, Ying; Yang, Guang-Yun; Li, Chong-Hui; Meng, Xiang-Fei; Liu, Yang; Xu, Ke-Sen; Dong, Jia-Hong
2015-08-21
To evaluate a different decision tree for safe liver resection and verify its efficiency. A total of 2457 patients underwent hepatic resection between January 2004 and December 2010 at the Chinese PLA General Hospital, and 634 hepatocellular carcinoma (HCC) patients were eligible for the final analyses. Post-hepatectomy liver failure (PHLF) was identified by the association of prothrombin time < 50% and serum bilirubin > 50 μmol/L (the "50-50" criteria), which were assessed at day 5 postoperatively or later. The Swiss-Clavien decision tree, Tokyo University-Makuuchi decision tree, and Chinese consensus decision tree were adopted to divide patients into two groups based on those decision trees in sequence, and the PHLF rates were recorded. The overall mortality and PHLF rate were 0.16% and 3.0%. A total of 19 patients experienced PHLF. The numbers of patients to whom the Swiss-Clavien, Tokyo University-Makuuchi, and Chinese consensus decision trees were applied were 581, 573, and 622, and the PHLF rates were 2.75%, 2.62%, and 2.73%, respectively. Significantly more cases satisfied the Chinese consensus decision tree than the Swiss-Clavien decision tree and Tokyo University-Makuuchi decision tree (P < 0.01,P < 0.01); nevertheless, the latter two shared no difference (P = 0.147). The PHLF rate exhibited no significant difference with respect to the three decision trees. The Chinese consensus decision tree expands the indications for hepatic resection for HCC patients and does not increase the PHLF rate compared to the Swiss-Clavien and Tokyo University-Makuuchi decision trees. It would be a safe and effective algorithm for hepatectomy in patients with hepatocellular carcinoma.
Pixel-based skin segmentation in psoriasis images.
George, Y; Aldeen, M; Garnavi, R
2016-08-01
In this paper, we present a detailed comparison study of skin segmentation methods for psoriasis images. Different techniques are modified and then applied to a set of psoriasis images acquired from the Royal Melbourne Hospital, Melbourne, Australia, with aim of finding the best technique suited for application to psoriasis images. We investigate the effect of different colour transformations on skin detection performance. In this respect, explicit skin thresholding is evaluated with three different decision boundaries (CbCr, HS and rgHSV). Histogram-based Bayesian classifier is applied to extract skin probability maps (SPMs) for different colour channels. This is then followed by using different approaches to find a binary skin map (SM) image from the SPMs. The approaches used include binary decision tree (DT) and Otsu's thresholding. Finally, a set of morphological operations are implemented to refine the resulted SM image. The paper provides detailed analysis and comparison of the performance of the Bayesian classifier in five different colour spaces (YCbCr, HSV, RGB, XYZ and CIELab). The results show that histogram-based Bayesian classifier is more effective than explicit thresholding, when applied to psoriasis images. It is also found that decision boundary CbCr outperforms HS and rgHSV. Another finding is that the SPMs of Cb, Cr, H and B-CIELab colour bands yield the best SMs for psoriasis images. In this study, we used a set of 100 psoriasis images for training and testing the presented methods. True Positive (TP) and True Negative (TN) are used as statistical evaluation measures.
A framework for designing and analyzing binary decision-making strategies in cellular systems†
Porter, Joshua R.; Andrews, Burton W.; Iglesias, Pablo A.
2015-01-01
Cells make many binary (all-or-nothing) decisions based on noisy signals gathered from their environment and processed through noisy decision-making pathways. Reducing the effect of noise to improve the fidelity of decision-making comes at the expense of increased complexity, creating a tradeoff between performance and metabolic cost. We present a framework based on rate distortion theory, a branch of information theory, to quantify this tradeoff and design binary decision-making strategies that balance low cost and accuracy in optimal ways. With this framework, we show that several observed behaviors of binary decision-making systems, including random strategies, hysteresis, and irreversibility, are optimal in an information-theoretic sense for various situations. This framework can also be used to quantify the goals around which a decision-making system is optimized and to evaluate the optimality of cellular decision-making systems by a fundamental information-theoretic criterion. As proof of concept, we use the framework to quantify the goals of the externally triggered apoptosis pathway. PMID:22370552
A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic
Qi, Jin-Peng; Qi, Jie; Zhang, Qing
2016-01-01
Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals. PMID:27413364
A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic.
Qi, Jin-Peng; Qi, Jie; Zhang, Qing
2016-01-01
Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals.
An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.
Liang, Ying; Liao, Bo; Zhu, Wen
2017-01-01
Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.
Maximum-likelihood soft-decision decoding of block codes using the A* algorithm
NASA Technical Reports Server (NTRS)
Ekroot, L.; Dolinar, S.
1994-01-01
The A* algorithm finds the path in a finite depth binary tree that optimizes a function. Here, it is applied to maximum-likelihood soft-decision decoding of block codes where the function optimized over the codewords is the likelihood function of the received sequence given each codeword. The algorithm considers codewords one bit at a time, making use of the most reliable received symbols first and pursuing only the partially expanded codewords that might be maximally likely. A version of the A* algorithm for maximum-likelihood decoding of block codes has been implemented for block codes up to 64 bits in length. The efficiency of this algorithm makes simulations of codes up to length 64 feasible. This article details the implementation currently in use, compares the decoding complexity with that of exhaustive search and Viterbi decoding algorithms, and presents performance curves obtained with this implementation of the A* algorithm for several codes.
Computing all hybridization networks for multiple binary phylogenetic input trees.
Albrecht, Benjamin
2015-07-30
The computation of phylogenetic trees on the same set of species that are based on different orthologous genes can lead to incongruent trees. One possible explanation for this behavior are interspecific hybridization events recombining genes of different species. An important approach to analyze such events is the computation of hybridization networks. This work presents the first algorithm computing the hybridization number as well as a set of representative hybridization networks for multiple binary phylogenetic input trees on the same set of taxa. To improve its practical runtime, we show how this algorithm can be parallelized. Moreover, we demonstrate the efficiency of the software Hybroscale, containing an implementation of our algorithm, by comparing it to PIRNv2.0, which is so far the best available software computing the exact hybridization number for multiple binary phylogenetic trees on the same set of taxa. The algorithm is part of the software Hybroscale, which was developed specifically for the investigation of hybridization networks including their computation and visualization. Hybroscale is freely available(1) and runs on all three major operating systems. Our simulation study indicates that our approach is on average 100 times faster than PIRNv2.0. Moreover, we show how Hybroscale improves the interpretation of the reported hybridization networks by adding certain features to its graphical representation.
Decision-Tree Formulation With Order-1 Lateral Execution
NASA Technical Reports Server (NTRS)
James, Mark
2007-01-01
A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive-disjunctive paths to a flattened conjunctive form composed only of equality checks when possible. If each reduced conjunctive form contains only equality checks and all of these forms use the same variables, then the decision tree can be reduced to an order-one operation through a table lookup. The speedup to order one is accomplished by distributing each decision variable over a surface of a multidimensional object by mapping the equality constant to an index
2010-01-01
Background The Maximal Pairing Problem (MPP) is the prototype of a class of combinatorial optimization problems that are of considerable interest in bioinformatics: Given an arbitrary phylogenetic tree T and weights ωxy for the paths between any two pairs of leaves (x, y), what is the collection of edge-disjoint paths between pairs of leaves that maximizes the total weight? Special cases of the MPP for binary trees and equal weights have been described previously; algorithms to solve the general MPP are still missing, however. Results We describe a relatively simple dynamic programming algorithm for the special case of binary trees. We then show that the general case of multifurcating trees can be treated by interleaving solutions to certain auxiliary Maximum Weighted Matching problems with an extension of this dynamic programming approach, resulting in an overall polynomial-time solution of complexity (n4 log n) w.r.t. the number n of leaves. The source code of a C implementation can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/Targeting. For binary trees, we furthermore discuss several constrained variants of the MPP as well as a partition function approach to the probabilistic version of the MPP. Conclusions The algorithms introduced here make it possible to solve the MPP also for large trees with high-degree vertices. This has practical relevance in the field of comparative phylogenetics and, for example, in the context of phylogenetic targeting, i.e., data collection with resource limitations. PMID:20525185
A note on subtrees rooted along the primary path of a binary tree
Troutman, B.M.; Karlinger, M.R.
1993-01-01
Let Fn denote the set of rooted binary plane trees with n external nodes, for given T???Fn let ui(T) be the altitude i node along the primary path of T, and let ??i(T) denote the number of external nodes in the induced subtree rooted at ui(T). We set ??i(T) = 0 if i is greater than the length of the primary path of T. We prove limn?????? ???i???x/n En{??i}/???i?? En{??i} = G(x), where En denotes the average over trees T???Fn and where the distribution function G is determined by its moments, for which we present an explicit expression. ?? 1993.
Accurate reliability analysis method for quantum-dot cellular automata circuits
NASA Astrophysics Data System (ADS)
Cui, Huanqing; Cai, Li; Wang, Sen; Liu, Xiaoqiang; Yang, Xiaokuo
2015-10-01
Probabilistic transfer matrix (PTM) is a widely used model in the reliability research of circuits. However, PTM model cannot reflect the impact of input signals on reliability, so it does not completely conform to the mechanism of the novel field-coupled nanoelectronic device which is called quantum-dot cellular automata (QCA). It is difficult to get accurate results when PTM model is used to analyze the reliability of QCA circuits. To solve this problem, we present the fault tree models of QCA fundamental devices according to different input signals. After that, the binary decision diagram (BDD) is used to quantitatively investigate the reliability of two QCA XOR gates depending on the presented models. By employing the fault tree models, the impact of input signals on reliability can be identified clearly and the crucial components of a circuit can be found out precisely based on the importance values (IVs) of components. So this method is contributive to the construction of reliable QCA circuits.
A laid-back trip through the Hennigian Forests
2017-01-01
Background This paper is a comment on the idea of matrix-free Cladistics. Demonstration of this idea’s efficiency is a major goal of the study. Within the proposed framework, the ordinary (phenetic) matrix is necessary only as “source” of Hennigian trees, not as a primary subject of the analysis. Switching from the matrix-based thinking to the matrix-free Cladistic approach clearly reveals that optimizations of the character-state changes are related not to the real processes, but to the form of the data representation. Methods We focused our study on the binary data. We wrote the simple ruby-based script FORESTER version 1.0 that helps represent a binary matrix as an array of the rooted trees (as a “Hennigian forest”). The binary representations of the genomic (DNA) data have been made by script 1001. The Average Consensus method as well as the standard Maximum Parsimony (MP) approach has been used to analyze the data. Principle findings The binary matrix may be easily re-written as a set of rooted trees (maximal relationships). The latter might be analyzed by the Average Consensus method. Paradoxically, this method, if applied to the Hennigian forests, in principle can help to identify clades despite the absence of the direct evidence from the primary data. Our approach may handle the clock- or non clock-like matrices, as well as the hypothetical, molecular or morphological data. Discussion Our proposal clearly differs from the numerous phenetic alignment-free techniques of the construction of the phylogenetic trees. Dealing with the relations, not with the actual “data” also distinguishes our approach from all optimization-based methods, if the optimization is defined as a way to reconstruct the sequences of the character-state changes on a tree, either the standard alignment-based techniques or the “direct” alignment-free procedure. We are not viewing our recent framework as an alternative to the three-taxon statement analysis (3TA), but there are two major differences between our recent proposal and the 3TA, as originally designed and implemented: (1) the 3TA deals with the three-taxon statements or minimal relationships. According to the logic of 3TA, the set of the minimal trees must be established as a binary matrix and used as an input for the parsimony program. In this paper, we operate directly with maximal relationships written just as trees, not as binary matrices, while also using the Average Consensus method instead of the MP analysis. The solely ‘reversal’-based groups can always be found by our method without the separate scoring of the putative reversals before analyses. PMID:28740753
Chen, Xiao Yu; Ma, Li Zhuang; Chu, Na; Zhou, Min; Hu, Yiyang
2013-01-01
Chronic hepatitis B (CHB) is a serious public health problem, and Traditional Chinese Medicine (TCM) plays an important role in the control and treatment for CHB. In the treatment of TCM, zheng discrimination is the most important step. In this paper, an approach based on CFS-GA (Correlation based Feature Selection and Genetic Algorithm) and C5.0 boost decision tree is used for zheng classification and progression in the TCM treatment of CHB. The CFS-GA performs better than the typical method of CFS. By CFS-GA, the acquired attribute subset is classified by C5.0 boost decision tree for TCM zheng classification of CHB, and C5.0 decision tree outperforms two typical decision trees of NBTree and REPTree on CFS-GA, CFS, and nonselection in comparison. Based on the critical indicators from C5.0 decision tree, important lab indicators in zheng progression are obtained by the method of stepwise discriminant analysis for expressing TCM zhengs in CHB, and alterations of the important indicators are also analyzed in zheng progression. In conclusion, all the three decision trees perform better on CFS-GA than on CFS and nonselection, and C5.0 decision tree outperforms the two typical decision trees both on attribute selection and nonselection.
TreePOD: Sensitivity-Aware Selection of Pareto-Optimal Decision Trees.
Muhlbacher, Thomas; Linhardt, Lorenz; Moller, Torsten; Piringer, Harald
2018-01-01
Balancing accuracy gains with other objectives such as interpretability is a key challenge when building decision trees. However, this process is difficult to automate because it involves know-how about the domain as well as the purpose of the model. This paper presents TreePOD, a new approach for sensitivity-aware model selection along trade-offs. TreePOD is based on exploring a large set of candidate trees generated by sampling the parameters of tree construction algorithms. Based on this set, visualizations of quantitative and qualitative tree aspects provide a comprehensive overview of possible tree characteristics. Along trade-offs between two objectives, TreePOD provides efficient selection guidance by focusing on Pareto-optimal tree candidates. TreePOD also conveys the sensitivities of tree characteristics on variations of selected parameters by extending the tree generation process with a full-factorial sampling. We demonstrate how TreePOD supports a variety of tasks involved in decision tree selection and describe its integration in a holistic workflow for building and selecting decision trees. For evaluation, we illustrate a case study for predicting critical power grid states, and we report qualitative feedback from domain experts in the energy sector. This feedback suggests that TreePOD enables users with and without statistical background a confident and efficient identification of suitable decision trees.
Selecting Power-Efficient Signal Features for a Low-Power Fall Detector.
Wang, Changhong; Redmond, Stephen J; Lu, Wei; Stevens, Michael C; Lord, Stephen R; Lovell, Nigel H
2017-11-01
Falls are a serious threat to the health of older people. A wearable fall detector can automatically detect the occurrence of a fall and alert a caregiver or an emergency response service so they may deliver immediate assistance, improving the chances of recovering from fall-related injuries. One constraint of such a wearable technology is its limited battery life. Thus, minimization of power consumption is an important design concern, all the while maintaining satisfactory accuracy of the fall detection algorithms implemented on the wearable device. This paper proposes an approach for selecting power-efficient signal features such that the minimum desirable fall detection accuracy is assured. Using data collected in simulated falls, simulated activities of daily living, and real free-living trials, all using young volunteers, the proposed approach selects four features from a set of ten commonly used features, providing a power saving of 75.3%, while limiting the error rate of a binary classification decision tree fall detection algorithm to 7.1%.Falls are a serious threat to the health of older people. A wearable fall detector can automatically detect the occurrence of a fall and alert a caregiver or an emergency response service so they may deliver immediate assistance, improving the chances of recovering from fall-related injuries. One constraint of such a wearable technology is its limited battery life. Thus, minimization of power consumption is an important design concern, all the while maintaining satisfactory accuracy of the fall detection algorithms implemented on the wearable device. This paper proposes an approach for selecting power-efficient signal features such that the minimum desirable fall detection accuracy is assured. Using data collected in simulated falls, simulated activities of daily living, and real free-living trials, all using young volunteers, the proposed approach selects four features from a set of ten commonly used features, providing a power saving of 75.3%, while limiting the error rate of a binary classification decision tree fall detection algorithm to 7.1%.
VC-dimension of univariate decision trees.
Yildiz, Olcay Taner
2015-02-01
In this paper, we give and prove the lower bounds of the Vapnik-Chervonenkis (VC)-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. Via a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively, we show that our VC-dimension bounds are tight for simple trees. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using structural risk minimization in decision trees, i.e., pruning. Our simulation results show that structural risk minimization pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross validation.
Predictor Combination in Binary Decision-Making Situations
ERIC Educational Resources Information Center
McGrath, Robert E.
2008-01-01
Professional psychologists are often confronted with the task of making binary decisions about individuals, such as predictions about future behavior or employee selection. Test users familiar with linear models and Bayes's theorem are likely to assume that the accuracy of decisions is consistently improved by combination of outcomes across valid…
On Tree-Based Phylogenetic Networks.
Zhang, Louxin
2016-07-01
A large class of phylogenetic networks can be obtained from trees by the addition of horizontal edges between the tree edges. These networks are called tree-based networks. We present a simple necessary and sufficient condition for tree-based networks and prove that a universal tree-based network exists for any number of taxa that contains as its base every phylogenetic tree on the same set of taxa. This answers two problems posted by Francis and Steel recently. A byproduct is a computer program for generating random binary phylogenetic networks under the uniform distribution model.
Schmid, Matthias; Küchenhoff, Helmut; Hoerauf, Achim; Tutz, Gerhard
2016-02-28
Survival trees are a popular alternative to parametric survival modeling when there are interactions between the predictor variables or when the aim is to stratify patients into prognostic subgroups. A limitation of classical survival tree methodology is that most algorithms for tree construction are designed for continuous outcome variables. Hence, classical methods might not be appropriate if failure time data are measured on a discrete time scale (as is often the case in longitudinal studies where data are collected, e.g., quarterly or yearly). To address this issue, we develop a method for discrete survival tree construction. The proposed technique is based on the result that the likelihood of a discrete survival model is equivalent to the likelihood of a regression model for binary outcome data. Hence, we modify tree construction methods for binary outcomes such that they result in optimized partitions for the estimation of discrete hazard functions. By applying the proposed method to data from a randomized trial in patients with filarial lymphedema, we demonstrate how discrete survival trees can be used to identify clinically relevant patient groups with similar survival behavior. Copyright © 2015 John Wiley & Sons, Ltd.
The Decision Tree: A Tool for Achieving Behavioral Change.
ERIC Educational Resources Information Center
Saren, Dru
1999-01-01
Presents a "Decision Tree" process for structuring team decision making and problem solving about specific student behavioral goals. The Decision Tree involves a sequence of questions/decisions that can be answered in "yes/no" terms. Questions address reasonableness of the goal, time factors, importance of the goal, responsibilities, safety,…
Lee, Daniel Joseph; Veneri, Diana A
2018-05-01
The most common complaint lower limb prosthesis users report is inadequacy of a proper socket fit. Adjustments to the residual limb-socket interface can be made by the prosthesis user without consultation of a clinician in many scenarios through skilled self-management. Decision trees guide prosthesis wearers through the self-management process, empowering them to rectify fit issues, or referring them to a clinician when necessary. This study examines the development and acceptability testing of patient-centered decision trees for lower limb prosthesis users. Decision trees underwent a four-stage process: literature review and expert consultation, designing, two-rounds of expert panel review and revisions, and target audience testing. Fifteen lower limb prosthesis users (average age 61 years) reviewed the decision trees and completed an acceptability questionnaire. Participants reported agreement of 80% or above in five of the eight questions related to acceptability of the decision trees. Disagreement was related to the level of experience of the respondent. Decision trees were found to be easy to use, illustrate correct solutions to common issues, and have terminology consistent with that of a new prosthesis user. Some users with greater than 1.5 years of experience would not use the decision trees based on their own self-management skills. Implications for Rehabilitation Discomfort of the residual limb-prosthetic socket interface is the most common reason for clinician visits. Prosthesis users can use decision trees to guide them through the process of obtaining a proper socket fit independently. Newer users may benefit from using the decision trees more than experienced users.
Ebrahimi, Mehregan; Ebrahimie, Esmaeil; Bull, C Michael
2015-08-01
The high number of failures is one reason why translocation is often not recommended. Considering how behavior changes during translocations may improve translocation success. To derive decision-tree models for species' translocation, we used data on the short-term responses of an endangered Australian skink in 5 simulated translocations with different release conditions. We used 4 different decision-tree algorithms (decision tree, decision-tree parallel, decision stump, and random forest) with 4 different criteria (gain ratio, information gain, gini index, and accuracy) to investigate how environmental and behavioral parameters may affect the success of a translocation. We assumed behavioral changes that increased dispersal away from a release site would reduce translocation success. The trees became more complex when we included all behavioral parameters as attributes, but these trees yielded more detailed information about why and how dispersal occurred. According to these complex trees, there were positive associations between some behavioral parameters, such as fight and dispersal, that showed there was a higher chance, for example, of dispersal among lizards that fought than among those that did not fight. Decision trees based on parameters related to release conditions were easier to understand and could be used by managers to make translocation decisions under different circumstances. © 2015 Society for Conservation Biology.
Decision trees in epidemiological research.
Venkatasubramaniam, Ashwini; Wolfson, Julian; Mitchell, Nathan; Barnes, Timothy; JaKa, Meghan; French, Simone
2017-01-01
In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, R.; Beaudet, P.
1982-01-01
An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.
Creating ensembles of decision trees through sampling
Kamath, Chandrika; Cantu-Paz, Erick
2005-08-30
A system for decision tree ensembles that includes a module to read the data, a module to sort the data, a module to evaluate a potential split of the data according to some criterion using a random sample of the data, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method is based on statistical sampling techniques and includes the steps of reading the data; sorting the data; evaluating a potential split according to some criterion using a random sample of the data, splitting the data, and combining multiple decision trees in ensembles.
Bioinformatics in proteomics: application, terminology, and pitfalls.
Wiemer, Jan C; Prokudin, Alexander
2004-01-01
Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.
Binary space partitioning trees and their uses
NASA Technical Reports Server (NTRS)
Bell, Bradley N.
1989-01-01
Binary Space Partitioning (BSP) trees have some qualities that make them useful in solving many graphics related problems. The purpose is to describe what a BSP tree is, and how it can be used to solve the problem of hidden surface removal, and constructive solid geometry. The BSP tree is based on the idea that a plane acting as a divider subdivides space into two parts with one being on the positive side and the other on the negative. A polygonal solid is then represented as the volume defined by the collective interior half spaces of the solid's bounding surfaces. The nature of how the tree is organized lends itself well for sorting polygons relative to an arbitrary point in 3 space. The speed at which the tree can be traversed for depth sorting is fast enough to provide hidden surface removal at interactive speeds. The fact that a BSP tree actually represents a polygonal solid as a bounded volume also makes it quite useful in performing the boolean operations used in constructive solid geometry. Due to the nature of the BSP tree, polygons can be classified as they are subdivided. The ability to classify polygons as they are subdivided can enhance the simplicity of implementing constructive solid geometry.
The k-d Tree: A Hierarchical Model for Human Cognition.
ERIC Educational Resources Information Center
Vandendorpe, Mary M.
This paper discusses a model of information storage and retrieval, the k-d tree (Bentley, 1975), a binary, hierarchical tree with multiple associate terms, which has been explored in computer research, and it is suggested that this model could be useful for describing human cognition. Included are two models of human long-term memory--networks and…
Short communication: Prediction of retention pay-off using a machine learning algorithm.
Shahinfar, Saleh; Kalantari, Afshin S; Cabrera, Victor; Weigel, Kent
2014-05-01
Replacement decisions have a major effect on dairy farm profitability. Dynamic programming (DP) has been widely studied to find the optimal replacement policies in dairy cattle. However, DP models are computationally intensive and might not be practical for daily decision making. Hence, the ability of applying machine learning on a prerun DP model to provide fast and accurate predictions of nonlinear and intercorrelated variables makes it an ideal methodology. Milk class (1 to 5), lactation number (1 to 9), month in milk (1 to 20), and month of pregnancy (0 to 9) were used to describe all cows in a herd in a DP model. Twenty-seven scenarios based on all combinations of 3 levels (base, 20% above, and 20% below) of milk production, milk price, and replacement cost were solved with the DP model, resulting in a data set of 122,716 records, each with a calculated retention pay-off (RPO). Then, a machine learning model tree algorithm was used to mimic the evaluated RPO with DP. The correlation coefficient factor was used to observe the concordance of RPO evaluated by DP and RPO predicted by the model tree. The obtained correlation coefficient was 0.991, with a corresponding value of 0.11 for relative absolute error. At least 100 instances were required per model constraint, resulting in 204 total equations (models). When these models were used for binary classification of positive and negative RPO, error rates were 1% false negatives and 9% false positives. Applying this trained model from simulated data for prediction of RPO for 102 actual replacement records from the University of Wisconsin-Madison dairy herd resulted in a 0.994 correlation with 0.10 relative absolute error rate. Overall results showed that model tree has a potential to be used in conjunction with DP to assist farmers in their replacement decisions. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Learning in data-limited multimodal scenarios: Scandent decision forests and tree-based features.
Hor, Soheil; Moradi, Mehdi
2016-12-01
Incomplete and inconsistent datasets often pose difficulties in multimodal studies. We introduce the concept of scandent decision trees to tackle these difficulties. Scandent trees are decision trees that optimally mimic the partitioning of the data determined by another decision tree, and crucially, use only a subset of the feature set. We show how scandent trees can be used to enhance the performance of decision forests trained on a small number of multimodal samples when we have access to larger datasets with vastly incomplete feature sets. Additionally, we introduce the concept of tree-based feature transforms in the decision forest paradigm. When combined with scandent trees, the tree-based feature transforms enable us to train a classifier on a rich multimodal dataset, and use it to classify samples with only a subset of features of the training data. Using this methodology, we build a model trained on MRI and PET images of the ADNI dataset, and then test it on cases with only MRI data. We show that this is significantly more effective in staging of cognitive impairments compared to a similar decision forest model trained and tested on MRI only, or one that uses other kinds of feature transform applied to the MRI data. Copyright © 2016. Published by Elsevier B.V.
Sankari, E Siva; Manimegalai, D
2017-12-21
Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
Metric Sex Determination of the Human Coxal Bone on a Virtual Sample using Decision Trees.
Savall, Frédéric; Faruch-Bilfeld, Marie; Dedouit, Fabrice; Sans, Nicolas; Rousseau, Hervé; Rougé, Daniel; Telmon, Norbert
2015-11-01
Decision trees provide an alternative to multivariate discriminant analysis, which is still the most commonly used in anthropometric studies. Our study analyzed the metric characterization of a recent virtual sample of 113 coxal bones using decision trees for sex determination. From 17 osteometric type I landmarks, a dataset was built with five classic distances traditionally reported in the literature and six new distances selected using the two-step ratio method. A ten-fold cross-validation was performed, and a decision tree was established on two subsamples (training and test sets). The decision tree established on the training set included three nodes and its application to the test set correctly classified 92% of individuals. This percentage was similar to the data of the literature. The usefulness of decision trees has been demonstrated in numerous fields. They have been already used in sex determination, body mass prediction, and ancestry estimation. This study shows another use of decision trees enabling simple and accurate sex determination. © 2015 American Academy of Forensic Sciences.
Multi-test decision tree and its application to microarray data classification.
Czajkowski, Marcin; Grześ, Marek; Kretowski, Marek
2014-05-01
The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 14 datasets by an average 6%. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Maiti, Anup Kumar; Nath Roy, Jitendra; Mukhopadhyay, Sourangshu
2007-08-01
In the field of optical computing and parallel information processing, several number systems have been used for different arithmetic and algebraic operations. Therefore an efficient conversion scheme from one number system to another is very important. Modified trinary number (MTN) has already taken a significant role towards carry and borrow free arithmetic operations. In this communication, we propose a tree-net architecture based all optical conversion scheme from binary number to its MTN form. Optical switch using nonlinear material (NLM) plays an important role.
Wang, Jie; Zeng, Hao-Long; Du, Hongying; Liu, Zeyuan; Cheng, Ji; Liu, Taotao; Hu, Ting; Kamal, Ghulam Mustafa; Li, Xihai; Liu, Huili; Xu, Fuqiang
2018-03-01
Metabolomics generate a profile of small molecules from cellular/tissue metabolism, which could directly reflect the mechanisms of complex networks of biochemical reactions. Traditional metabolomics methods, such as OPLS-DA, PLS-DA are mainly used for binary class discrimination. Multiple groups are always involved in the biological system, especially for brain research. Multiple brain regions are involved in the neuronal study of brain metabolic dysfunctions such as alcoholism, Alzheimer's disease, etc. In the current study, 10 different brain regions were utilized for comparative studies between alcohol preferring and non-preferring rats, male and female rats respectively. As many classes are involved (ten different regions and four types of animals), traditional metabolomics methods are no longer efficient for showing differentiation. Here, a novel strategy based on the decision tree algorithm was employed for successfully constructing different classification models to screen out the major characteristics of ten brain regions at the same time. Subsequently, this method was also utilized to select the major effective brain regions related to alcohol preference and gender difference. Compared with the traditional multivariate statistical methods, the decision tree could construct acceptable and understandable classification models for multi-class data analysis. Therefore, the current technology could also be applied to other general metabolomics studies involving multi class data. Copyright © 2017 Elsevier B.V. All rights reserved.
Stolzer, Maureen; Lai, Han; Xu, Minli; Sathaye, Deepa; Vernot, Benjamin; Durand, Dannie
2012-09-15
Gene duplication (D), transfer (T), loss (L) and incomplete lineage sorting (I) are crucial to the evolution of gene families and the emergence of novel functions. The history of these events can be inferred via comparison of gene and species trees, a process called reconciliation, yet current reconciliation algorithms model only a subset of these evolutionary processes. We present an algorithm to reconcile a binary gene tree with a nonbinary species tree under a DTLI parsimony criterion. This is the first reconciliation algorithm to capture all four evolutionary processes driving tree incongruence and the first to reconcile non-binary species trees with a transfer model. Our algorithm infers all optimal solutions and reports complete, temporally feasible event histories, giving the gene and species lineages in which each event occurred. It is fixed-parameter tractable, with polytime complexity when the maximum species outdegree is fixed. Application of our algorithms to prokaryotic and eukaryotic data show that use of an incomplete event model has substantial impact on the events inferred and resulting biological conclusions. Our algorithms have been implemented in Notung, a freely available phylogenetic reconciliation software package, available at http://www.cs.cmu.edu/~durand/Notung. mstolzer@andrew.cmu.edu.
Improving children's affective decision making in the Children's Gambling Task.
Andrews, Glenda; Moussaumai, Jennifer
2015-11-01
Affective decision making was examined in 108 children (3-, 4-, and 5-year-olds) using the Children's Gambling Task (CGT). Children completed the CGT and then responded to awareness questions. Children in the binary_experience and binary_experience+awareness (not control) conditions first completed two simpler versions. Children in the binary_experience+awareness condition also responded to questions about relational components of the simpler versions. Experience with simpler versions facilitated decision making in 4- and 5-year-olds, but 3-year-olds' advantageous choices declined across trial blocks in the binary_experience and control conditions. Responding to questions about relational components further benefited the 4- and 5-year-olds. The 3-year-olds' advantageous choices on the final block were at chance level in the binary_experience+awareness condition but were below chance level in the other conditions. Awareness following the CGT was strongly correlated with advantageous choices and with age. Awareness was demonstrated by 5-year-olds (all conditions) and 4-year-olds (binary_experience and binary_experience+awareness) but not by 3-year-olds. The findings demonstrate the importance of complexity and conscious awareness in cognitive development. Copyright © 2015 Elsevier Inc. All rights reserved.
Using histograms to introduce randomization in the generation of ensembles of decision trees
Kamath, Chandrika; Cantu-Paz, Erick; Littau, David
2005-02-22
A system for decision tree ensembles that includes a module to read the data, a module to create a histogram, a module to evaluate a potential split according to some criterion using the histogram, a module to select a split point randomly in an interval around the best split, a module to split the data, and a module to combine multiple decision trees in ensembles. The decision tree method includes the steps of reading the data; creating a histogram; evaluating a potential split according to some criterion using the histogram, selecting a split point randomly in an interval around the best split, splitting the data, and combining multiple decision trees in ensembles.
Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine
NASA Technical Reports Server (NTRS)
Schwabacher, Mark A.; Aguilar, Robert; Figueroa, Fernando F.
2009-01-01
The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically "learns" a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to "train" and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it "learned" a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location.
Objective consensus from decision trees.
Putora, Paul Martin; Panje, Cedric M; Papachristofilou, Alexandros; Dal Pra, Alan; Hundsberger, Thomas; Plasswilm, Ludwig
2014-12-05
Consensus-based approaches provide an alternative to evidence-based decision making, especially in situations where high-level evidence is limited. Our aim was to demonstrate a novel source of information, objective consensus based on recommendations in decision tree format from multiple sources. Based on nine sample recommendations in decision tree format a representative analysis was performed. The most common (mode) recommendations for each eventuality (each permutation of parameters) were determined. The same procedure was applied to real clinical recommendations for primary radiotherapy for prostate cancer. Data was collected from 16 radiation oncology centres, converted into decision tree format and analyzed in order to determine the objective consensus. Based on information from multiple sources in decision tree format, treatment recommendations can be assessed for every parameter combination. An objective consensus can be determined by means of mode recommendations without compromise or confrontation among the parties. In the clinical example involving prostate cancer therapy, three parameters were used with two cut-off values each (Gleason score, PSA, T-stage) resulting in a total of 27 possible combinations per decision tree. Despite significant variations among the recommendations, a mode recommendation could be found for specific combinations of parameters. Recommendations represented as decision trees can serve as a basis for objective consensus among multiple parties.
On the error probability of general tree and trellis codes with applications to sequential decoding
NASA Technical Reports Server (NTRS)
Johannesson, R.
1973-01-01
An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random binary tree codes is derived and shown to be independent of the length of the tree. An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random L-branch binary trellis codes of rate R = 1/n is derived which separates the effects of the tail length T and the memory length M of the code. It is shown that the bound is independent of the length L of the information sequence. This implication is investigated by computer simulations of sequential decoding utilizing the stack algorithm. These simulations confirm the implication and further suggest an empirical formula for the true undetected decoding error probability with sequential decoding.
Gretchen G. Moisen; Elizabeth A. Freeman; Jock A. Blackard; Tracey S. Frescino; Niklaus E. Zimmermann; Thomas C. Edwards
2006-01-01
Many efforts are underway to produce broad-scale forest attribute maps by modelling forest class and structure variables collected in forest inventories as functions of satellite-based and biophysical information. Typically, variants of classification and regression trees implemented in Rulequest's© See5 and Cubist (for binary and continuous responses,...
The decision tree approach to classification
NASA Technical Reports Server (NTRS)
Wu, C.; Landgrebe, D. A.; Swain, P. H.
1975-01-01
A class of multistage decision tree classifiers is proposed and studied relative to the classification of multispectral remotely sensed data. The decision tree classifiers are shown to have the potential for improving both the classification accuracy and the computation efficiency. Dimensionality in pattern recognition is discussed and two theorems on the lower bound of logic computation for multiclass classification are derived. The automatic or optimization approach is emphasized. Experimental results on real data are reported, which clearly demonstrate the usefulness of decision tree classifiers.
Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin
2015-08-01
Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.
Decision tree and ensemble learning algorithms with their applications in bioinformatics.
Che, Dongsheng; Liu, Qi; Rasheed, Khaled; Tao, Xiuping
2011-01-01
Machine learning approaches have wide applications in bioinformatics, and decision tree is one of the successful approaches applied in this field. In this chapter, we briefly review decision tree and related ensemble algorithms and show the successful applications of such approaches on solving biological problems. We hope that by learning the algorithms of decision trees and ensemble classifiers, biologists can get the basic ideas of how machine learning algorithms work. On the other hand, by being exposed to the applications of decision trees and ensemble algorithms in bioinformatics, computer scientists can get better ideas of which bioinformatics topics they may work on in their future research directions. We aim to provide a platform to bridge the gap between biologists and computer scientists.
A Decision Tree for Psychology Majors: Supplying Questions as Well as Answers.
ERIC Educational Resources Information Center
Poe, Retta E.
1988-01-01
Outlines the development of a psychology careers decision tree to help faculty advise students plan their program. States that students using the decision tree may benefit by learning more about their career options and by acquiring better question-asking skills. (GEA)
Lin, Fen-Fang; Wang, Ke; Yang, Ning; Yan, Shi-Guang; Zheng, Xin-Yu
2012-02-01
In this paper, some main factors such as soil type, land use pattern, lithology type, topography, road, and industry type that affect soil quality were used to precisely obtain the spatial distribution characteristics of regional soil quality, mutual information theory was adopted to select the main environmental factors, and decision tree algorithm See 5.0 was applied to predict the grade of regional soil quality. The main factors affecting regional soil quality were soil type, land use, lithology type, distance to town, distance to water area, altitude, distance to road, and distance to industrial land. The prediction accuracy of the decision tree model with the variables selected by mutual information was obviously higher than that of the model with all variables, and, for the former model, whether of decision tree or of decision rule, its prediction accuracy was all higher than 80%. Based on the continuous and categorical data, the method of mutual information theory integrated with decision tree could not only reduce the number of input parameters for decision tree algorithm, but also predict and assess regional soil quality effectively.
Hydrochemical analysis of groundwater using a tree-based model
NASA Astrophysics Data System (ADS)
Litaor, M. Iggy; Brielmann, H.; Reichmann, O.; Shenker, M.
2010-06-01
SummaryHydrochemical indices are commonly used to ascertain aquifer characteristics, salinity problems, anthropogenic inputs and resource management, among others. This study was conducted to test the applicability of a binary decision tree model to aquifer evaluation using hydrochemical indices as input. The main advantage of the tree-based model compared to other commonly used statistical procedures such as cluster and factor analyses is the ability to classify groundwater samples with assigned probability and the reduction of a large data set into a few significant variables without creating new factors. We tested the model using data sets collected from headwater springs of the Jordan River, Israel. The model evaluation consisted of several levels of complexity, from simple separation between the calcium-magnesium-bicarbonate water type of karstic aquifers to the more challenging separation of calcium-sodium-bicarbonate water type flowing through perched and regional basaltic aquifers. In all cases, the model assigned measures for goodness of fit in the form of misclassification errors and singled out the most significant variable in the analysis. The model proceeded through a sequence of partitions providing insight into different possible pathways and changing lithology. The model results were extremely useful in constraining the interpretation of geological heterogeneity and constructing a conceptual flow model for a given aquifer. The tree model clearly identified the hydrochemical indices that were excluded from the analysis, thus providing information that can lead to a decrease in the number of routinely analyzed variables and a significant reduction in laboratory cost.
Beaulieu, Jeremy M; O'Meara, Brian C; Donoghue, Michael J
2013-09-01
The growth of phylogenetic trees in scope and in size is promising from the standpoint of understanding a wide variety of evolutionary patterns and processes. With trees comprised of larger, older, and globally distributed clades, it is likely that the lability of a binary character will differ significantly among lineages, which could lead to errors in estimating transition rates and the associated inference of ancestral states. Here we develop and implement a new method for identifying different rates of evolution in a binary character along different branches of a phylogeny. We illustrate this approach by exploring the evolution of growth habit in Campanulidae, a flowering plant clade containing some 35,000 species. The distribution of woody versus herbaceous species calls into question the use of traditional models of binary character evolution. The recognition and accommodation of changes in the rate of growth form evolution in different lineages demonstrates, for the first time, a robust picture of growth form evolution across a very large, very old, and very widespread flowering plant clade.
Novel ID-based anti-collision approach for RFID
NASA Astrophysics Data System (ADS)
Zhang, De-Gan; Li, Wen-Bin
2016-09-01
Novel correlation ID-based (CID) anti-collision approach for RFID under the banner of the Internet of Things (IOT) has been presented in this paper. The key insights are as follows: according to the deterministic algorithms which are based on the binary search tree, we propose a method to increase the association between tags so that tags can initiatively send their own ID under certain trigger conditions, at the same time, we present a multi-tree search method for querying. When the number of tags is small, by replacing the actual ID with the temporary ID, it can greatly reduce the number of times that the reader reads and writes to tag's ID. Active tags send data to the reader by the way of modulation binary pulses. When applying this method to the uncertain ALOHA algorithms, the reader can determine the locations of the empty slots according to the position of the binary pulse, so it can avoid the decrease in efficiency which is caused by reading empty slots when reading slots. Theory and experiment show that this method can greatly improve the recognition efficiency of the system when applied to either the search tree or the ALOHA anti-collision algorithms.
The value of decision tree analysis in planning anaesthetic care in obstetrics.
Bamber, J H; Evans, S A
2016-08-01
The use of decision tree analysis is discussed in the context of the anaesthetic and obstetric management of a young pregnant woman with joint hypermobility syndrome with a history of insensitivity to local anaesthesia and a previous difficult intubation due to a tongue tumour. The multidisciplinary clinical decision process resulted in the woman being delivered without complication by elective caesarean section under general anaesthesia after an awake fibreoptic intubation. The decision process used is reviewed and compared retrospectively to a decision tree analytical approach. The benefits and limitations of using decision tree analysis are reviewed and its application in obstetric anaesthesia is discussed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Building of fuzzy decision trees using ID3 algorithm
NASA Astrophysics Data System (ADS)
Begenova, S. B.; Avdeenko, T. V.
2018-05-01
Decision trees are widely used in the field of machine learning and artificial intelligence. Such popularity is due to the fact that with the help of decision trees graphic models, text rules can be built and they are easily understood by the final user. Because of the inaccuracy of observations, uncertainties, the data, collected in the environment, often take an unclear form. Therefore, fuzzy decision trees becoming popular in the field of machine learning. This article presents a method that includes the features of the two above-mentioned approaches: a graphical representation of the rules system in the form of a tree and a fuzzy representation of the data. The approach uses such advantages as high comprehensibility of decision trees and the ability to cope with inaccurate and uncertain information in fuzzy representation. The received learning method is suitable for classifying problems with both numerical and symbolic features. In the article, solution illustrations and numerical results are given.
Evolutionary Algorithm Based Automated Reverse Engineering and Defect Discovery
2007-09-21
a previous application of a GP as a data mining function to evolve fuzzy decision trees symbolically [3-5], the terminal set consisted of fuzzy...of input and output information is required. In the case of fuzzy decision trees, the database represented a collection of scenarios about which the...fuzzy decision tree to be evolved would make decisions . The database also had entries created by experts representing decisions about the scenarios
NASA Astrophysics Data System (ADS)
Chen, Zhang; Peng, Zhenming; Peng, Lingbing; Liao, Dongyi; He, Xin
2011-11-01
With the swift and violent development of the Multimedia Messaging Service (MMS), it becomes an urgent task to filter the Multimedia Message (MM) spam effectively in real-time. For the fact that most MMs contain images or videos, a method based on retrieving images is given in this paper for filtering MM spam. The detection method used in this paper is a combination of skin-color detection, texture detection, and face detection, and the classifier for this imbalanced problem is a very fast multi-classification combining Support vector machine (SVM) with unilateral binary decision tree. The experiments on 3 test sets show that the proposed method is effective, with the interception rate up to 60% and the average detection time for each image less than 1 second.
Creating ensembles of oblique decision trees with evolutionary algorithms and sampling
Cantu-Paz, Erick [Oakland, CA; Kamath, Chandrika [Tracy, CA
2006-06-13
A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.
The decision tree classifier - Design and potential. [for Landsat-1 data
NASA Technical Reports Server (NTRS)
Hauska, H.; Swain, P. H.
1975-01-01
A new classifier has been developed for the computerized analysis of remote sensor data. The decision tree classifier is essentially a maximum likelihood classifier using multistage decision logic. It is characterized by the fact that an unknown sample can be classified into a class using one or several decision functions in a successive manner. The classifier is applied to the analysis of data sensed by Landsat-1 over Kenosha Pass, Colorado. The classifier is illustrated by a tree diagram which for processing purposes is encoded as a string of symbols such that there is a unique one-to-one relationship between string and decision tree.
Liang, Shih-Hsiung; Walther, Bruno Andreas; Shieh, Bao-Sen
2017-01-01
Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables associated with reproduction. Our final optimal models achieved relatively high performance values, and we discuss differences in performance with regard to sample size and variable treatments. Our results showed that, for both the establishment model and introduction model, the number of invaded countries was the most important or second most important determinant, respectively. Therefore, we suggest that future success for introduction and establishment of exotic birds may be gauged by simply looking at previous success in invading other countries. Finally, we found that species traits related to reproduction were more important in establishment models than in introduction models; importantly, these determinants were not averaged but either minimum or maximum values of species traits. Therefore, we suggest that in addition to averaged values, reproductive potential represented by minimum and maximum values of species traits should be considered in invasion studies.
Liang, Shih-Hsiung; Walther, Bruno Andreas
2017-01-01
Background Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. Methods We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. Results The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables associated with reproduction. Discussion Our final optimal models achieved relatively high performance values, and we discuss differences in performance with regard to sample size and variable treatments. Our results showed that, for both the establishment model and introduction model, the number of invaded countries was the most important or second most important determinant, respectively. Therefore, we suggest that future success for introduction and establishment of exotic birds may be gauged by simply looking at previous success in invading other countries. Finally, we found that species traits related to reproduction were more important in establishment models than in introduction models; importantly, these determinants were not averaged but either minimum or maximum values of species traits. Therefore, we suggest that in addition to averaged values, reproductive potential represented by minimum and maximum values of species traits should be considered in invasion studies. PMID:28316893
Automated rule-base creation via CLIPS-Induce
NASA Technical Reports Server (NTRS)
Murphy, Patrick M.
1994-01-01
Many CLIPS rule-bases contain one or more rule groups that perform classification. In this paper we describe CLIPS-Induce, an automated system for the creation of a CLIPS classification rule-base from a set of test cases. CLIPS-Induce consists of two components, a decision tree induction component and a CLIPS production extraction component. ID3, a popular decision tree induction algorithm, is used to induce a decision tree from the test cases. CLIPS production extraction is accomplished through a top-down traversal of the decision tree. Nodes of the tree are used to construct query rules, and branches of the tree are used to construct classification rules. The learned CLIPS productions may easily be incorporated into a large CLIPS system that perform tasks such as accessing a database or displaying information.
Decision tree methods: applications for classification and prediction.
Song, Yan-Yan; Lu, Ying
2015-04-25
Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
Learning from examples - Generation and evaluation of decision trees for software resource analysis
NASA Technical Reports Server (NTRS)
Selby, Richard W.; Porter, Adam A.
1988-01-01
A general solution method for the automatic generation of decision (or classification) trees is investigated. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for software resource data analysis. The trees identify classes of objects (software modules) that had high development effort. Sixteen software systems ranging from 3,000 to 112,000 source lines were selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4,700 objects, captured information about the development effort, faults, changes, design style, and implementation style. A total of 9,600 decision trees were automatically generated and evaluated. The trees correctly identified 79.3 percent of the software modules that had high development effort or faults, and the trees generated from the best parameter combinations correctly identified 88.4 percent of the modules on the average.
Compact 0-complete trees: A new method for searching large files
DOE Office of Scientific and Technical Information (OSTI.GOV)
Orlandic, R.; Pfaltz, J.L.
1988-01-26
In this report, a novel approach to ordered retrieval in very large files is developed. The method employs a B-tree like search algorithm that is independent of key type or key length because all keys in index blocks are encoded by a 1 byte surrogate. The replacement of actual key sequences by the 1 byte surrogate ensures a maximal possible fan out and greatly reduces the storage overhead of maintaining access indices. Initially, retrieval in binary trie structure is developed. With the aid of a fairly complex recurrence relation, the rather scraggly binary trie is transformed into compact multi-way searchmore » tree. Then the recurrence relation itself is replaced by an unusually simple search algorithm. Then implementation details and empirical performance results are presented. Reduction of index size by 50%--75% opens up the possibility of replicating system-wide indices for parallel access in distributed databases. 23 figs.« less
Canonical multi-valued input Reed-Muller trees and forms
NASA Technical Reports Server (NTRS)
Perkowski, M. A.; Johnson, P. D.
1991-01-01
There is recently an increased interest in logic synthesis using EXOR gates. The paper introduces the fundamental concept of Orthogonal Expansion, which generalizes the ring form of the Shannon expansion to the logic with multiple-valued (mv) inputs. Based on this concept we are able to define a family of canonical tree circuits. Such circuits can be considered for binary and multiple-valued input cases. They can be multi-level (trees and DAG's) or flattened to two-level AND-EXOR circuits. Input decoders similar to those used in Sum of Products (SOP) PLA's are used in realizations of multiple-valued input functions. In the case of the binary logic the family of flattened AND-EXOR circuits includes several forms discussed by Davio and Green. For the case of the logic with multiple-valued inputs, the family of the flattened mv AND-EXOR circuits includes three expansions known from literature and two new expansions.
NASA Astrophysics Data System (ADS)
Danandeh Mehr, Ali; Nourani, Vahid; Hrnjica, Bahrudin; Molajou, Amir
2017-12-01
The effectiveness of genetic programming (GP) for solving regression problems in hydrology has been recognized in recent studies. However, its capability to solve classification problems has not been sufficiently explored so far. This study develops and applies a novel classification-forecasting model, namely Binary GP (BGP), for teleconnection studies between sea surface temperature (SST) variations and maximum monthly rainfall (MMR) events. The BGP integrates certain types of data pre-processing and post-processing methods with conventional GP engine to enhance its ability to solve both regression and classification problems simultaneously. The model was trained and tested using SST series of Black Sea, Mediterranean Sea, and Red Sea as potential predictors as well as classified MMR events at two locations in Iran as predictand. Skill of the model was measured in regard to different rainfall thresholds and SST lags and compared to that of the hybrid decision tree-association rule (DTAR) model available in the literature. The results indicated that the proposed model can identify potential teleconnection signals of surrounding seas beneficial to long-term forecasting of the occurrence of the classified MMR events.
Reconfigurable tree architectures using subtree oriented fault tolerance
NASA Technical Reports Server (NTRS)
Lowrie, Matthew B.
1987-01-01
An approach to the design of reconfigurable tree architecture is presented in which spare processors are allocated at the leaves. The approach is unique in that spares are associated with subtrees and sharing of spares between these subtrees can occur. The Subtree Oriented Fault Tolerance (SOFT) approach is more reliable than previous approaches capable of tolerating link and switch failures for both single chip and multichip tree implementations while reducing redundancy in terms of both spare processors and links. VLSI layout is 0(n) for binary trees and is directly extensible to N-ary trees and fault tolerance through performance degradation.
Decision-Tree Models of Categorization Response Times, Choice Proportions, and Typicality Judgments
ERIC Educational Resources Information Center
Lafond, Daniel; Lacouture, Yves; Cohen, Andrew L.
2009-01-01
The authors present 3 decision-tree models of categorization adapted from T. Trabasso, H. Rollins, and E. Shaughnessy (1971) and use them to provide a quantitative account of categorization response times, choice proportions, and typicality judgments at the individual-participant level. In Experiment 1, the decision-tree models were fit to…
Masías, Víctor H.; Krause, Mariane; Valdés, Nelson; Pérez, J. C.; Laengle, Sigifredo
2015-01-01
Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice. PMID:25914657
Masías, Víctor H; Krause, Mariane; Valdés, Nelson; Pérez, J C; Laengle, Sigifredo
2015-01-01
Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.
Delgado-Gomez, D; Baca-Garcia, E; Aguado, D; Courtet, P; Lopez-Castroman, J
2016-12-01
Several Computerized Adaptive Tests (CATs) have been proposed to facilitate assessments in mental health. These tests are built in a standard way, disregarding useful and usually available information not included in the assessment scales that could increase the precision and utility of CATs, such as the history of suicide attempts. Using the items of a previously developed scale for suicidal risk, we compared the performance of a standard CAT and a decision tree in a support decision system to identify suicidal behavior. We included the history of past suicide attempts as a class for the separation of patients in the decision tree. The decision tree needed an average of four items to achieve a similar accuracy than a standard CAT with nine items. The accuracy of the decision tree, obtained after 25 cross-validations, was 81.4%. A shortened test adapted for the separation of suicidal and non-suicidal patients was developed. CATs can be very useful tools for the assessment of suicidal risk. However, standard CATs do not use all the information that is available. A decision tree can improve the precision of the assessment since they are constructed using a priori information. Copyright © 2016 Elsevier B.V. All rights reserved.
Doubravsky, Karel; Dohnal, Mirko
2015-01-01
Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details. PMID:26158662
Doubravsky, Karel; Dohnal, Mirko
2015-01-01
Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details.
Encoding phylogenetic trees in terms of weighted quartets.
Grünewald, Stefan; Huber, Katharina T; Moulton, Vincent; Semple, Charles
2008-04-01
One of the main problems in phylogenetics is to develop systematic methods for constructing evolutionary or phylogenetic trees. For a set of species X, an edge-weighted phylogenetic X-tree or phylogenetic tree is a (graph theoretical) tree with leaf set X and no degree 2 vertices, together with a map assigning a non-negative length to each edge of the tree. Within phylogenetics, several methods have been proposed for constructing such trees that work by trying to piece together quartet trees on X, i.e. phylogenetic trees each having four leaves in X. Hence, it is of interest to characterise when a collection of quartet trees corresponds to a (unique) phylogenetic tree. Recently, Dress and Erdös provided such a characterisation for binary phylogenetic trees, that is, phylogenetic trees all of whose internal vertices have degree 3. Here we provide a new characterisation for arbitrary phylogenetic trees.
Freitas, Alex A; Limbu, Kriti; Ghafourian, Taravat
2015-01-01
Volume of distribution is an important pharmacokinetic property that indicates the extent of a drug's distribution in the body tissues. This paper addresses the problem of how to estimate the apparent volume of distribution at steady state (Vss) of chemical compounds in the human body using decision tree-based regression methods from the area of data mining (or machine learning). Hence, the pros and cons of several different types of decision tree-based regression methods have been discussed. The regression methods predict Vss using, as predictive features, both the compounds' molecular descriptors and the compounds' tissue:plasma partition coefficients (Kt:p) - often used in physiologically-based pharmacokinetics. Therefore, this work has assessed whether the data mining-based prediction of Vss can be made more accurate by using as input not only the compounds' molecular descriptors but also (a subset of) their predicted Kt:p values. Comparison of the models that used only molecular descriptors, in particular, the Bagging decision tree (mean fold error of 2.33), with those employing predicted Kt:p values in addition to the molecular descriptors, such as the Bagging decision tree using adipose Kt:p (mean fold error of 2.29), indicated that the use of predicted Kt:p values as descriptors may be beneficial for accurate prediction of Vss using decision trees if prior feature selection is applied. Decision tree based models presented in this work have an accuracy that is reasonable and similar to the accuracy of reported Vss inter-species extrapolations in the literature. The estimation of Vss for new compounds in drug discovery will benefit from methods that are able to integrate large and varied sources of data and flexible non-linear data mining methods such as decision trees, which can produce interpretable models. Graphical AbstractDecision trees for the prediction of tissue partition coefficient and volume of distribution of drugs.
Speckle Imaging and Spectroscopy of Kepler Exo-planet Transit Candidate Stars
NASA Astrophysics Data System (ADS)
Howell, Steve B.; Sherry, William; Horch, Elliott; Doyle, Laurance
2010-02-01
The NASA Kepler mission was successfully launched on 6 March 2009 and has begun science operations. Commissioning tests done early on in the mission have shown that for the bright sources, 10-15 ppm relative photometry can be achieved. This level assures we will detect Earth- like transits if they are present. ``Hot Jupiter" and similar large planet candidates have already been discovered and will be discussed at the Jan. AAS meeting as well as in a special issue of Science magazine to appear near years end. The plethora of variability observed is astounding and includes a number of eclipsing binaries which appear to have Jupiter and smaller size objects as an orbiting their body. Our proposal consists of three highly related objectives: 1) To continue our highly successful speckle imaging program which is a major component of defense to weed out false positive candidate transiting planets found by Kepler and move the rest to probable or certain exo-planet detections; 2) To obtain low resolution ``discovery" type spectra for planet candidate stars in order to provide spectral type and luminosity class indicators as well as a first look triage to eliminate binaries and rapid rotators; and 3) to obtain ~1Aresolution time ordered spectra of eclipsing binaries that are exo-planet candidates in order to obtain the velocity solution for the binary star, allowing its signal to be modeled and removed from the Keck or HET exo-planet velocity search. As of this writing, Kepler has produced a list of 227 exo-planet candidates which require false positive decision tree observations. Our proposed effort performs much of the first line of defense for the mission.
Lucini, Filipe R; S Fogliatto, Flavio; C da Silveira, Giovani J; L Neyeloff, Jeruza; Anzanello, Michel J; de S Kuchenbecker, Ricardo; D Schaan, Beatriz
2017-04-01
Emergency department (ED) overcrowding is a serious issue for hospitals. Early information on short-term inward bed demand from patients receiving care at the ED may reduce the overcrowding problem, and optimize the use of hospital resources. In this study, we use text mining methods to process data from early ED patient records using the SOAP framework, and predict future hospitalizations and discharges. We try different approaches for pre-processing of text records and to predict hospitalization. Sets-of-words are obtained via binary representation, term frequency, and term frequency-inverse document frequency. Unigrams, bigrams and trigrams are tested for feature formation. Feature selection is based on χ 2 and F-score metrics. In the prediction module, eight text mining methods are tested: Decision Tree, Random Forest, Extremely Randomized Tree, AdaBoost, Logistic Regression, Multinomial Naïve Bayes, Support Vector Machine (Kernel linear) and Nu-Support Vector Machine (Kernel linear). Prediction performance is evaluated by F1-scores. Precision and Recall values are also informed for all text mining methods tested. Nu-Support Vector Machine was the text mining method with the best overall performance. Its average F1-score in predicting hospitalization was 77.70%, with a standard deviation (SD) of 0.66%. The method could be used to manage daily routines in EDs such as capacity planning and resource allocation. Text mining could provide valuable information and facilitate decision-making by inward bed management teams. Copyright © 2017 Elsevier Ireland Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moody, A. T.
2014-12-26
Avalaunch implements a tree-based process launcher. It first bootstraps itself on to a set of compute nodes by launching children processes, which immediately connect back to the parent process to acquire info needed t launch their own children. Once the tree is established, user processes are started by broadcasting commands and application binaries through the tree. All communication flows over high-performance network protocols via spawnnet. The goal is to start MPI jobs having hundreds of thousands of processes within seconds.
CSTutor: A Sketch-Based Tool for Visualizing Data Structures
ERIC Educational Resources Information Center
Buchanan, Sarah; Laviola, Joseph J., Jr.
2014-01-01
We present CSTutor, a sketch-based interface designed to help students understand data structures, specifically Linked Lists, Binary Search Trees, AVL Trees, and Heaps. CSTutor creates an environment that seamlessly combines a user's sketched diagram and code. In each of these data structure modes, the user can naturally sketch a data structure on…
Using classification tree analysis to predict oak wilt distribution in Minnesota and Texas
Marla c. Downing; Vernon L. Thomas; Jennifer Juzwik; David N. Appel; Robin M. Reich; Kim Camilli
2008-01-01
We developed a methodology and compared results for predicting the potential distribution of Ceratocystis fagacearum (causal agent of oak wilt), in both Anoka County, MN, and Fort Hood, TX. The Potential Distribution of Oak Wilt (PDOW) utilizes a binary classification tree statistical technique that incorporates: geographical information systems (GIS...
Peculiar spectral statistics of ensembles of trees and star-like graphs
NASA Astrophysics Data System (ADS)
Kovaleva, V.; Maximov, Yu; Nechaev, S.; Valba, O.
2017-07-01
In this paper we investigate the eigenvalue statistics of exponentially weighted ensembles of full binary trees and p-branching star graphs. We show that spectral densities of corresponding adjacency matrices demonstrate peculiar ultrametric structure inherent to sparse systems. In particular, the tails of the distribution for binary trees share the ‘Lifshitz singularity’ emerging in the one-dimensional localization, while the spectral statistics of p-branching star-like graphs is less universal, being strongly dependent on p. The hierarchical structure of spectra of adjacency matrices is interpreted as sets of resonance frequencies, that emerge in ensembles of fully branched tree-like systems, known as dendrimers. However, the relaxational spectrum is not determined by the cluster topology, but has rather the number-theoretic origin, reflecting the peculiarities of the rare-event statistics typical for one-dimensional systems with a quenched structural disorder. The similarity of spectral densities of an individual dendrimer and of an ensemble of linear chains with exponential distribution in lengths, demonstrates that dendrimers could be served as simple disorder-less toy models of one-dimensional systems with quenched disorder.
Universal features of dendrites through centripetal branch ordering
Effenberger, Felix; Muellerleile, Julia
2017-01-01
Dendrites form predominantly binary trees that are exquisitely embedded in the networks of the brain. While neuronal computation is known to depend on the morphology of dendrites, their underlying topological blueprint remains unknown. Here, we used a centripetal branch ordering scheme originally developed to describe river networks—the Horton-Strahler order (SO)–to examine hierarchical relationships of branching statistics in reconstructed and model dendritic trees. We report on a number of universal topological relationships with SO that are true for all binary trees and distinguish those from SO-sorted metric measures that appear to be cell type-specific. The latter are therefore potential new candidates for categorising dendritic tree structures. Interestingly, we find a faithful correlation of branch diameters with centripetal branch orders, indicating a possible functional importance of SO for dendritic morphology and growth. Also, simulated local voltage responses to synaptic inputs are strongly correlated with SO. In summary, our study identifies important SO-dependent measures in dendritic morphology that are relevant for neural function while at the same time it describes other relationships that are universal for all dendrites. PMID:28671947
van Iersel, Leo; Kelk, Steven; Lekić, Nela; Scornavacca, Celine
2014-05-05
Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work (SIDMA 26(4):1635-1656, TCBB 10(1):18-25, SIDMA 28(1):49-66) and are publicly available. We also apply our methods to real data.
Peculiar spectral statistics of ensembles of trees and star-like graphs
Kovaleva, V.; Maximov, Yu; Nechaev, S.; ...
2017-07-11
In this paper we investigate the eigenvalue statistics of exponentially weighted ensembles of full binary trees and p-branching star graphs. We show that spectral densities of corresponding adjacency matrices demonstrate peculiar ultrametric structure inherent to sparse systems. In particular, the tails of the distribution for binary trees share the \\Lifshitz singularity" emerging in the onedimensional localization, while the spectral statistics of p-branching star-like graphs is less universal, being strongly dependent on p. The hierarchical structure of spectra of adjacency matrices is interpreted as sets of resonance frequencies, that emerge in ensembles of fully branched tree-like systems, known as dendrimers. However,more » the relaxational spectrum is not determined by the cluster topology, but has rather the number-theoretic origin, re ecting the peculiarities of the rare-event statistics typical for one-dimensional systems with a quenched structural disorder. The similarity of spectral densities of an individual dendrimer and of ensemble of linear chains with exponential distribution in lengths, demonstrates that dendrimers could be served as simple disorder-less toy models of one-dimensional systems with quenched disorder.« less
Peculiar spectral statistics of ensembles of trees and star-like graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kovaleva, V.; Maximov, Yu; Nechaev, S.
In this paper we investigate the eigenvalue statistics of exponentially weighted ensembles of full binary trees and p-branching star graphs. We show that spectral densities of corresponding adjacency matrices demonstrate peculiar ultrametric structure inherent to sparse systems. In particular, the tails of the distribution for binary trees share the \\Lifshitz singularity" emerging in the onedimensional localization, while the spectral statistics of p-branching star-like graphs is less universal, being strongly dependent on p. The hierarchical structure of spectra of adjacency matrices is interpreted as sets of resonance frequencies, that emerge in ensembles of fully branched tree-like systems, known as dendrimers. However,more » the relaxational spectrum is not determined by the cluster topology, but has rather the number-theoretic origin, re ecting the peculiarities of the rare-event statistics typical for one-dimensional systems with a quenched structural disorder. The similarity of spectral densities of an individual dendrimer and of ensemble of linear chains with exponential distribution in lengths, demonstrates that dendrimers could be served as simple disorder-less toy models of one-dimensional systems with quenched disorder.« less
NASA Technical Reports Server (NTRS)
Shiffman, Smadar
2004-01-01
Automated cloud detection and tracking is an important step in assessing global climate change via remote sensing. Cloud masks, which indicate whether individual pixels depict clouds, are included in many of the data products that are based on data acquired on- board earth satellites. Many cloud-mask algorithms have the form of decision trees, which employ sequential tests that scientists designed based on empirical astrophysics studies and astrophysics simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In this study we explored the potential benefits of automatically-learned decision trees for detecting clouds from images acquired using the Advanced Very High Resolution Radiometer (AVHRR) instrument on board the NOAA-14 weather satellite of the National Oceanic and Atmospheric Administration. We constructed three decision trees for a sample of 8km-daily AVHRR data from 2000 using a decision-tree learning procedure provided within MATLAB(R), and compared the accuracy of the decision trees to the accuracy of the cloud mask. We used ground observations collected by the National Aeronautics and Space Administration Clouds and the Earth s Radiant Energy Systems S COOL project as the gold standard. For the sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks included in the AVHRR data product.
Batterham, Philip J; Christensen, Helen; Mackinnon, Andrew J
2009-11-22
Relative to physical health conditions such as cardiovascular disease, little is known about risk factors that predict the prevalence of depression. The present study investigates the expected effects of a reduction of these risks over time, using the decision tree method favoured in assessing cardiovascular disease risk. The PATH through Life cohort was used for the study, comprising 2,105 20-24 year olds, 2,323 40-44 year olds and 2,177 60-64 year olds sampled from the community in the Canberra region, Australia. A decision tree methodology was used to predict the presence of major depressive disorder after four years of follow-up. The decision tree was compared with a logistic regression analysis using ROC curves. The decision tree was found to distinguish and delineate a wide range of risk profiles. Previous depressive symptoms were most highly predictive of depression after four years, however, modifiable risk factors such as substance use and employment status played significant roles in assessing the risk of depression. The decision tree was found to have better sensitivity and specificity than a logistic regression using identical predictors. The decision tree method was useful in assessing the risk of major depressive disorder over four years. Application of the model to the development of a predictive tool for tailored interventions is discussed.
Implementation of Data Mining to Analyze Drug Cases Using C4.5 Decision Tree
NASA Astrophysics Data System (ADS)
Wahyuni, Sri
2018-03-01
Data mining was the process of finding useful information from a large set of databases. One of the existing techniques in data mining was classification. The method used was decision tree method and algorithm used was C4.5 algorithm. The decision tree method was a method that transformed a very large fact into a decision tree which was presenting the rules. Decision tree method was useful for exploring data, as well as finding a hidden relationship between a number of potential input variables with a target variable. The decision tree of the C4.5 algorithm was constructed with several stages including the selection of attributes as roots, created a branch for each value and divided the case into the branch. These stages would be repeated for each branch until all the cases on the branch had the same class. From the solution of the decision tree there would be some rules of a case. In this case the researcher classified the data of prisoners at Labuhan Deli prison to know the factors of detainees committing criminal acts of drugs. By applying this C4.5 algorithm, then the knowledge was obtained as information to minimize the criminal acts of drugs. From the findings of the research, it was found that the most influential factor of the detainee committed the criminal act of drugs was from the address variable.
An Improved Decision Tree for Predicting a Major Product in Competing Reactions
ERIC Educational Resources Information Center
Graham, Kate J.
2014-01-01
When organic chemistry students encounter competing reactions, they are often overwhelmed by the task of evaluating multiple factors that affect the outcome of a reaction. The use of a decision tree is a useful tool to teach students to evaluate a complex situation and propose a likely outcome. Specifically, a decision tree can help students…
Decision Tree Phytoremediation
1999-12-01
aromatic hydrocarbons, and landfill leachates . Phytoremediation has been used for point and nonpoint source hazardous waste control. 1.2 Types of... Phytoremediation Prepared by Interstate Technology and Regulatory Cooperation Work Group Phytoremediation Work Team December 1999 Decision Tree...1999 2. REPORT TYPE N/A 3. DATES COVERED - 4. TITLE AND SUBTITLE Phytoremediation Decision Tree 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c
Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.
Barros, Rodrigo C; Winck, Ana T; Machado, Karina S; Basgalupp, Márcio P; de Carvalho, André C P L F; Ruiz, Duncan D; de Souza, Osmar Norberto
2012-11-21
This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data
2012-01-01
Background This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor. PMID:23171000
Nair, Shalini Rajandran; Tan, Li Kuo; Mohd Ramli, Norlisah; Lim, Shen Yang; Rahmat, Kartini; Mohd Nor, Hazman
2013-06-01
To develop a decision tree based on standard magnetic resonance imaging (MRI) and diffusion tensor imaging to differentiate multiple system atrophy (MSA) from Parkinson's disease (PD). 3-T brain MRI and DTI (diffusion tensor imaging) were performed on 26 PD and 13 MSA patients. Regions of interest (ROIs) were the putamen, substantia nigra, pons, middle cerebellar peduncles (MCP) and cerebellum. Linear, volumetry and DTI (fractional anisotropy and mean diffusivity) were measured. A three-node decision tree was formulated, with design goals being 100 % specificity at node 1, 100 % sensitivity at node 2 and highest combined sensitivity and specificity at node 3. Nine parameters (mean width, fractional anisotropy (FA) and mean diffusivity (MD) of MCP; anteroposterior diameter of pons; cerebellar FA and volume; pons and mean putamen volume; mean FA substantia nigra compacta-rostral) showed statistically significant (P < 0.05) differences between MSA and PD with mean MCP width, anteroposterior diameter of pons and mean FA MCP chosen for the decision tree. Threshold values were 14.6 mm, 21.8 mm and 0.55, respectively. Overall performance of the decision tree was 92 % sensitivity, 96 % specificity, 92 % PPV and 96 % NPV. Twelve out of 13 MSA patients were accurately classified. Formation of the decision tree using these parameters was both descriptive and predictive in differentiating between MSA and PD. • Parkinson's disease and multiple system atrophy can be distinguished on MR imaging. • Combined conventional MRI and diffusion tensor imaging improves the accuracy of diagnosis. • A decision tree is descriptive and predictive in differentiating between clinical entities. • A decision tree can reliably differentiate Parkinson's disease from multiple system atrophy.
Application of preprocessing filtering on Decision Tree C4.5 and rough set theory
NASA Astrophysics Data System (ADS)
Chan, Joseph C. C.; Lin, Tsau Y.
2001-03-01
This paper compares two artificial intelligence methods: the Decision Tree C4.5 and Rough Set Theory on the stock market data. The Decision Tree C4.5 is reviewed with the Rough Set Theory. An enhanced window application is developed to facilitate the pre-processing filtering by introducing the feature (attribute) transformations, which allows users to input formulas and create new attributes. Also, the application produces three varieties of data set with delaying, averaging, and summation. The results prove the improvement of pre-processing by applying feature (attribute) transformations on Decision Tree C4.5. Moreover, the comparison between Decision Tree C4.5 and Rough Set Theory is based on the clarity, automation, accuracy, dimensionality, raw data, and speed, which is supported by the rules sets generated by both algorithms on three different sets of data.
Multivariate analysis of flow cytometric data using decision trees.
Simon, Svenja; Guthke, Reinhard; Kamradt, Thomas; Frey, Oliver
2012-01-01
Characterization of the response of the host immune system is important in understanding the bidirectional interactions between the host and microbial pathogens. For research on the host site, flow cytometry has become one of the major tools in immunology. Advances in technology and reagents allow now the simultaneous assessment of multiple markers on a single cell level generating multidimensional data sets that require multivariate statistical analysis. We explored the explanatory power of the supervised machine learning method called "induction of decision trees" in flow cytometric data. In order to examine whether the production of a certain cytokine is depended on other cytokines, datasets from intracellular staining for six cytokines with complex patterns of co-expression were analyzed by induction of decision trees. After weighting the data according to their class probabilities, we created a total of 13,392 different decision trees for each given cytokine with different parameter settings. For a more realistic estimation of the decision trees' quality, we used stratified fivefold cross validation and chose the "best" tree according to a combination of different quality criteria. While some of the decision trees reflected previously known co-expression patterns, we found that the expression of some cytokines was not only dependent on the co-expression of others per se, but was also dependent on the intensity of expression. Thus, for the first time we successfully used induction of decision trees for the analysis of high dimensional flow cytometric data and demonstrated the feasibility of this method to reveal structural patterns in such data sets.
The attentional drift-diffusion model extends to simple purchasing decisions.
Krajbich, Ian; Lu, Dingchao; Camerer, Colin; Rangel, Antonio
2012-01-01
How do we make simple purchasing decisions (e.g., whether or not to buy a product at a given price)? Previous work has shown that the attentional drift-diffusion model (aDDM) can provide accurate quantitative descriptions of the psychometric data for binary and trinary value-based choices, and of how the choice process is guided by visual attention. Here we extend the aDDM to the case of purchasing decisions, and test it using an eye-tracking experiment. We find that the model also provides a reasonably accurate quantitative description of the relationship between choice, reaction time, and visual fixations using parameters that are very similar to those that best fit the previous data. The only critical difference is that the choice biases induced by the fixations are about half as big in purchasing decisions as in binary choices. This suggests that a similar computational process is used to make binary choices, trinary choices, and simple purchasing decisions.
The Attentional Drift-Diffusion Model Extends to Simple Purchasing Decisions
Krajbich, Ian; Lu, Dingchao; Camerer, Colin; Rangel, Antonio
2012-01-01
How do we make simple purchasing decisions (e.g., whether or not to buy a product at a given price)? Previous work has shown that the attentional drift-diffusion model (aDDM) can provide accurate quantitative descriptions of the psychometric data for binary and trinary value-based choices, and of how the choice process is guided by visual attention. Here we extend the aDDM to the case of purchasing decisions, and test it using an eye-tracking experiment. We find that the model also provides a reasonably accurate quantitative description of the relationship between choice, reaction time, and visual fixations using parameters that are very similar to those that best fit the previous data. The only critical difference is that the choice biases induced by the fixations are about half as big in purchasing decisions as in binary choices. This suggests that a similar computational process is used to make binary choices, trinary choices, and simple purchasing decisions. PMID:22707945
Automated detection of tuberculosis on sputum smeared slides using stepwise classification
NASA Astrophysics Data System (ADS)
Divekar, Ajay; Pangilinan, Corina; Coetzee, Gerrit; Sondh, Tarlochan; Lure, Fleming Y. M.; Kennedy, Sean
2012-03-01
Routine visual slide screening for identification of tuberculosis (TB) bacilli in stained sputum slides under microscope system is a tedious labor-intensive task and can miss up to 50% of TB. Based on the Shannon cofactor expansion on Boolean function for classification, a stepwise classification (SWC) algorithm is developed to remove different types of false positives, one type at a time, and to increase the detection of TB bacilli at different concentrations. Both bacilli and non-bacilli objects are first analyzed and classified into several different categories including scanty positive, high concentration positive, and several non-bacilli categories: small bright objects, beaded, dim elongated objects, etc. The morphological and contrast features are extracted based on aprior clinical knowledge. The SWC is composed of several individual classifiers. Individual classifier to increase the bacilli counts utilizes an adaptive algorithm based on a microbiologist's statistical heuristic decision process. Individual classifier to reduce false positive is developed through minimization from a binary decision tree to classify different types of true and false positive based on feature vectors. Finally, the detection algorithm is was tested on 102 independent confirmed negative and 74 positive cases. A multi-class task analysis shows high accordance rate for negative, scanty, and high-concentration as 88.24%, 56.00%, and 97.96%, respectively. A binary-class task analysis using a receiver operating characteristics method with the area under the curve (Az) is also utilized to analyze the performance of this detection algorithm, showing the superior detection performance on the high-concentration cases (Az=0.913) and cases mixed with high-concentration and scanty cases (Az=0.878).
15 CFR Supplement 1 to Part 732 - Decision Tree
Code of Federal Regulations, 2010 CFR
2010-01-01
... 15 Commerce and Foreign Trade 2 2010-01-01 2010-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000 [69 FR 5687, Feb. 6, 2004] ...
15 CFR Supplement No 1 to Part 732 - Decision Tree
Code of Federal Regulations, 2013 CFR
2013-01-01
... 15 Commerce and Foreign Trade 2 2013-01-01 2013-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000 [69 FR 5687, Feb. 6...
15 CFR Supplement No 1 to Part 732 - Decision Tree
Code of Federal Regulations, 2014 CFR
2014-01-01
... 15 Commerce and Foreign Trade 2 2014-01-01 2014-01-01 false Decision Tree No Supplement No 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued... THE EAR Pt. 732, Supp. 1 Supplement No 1 to Part 732—Decision Tree ER06FE04.000 [69 FR 5687, Feb. 6...
15 CFR Supplement 1 to Part 732 - Decision Tree
Code of Federal Regulations, 2012 CFR
2012-01-01
... 15 Commerce and Foreign Trade 2 2012-01-01 2012-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000 [69 FR 5687, Feb. 6, 2004] ...
15 CFR Supplement 1 to Part 732 - Decision Tree
Code of Federal Regulations, 2011 CFR
2011-01-01
... 15 Commerce and Foreign Trade 2 2011-01-01 2011-01-01 false Decision Tree 1 Supplement 1 to Part 732 Commerce and Foreign Trade Regulations Relating to Commerce and Foreign Trade (Continued) BUREAU... THE EAR Pt. 732, Supp. 1 Supplement 1 to Part 732—Decision Tree ER06FE04.000 [69 FR 5687, Feb. 6, 2004] ...
Improved Frame Mode Selection for AMR-WB+ Based on Decision Tree
NASA Astrophysics Data System (ADS)
Kim, Jong Kyu; Kim, Nam Soo
In this letter, we propose a coding mode selection method for the AMR-WB+ audio coder based on a decision tree. In order to reduce computation while maintaining good performance, decision tree classifier is adopted with the closed loop mode selection results as the target classification labels. The size of the decision tree is controlled by pruning, so the proposed method does not increase the memory requirement significantly. Through an evaluation test on a database covering both speech and music materials, the proposed method is found to achieve a much better mode selection accuracy compared with the open loop mode selection module in the AMR-WB+.
Activity classification using realistic data from wearable sensors.
Pärkkä, Juha; Ermes, Miikka; Korpipää, Panu; Mäntyjärvi, Jani; Peltola, Johannes; Korhonen, Ilkka
2006-01-01
Automatic classification of everyday activities can be used for promotion of health-enhancing physical activities and a healthier lifestyle. In this paper, methods used for classification of everyday activities like walking, running, and cycling are described. The aim of the study was to find out how to recognize activities, which sensors are useful and what kind of signal processing and classification is required. A large and realistic data library of sensor data was collected. Sixteen test persons took part in the data collection, resulting in approximately 31 h of annotated, 35-channel data recorded in an everyday environment. The test persons carried a set of wearable sensors while performing several activities during the 2-h measurement session. Classification results of three classifiers are shown: custom decision tree, automatically generated decision tree, and artificial neural network. The classification accuracies using leave-one-subject-out cross validation range from 58 to 97% for custom decision tree classifier, from 56 to 97% for automatically generated decision tree, and from 22 to 96% for artificial neural network. Total classification accuracy is 82 % for custom decision tree classifier, 86% for automatically generated decision tree, and 82% for artificial neural network.
The coalescent of a sample from a binary branching process.
Lambert, Amaury
2018-04-25
At time 0, start a time-continuous binary branching process, where particles give birth to a single particle independently (at a possibly time-dependent rate) and die independently (at a possibly time-dependent and age-dependent rate). A particular case is the classical birth-death process. Stop this process at time T>0. It is known that the tree spanned by the N tips alive at time T of the tree thus obtained (called a reduced tree or coalescent tree) is a coalescent point process (CPP), which basically means that the depths of interior nodes are independent and identically distributed (iid). Now select each of the N tips independently with probability y (Bernoulli sample). It is known that the tree generated by the selected tips, which we will call the Bernoulli sampled CPP, is again a CPP. Now instead, select exactly k tips uniformly at random among the N tips (a k-sample). We show that the tree generated by the selected tips is a mixture of Bernoulli sampled CPPs with the same parent CPP, over some explicit distribution of the sampling probability y. An immediate consequence is that the genealogy of a k-sample can be obtained by the realization of k random variables, first the random sampling probability Y and then the k-1 node depths which are iid conditional on Y=y. Copyright © 2018. Published by Elsevier Inc.
QTest: Quantitative Testing of Theories of Binary Choice.
Regenwetter, Michel; Davis-Stober, Clintin P; Lim, Shiau Hong; Guo, Ying; Popova, Anna; Zwilling, Chris; Cha, Yun-Shil; Messner, William
2014-01-01
The goal of this paper is to make modeling and quantitative testing accessible to behavioral decision researchers interested in substantive questions. We provide a novel, rigorous, yet very general, quantitative diagnostic framework for testing theories of binary choice. This permits the nontechnical scholar to proceed far beyond traditionally rather superficial methods of analysis, and it permits the quantitatively savvy scholar to triage theoretical proposals before investing effort into complex and specialized quantitative analyses. Our theoretical framework links static algebraic decision theory with observed variability in behavioral binary choice data. The paper is supplemented with a custom-designed public-domain statistical analysis package, the QTest software. We illustrate our approach with a quantitative analysis using published laboratory data, including tests of novel versions of "Random Cumulative Prospect Theory." A major asset of the approach is the potential to distinguish decision makers who have a fixed preference and commit errors in observed choices from decision makers who waver in their preferences.
A universal hybrid decision tree classifier design for human activity classification.
Chien, Chieh; Pottie, Gregory J
2012-01-01
A system that reliably classifies daily life activities can contribute to more effective and economical treatments for patients with chronic conditions or undergoing rehabilitative therapy. We propose a universal hybrid decision tree classifier for this purpose. The tree classifier can flexibly implement different decision rules at its internal nodes, and can be adapted from a population-based model when supplemented by training data for individuals. The system was tested using seven subjects each monitored by 14 triaxial accelerometers. Each subject performed fourteen different activities typical of daily life. Using leave-one-out cross validation, our decision tree produced average classification accuracies of 89.9%. In contrast, the MATLAB personalized tree classifiers using Gini's diversity index as the split criterion followed by optimally tuning the thresholds for each subject yielded 69.2%.
An Isometric Mapping Based Co-Location Decision Tree Algorithm
NASA Astrophysics Data System (ADS)
Zhou, G.; Wei, J.; Zhou, X.; Zhang, R.; Huang, W.; Sha, H.; Chen, J.
2018-05-01
Decision tree (DT) induction has been widely used in different pattern classification. However, most traditional DTs have the disadvantage that they consider only non-spatial attributes (ie, spectral information) as a result of classifying pixels, which can result in objects being misclassified. Therefore, some researchers have proposed a co-location decision tree (Cl-DT) method, which combines co-location and decision tree to solve the above the above-mentioned traditional decision tree problems. Cl-DT overcomes the shortcomings of the existing DT algorithms, which create a node for each value of a given attribute, which has a higher accuracy than the existing decision tree approach. However, for non-linearly distributed data instances, the euclidean distance between instances does not reflect the true positional relationship between them. In order to overcome these shortcomings, this paper proposes an isometric mapping method based on Cl-DT (called, (Isomap-based Cl-DT), which is a method that combines heterogeneous and Cl-DT together. Because isometric mapping methods use geodetic distances instead of Euclidean distances between non-linearly distributed instances, the true distance between instances can be reflected. The experimental results and several comparative analyzes show that: (1) The extraction method of exposed carbonate rocks is of high accuracy. (2) The proposed method has many advantages, because the total number of nodes, the number of leaf nodes and the number of nodes are greatly reduced compared to Cl-DT. Therefore, the Isomap -based Cl-DT algorithm can construct a more accurate and faster decision tree.
Wang, Ting; Li, Weiying; Zheng, Xiaofeng; Lin, Zhifen; Kong, Deyang
2014-02-01
During the last past decades, there is an increasing number of studies about estrogenic activities of the environmental pollutants on amphibians and many determination methods have been proposed. However, these determination methods are time-consuming and expensive, and a rapid and simple method to screen and test the chemicals for estrogenic activities to amphibians is therefore imperative. Herein is proposed a new decision tree formulated not only with physicochemical parameters but also a biological parameter that was successfully used to screen estrogenic activities of the chemicals on amphibians. The biological parameter, CDOCKER interaction energy (Ebinding ) between chemicals and the target proteins was calculated based on the method of molecular docking, and it was used to revise the decision tree formulated by Hong only with physicochemical parameters for screening estrogenic activity of chemicals in rat. According to the correlation between Ebinding of rat and Xenopus laevis, a new decision tree for estrogenic activities in Xenopus laevis is finally proposed. Then it was validated by using the randomly 8 chemicals which can be frequently exposed to Xenopus laevis, and the agreement between the results from the new decision tree and the ones from experiments is generally satisfactory. Consequently, the new decision tree can be used to screen the estrogenic activities of the chemicals, and combinational use of the Ebinding and classical physicochemical parameters can greatly improves Hong's decision tree. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Stonecipher, Karl; Parrish, Joseph; Stonecipher, Megan
2018-05-18
This review is intended to update and educate the reader on the currently available options for laser vision correction, more specifically, laser-assisted in-situ keratomileusis (LASIK). In addition, some related clinical outcomes data from over 1000 cases performed over a 1-year are presented to highlight some differences between the various treatment profiles currently available including the rapidity of visual recovery. The cases in question were performed on the basis of a decision tree to segregate patients on the basis of anatomical, topographic and aberrometry findings; the decision tree was formulated based on the data available in some of the reviewed articles. Numerous recent studies reported in the literature provide data related to the risks and benefits of LASIK; alternatives to a laser refractive procedure are also discussed. The results from these studies have been used to prepare a decision tree to assist the surgeon in choosing the best option for the patient based on the data from several standard preoperative diagnostic tests. The data presented here should aid surgeons in understanding the effects of currently available LASIK treatment profiles. Surgeons should also be able to appreciate how the findings were used to create a decision tree to help choose the most appropriate treatment profile for patients. Finally, the retrospective evaluation of clinical outcomes based on the decision tree should provide surgeons with a realistic expectation for their own outcomes should they adopt such a decision tree in their own practice.
Analysis of the Discourse of Power in Etel Adnan's Play "Like a Christmas Tree"
ERIC Educational Resources Information Center
Alashqar, Hossam Mahmoud
2015-01-01
This paper seeks to investigate the sources of power in the discourse of an Arab-American writer, Etel Adnan's one act play, "Like a Christmas Tree." The play represents a heated argument between two figures who stand for two different ideologies and who fall within the frame of "binary opposition," transcultural…
NASA Astrophysics Data System (ADS)
Estuar, Maria Regina Justina; Victorino, John Noel; Coronel, Andrei; Co, Jerelyn; Tiausas, Francis; Señires, Chiara Veronica
2017-09-01
Use of wireless sensor networks and smartphone integration design to monitor environmental parameters surrounding plantations is made possible because of readily available and affordable sensors. Providing low cost monitoring devices would be beneficial, especially to small farm owners, in a developing country like the Philippines, where agriculture covers a significant amount of the labor market. This study discusses the integration of wireless soil sensor devices and smartphones to create an application that will use multidimensional analysis to detect the presence or absence of plant disease. Specifically, soil sensors are designed to collect soil quality parameters in a sink node from which the smartphone collects data from via Bluetooth. Given these, there is a need to develop a classification model on the mobile phone that will report infection status of a soil. Though tree classification is the most appropriate approach for continuous parameter-based datasets, there is a need to determine whether tree models will result to coherent results or not. Soil sensor data that resides on the phone is modeled using several variations of decision tree, namely: decision tree (DT), best-fit (BF) decision tree, functional tree (FT), Naive Bayes (NB) decision tree, J48, J48graft and LAD tree, where decision tree approaches the problem by considering all sensor nodes as one. Results show that there are significant differences among soil sensor parameters indicating that there are variances in scores between the infected and uninfected sites. Furthermore, analysis of variance in accuracy, recall, precision and F1 measure scores from tree classification models homogeneity among NBTree, J48graft and J48 tree classification models.
FPGA implementation of concatenated non-binary QC-LDPC codes for high-speed optical transport.
Zou, Ding; Djordjevic, Ivan B
2015-06-01
In this paper, we propose a soft-decision-based FEC scheme that is the concatenation of a non-binary LDPC code and hard-decision FEC code. The proposed NB-LDPC + RS with overhead of 27.06% provides a superior NCG of 11.9dB at a post-FEC BER of 10-15. As a result, the proposed NB-LDPC codes represent the strong FEC candidate of soft-decision FEC for beyond 100Gb/s optical transmission systems.
Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H
2016-01-01
Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
A survey of decision tree classifier methodology
NASA Technical Reports Server (NTRS)
Safavian, S. R.; Landgrebe, David
1991-01-01
Decision tree classifiers (DTCs) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps the most important feature of DTCs is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issues. After considering potential advantages of DTCs over single-state classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.
A survey of decision tree classifier methodology
NASA Technical Reports Server (NTRS)
Safavian, S. Rasoul; Landgrebe, David
1990-01-01
Decision Tree Classifiers (DTC's) are used successfully in many diverse areas such as radar signal classification, character recognition, remote sensing, medical diagnosis, expert systems, and speech recognition. Perhaps, the most important feature of DTC's is their capability to break down a complex decision-making process into a collection of simpler decisions, thus providing a solution which is often easier to interpret. A survey of current methods is presented for DTC designs and the various existing issue. After considering potential advantages of DTC's over single stage classifiers, subjects of tree structure design, feature selection at each internal node, and decision and search strategies are discussed.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
Two Dimensional Path Planning with Obstacles and Shadows.
1987-01-01
22060College Park, MD 20742 8la NAME OF FUNDING/SPONSORING Bb. OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER •" " .N!ZATION f (If...quadtree is a tip node (if the tree . It represents a tinifirmly c(olored -Iq tiare region of the picture. A gray n()de ()f the (tii tree is a nd((e...Sight Algorithm traversal of the quadtree, they can be sorted using a binary tree by their relative location on the line of sight, given by the x or y
Study on the Secant Segmentation Algorithm of Rubber Tree
NASA Astrophysics Data System (ADS)
Li, Shute; Zhang, Jie; Zhang, Jian; Sun, Liang; Liu, Yongna
2018-04-01
Natural rubber is one of the most important materials in the national defense and industry, and the tapping panel dryness (TPD) of the rubber tree is one of the most serious diseases that affect the production of rubber. Although considerable progress has been made in the more than 100 years of research on the TPD, there are still many areas to be improved. At present, the method of artificial observation is widely used to identify TPD, but the diversity of rubber tree secant symptoms leads to the inaccurate judgement of the level of TPD. In this paper, image processing technology is used to separate the secant and latex, so that we can get rid of the interference factors, get the exact secant and latex binary image. By calculating the area ratio of the corresponding binary images, the grade of TPD can be classified accurately. and can also provide an objective basis for the accurate identification of the tapping panel dryness (TPD) level.
Binary-space-partitioned images for resolving image-based visibility.
Fu, Chi-Wing; Wong, Tien-Tsin; Tong, Wai-Shun; Tang, Chi-Keung; Hanson, Andrew J
2004-01-01
We propose a novel 2D representation for 3D visibility sorting, the Binary-Space-Partitioned Image (BSPI), to accelerate real-time image-based rendering. BSPI is an efficient 2D realization of a 3D BSP tree, which is commonly used in computer graphics for time-critical visibility sorting. Since the overall structure of a BSP tree is encoded in a BSPI, traversing a BSPI is comparable to traversing the corresponding BSP tree. BSPI performs visibility sorting efficiently and accurately in the 2D image space by warping the reference image triangle-by-triangle instead of pixel-by-pixel. Multiple BSPIs can be combined to solve "disocclusion," when an occluded portion of the scene becomes visible at a novel viewpoint. Our method is highly automatic, including a tensor voting preprocessing step that generates candidate image partition lines for BSPIs, filters the noisy input data by rejecting outliers, and interpolates missing information. Our system has been applied to a variety of real data, including stereo, motion, and range images.
SVM-based tree-type neural networks as a critic in adaptive critic designs for control.
Deb, Alok Kanti; Jayadeva; Gopal, Madan; Chandra, Suresh
2007-07-01
In this paper, we use the approach of adaptive critic design (ACD) for control, specifically, the action-dependent heuristic dynamic programming (ADHDP) method. A least squares support vector machine (SVM) regressor has been used for generating the control actions, while an SVM-based tree-type neural network (NN) is used as the critic. After a failure occurs, the critic and action are retrained in tandem using the failure data. Failure data is binary classification data, where the number of failure states are very few as compared to the number of no-failure states. The difficulty of conventional multilayer feedforward NNs in learning this type of classification data has been overcome by using the SVM-based tree-type NN, which due to its feature to add neurons to learn misclassified data, has the capability to learn any binary classification data without a priori choice of the number of neurons or the structure of the network. The capability of the trained controller to handle unforeseen situations is demonstrated.
Development of a diagnostic decision tree for obstructive pulmonary diseases based on real-life data
in ’t Veen, Johannes C.C.M.; Dekhuijzen, P.N. Richard; van Heijst, Ellen; Kocks, Janwillem W.H.; Muilwijk-Kroes, Jacqueline B.; Chavannes, Niels H.; van der Molen, Thys
2016-01-01
The aim of this study was to develop and explore the diagnostic accuracy of a decision tree derived from a large real-life primary care population. Data from 9297 primary care patients (45% male, mean age 53±17 years) with suspicion of an obstructive pulmonary disease was derived from an asthma/chronic obstructive pulmonary disease (COPD) service where patients were assessed using spirometry, the Asthma Control Questionnaire, the Clinical COPD Questionnaire, history data and medication use. All patients were diagnosed through the Internet by a pulmonologist. The Chi-squared Automatic Interaction Detection method was used to build the decision tree. The tree was externally validated in another real-life primary care population (n=3215). Our tree correctly diagnosed 79% of the asthma patients, 85% of the COPD patients and 32% of the asthma–COPD overlap syndrome (ACOS) patients. External validation showed a comparable pattern (correct: asthma 78%, COPD 83%, ACOS 24%). Our decision tree is considered to be promising because it was based on real-life primary care patients with a specialist's diagnosis. In most patients the diagnosis could be correctly predicted. Predicting ACOS, however, remained a challenge. The total decision tree can be implemented in computer-assisted diagnostic systems for individual patients. A simplified version of this tree can be used in daily clinical practice as a desk tool. PMID:27730177
Evolving optimised decision rules for intrusion detection using particle swarm paradigm
NASA Astrophysics Data System (ADS)
Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.
2012-12-01
The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.
A Decision Tree for Nonmetric Sex Assessment from the Skull.
Langley, Natalie R; Dudzik, Beatrix; Cloutier, Alesia
2018-01-01
This study uses five well-documented cranial nonmetric traits (glabella, mastoid process, mental eminence, supraorbital margin, and nuchal crest) and one additional trait (zygomatic extension) to develop a validated decision tree for sex assessment. The decision tree was built and cross-validated on a sample of 293 U.S. White individuals from the William M. Bass Donated Skeletal Collection. Ordinal scores from the six traits were analyzed using the partition modeling option in JMP Pro 12. A holdout sample of 50 skulls was used to test the model. The most accurate decision tree includes three variables: glabella, zygomatic extension, and mastoid process. This decision tree yielded 93.5% accuracy on the training sample, 94% on the cross-validated sample, and 96% on a holdout validation sample. Linear weighted kappa statistics indicate acceptable agreement among observers for these variables. Mental eminence should be avoided, and definitions and figures should be referenced carefully to score nonmetric traits. © 2017 American Academy of Forensic Sciences.
A framework for sensitivity analysis of decision trees.
Kamiński, Bogumił; Jakubczyk, Michał; Szufel, Przemysław
2018-01-01
In the paper, we consider sequential decision problems with uncertainty, represented as decision trees. Sensitivity analysis is always a crucial element of decision making and in decision trees it often focuses on probabilities. In the stochastic model considered, the user often has only limited information about the true values of probabilities. We develop a framework for performing sensitivity analysis of optimal strategies accounting for this distributional uncertainty. We design this robust optimization approach in an intuitive and not overly technical way, to make it simple to apply in daily managerial practice. The proposed framework allows for (1) analysis of the stability of the expected-value-maximizing strategy and (2) identification of strategies which are robust with respect to pessimistic/optimistic/mode-favoring perturbations of probabilities. We verify the properties of our approach in two cases: (a) probabilities in a tree are the primitives of the model and can be modified independently; (b) probabilities in a tree reflect some underlying, structural probabilities, and are interrelated. We provide a free software tool implementing the methods described.
Learning accurate very fast decision trees from uncertain data streams
NASA Astrophysics Data System (ADS)
Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo
2015-12-01
Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.
Reconstructing Unrooted Phylogenetic Trees from Symbolic Ternary Metrics.
Grünewald, Stefan; Long, Yangjing; Wu, Yaokun
2018-03-09
Böcker and Dress (Adv Math 138:105-125, 1998) presented a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex coloring of its interior vertices, we can map every triple of three different leaves to the color of its median vertex. We characterize all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its coloring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterizes whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion.
Klein, M D; Rabbani, A B; Rood, K D; Durham, T; Rosenberg, N M; Bahr, M J; Thomas, R L; Langenburg, S E; Kuhns, L R
2001-09-01
The authors compared 3 quantitative methods for assisting clinicians in the differential diagnosis of abdominal pain in children, where the most common important endpoint is whether the patient has appendicitis. Pretest probability in different age and sex groups were determined to perform Bayesian analysis, binary logistic regression was used to determine which variables were statistically significantly likely to contribute to a diagnosis, and recursive partitioning was used to build decision trees with quantitative endpoints. The records of all children (1,208) seen at a large urban emergency department (ED) with a chief complaint of abdominal pain were immediately reviewed retrospectively (24 to 72 hours after the encounter). Attempts were made to contact all the patients' families to determine an accurate final diagnosis. A total of 1,008 (83%) families were contacted. Data were analyzed by calculation of the posttest probability, recursive partitioning, and binary logistic regression. In all groups the most common diagnosis was abdominal pain (ICD-9 Code 789). After this, however, the order of the most common final diagnoses for abdominal pain varied significantly. The entire group had a pretest probability of appendicitis of 0.06. This varied with age and sex from 0.02 in boys 2 to 5 years old to 0.16 in boys older than 12 years. In boys age 5 to 12, recursive partitioning and binary logistic regression agreed on guarding and anorexia as important variables. Guarding and tenderness were important in girls age 5 to 12. In boys age greater than 12, both agreed on guarding and anorexia. Using sensitivities and specificities from the literature, computed tomography improved the posttest probability for the group from.06 to.33; ultrasound improved it from.06 to.48; and barium enema improved it from.06 to.58. Knowing the pretest probabilities in a specific population allows the physician to evaluate the likely diagnoses first. Other quantitative methods can help judge how much importance a certain criterion should have in the decision making and how much a particular test is likely to influence the probability of a correct diagnosis. It now should be possible to make these sophisticated quantitative methods readily available to clinicians via the computer. Copyright 2001 by W.B. Saunders Company.
Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree
2008-04-01
REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music
PCA based feature reduction to improve the accuracy of decision tree c4.5 classification
NASA Astrophysics Data System (ADS)
Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.
2018-03-01
Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.
Chi, Chia-Fen; Tseng, Li-Kai; Jang, Yuh
2012-07-01
Many disabled individuals lack extensive knowledge about assistive technology, which could help them use computers. In 1997, Denis Anson developed a decision tree of 49 evaluative questions designed to evaluate the functional capabilities of the disabled user and choose an appropriate combination of assistive devices, from a selection of 26, that enable the individual to use a computer. In general, occupational therapists guide the disabled users through this process. They often have to go over repetitive questions in order to find an appropriate device. A disabled user may require an alphanumeric entry device, a pointing device, an output device, a performance enhancement device, or some combination of these. Therefore, the current research eliminates redundant questions and divides Anson's decision tree into multiple independent subtrees to meet the actual demand of computer users with disabilities. The modified decision tree was tested by six disabled users to prove it can determine a complete set of assistive devices with a smaller number of evaluative questions. The means to insert new categories of computer-related assistive devices was included to ensure the decision tree can be expanded and updated. The current decision tree can help the disabled users and assistive technology practitioners to find appropriate computer-related assistive devices that meet with clients' individual needs in an efficient manner.
Uncertain decision tree inductive inference
NASA Astrophysics Data System (ADS)
Zarban, L.; Jafari, S.; Fakhrahmad, S. M.
2011-10-01
Induction is the process of reasoning in which general rules are formulated based on limited observations of recurring phenomenal patterns. Decision tree learning is one of the most widely used and practical inductive methods, which represents the results in a tree scheme. Various decision tree algorithms have already been proposed such as CLS, ID3, Assistant C4.5, REPTree and Random Tree. These algorithms suffer from some major shortcomings. In this article, after discussing the main limitations of the existing methods, we introduce a new decision tree induction algorithm, which overcomes all the problems existing in its counterparts. The new method uses bit strings and maintains important information on them. This use of bit strings and logical operation on them causes high speed during the induction process. Therefore, it has several important features: it deals with inconsistencies in data, avoids overfitting and handles uncertainty. We also illustrate more advantages and the new features of the proposed method. The experimental results show the effectiveness of the method in comparison with other methods existing in the literature.
ERIC Educational Resources Information Center
Liu, Tsung-Yu
2016-01-01
This study investigates how educational games impact on students' academic performance and multimedia flow experiences in a computer science course. A curriculum consists of five basic learning units, that is, the stack, queue, sort, tree traversal, and binary search tree, was conducted for 110 university students during one semester. Two groups…
Comparative Issues and Methods in Organizational Diagnosis. Report II. The Decision Tree Approach.
organizational diagnosis . The advantages and disadvantages of the decision-tree approach generally, and in this study specifically, are examined. A pre-test, using a civilian sample of 174 work groups with Survey of Organizations data, was conducted to assess various decision-tree classification criteria, in terms of their similarity to the distance function used by Bowers and Hausser (1977). The results suggested the use of a large developmental sample, which should result in more distinctly defined boundary lines between classification profiles. Also, the decision matrix
Durham, Erin-Elizabeth A; Yu, Xiaxia; Harrison, Robert W
2014-12-01
Effective machine-learning handles large datasets efficiently. One key feature of handling large data is the use of databases such as MySQL. The freeware fuzzy decision tree induction tool, FDT, is a scalable supervised-classification software tool implementing fuzzy decision trees. It is based on an optimized fuzzy ID3 (FID3) algorithm. FDT 2.0 improves upon FDT 1.0 by bridging the gap between data science and data engineering: it combines a robust decisioning tool with data retention for future decisions, so that the tool does not need to be recalibrated from scratch every time a new decision is required. In this paper we briefly review the analytical capabilities of the freeware FDT tool and its major features and functionalities; examples of large biological datasets from HIV, microRNAs and sRNAs are included. This work shows how to integrate fuzzy decision algorithms with modern database technology. In addition, we show that integrating the fuzzy decision tree induction tool with database storage allows for optimal user satisfaction in today's Data Analytics world.
Toward Predicting Social Support Needs in Online Health Social Networks.
Choi, Min-Je; Kim, Sung-Hee; Lee, Sukwon; Kwon, Bum Chul; Yi, Ji Soo; Choo, Jaegul; Huh, Jina
2017-08-02
While online health social networks (OHSNs) serve as an effective platform for patients to fulfill their various social support needs, predicting the needs of users and providing tailored information remains a challenge. The objective of this study was to discriminate important features for identifying users' social support needs based on knowledge gathered from survey data. This study also provides guidelines for a technical framework, which can be used to predict users' social support needs based on raw data collected from OHSNs. We initially conducted a Web-based survey with 184 OHSN users. From this survey data, we extracted 34 features based on 5 categories: (1) demographics, (2) reading behavior, (3) posting behavior, (4) perceived roles in OHSNs, and (5) values sought in OHSNs. Features from the first 4 categories were used as variables for binary classification. For the prediction outcomes, we used features from the last category: the needs for emotional support, experience-based information, unconventional information, and medical facts. We compared 5 binary classifier algorithms: gradient boosting tree, random forest, decision tree, support vector machines, and logistic regression. We then calculated the scores of the area under the receiver operating characteristic (ROC) curve (AUC) to understand the comparative effectiveness of the used features. The best performance was AUC scores of 0.89 for predicting users seeking emotional support, 0.86 for experience-based information, 0.80 for unconventional information, and 0.83 for medical facts. With the gradient boosting tree as our best performing model, we analyzed the strength of individual features in predicting one's social support need. Among other discoveries, we found that users seeking emotional support tend to post more in OHSNs compared with others. We developed an initial framework for automatically predicting social support needs in OHSNs using survey data. Future work should involve nonsurvey data to evaluate the feasibility of the framework. Our study contributes to providing personalized social support in OHSNs. ©Min-Je Choi, Sung-Hee Kim, Sukwon Lee, Bum Chul Kwon, Ji Soo Yi, Jaegul Choo, Jina Huh. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 02.08.2017.
Lee, Saro; Park, Inhye
2013-09-30
Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. Copyright © 2013. Published by Elsevier Ltd.
MRI-based decision tree model for diagnosis of biliary atresia.
Kim, Yong Hee; Kim, Myung-Joon; Shin, Hyun Joo; Yoon, Haesung; Han, Seok Joo; Koh, Hong; Roh, Yun Ho; Lee, Mi-Jung
2018-02-23
To evaluate MRI findings and to generate a decision tree model for diagnosis of biliary atresia (BA) in infants with jaundice. We retrospectively reviewed features of MRI and ultrasonography (US) performed in infants with jaundice between January 2009 and June 2016 under approval of the institutional review board, including the maximum diameter of periportal signal change on MRI (MR triangular cord thickness, MR-TCT) or US (US-TCT), visibility of common bile duct (CBD) and abnormality of gallbladder (GB). Hepatic subcapsular flow was reviewed on Doppler US. We performed conditional inference tree analysis using MRI findings to generate a decision tree model. A total of 208 infants were included, 112 in the BA group and 96 in the non-BA group. Mean age at the time of MRI was 58.7 ± 36.6 days. Visibility of CBD, abnormality of GB and MR-TCT were good discriminators for the diagnosis of BA and the MRI-based decision tree using these findings with MR-TCT cut-off 5.1 mm showed 97.3 % sensitivity, 94.8 % specificity and 96.2 % accuracy. MRI-based decision tree model reliably differentiates BA in infants with jaundice. MRI can be an objective imaging modality for the diagnosis of BA. • MRI-based decision tree model reliably differentiates biliary atresia in neonatal cholestasis. • Common bile duct, gallbladder and periportal signal changes are the discriminators. • MRI has comparable performance to ultrasonography for diagnosis of biliary atresia.
Satomi, Junichiro; Ghaibeh, A Ammar; Moriguchi, Hiroki; Nagahiro, Shinji
2015-07-01
The severity of clinical signs and symptoms of cranial dural arteriovenous fistulas (DAVFs) are well correlated with their pattern of venous drainage. Although the presence of cortical venous drainage can be considered a potential predictor of aggressive DAVF behaviors, such as intracranial hemorrhage or progressive neurological deficits due to venous congestion, accurate statistical analyses are currently not available. Using a decision tree data mining method, the authors aimed at clarifying the predictability of the future development of aggressive behaviors of DAVF and at identifying the main causative factors. Of 266 DAVF patients, 89 were eligible for analysis. Under observational management, 51 patients presented with intracranial hemorrhage/infarction during the follow-up period. The authors created a decision tree able to assess the risk for the development of aggressive DAVF behavior. Evaluated by 10-fold cross-validation, the decision tree's accuracy, sensitivity, and specificity were 85.28%, 88.33%, and 80.83%, respectively. The tree shows that the main factor in symptomatic patients was the presence of cortical venous drainage. In its absence, the lesion location determined the risk of a DAVF developing aggressive behavior. Decision tree analysis accurately predicts the future development of aggressive DAVF behavior.
Park, Myonghwa; Choi, Sora; Shin, A Mi; Koo, Chul Hoi
2013-02-01
The purpose of this study was to develop a prediction model for the characteristics of older adults with depression using the decision tree method. A large dataset from the 2008 Korean Elderly Survey was used and data of 14,970 elderly people were analyzed. Target variable was depression and 53 input variables were general characteristics, family & social relationship, economic status, health status, health behavior, functional status, leisure & social activity, quality of life, and living environment. Data were analyzed by decision tree analysis, a data mining technique using SPSS Window 19.0 and Clementine 12.0 programs. The decision trees were classified into five different rules to define the characteristics of older adults with depression. Classification & Regression Tree (C&RT) showed the best prediction with an accuracy of 80.81% among data mining models. Factors in the rules were life satisfaction, nutritional status, daily activity difficulty due to pain, functional limitation for basic or instrumental daily activities, number of chronic diseases and daily activity difficulty due to disease. The different rules classified by the decision tree model in this study should contribute as baseline data for discovering informative knowledge and developing interventions tailored to these individual characteristics.
Applied Swarm-based medicine: collecting decision trees for patterns of algorithms analysis.
Panje, Cédric M; Glatzer, Markus; von Rappard, Joscha; Rothermundt, Christian; Hundsberger, Thomas; Zumstein, Valentin; Plasswilm, Ludwig; Putora, Paul Martin
2017-08-16
The objective consensus methodology has recently been applied in consensus finding in several studies on medical decision-making among clinical experts or guidelines. The main advantages of this method are an automated analysis and comparison of treatment algorithms of the participating centers which can be performed anonymously. Based on the experience from completed consensus analyses, the main steps for the successful implementation of the objective consensus methodology were identified and discussed among the main investigators. The following steps for the successful collection and conversion of decision trees were identified and defined in detail: problem definition, population selection, draft input collection, tree conversion, criteria adaptation, problem re-evaluation, results distribution and refinement, tree finalisation, and analysis. This manuscript provides information on the main steps for successful collection of decision trees and summarizes important aspects at each point of the analysis.
Shao, Q; Rowe, R C; York, P
2007-06-01
Understanding of the cause-effect relationships between formulation ingredients, process conditions and product properties is essential for developing a quality product. However, the formulation knowledge is often hidden in experimental data and not easily interpretable. This study compares neurofuzzy logic and decision tree approaches in discovering hidden knowledge from an immediate release tablet formulation database relating formulation ingredients (silica aerogel, magnesium stearate, microcrystalline cellulose and sodium carboxymethylcellulose) and process variables (dwell time and compression force) to tablet properties (tensile strength, disintegration time, friability, capping and drug dissolution at various time intervals). Both approaches successfully generated useful knowledge in the form of either "if then" rules or decision trees. Although different strategies are employed by the two approaches in generating rules/trees, similar knowledge was discovered in most cases. However, as decision trees are not able to deal with continuous dependent variables, data discretisation procedures are generally required.
Parallel object-oriented decision tree system
Kamath,; Chandrika, Cantu-Paz [Dublin, CA; Erick, [Oakland, CA
2006-02-28
A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.
EEG feature selection method based on decision tree.
Duan, Lijuan; Ge, Hui; Ma, Wei; Miao, Jun
2015-01-01
This paper aims to solve automated feature selection problem in brain computer interface (BCI). In order to automate feature selection process, we proposed a novel EEG feature selection method based on decision tree (DT). During the electroencephalogram (EEG) signal processing, a feature extraction method based on principle component analysis (PCA) was used, and the selection process based on decision tree was performed by searching the feature space and automatically selecting optimal features. Considering that EEG signals are a series of non-linear signals, a generalized linear classifier named support vector machine (SVM) was chosen. In order to test the validity of the proposed method, we applied the EEG feature selection method based on decision tree to BCI Competition II datasets Ia, and the experiment showed encouraging results.
Collell, Guillem; Prelec, Drazen; Patil, Kaustubh R
2018-01-31
Class imbalance presents a major hurdle in the application of classification methods. A commonly taken approach is to learn ensembles of classifiers using rebalanced data. Examples include bootstrap averaging (bagging) combined with either undersampling or oversampling of the minority class examples. However, rebalancing methods entail asymmetric changes to the examples of different classes, which in turn can introduce their own biases. Furthermore, these methods often require specifying the performance measure of interest a priori, i.e., before learning. An alternative is to employ the threshold moving technique, which applies a threshold to the continuous output of a model, offering the possibility to adapt to a performance measure a posteriori , i.e., a plug-in method. Surprisingly, little attention has been paid to this combination of a bagging ensemble and threshold-moving. In this paper, we study this combination and demonstrate its competitiveness. Contrary to the other resampling methods, we preserve the natural class distribution of the data resulting in well-calibrated posterior probabilities. Additionally, we extend the proposed method to handle multiclass data. We validated our method on binary and multiclass benchmark data sets by using both, decision trees and neural networks as base classifiers. We perform analyses that provide insights into the proposed method.
Jarnevich, Catherine S.; Talbert, Marian; Morisette, Jeffrey T.; Aldridge, Cameron L.; Brown, Cynthia; Kumar, Sunil; Manier, Daniel; Talbert, Colin; Holcombe, Tracy R.
2017-01-01
Evaluating the conditions where a species can persist is an important question in ecology both to understand tolerances of organisms and to predict distributions across landscapes. Presence data combined with background or pseudo-absence locations are commonly used with species distribution modeling to develop these relationships. However, there is not a standard method to generate background or pseudo-absence locations, and method choice affects model outcomes. We evaluated combinations of both model algorithms (simple and complex generalized linear models, multivariate adaptive regression splines, Maxent, boosted regression trees, and random forest) and background methods (random, minimum convex polygon, and continuous and binary kernel density estimator (KDE)) to assess the sensitivity of model outcomes to choices made. We evaluated six questions related to model results, including five beyond the common comparison of model accuracy assessment metrics (biological interpretability of response curves, cross-validation robustness, independent data accuracy and robustness, and prediction consistency). For our case study with cheatgrass in the western US, random forest was least sensitive to background choice and the binary KDE method was least sensitive to model algorithm choice. While this outcome may not hold for other locations or species, the methods we used can be implemented to help determine appropriate methodologies for particular research questions.
Calibrating emergent phenomena in stock markets with agent based models
Sornette, Didier
2018-01-01
Since the 2008 financial crisis, agent-based models (ABMs), which account for out-of-equilibrium dynamics, heterogeneous preferences, time horizons and strategies, have often been envisioned as the new frontier that could revolutionise and displace the more standard models and tools in economics. However, their adoption and generalisation is drastically hindered by the absence of general reliable operational calibration methods. Here, we start with a different calibration angle that qualifies an ABM for its ability to achieve abnormal trading performance with respect to the buy-and-hold strategy when fed with real financial data. Starting from the common definition of standard minority and majority agents with binary strategies, we prove their equivalence to optimal decision trees. This efficient representation allows us to exhaustively test all meaningful single agent models for their potential anomalous investment performance, which we apply to the NASDAQ Composite index over the last 20 years. We uncover large significant predictive power, with anomalous Sharpe ratio and directional accuracy, in particular during the dotcom bubble and crash and the 2008 financial crisis. A principal component analysis reveals transient convergence between the anomalous minority and majority models. A novel combination of the optimal single-agent models of both classes into a two-agents model leads to remarkable superior investment performance, especially during the periods of bubbles and crashes. Our design opens the field of ABMs to construct novel types of advanced warning systems of market crises, based on the emergent collective intelligence of ABMs built on carefully designed optimal decision trees that can be reversed engineered from real financial data. PMID:29499049
Calibrating emergent phenomena in stock markets with agent based models.
Fievet, Lucas; Sornette, Didier
2018-01-01
Since the 2008 financial crisis, agent-based models (ABMs), which account for out-of-equilibrium dynamics, heterogeneous preferences, time horizons and strategies, have often been envisioned as the new frontier that could revolutionise and displace the more standard models and tools in economics. However, their adoption and generalisation is drastically hindered by the absence of general reliable operational calibration methods. Here, we start with a different calibration angle that qualifies an ABM for its ability to achieve abnormal trading performance with respect to the buy-and-hold strategy when fed with real financial data. Starting from the common definition of standard minority and majority agents with binary strategies, we prove their equivalence to optimal decision trees. This efficient representation allows us to exhaustively test all meaningful single agent models for their potential anomalous investment performance, which we apply to the NASDAQ Composite index over the last 20 years. We uncover large significant predictive power, with anomalous Sharpe ratio and directional accuracy, in particular during the dotcom bubble and crash and the 2008 financial crisis. A principal component analysis reveals transient convergence between the anomalous minority and majority models. A novel combination of the optimal single-agent models of both classes into a two-agents model leads to remarkable superior investment performance, especially during the periods of bubbles and crashes. Our design opens the field of ABMs to construct novel types of advanced warning systems of market crises, based on the emergent collective intelligence of ABMs built on carefully designed optimal decision trees that can be reversed engineered from real financial data.
Mearelli, Filippo; Fiotti, Nicola; Altamura, Nicola; Zanetti, Michela; Fernandes, Giovanni; Burekovic, Ismet; Occhipinti, Alessandro; Orso, Daniele; Giansante, Carlo; Casarsa, Chiara; Biolo, Gianni
2014-10-01
The objective of the study was to determine the accuracy of phospholipase A2 group II (PLA2-II), interferon-gamma-inducible protein 10 (IP-10), angiopoietin-2 (Ang-2), and procalcitonin (PCT) plasma levels in early ruling in/out of sepsis among systemic inflammatory response syndrome (SIRS) patients. Biomarker levels were determined in 80 SIRS patients during the first 4 h of admission to the medical ward. The final diagnosis of sepsis or non-infective SIRS was issued according to good clinical practice. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for sepsis diagnosis were assessed. The optimal biomarker combinations with clinical variables were investigated by logistic regression and decision tree (CART). PLA2-II, IP-10 and PCT, but not Ang-2, were significantly higher in septic (n = 60) than in non-infective SIRS (n = 20) patients (P ≤ 0.001, 0.027, and 0.002, respectively). PLA2-II PPV and NPV were 88 and 86%, respectively. The corresponding figures were 100 and 31% for IP-10, and 93 and 35% for PCT. Binary logistic regression model had 100% PPV and NPV, while manual and software-generated CART reached an overall accuracy of 95 and 98%, respectively, both with 100% NPV. PLA2-II and IP-10 associated with clinical variables in regression or decision tree heterogeneous models may be valuable biomarkers for sepsis diagnosis in SIRS patients admitted to medical ward (MW). Further studies are needed to introduce them into clinical practice.
The Decision Tree for Teaching Management of Uncertainty
ERIC Educational Resources Information Center
Knaggs, Sara J.; And Others
1974-01-01
A 'decision tree' consists of an outline of the patient's symptoms and a logic for decision and action. It is felt that this approach to the decisionmaking process better facilitates each learner's application of his own level of knowledge and skills. (Author)
Predicting metabolic syndrome using decision tree and support vector machine methods.
Karimi-Alavijeh, Farzaneh; Jalili, Saeed; Sadeghi, Masoumeh
2016-05-01
Metabolic syndrome which underlies the increased prevalence of cardiovascular disease and Type 2 diabetes is considered as a group of metabolic abnormalities including central obesity, hypertriglyceridemia, glucose intolerance, hypertension, and dyslipidemia. Recently, artificial intelligence based health-care systems are highly regarded because of its success in diagnosis, prediction, and choice of treatment. This study employs machine learning technics for predict the metabolic syndrome. This study aims to employ decision tree and support vector machine (SVM) to predict the 7-year incidence of metabolic syndrome. This research is a practical one in which data from 2107 participants of Isfahan Cohort Study has been utilized. The subjects without metabolic syndrome according to the ATPIII criteria were selected. The features that have been used in this data set include: gender, age, weight, body mass index, waist circumference, waist-to-hip ratio, hip circumference, physical activity, smoking, hypertension, antihypertensive medication use, systolic blood pressure (BP), diastolic BP, fasting blood sugar, 2-hour blood glucose, triglycerides (TGs), total cholesterol, low-density lipoprotein, high density lipoprotein-cholesterol, mean corpuscular volume, and mean corpuscular hemoglobin. Metabolic syndrome was diagnosed based on ATPIII criteria and two methods of decision tree and SVM were selected to predict the metabolic syndrome. The criteria of sensitivity, specificity and accuracy were used for validation. SVM and decision tree methods were examined according to the criteria of sensitivity, specificity and accuracy. Sensitivity, specificity and accuracy were 0.774 (0.758), 0.74 (0.72) and 0.757 (0.739) in SVM (decision tree) method. The results show that SVM method sensitivity, specificity and accuracy is more efficient than decision tree. The results of decision tree method show that the TG is the most important feature in predicting metabolic syndrome. According to this study, in cases where only the final result of the decision is regarded significant, SVM method can be used with acceptable accuracy in decision making medical issues. This method has not been implemented in the previous research.
NASA Astrophysics Data System (ADS)
Baker, Paul T.; Caudill, Sarah; Hodge, Kari A.; Talukder, Dipongkar; Capano, Collin; Cornish, Neil J.
2015-03-01
Searches for gravitational waves produced by coalescing black hole binaries with total masses ≳25 M⊙ use matched filtering with templates of short duration. Non-Gaussian noise bursts in gravitational wave detector data can mimic short signals and limit the sensitivity of these searches. Previous searches have relied on empirically designed statistics incorporating signal-to-noise ratio and signal-based vetoes to separate gravitational wave candidates from noise candidates. We report on sensitivity improvements achieved using a multivariate candidate ranking statistic derived from a supervised machine learning algorithm. We apply the random forest of bagged decision trees technique to two separate searches in the high mass (≳25 M⊙ ) parameter space. For a search which is sensitive to gravitational waves from the inspiral, merger, and ringdown of binary black holes with total mass between 25 M⊙ and 100 M⊙ , we find sensitive volume improvements as high as 70±13%-109±11% when compared to the previously used ranking statistic. For a ringdown-only search which is sensitive to gravitational waves from the resultant perturbed intermediate mass black hole with mass roughly between 10 M⊙ and 600 M⊙ , we find sensitive volume improvements as high as 61±4%-241±12% when compared to the previously used ranking statistic. We also report how sensitivity improvements can differ depending on mass regime, mass ratio, and available data quality information. Finally, we describe the techniques used to tune and train the random forest classifier that can be generalized to its use in other searches for gravitational waves.
QTest: Quantitative Testing of Theories of Binary Choice
Regenwetter, Michel; Davis-Stober, Clintin P.; Lim, Shiau Hong; Guo, Ying; Popova, Anna; Zwilling, Chris; Cha, Yun-Shil; Messner, William
2014-01-01
The goal of this paper is to make modeling and quantitative testing accessible to behavioral decision researchers interested in substantive questions. We provide a novel, rigorous, yet very general, quantitative diagnostic framework for testing theories of binary choice. This permits the nontechnical scholar to proceed far beyond traditionally rather superficial methods of analysis, and it permits the quantitatively savvy scholar to triage theoretical proposals before investing effort into complex and specialized quantitative analyses. Our theoretical framework links static algebraic decision theory with observed variability in behavioral binary choice data. The paper is supplemented with a custom-designed public-domain statistical analysis package, the QTest software. We illustrate our approach with a quantitative analysis using published laboratory data, including tests of novel versions of “Random Cumulative Prospect Theory.” A major asset of the approach is the potential to distinguish decision makers who have a fixed preference and commit errors in observed choices from decision makers who waver in their preferences. PMID:24999495
Universal artifacts affect the branching of phylogenetic trees, not universal scaling laws.
Altaba, Cristian R
2009-01-01
The superficial resemblance of phylogenetic trees to other branching structures allows searching for macroevolutionary patterns. However, such trees are just statistical inferences of particular historical events. Recent meta-analyses report finding regularities in the branching pattern of phylogenetic trees. But is this supported by evidence, or are such regularities just methodological artifacts? If so, is there any signal in a phylogeny? In order to evaluate the impact of polytomies and imbalance on tree shape, the distribution of all binary and polytomic trees of up to 7 taxa was assessed in tree-shape space. The relationship between the proportion of outgroups and the amount of imbalance introduced with them was assessed applying four different tree-building methods to 100 combinations from a set of 10 ingroup and 9 outgroup species, and performing covariance analyses. The relevance of this analysis was explored taking 61 published phylogenies, based on nucleic acid sequences and involving various taxa, taxonomic levels, and tree-building methods. All methods of phylogenetic inference are quite sensitive to the artifacts introduced by outgroups. However, published phylogenies appear to be subject to a rather effective, albeit rather intuitive control against such artifacts. The data and methods used to build phylogenetic trees are varied, so any meta-analysis is subject to pitfalls due to their uneven intrinsic merits, which translate into artifacts in tree shape. The binary branching pattern is an imposition of methods, and seldom reflects true relationships in intraspecific analyses, yielding artifactual polytomies in short trees. Above the species level, the departure of real trees from simplistic random models is caused at least by two natural factors--uneven speciation and extinction rates; and artifacts such as choice of taxa included in the analysis, and imbalance introduced by outgroups and basal paraphyletic taxa. This artifactual imbalance accounts for tree shape convergence of large trees. There is no evidence for any universal scaling in the tree of life. Instead, there is a need for improved methods of tree analysis that can be used to discriminate the noise due to outgroups from the phylogenetic signal within the taxon of interest, and to evaluate realistic models of evolution, correcting the retrospective perspective and explicitly recognizing extinction as a driving force. Artifacts are pervasive, and can only be overcome through understanding the structure and biological meaning of phylogenetic trees. Catalan Abstract in Translation S1.
Cost-effectiveness Analysis with Influence Diagrams.
Arias, M; Díez, F J
2015-01-01
Cost-effectiveness analysis (CEA) is used increasingly in medicine to determine whether the health benefit of an intervention is worth the economic cost. Decision trees, the standard decision modeling technique for non-temporal domains, can only perform CEA for very small problems. To develop a method for CEA in problems involving several dozen variables. We explain how to build influence diagrams (IDs) that explicitly represent cost and effectiveness. We propose an algorithm for evaluating cost-effectiveness IDs directly, i.e., without expanding an equivalent decision tree. The evaluation of an ID returns a set of intervals for the willingness to pay - separated by cost-effectiveness thresholds - and, for each interval, the cost, the effectiveness, and the optimal intervention. The algorithm that evaluates the ID directly is in general much more efficient than the brute-force method, which is in turn more efficient than the expansion of an equivalent decision tree. Using OpenMarkov, an open-source software tool that implements this algorithm, we have been able to perform CEAs on several IDs whose equivalent decision trees contain millions of branches. IDs can perform CEA on large problems that cannot be analyzed with decision trees.
ERIC Educational Resources Information Center
Chen, Gwo-Dong; Liu, Chen-Chung; Ou, Kuo-Liang; Liu, Baw-Jhiune
2000-01-01
Discusses the use of Web logs to record student behavior that can assist teachers in assessing performance and making curriculum decisions for distance learning students who are using Web-based learning systems. Adopts decision tree and data cube information processing methodologies for developing more effective pedagogical strategies. (LRW)
Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database
DOE Office of Scientific and Technical Information (OSTI.GOV)
Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.
2013-09-01
This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the setmore » of predictors to around 8 factors that can be validated using reputable medical and public health resources.« less
Multistage classification of multispectral Earth observational data: The design approach
NASA Technical Reports Server (NTRS)
Bauer, M. E. (Principal Investigator); Muasher, M. J.; Landgrebe, D. A.
1981-01-01
An algorithm is proposed which predicts the optimal features at every node in a binary tree procedure. The algorithm estimates the probability of error by approximating the area under the likelihood ratio function for two classes and taking into account the number of training samples used in estimating each of these two classes. Some results on feature selection techniques, particularly in the presence of a very limited set of training samples, are presented. Results comparing probabilities of error predicted by the proposed algorithm as a function of dimensionality as compared to experimental observations are shown for aircraft and LANDSAT data. Results are obtained for both real and simulated data. Finally, two binary tree examples which use the algorithm are presented to illustrate the usefulness of the procedure.
Assessing School Readiness for a Practice Arrangement Using Decision Tree Methodology.
ERIC Educational Resources Information Center
Barger, Sara E.
1998-01-01
Questions in a decision-tree address mission, faculty interest, administrative support, and practice plan as a way of assessing arrangements for nursing faculty's clinical practice. Decisions should be based on congruence between the human resource allocation and the reward systems. (SK)
Automated Decision Tree Classification of Corneal Shape
Twa, Michael D.; Parthasarathy, Srinivasan; Roberts, Cynthia; Mahmoud, Ashraf M.; Raasch, Thomas W.; Bullimore, Mark A.
2011-01-01
Purpose The volume and complexity of data produced during videokeratography examinations present a challenge of interpretation. As a consequence, results are often analyzed qualitatively by subjective pattern recognition or reduced to comparisons of summary indices. We describe the application of decision tree induction, an automated machine learning classification method, to discriminate between normal and keratoconic corneal shapes in an objective and quantitative way. We then compared this method with other known classification methods. Methods The corneal surface was modeled with a seventh-order Zernike polynomial for 132 normal eyes of 92 subjects and 112 eyes of 71 subjects diagnosed with keratoconus. A decision tree classifier was induced using the C4.5 algorithm, and its classification performance was compared with the modified Rabinowitz–McDonnell index, Schwiegerling’s Z3 index (Z3), Keratoconus Prediction Index (KPI), KISA%, and Cone Location and Magnitude Index using recommended classification thresholds for each method. We also evaluated the area under the receiver operator characteristic (ROC) curve for each classification method. Results Our decision tree classifier performed equal to or better than the other classifiers tested: accuracy was 92% and the area under the ROC curve was 0.97. Our decision tree classifier reduced the information needed to distinguish between normal and keratoconus eyes using four of 36 Zernike polynomial coefficients. The four surface features selected as classification attributes by the decision tree method were inferior elevation, greater sagittal depth, oblique toricity, and trefoil. Conclusions Automated decision tree classification of corneal shape through Zernike polynomials is an accurate quantitative method of classification that is interpretable and can be generated from any instrument platform capable of raw elevation data output. This method of pattern classification is extendable to other classification problems. PMID:16357645
Surucu, Murat; Shah, Karan K; Mescioglu, Ibrahim; Roeske, John C; Small, William; Choi, Mehee; Emami, Bahman
2016-02-01
To develop decision trees predicting for tumor volume reduction in patients with head and neck (H&N) cancer using pretreatment clinical and pathological parameters. Forty-eight patients treated with definitive concurrent chemoradiotherapy for squamous cell carcinoma of the nasopharynx, oropharynx, oral cavity, or hypopharynx were retrospectively analyzed. These patients were rescanned at a median dose of 37.8 Gy and replanned to account for anatomical changes. The percentages of gross tumor volume (GTV) change from initial to rescan computed tomography (CT; %GTVΔ) were calculated. Two decision trees were generated to correlate %GTVΔ in primary and nodal volumes with 14 characteristics including age, gender, Karnofsky performance status (KPS), site, human papilloma virus (HPV) status, tumor grade, primary tumor growth pattern (endophytic/exophytic), tumor/nodal/group stages, chemotherapy regimen, and primary, nodal, and total GTV volumes in the initial CT scan. The C4.5 Decision Tree induction algorithm was implemented. The median %GTVΔ for primary, nodal, and total GTVs was 26.8%, 43.0%, and 31.2%, respectively. Type of chemotherapy, age, primary tumor growth pattern, site, KPS, and HPV status were the most predictive parameters for primary %GTVΔ decision tree, whereas for nodal %GTVΔ, KPS, site, age, primary tumor growth pattern, initial primary GTV, and total GTV volumes were predictive. Both decision trees had an accuracy of 88%. There can be significant changes in primary and nodal tumor volumes during the course of H&N chemoradiotherapy. Considering the proposed decision trees, radiation oncologists can select patients predicted to have high %GTVΔ, who would theoretically gain the most benefit from adaptive radiotherapy, in order to better use limited clinical resources. © The Author(s) 2015.
An ordinal classification approach for CTG categorization.
Georgoulas, George; Karvelis, Petros; Gavrilis, Dimitris; Stylios, Chrysostomos D; Nikolakopoulos, George
2017-07-01
Evaluation of cardiotocogram (CTG) is a standard approach employed during pregnancy and delivery. But, its interpretation requires high level expertise to decide whether the recording is Normal, Suspicious or Pathological. Therefore, a number of attempts have been carried out over the past three decades for development automated sophisticated systems. These systems are usually (multiclass) classification systems that assign a category to the respective CTG. However most of these systems usually do not take into consideration the natural ordering of the categories associated with CTG recordings. In this work, an algorithm that explicitly takes into consideration the ordering of CTG categories, based on binary decomposition method, is investigated. Achieved results, using as a base classifier the C4.5 decision tree classifier, prove that the ordinal classification approach is marginally better than the traditional multiclass classification approach, which utilizes the standard C4.5 algorithm for several performance criteria.
Multi-Sensor Characterization of the Boreal Forest: Initial Findings
NASA Technical Reports Server (NTRS)
Reith, Ernest; Roberts, Dar A.; Prentiss, Dylan
2001-01-01
Results are presented in an initial apriori knowledge approach toward using complementary multi-sensor multi-temporal imagery in characterizing vegetated landscapes over a site in the Boreal Ecosystem-Atmosphere Study (BOREAS). Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and Airborne Synthetic Aperture Radar (AIRSAR) data were segmented using multiple endmember spectral mixture analysis and binary decision tree approaches. Individual date/sensor land cover maps had overall accuracies between 55.0% - 69.8%. The best eight land cover layers from all dates and sensors correctly characterized 79.3% of the cover types. An overlay approach was used to create a final land cover map. An overall accuracy of 71.3% was achieved in this multi-sensor approach, a 1.5% improvement over our most accurate single scene technique, but 8% less than the original input. Black spruce was evaluated to be particularly undermapped in the final map possibly because it was also contained within jack pine and muskeg land coverages.
Mental Effort in Binary Categorization Aided by Binary Cues
ERIC Educational Resources Information Center
Botzer, Assaf; Meyer, Joachim; Parmet, Yisrael
2013-01-01
Binary cueing systems assist in many tasks, often alerting people about potential hazards (such as alarms and alerts). We investigate whether cues, besides possibly improving decision accuracy, also affect the effort users invest in tasks and whether the required effort in tasks affects the responses to cues. We developed a novel experimental tool…
On Parallelism and the Penman Natural Language Generation System.
1988-04-01
TagfiniteA Tagsubject L untag ed Figure 2-2: System network with choosers & realization statements 7 decision . We will give a more detailed account of...2: enter the current system. The chooser of the system is in charge of * selection of features. The chooser is itself a decision tree with certain...organization of a chooser is the same as a decision (discrimination) tree, and each branching point in the tree is defined by Ask operation. For example, in
Permutation parity machines for neural cryptography.
Reyes, Oscar Mauricio; Zimmermann, Karl-Heinz
2010-06-01
Recently, synchronization was proved for permutation parity machines, multilayer feed-forward neural networks proposed as a binary variant of the tree parity machines. This ability was already used in the case of tree parity machines to introduce a key-exchange protocol. In this paper, a protocol based on permutation parity machines is proposed and its performance against common attacks (simple, geometric, majority and genetic) is studied.
Permutation parity machines for neural cryptography
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reyes, Oscar Mauricio; Escuela de Ingenieria Electrica, Electronica y Telecomunicaciones, Universidad Industrial de Santander, Bucaramanga; Zimmermann, Karl-Heinz
2010-06-15
Recently, synchronization was proved for permutation parity machines, multilayer feed-forward neural networks proposed as a binary variant of the tree parity machines. This ability was already used in the case of tree parity machines to introduce a key-exchange protocol. In this paper, a protocol based on permutation parity machines is proposed and its performance against common attacks (simple, geometric, majority and genetic) is studied.
Nonbinary Tree-Based Phylogenetic Networks.
Jetten, Laura; van Iersel, Leo
2018-01-01
Rooted phylogenetic networks are used to describe evolutionary histories that contain non-treelike evolutionary events such as hybridization and horizontal gene transfer. In some cases, such histories can be described by a phylogenetic base-tree with additional linking arcs, which can, for example, represent gene transfer events. Such phylogenetic networks are called tree-based. Here, we consider two possible generalizations of this concept to nonbinary networks, which we call tree-based and strictly-tree-based nonbinary phylogenetic networks. We give simple graph-theoretic characterizations of tree-based and strictly-tree-based nonbinary phylogenetic networks. Moreover, we show for each of these two classes that it can be decided in polynomial time whether a given network is contained in the class. Our approach also provides a new view on tree-based binary phylogenetic networks. Finally, we discuss two examples of nonbinary phylogenetic networks in biology and show how our results can be applied to them.
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, P.; Beaudet, P.
1980-01-01
The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.
Evaluation of Decision Trees for Cloud Detection from AVHRR Data
NASA Technical Reports Server (NTRS)
Shiffman, Smadar; Nemani, Ramakrishna
2005-01-01
Automated cloud detection and tracking is an important step in assessing changes in radiation budgets associated with global climate change via remote sensing. Data products based on satellite imagery are available to the scientific community for studying trends in the Earth's atmosphere. The data products include pixel-based cloud masks that assign cloud-cover classifications to pixels. Many cloud-mask algorithms have the form of decision trees. The decision trees employ sequential tests that scientists designed based on empirical astrophysics studies and simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In a previous study we compared automatically learned decision trees to cloud masks included in Advanced Very High Resolution Radiometer (AVHRR) data products from the year 2000. In this paper we report the replication of the study for five-year data, and for a gold standard based on surface observations performed by scientists at weather stations in the British Islands. For our sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks p < 0.001.
Chen, Hsiu-Chin; Bennett, Sean
2016-08-01
Little evidence shows the use of decision-tree algorithms in identifying predictors and analyzing their associations with pass rates for the NCLEX-RN(®) in associate degree nursing students. This longitudinal and retrospective cohort study investigated whether a decision-tree algorithm could be used to develop an accurate prediction model for the students' passing or failing the NCLEX-RN. This study used archived data from 453 associate degree nursing students in a selected program. The chi-squared automatic interaction detection analysis of the decision trees module was used to examine the effect of the collected predictors on passing/failing the NCLEX-RN. The actual percentage scores of Assessment Technologies Institute®'s RN Comprehensive Predictor(®) accurately identified students at risk of failing. The classification model correctly classified 92.7% of the students for passing. This study applied the decision-tree model to analyze a sequence database for developing a prediction model for early remediation in preparation for the NCLEXRN. [J Nurs Educ. 2016;55(8):454-457.]. Copyright 2016, SLACK Incorporated.
NASA Astrophysics Data System (ADS)
Sheikh, Alireza; Amat, Alexandre Graell i.; Liva, Gianluigi
2017-12-01
We analyze the achievable information rates (AIRs) for coded modulation schemes with QAM constellations with both bit-wise and symbol-wise decoders, corresponding to the case where a binary code is used in combination with a higher-order modulation using the bit-interleaved coded modulation (BICM) paradigm and to the case where a nonbinary code over a field matched to the constellation size is used, respectively. In particular, we consider hard decision decoding, which is the preferable option for fiber-optic communication systems where decoding complexity is a concern. Recently, Liga \\emph{et al.} analyzed the AIRs for bit-wise and symbol-wise decoders considering what the authors called \\emph{hard decision decoder} which, however, exploits \\emph{soft information} of the transition probabilities of discrete-input discrete-output channel resulting from the hard detection. As such, the complexity of the decoder is essentially the same as the complexity of a soft decision decoder. In this paper, we analyze instead the AIRs for the standard hard decision decoder, commonly used in practice, where the decoding is based on the Hamming distance metric. We show that if standard hard decision decoding is used, bit-wise decoders yield significantly higher AIRs than symbol-wise decoders. As a result, contrary to the conclusion by Liga \\emph{et al.}, binary decoders together with the BICM paradigm are preferable for spectrally-efficient fiber-optic systems. We also design binary and nonbinary staircase codes and show that, in agreement with the AIRs, binary codes yield better performance.
Sequential decision tree using the analytic hierarchy process for decision support in rectal cancer.
Suner, Aslı; Çelikoğlu, Can Cengiz; Dicle, Oğuz; Sökmen, Selman
2012-09-01
The aim of the study is to determine the most appropriate method for construction of a sequential decision tree in the management of rectal cancer, using various patient-specific criteria and treatments such as surgery, chemotherapy, and radiotherapy. An analytic hierarchy process (AHP) was used to determine the priorities of variables. Relevant criteria used in two decision steps and their relative priorities were established by a panel of five general surgeons. Data were collected via a web-based application and analyzed using the "Expert Choice" software specifically developed for the AHP. Consistency ratios in the AHP method were calculated for each set of judgments, and the priorities of sub-criteria were determined. A sequential decision tree was constructed for the best treatment decision process, using priorities determined by the AHP method. Consistency ratios in the AHP method were calculated for each decision step, and the judgments were considered consistent. The tumor-related criterion "presence of perforation" (0.331) and the patient-surgeon-related criterion "surgeon's experience" (0.630) had the highest priority in the first decision step. In the second decision step, the tumor-related criterion "the stage of the disease" (0.230) and the patient-surgeon-related criterion "surgeon's experience" (0.281) were the paramount criteria. The results showed some variation in the ranking of criteria between the decision steps. In the second decision step, for instance, the tumor-related criterion "presence of perforation" was just the fifth. The consistency of decision support systems largely depends on the quality of the underlying decision tree. When several choices and variables have to be considered in a decision, it is very important to determine priorities. The AHP method seems to be effective for this purpose. The decision algorithm developed by this method is more realistic and will improve the quality of the decision tree. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Beverly, D.; Ewers, B. E.; Hyde, K.; Ohara, N.; Speckman, H. N.
2015-12-01
High elevation watersheds of the Rocky Mountains region contribute over 70% of the streamflow needed for infrastructure, agriculture, and ecological processes. Snow-water yields are heterogeneous in space and time and are driven by a multitude of snow distribution processes, including snowpack evolution driven by physical and biological factors. Quantifying heterogeneity of snowpack is further complicated by vegetation perturbations; much of the Rocky Mountains have experienced significant tree mortality due to bark beetle outbreaks. Reduction of living crown area decreases canopy interception while increasing radiation to snow surfaces, which alters snowpack distribution throughout the catchment. We hypothesize that, in a complex watershed, topographic variation (i.e., slope and aspect) will have a greater effect on snowpack evolution and distribution than densities of canopy mortality due to beetle infestation. The 120 ha No Name watershed, located in southern Wyoming at 3000 m elevation was divided into twenty-one 175 m2 parcels, in which plots were randomly assigned within each parcel. Peak snow was measured in April; in the 50 m2 plots, depths were measured every 2 m along north-south and east-west transects. Twenty-one snow pits were excavated to quantify snow densities in 10 cm increments throughout the pit profile. Forest inventories occurred the following summer. Peak snowpack levels occurred in April with mean depth of 92.3 ± 2.4 cm and peak SWE of 34.0 ± 0.84 cm. Binary decision trees accounted for 63% of the variability after including topographic indices, beetle condition of the trees, LAI, and basal area. Snow depth showed a slight positive relationship with increased in beetle mortality on slopes less than 11 degrees. Overall, topographic indices are greater drivers for snow distributions compared to effects of tree mortality.
Comparison of Taxi Time Prediction Performance Using Different Taxi Speed Decision Trees
NASA Technical Reports Server (NTRS)
Lee, Hanbong
2017-01-01
In the STBO modeler and tactical surface scheduler for ATD-2 project, taxi speed decision trees are used to calculate the unimpeded taxi times of flights taxiing on the airport surface. The initial taxi speed values in these decision trees did not show good prediction accuracy of taxi times. Using the more recent, reliable surveillance data, new taxi speed values in ramp area and movement area were computed. Before integrating these values into the STBO system, we performed test runs using live data from Charlotte airport, with different taxi speed settings: 1) initial taxi speed values and 2) new ones. Taxi time prediction performance was evaluated by comparing various metrics. The results show that the new taxi speed decision trees can calculate the unimpeded taxi-out times more accurately.
NASA Astrophysics Data System (ADS)
Bouffon, T.; Rice, R.; Bales, R.
2006-12-01
The spatial distributions of snow water equivalent (SWE) and snow depth within a 1, 4, and 16 km2 grid element around two automated snow pillows in a forested and open- forested region of the Upper Merced River Basin (2,800 km2) of Yosemite National Park were characterized using field observations and analyzed using binary regression trees. Snow surveys occurred at the forested site during the accumulation and ablation seasons, while at the open-forest site a survey was performed only during the accumulation season. An average of 130 snow depth and 7 snow density measurements were made on each survey, within the 4 km2 grid. Snow depth was distributed using binary regression trees and geostatistical methods using the physiographic parameters (e.g. elevation, slope, vegetation, aspect). Results in the forest region indicate that the snow pillow overestimated average SWE within the 1, 4, and 16 km2 areas by 34 percent during ablation, but during accumulation the snow pillow provides a good estimate of the modeled mean SWE grid value, however it is suspected that the snow pillow was underestimating SWE. However, at the open forest site, during accumulation, the snow pillow was 28 percent greater than the mean modeled grid element. In addition, the binary regression trees indicate that the independent variables of vegetation, slope, and aspect are the most influential parameters of snow depth distribution. The binary regression tree and multivariate linear regression models explain about 60 percent of the initial variance for snow depth and 80 percent for density, respectively. This short-term study provides motivation and direction for the installation of a distributed snow measurement network to fill the information gap in basin-wide SWE and snow depth measurements. Guided by these results, a distributed snow measurement network was installed in the Fall 2006 at Gin Flat in the Upper Merced River Basin with the specific objective of measuring accumulation and ablation across topographic variables with the aim of providing guidance for future larger scale observation network designs.
Low intensity magnetic field influences short-term memory: A study in a group of healthy students.
Navarro, Enrique A; Gomez-Perretta, Claudio; Montes, Francisco
2016-01-01
This study analyzes if an external magnetic stimulus (2 kHz and approximately 0.1 μT applied near frontal cortex) influences working memory, perception, binary decision, motor execution, and sustained attention in humans. A magnetic stimulus and a sham stimulus were applied to both sides of the head (frontal cortex close to temporal-parietal area) in young and healthy male test subjects (n = 65) while performing Sternberg's memory scanning task. There was a significant change in reaction time. Times recorded for perception, sustained attention, and motor execution were lower in exposed subjects (P < 0.01). However, time employed in binary decision increased for subjects exposed to magnetic fields. From results, it seems that a low intensity 2 kHz exposure modifies short-term working memory, as well as perception, binary decision, motor execution, and sustained attention. © 2015 Wiley Periodicals, Inc.
Zhao, Yang; Zheng, Wei; Zhuo, Daisy Y; Lu, Yuefeng; Ma, Xiwen; Liu, Hengchang; Zeng, Zhen; Laird, Glen
2017-10-11
Personalized medicine, or tailored therapy, has been an active and important topic in recent medical research. Many methods have been proposed in the literature for predictive biomarker detection and subgroup identification. In this article, we propose a novel decision tree-based approach applicable in randomized clinical trials. We model the prognostic effects of the biomarkers using additive regression trees and the biomarker-by-treatment effect using a single regression tree. Bayesian approach is utilized to periodically revise the split variables and the split rules of the decision trees, which provides a better overall fitting. Gibbs sampler is implemented in the MCMC procedure, which updates the prognostic trees and the interaction tree separately. We use the posterior distribution of the interaction tree to construct the predictive scores of the biomarkers and to identify the subgroup where the treatment is superior to the control. Numerical simulations show that our proposed method performs well under various settings comparing to existing methods. We also demonstrate an application of our method in a real clinical trial.
RE-Powering’s Electronic Decision Tree
Developed by US EPA's RE-Powering America's Land Initiative, the RE-Powering Decision Trees tool guides interested parties through a process to screen sites for their suitability for solar photovoltaics or wind installations
Modeling Search Behaviors during the Acquisition of Expertise in a Sequential Decision-Making Task.
Moënne-Loccoz, Cristóbal; Vergara, Rodrigo C; López, Vladimir; Mery, Domingo; Cosmelli, Diego
2017-01-01
Our daily interaction with the world is plagued of situations in which we develop expertise through self-motivated repetition of the same task. In many of these interactions, and especially when dealing with computer and machine interfaces, we must deal with sequences of decisions and actions. For instance, when drawing cash from an ATM machine, choices are presented in a step-by-step fashion and a specific sequence of choices must be performed in order to produce the expected outcome. But, as we become experts in the use of such interfaces, is it possible to identify specific search and learning strategies? And if so, can we use this information to predict future actions? In addition to better understanding the cognitive processes underlying sequential decision making, this could allow building adaptive interfaces that can facilitate interaction at different moments of the learning curve. Here we tackle the question of modeling sequential decision-making behavior in a simple human-computer interface that instantiates a 4-level binary decision tree (BDT) task. We record behavioral data from voluntary participants while they attempt to solve the task. Using a Hidden Markov Model-based approach that capitalizes on the hierarchical structure of behavior, we then model their performance during the interaction. Our results show that partitioning the problem space into a small set of hierarchically related stereotyped strategies can potentially capture a host of individual decision making policies. This allows us to follow how participants learn and develop expertise in the use of the interface. Moreover, using a Mixture of Experts based on these stereotyped strategies, the model is able to predict the behavior of participants that master the task.
Decision Tree Approach for Soil Liquefaction Assessment
Gandomi, Amir H.; Fridline, Mark M.; Roke, David A.
2013-01-01
In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view. PMID:24489498
Decision tree approach for soil liquefaction assessment.
Gandomi, Amir H; Fridline, Mark M; Roke, David A
2013-01-01
In the current study, the performances of some decision tree (DT) techniques are evaluated for postearthquake soil liquefaction assessment. A database containing 620 records of seismic parameters and soil properties is used in this study. Three decision tree techniques are used here in two different ways, considering statistical and engineering points of view, to develop decision rules. The DT results are compared to the logistic regression (LR) model. The results of this study indicate that the DTs not only successfully predict liquefaction but they can also outperform the LR model. The best DT models are interpreted and evaluated based on an engineering point of view.
Fast Image Texture Classification Using Decision Trees
NASA Technical Reports Server (NTRS)
Thompson, David R.
2011-01-01
Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.
Efficient Decoding of Compressed Data.
ERIC Educational Resources Information Center
Bassiouni, Mostafa A.; Mukherjee, Amar
1995-01-01
Discusses the problem of enhancing the speed of Huffman decoding of compressed data. Topics addressed include the Huffman decoding tree; multibit decoding; binary string mapping problems; and algorithms for solving mapping problems. (22 references) (LRW)
A Model of Adding Relations in Multi-levels to a Formal Organization Structure with Two Subordinates
NASA Astrophysics Data System (ADS)
Sawada, Kiyoshi; Amano, Kazuyuki
2009-10-01
This paper proposes a model of adding relations in multi-levels to a formal organization structure with two subordinates such that the communication of information between every member in the organization becomes the most efficient. When edges between every pair of nodes with the same depth in L (L = 1, 2, …, H) levels are added to a complete binary tree of height H, an optimal set of depths {N1, N2, …, NL} (H⩾N1>N2> …>NL⩾1) is obtained by maximizing the total shortening path length which is the sum of shortening lengths of shortest paths between every pair of all nodes in the complete binary tree. It is shown that {N1, N2, …, NL}* = {H, H-1, …, H-L+1}.
Gundogdu, Erhan; Ozkan, Huseyin; Alatan, A Aydin
2017-11-01
Correlation filters have been successfully used in visual tracking due to their modeling power and computational efficiency. However, the state-of-the-art correlation filter-based (CFB) tracking algorithms tend to quickly discard the previous poses of the target, since they consider only a single filter in their models. On the contrary, our approach is to register multiple CFB trackers for previous poses and exploit the registered knowledge when an appearance change occurs. To this end, we propose a novel tracking algorithm [of complexity O(D) ] based on a large ensemble of CFB trackers. The ensemble [of size O(2 D ) ] is organized over a binary tree (depth D ), and learns the target appearance subspaces such that each constituent tracker becomes an expert of a certain appearance. During tracking, the proposed algorithm combines only the appearance-aware relevant experts to produce boosted tracking decisions. Additionally, we propose a versatile spatial windowing technique to enhance the individual expert trackers. For this purpose, spatial windows are learned for target objects as well as the correlation filters and then the windowed regions are processed for more robust correlations. In our extensive experiments on benchmark datasets, we achieve a substantial performance increase by using the proposed tracking algorithm together with the spatial windowing.
Ethnographic Decision Tree Modeling: A Research Method for Counseling Psychology.
ERIC Educational Resources Information Center
Beck, Kirk A.
2005-01-01
This article describes ethnographic decision tree modeling (EDTM; C. H. Gladwin, 1989) as a mixed method design appropriate for counseling psychology research. EDTM is introduced and located within a postpositivist research paradigm. Decision theory that informs EDTM is reviewed, and the 2 phases of EDTM are highlighted. The 1st phase, model…
Stereo-vision-based terrain mapping for off-road autonomous navigation
NASA Astrophysics Data System (ADS)
Rankin, Arturo L.; Huertas, Andres; Matthies, Larry H.
2009-05-01
Successful off-road autonomous navigation by an unmanned ground vehicle (UGV) requires reliable perception and representation of natural terrain. While perception algorithms are used to detect driving hazards, terrain mapping algorithms are used to represent the detected hazards in a world model a UGV can use to plan safe paths. There are two primary ways to detect driving hazards with perception sensors mounted to a UGV: binary obstacle detection and traversability cost analysis. Binary obstacle detectors label terrain as either traversable or non-traversable, whereas, traversability cost analysis assigns a cost to driving over a discrete patch of terrain. In uncluttered environments where the non-obstacle terrain is equally traversable, binary obstacle detection is sufficient. However, in cluttered environments, some form of traversability cost analysis is necessary. The Jet Propulsion Laboratory (JPL) has explored both approaches using stereo vision systems. A set of binary detectors has been implemented that detect positive obstacles, negative obstacles, tree trunks, tree lines, excessive slope, low overhangs, and water bodies. A compact terrain map is built from each frame of stereo images. The mapping algorithm labels cells that contain obstacles as nogo regions, and encodes terrain elevation, terrain classification, terrain roughness, traversability cost, and a confidence value. The single frame maps are merged into a world map where temporal filtering is applied. In previous papers, we have described our perception algorithms that perform binary obstacle detection. In this paper, we summarize the terrain mapping capabilities that JPL has implemented during several UGV programs over the last decade and discuss some challenges to building terrain maps with stereo range data.
Stereo Vision Based Terrain Mapping for Off-Road Autonomous Navigation
NASA Technical Reports Server (NTRS)
Rankin, Arturo L.; Huertas, Andres; Matthies, Larry H.
2009-01-01
Successful off-road autonomous navigation by an unmanned ground vehicle (UGV) requires reliable perception and representation of natural terrain. While perception algorithms are used to detect driving hazards, terrain mapping algorithms are used to represent the detected hazards in a world model a UGV can use to plan safe paths. There are two primary ways to detect driving hazards with perception sensors mounted to a UGV: binary obstacle detection and traversability cost analysis. Binary obstacle detectors label terrain as either traversable or non-traversable, whereas, traversability cost analysis assigns a cost to driving over a discrete patch of terrain. In uncluttered environments where the non-obstacle terrain is equally traversable, binary obstacle detection is sufficient. However, in cluttered environments, some form of traversability cost analysis is necessary. The Jet Propulsion Laboratory (JPL) has explored both approaches using stereo vision systems. A set of binary detectors has been implemented that detect positive obstacles, negative obstacles, tree trunks, tree lines, excessive slope, low overhangs, and water bodies. A compact terrain map is built from each frame of stereo images. The mapping algorithm labels cells that contain obstacles as no-go regions, and encodes terrain elevation, terrain classification, terrain roughness, traversability cost, and a confidence value. The single frame maps are merged into a world map where temporal filtering is applied. In previous papers, we have described our perception algorithms that perform binary obstacle detection. In this paper, we summarize the terrain mapping capabilities that JPL has implemented during several UGV programs over the last decade and discuss some challenges to building terrain maps with stereo range data.
Mudali, D; Teune, L K; Renken, R J; Leenders, K L; Roerdink, J B T M
2015-01-01
Medical imaging techniques like fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to aid in the differential diagnosis of neurodegenerative brain diseases. In this study, the objective is to classify FDG-PET brain scans of subjects with Parkinsonian syndromes (Parkinson's disease, multiple system atrophy, and progressive supranuclear palsy) compared to healthy controls. The scaled subprofile model/principal component analysis (SSM/PCA) method was applied to FDG-PET brain image data to obtain covariance patterns and corresponding subject scores. The latter were used as features for supervised classification by the C4.5 decision tree method. Leave-one-out cross validation was applied to determine classifier performance. We carried out a comparison with other types of classifiers. The big advantage of decision tree classification is that the results are easy to understand by humans. A visual representation of decision trees strongly supports the interpretation process, which is very important in the context of medical diagnosis. Further improvements are suggested based on enlarging the number of the training data, enhancing the decision tree method by bagging, and adding additional features based on (f)MRI data.
PRIA 3 Fee Determination Decision Tree
The PRIA 3 decision tree will help applicants requesting a pesticide registration or certain tolerance action to accurately identify the category of their application and the amount of the required fee before they submit the application.
Solar and Wind Site Screening Decision Trees
EPA and NREL created a decision tree to guide state and local governments and other stakeholders through a process for screening sites for their suitability for future redevelopment with solar photovoltaic (PV) energy and wind energy.
Krajbich, Ian; Rangel, Antonio
2011-08-16
How do we make decisions when confronted with several alternatives (e.g., on a supermarket shelf)? Previous work has shown that accumulator models, such as the drift-diffusion model, can provide accurate descriptions of the psychometric data for binary value-based choices, and that the choice process is guided by visual attention. However, the computational processes used to make choices in more complicated situations involving three or more options are unknown. We propose a model of trinary value-based choice that generalizes what is known about binary choice, and test it using an eye-tracking experiment. We find that the model provides a quantitatively accurate description of the relationship between choice, reaction time, and visual fixation data using the same parameters that were estimated in previous work on binary choice. Our findings suggest that the brain uses similar computational processes to make binary and trinary choices.
Inferring phylogenetic trees from the knowledge of rare evolutionary events.
Hellmuth, Marc; Hernandez-Rosales, Maribel; Long, Yangjing; Stadler, Peter F
2018-06-01
Rare events have played an increasing role in molecular phylogenetics as potentially homoplasy-poor characters. In this contribution we analyze the phylogenetic information content from a combinatorial point of view by considering the binary relation on the set of taxa defined by the existence of a single event separating two taxa. We show that the graph-representation of this relation must be a tree. Moreover, we characterize completely the relationship between the tree of such relations and the underlying phylogenetic tree. With directed operations such as tandem-duplication-random-loss events in mind we demonstrate how non-symmetric information constrains the position of the root in the partially reconstructed phylogeny.
Rosso, Nicholas; Giabbanelli, Philippe
2018-05-30
National surveys in public health nutrition commonly record the weight of every food consumed by an individual. However, if the goal is to identify whether individuals are in compliance with the 5 main national nutritional guidelines (sodium, saturated fats, sugars, fruit and vegetables, and fats), much less information may be needed. A previous study showed that tracking only 2.89% of all foods (113/3911) was sufficient to accurately identify compliance. Further reducing the data needs could lower participation burden, thus decreasing the costs for monitoring national compliance with key guidelines. This study aimed to assess whether national public health nutrition surveys can be further simplified by only recording whether a food was consumed, rather than having to weigh it. Our dataset came from a generalized sample of inhabitants in the United Kingdom, more specifically from the National Diet and Nutrition Survey 2008-2012. After simplifying food consumptions to a binary value (1 if an individual consumed a food and 0 otherwise), we built and optimized decision trees to find whether the foods could accurately predict compliance with the major 5 nutritional guidelines. When using decision trees of a similar size to previous studies (ie, involving as many foods), we were able to correctly infer compliance for the 5 guidelines with an average accuracy of 80.1%. This is an average increase of 2.5 percentage points over a previous study, showing that further simplifying the surveys can actually yield more robust estimates. When we allowed the new decision trees to use slightly more foods than in previous studies, we were able to optimize the performance with an average increase of 3.1 percentage points. Although one may expect a further simplification of surveys to decrease accuracy, our study found that public health dietary surveys can be simplified (from accurately weighing items to simply checking whether they were consumed) while improving accuracy. One possibility is that the simplification reduced noise and made it easier for patterns to emerge. Using simplified surveys will allow to monitor public health nutrition in a more cost-effective manner and possibly decrease the number of errors as participation burden is reduced. ©Nicholas Rosso, Philippe Giabbanelli. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 30.05.2018.
Stanislawski, Jerzy; Kotulska, Malgorzata; Unold, Olgierd
2013-01-17
Amyloids are proteins capable of forming fibrils. Many of them underlie serious diseases, like Alzheimer disease. The number of amyloid-associated diseases is constantly increasing. Recent studies indicate that amyloidogenic properties can be associated with short segments of aminoacids, which transform the structure when exposed. A few hundreds of such peptides have been experimentally found. Experimental testing of all possible aminoacid combinations is currently not feasible. Instead, they can be predicted by computational methods. 3D profile is a physicochemical-based method that has generated the most numerous dataset - ZipperDB. However, it is computationally very demanding. Here, we show that dataset generation can be accelerated. Two methods to increase the classification efficiency of amyloidogenic candidates are presented and tested: simplified 3D profile generation and machine learning methods. We generated a new dataset of hexapeptides, using more economical 3D profile algorithm, which showed very good classification overlap with ZipperDB (93.5%). The new part of our dataset contains 1779 segments, with 204 classified as amyloidogenic. The dataset of 6-residue sequences with their binary classification, based on the energy of the segment, was applied for training machine learning methods. A separate set of sequences from ZipperDB was used as a test set. The most effective methods were Alternating Decision Tree and Multilayer Perceptron. Both methods obtained area under ROC curve of 0.96, accuracy 91%, true positive rate ca. 78%, and true negative rate 95%. A few other machine learning methods also achieved a good performance. The computational time was reduced from 18-20 CPU-hours (full 3D profile) to 0.5 CPU-hours (simplified 3D profile) to seconds (machine learning). We showed that the simplified profile generation method does not introduce an error with regard to the original method, while increasing the computational efficiency. Our new dataset proved representative enough to use simple statistical methods for testing the amylogenicity based only on six letter sequences. Statistical machine learning methods such as Alternating Decision Tree and Multilayer Perceptron can replace the energy based classifier, with advantage of very significantly reduced computational time and simplicity to perform the analysis. Additionally, a decision tree provides a set of very easily interpretable rules.
A Sequence of Sorting Strategies.
ERIC Educational Resources Information Center
Duncan, David R.; Litwiller, Bonnie H.
1984-01-01
Describes eight increasingly sophisticated and efficient sorting algorithms including linear insertion, binary insertion, shellsort, bubble exchange, shakersort, quick sort, straight selection, and tree selection. Provides challenges for the reader and the student to program these efficiently. (JM)
Optimum Array Processing for Detecting Binary Signals Corrupted by Directional Interference.
1972-12-01
specific cases. Two different series representations of a vector random process are discussed in Van Trees [3]. These two methods both require the... spaci ~ng d, etc.) its detection error represents a lower bound for the performance that might be obtained with other types of array processing (such...Middleton, Introduction to Statistical Communication Theory, New York: McGraw-Hill, 1960. 3. H.L. Van Trees , Detection, Estimation, and Modulation Theory
Nodal distances for rooted phylogenetic trees.
Cardona, Gabriel; Llabrés, Mercè; Rosselló, Francesc; Valiente, Gabriel
2010-08-01
Dissimilarity measures for (possibly weighted) phylogenetic trees based on the comparison of their vectors of path lengths between pairs of taxa, have been present in the systematics literature since the early seventies. For rooted phylogenetic trees, however, these vectors can only separate non-weighted binary trees, and therefore these dissimilarity measures are metrics only on this class of rooted phylogenetic trees. In this paper we overcome this problem, by splitting in a suitable way each path length between two taxa into two lengths. We prove that the resulting splitted path lengths matrices single out arbitrary rooted phylogenetic trees with nested taxa and arcs weighted in the set of positive real numbers. This allows the definition of metrics on this general class of rooted phylogenetic trees by comparing these matrices through metrics in spaces M(n)(R) of real-valued n x n matrices. We conclude this paper by establishing some basic facts about the metrics for non-weighted phylogenetic trees defined in this way using L(p) metrics on M(n)(R), with p [epsilon] R(>0).
Moon, Mikyung; Lee, Soo-Kyoung
2017-01-01
The purpose of this study was to use decision tree analysis to explore the factors associated with pressure ulcers (PUs) among elderly people admitted to Korean long-term care facilities. The data were extracted from the 2014 National Inpatient Sample (NIS)-data of Health Insurance Review and Assessment Service (HIRA). A MapReduce-based program was implemented to join and filter 5 tables of the NIS. The outcome predicted by the decision tree model was the prevalence of PUs as defined by the Korean Standard Classification of Disease-7 (KCD-7; code L89 * ). Using R 3.3.1, a decision tree was generated with the finalized 15,856 cases and 830 variables. The decision tree displayed 15 subgroups with 8 variables showing 0.804 accuracy, 0.820 sensitivity, and 0.787 specificity. The most significant primary predictor of PUs was length of stay less than 0.5 day. Other predictors were the presence of an infectious wound dressing, followed by having diagnoses numbering less than 3.5 and the presence of a simple dressing. Among diagnoses, "injuries to the hip and thigh" was the top predictor ranking 5th overall. Total hospital cost exceeding 2,200,000 Korean won (US $2,000) rounded out the top 7. These results support previous studies that showed length of stay, comorbidity, and total hospital cost were associated with PUs. Moreover, wound dressings were commonly used to treat PUs. They also show that machine learning, such as a decision tree, could effectively predict PUs using big data.
Predicting the probability of mortality of gastric cancer patients using decision tree.
Mohammadzadeh, F; Noorkojuri, H; Pourhoseingholi, M A; Saadat, S; Baghestani, A R
2015-06-01
Gastric cancer is the fourth most common cancer worldwide. This reason motivated us to investigate and introduce gastric cancer risk factors utilizing statistical methods. The aim of this study was to identify the most important factors influencing the mortality of patients who suffer from gastric cancer disease and to introduce a classification approach according to decision tree model for predicting the probability of mortality from this disease. Data on 216 patients with gastric cancer, who were registered in Taleghani hospital in Tehran,Iran, were analyzed. At first, patients were divided into two groups: the dead and alive. Then, to fit decision tree model to our data, we randomly selected 20% of dataset to the test sample and remaining dataset considered as the training sample. Finally, the validity of the model examined with sensitivity, specificity, diagnosis accuracy and the area under the receiver operating characteristic curve. The CART version 6.0 and SPSS version 19.0 softwares were used for the analysis of the data. Diabetes, ethnicity, tobacco, tumor size, surgery, pathologic stage, age at diagnosis, exposure to chemical weapons and alcohol consumption were determined as effective factors on mortality of gastric cancer. The sensitivity, specificity and accuracy of decision tree were 0.72, 0.75 and 0.74 respectively. The indices of sensitivity, specificity and accuracy represented that the decision tree model has acceptable accuracy to prediction the probability of mortality in gastric cancer patients. So a simple decision tree consisted of factors affecting on mortality of gastric cancer may help clinicians as a reliable and practical tool to predict the probability of mortality in these patients.
Diagnostic classification scheme in Iranian breast cancer patients using a decision tree.
Malehi, Amal Saki
2014-01-01
The objective of this study was to determine a diagnostic classification scheme using a decision tree based model. The study was conducted as a retrospective case-control study in Imam Khomeini hospital in Tehran during 2001 to 2009. Data, including demographic and clinical-pathological characteristics, were uniformly collected from 624 females, 312 of them were referred with positive diagnosis of breast cancer (cases) and 312 healthy women (controls). The decision tree was implemented to develop a diagnostic classification scheme using CART 6.0 Software. The AUC (area under curve), was measured as the overall performance of diagnostic classification of the decision tree. Five variables as main risk factors of breast cancer and six subgroups as high risk were identified. The results indicated that increasing age, low age at menarche, single and divorced statues, irregular menarche pattern and family history of breast cancer are the important diagnostic factors in Iranian breast cancer patients. The sensitivity and specificity of the analysis were 66% and 86.9% respectively. The high AUC (0.82) also showed an excellent classification and diagnostic performance of the model. Decision tree based model appears to be suitable for identifying risk factors and high or low risk subgroups. It can also assists clinicians in making a decision, since it can identify underlying prognostic relationships and understanding the model is very explicit.
Ultrasonographic Diagnosis of Biliary Atresia Based on a Decision-Making Tree Model.
Lee, So Mi; Cheon, Jung-Eun; Choi, Young Hun; Kim, Woo Sun; Cho, Hyun-Hae; Cho, Hyun-Hye; Kim, In-One; You, Sun Kyoung
2015-01-01
To assess the diagnostic value of various ultrasound (US) findings and to make a decision-tree model for US diagnosis of biliary atresia (BA). From March 2008 to January 2014, the following US findings were retrospectively evaluated in 100 infants with cholestatic jaundice (BA, n = 46; non-BA, n = 54): length and morphology of the gallbladder, triangular cord thickness, hepatic artery and portal vein diameters, and visualization of the common bile duct. Logistic regression analyses were performed to determine the features that would be useful in predicting BA. Conditional inference tree analysis was used to generate a decision-making tree for classifying patients into the BA or non-BA groups. Multivariate logistic regression analysis showed that abnormal gallbladder morphology and greater triangular cord thickness were significant predictors of BA (p = 0.003 and 0.001; adjusted odds ratio: 345.6 and 65.6, respectively). In the decision-making tree using conditional inference tree analysis, gallbladder morphology and triangular cord thickness (optimal cutoff value of triangular cord thickness, 3.4 mm) were also selected as significant discriminators for differential diagnosis of BA, and gallbladder morphology was the first discriminator. The diagnostic performance of the decision-making tree was excellent, with sensitivity of 100% (46/46), specificity of 94.4% (51/54), and overall accuracy of 97% (97/100). Abnormal gallbladder morphology and greater triangular cord thickness (> 3.4 mm) were the most useful predictors of BA on US. We suggest that the gallbladder morphology should be evaluated first and that triangular cord thickness should be evaluated subsequently in cases with normal gallbladder morphology.
2013-05-01
specifics of the correlation will be explored followed by discussion of new paradigms— the ordered event list (OEL) and the decision tree — that result from...4.2.1 Brief Overview of the Decision Tree Paradigm ................................................15 4.2.2 OEL Explained...6 Figure 3. A depiction of a notional fault/activation tree . ................................................................7
Personalized Modeling for Prediction with Decision-Path Models
Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.
2015-01-01
Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570
Space/age forestry: Implications of planting density and rotation age in SRIC management decisions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Merriam, R.A.; Phillips, V.D.; Liu, W.
1993-12-31
Short-rotation intensive-culture (SRIC) of promising tree crops is being evaluated worldwide for the production of methanol, ethanol, and electricity from renewable biomass resources. Planting density and rotation age are fundamental management decisions associated with SRIC energy plantations. Most studies of these variables have been conducted without the benefit of a unifying theory of the effects of growing space and rotation age on individual tree growth and stand level productivity. A modeling procedure based on field trials of Eucalyptus spp. is presented that evaluates the growth potential of a tree in the absence and presence of competition of neighboring trees inmore » a stand. The results of this analysis are useful in clarifying economic implications of different growing space and rotation age decisions that tree plantation managers must make. The procedure is readily applicable to other species under consideration for SRIC plantations at any location.« less
Distortion of Probability and Outcome Information in Risky Decisions
ERIC Educational Resources Information Center
DeKay, Michael L.; Patino-Echeverri, Dalia; Fischbeck, Paul S.
2009-01-01
Substantial evidence indicates that information is distorted during decision making, but very few studies have assessed the distortion of probability and outcome information in risky decisions. In two studies involving six binary decisions (e.g., banning blood donations from people who have visited England, because of "mad cow disease"),…
Fernández, Alberto; Carmona, Cristobal José; José Del Jesus, María; Herrera, Francisco
2017-09-01
Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this research, we overcome these problems by carrying out a combination between feature and instance selections. Feature selection will allow simplifying the overlapping areas easing the generation of rules to distinguish among the classes. Selection of instances from all classes will address the imbalance itself by finding the most appropriate class distribution for the learning task, as well as possibly removing noise and difficult borderline examples. For the sake of obtaining an optimal joint set of features and instances, we embedded the searching for both parameters in a Multi-Objective Evolutionary Algorithm, using the C4.5 decision tree as baseline classifier in this wrapper approach. The multi-objective scheme allows taking a double advantage: the search space becomes broader, and we may provide a set of different solutions in order to build an ensemble of classifiers. This proposal has been contrasted versus several state-of-the-art solutions on imbalanced classification showing excellent results in both binary and multi-class problems.
Capel, Paul D.; Wolock, David M.; Coupe, Richard H.; Roth, Jason L.
2018-01-10
Agricultural activities can affect water quality and the health of aquatic ecosystems; many water-quality issues originate with the movement of water, agricultural chemicals, and eroded soil from agricultural areas to streams and groundwater. Most agricultural activities are designed to sustain or increase crop production, while some are designed to protect soil and water resources. Numerous soil- and water-protection practices are designed to reduce the volume and velocity of runoff and increase infiltration. This report presents a conceptual framework that combines generalized concepts on the movement of water, the environmental behavior of chemicals and eroded soil, and the designed functions of various agricultural activities, as they relate to hydrology, to create attainable expectations for the protection of—with the goal of improving—water quality through changes in an agricultural activity.The framework presented uses two types of decision trees to guide decision making toward attainable expectations regarding the effectiveness of changing agricultural activities to protect and improve water quality in streams. One decision tree organizes decision making by considering the hydrologic setting and chemical behaviors, largely at the field scale. This decision tree can help determine which agricultural activities could effectively protect and improve water quality in a stream from the movement of chemicals, or sediment, from a field. The second decision tree is a chemical fate accounting tree. This decision tree helps set attainable expectations for the permanent removal of sediment, elements, and organic chemicals—such as herbicides and insecticides—through trapping or conservation tillage practices. Collectively, this conceptual framework consolidates diverse hydrologic settings, chemicals, and agricultural activities into a single, broad context that can be used to set attainable expectations for agricultural activities. This framework also enables better decision making for future agricultural activities as a means to reduce current, and prevent new, water-quality issues.
Vlsi implementation of flexible architecture for decision tree classification in data mining
NASA Astrophysics Data System (ADS)
Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak
2017-07-01
The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.
Khalkhali, Hamid Reza; Lotfnezhad Afshar, Hadi; Esnaashari, Omid; Jabbari, Nasrollah
2016-01-01
Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.
An efficient indexing scheme for binary feature based biometric database
NASA Astrophysics Data System (ADS)
Gupta, P.; Sana, A.; Mehrotra, H.; Hwang, C. Jinshong
2007-04-01
The paper proposes an efficient indexing scheme for binary feature template using B+ tree. In this scheme the input image is decomposed into approximation, vertical, horizontal and diagonal coefficients using the discrete wavelet transform. The binarized approximation coefficient at second level is divided into four quadrants of equal size and Hamming distance (HD) for each quadrant with respect to sample template of all ones is measured. This HD value of each quadrant is used to generate upper and lower range values which are inserted into B+ tree. The nodes of tree at first level contain the lower and upper range values generated from HD of first quadrant. Similarly, lower and upper range values for the three quadrants are stored in the second, third and fourth level respectively. Finally leaf node contains the set of identifiers. At the time of identification, the test image is used to generate HD for four quadrants. Then the B+ tree is traversed based on the value of HD at every node and terminates to leaf nodes with set of identifiers. The feature vector for each identifier is retrieved from the particular bin of secondary memory and matched with test feature template to get top matches. The proposed scheme is implemented on ear biometric database collected at IIT Kanpur. The system is giving an overall accuracy of 95.8% at penetration rate of 34%.
A three-sided rearrangeable switching network for a binary fat tree
NASA Astrophysics Data System (ADS)
Yen, Mao-Hsu; Yu, Chu; Shin, Haw-Yun; Chen, Sao-Jie
2011-06-01
A binary fat tree needs an internal node to interconnect the left-children, right-children and parent terminals to each other. In this article, we first propose a three-stage, 3-sided rearrangeable switching network for the implementation of a binary fat tree. The main component of this 3-sided switching network (3SSN) consists of a polygonal switch block (PSB) interconnected by crossbars. With the same size and the same number of switches as our 3SSN, a three-stage, 3-sided clique-based switching network is shown to be not rearrangeable. Also, the effects of the rearrangeable structure and the number of terminals on the network switch-efficiency are explored and a proper set of parameters has been determined to minimise the number of switches. We derive that a rearrangeable 3-sided switching network with switches proportional to N 3/2 is most suitable to interconnect N terminals. Moreover, we propose a new Polygonal Field Programmable Gate Array (PFPGA) that consists of logic blocks interconnected by our 3SSN, such that the logic blocks in this PFPGA can be grouped into clusters to implement different logic functions. Since the programmable switches usually have high resistance and capacitance and occupy a large area, we have to consider the effect of the 3SSN structure and the granularity of its cluster logic blocks on the switch efficiency of PFPGA. Experiments on benchmark circuits show that the switch and speed performances are significantly improved. Based on the experimental results, we can determine the parameters of PFPGA for the VLSI implementation.
Steel, Mike
2012-10-01
Neutral macroevolutionary models, such as the Yule model, give rise to a probability distribution on the set of discrete rooted binary trees over a given leaf set. Such models can provide a signal as to the approximate location of the root when only the unrooted phylogenetic tree is known, and this signal becomes relatively more significant as the number of leaves grows. In this short note, we show that among models that treat all taxa equally, and are sampling consistent (i.e. the distribution on trees is not affected by taxa yet to be included), all such models, except one (the so-called PDA model), convey some information as to the location of the ancestral root in an unrooted tree. Copyright © 2012 Elsevier Inc. All rights reserved.
The Utility of Decision Trees in Oncofertility Care in Japan.
Ito, Yuki; Shiraishi, Eriko; Kato, Atsuko; Haino, Takayuki; Sugimoto, Kouhei; Okamoto, Aikou; Suzuki, Nao
2017-03-01
To identify the utility and issues associated with the use of decision trees in oncofertility patient care in Japan. A total of 35 women who had been diagnosed with cancer, but had not begun anticancer treatment, were enrolled. We applied the oncofertility decision tree for women published by Gardino et al. to counsel a consecutive series of women on fertility preservation (FP) options following cancer diagnosis. Percentage of women who decided to undergo oocyte retrieval for embryo cryopreservation and the expected live-birth rate for these patients were calculated using the following equation: expected live-birth rate = pregnancy rate at each age per embryo transfer × (1 - miscarriage rate) × No. of cryopreserved embryos. Oocyte retrieval was performed for 17 patients (48.6%; mean ± standard deviation [SD] age, 36.35 ± 3.82 years). The mean ± SD number of cryopreserved embryos was 5.29 ± 4.63. The expected live-birth rate was 0.66. The expected live-birth rate with FP indicated that one in three oncofertility patients would not expect to have a live birth following oocyte retrieval and embryo cryopreservation. While the decision trees were useful as decision-making tools for women contemplating FP, in the context of the current restrictions on oocyte donation and the extremely small number of adoptions in Japan, the remaining options for fertility after cancer are limited. In order for cancer survivors to feel secure in their decisions, the decision tree may need to be adapted simultaneously with improvements to the social environment, such as greater support for adoption.
NASA Astrophysics Data System (ADS)
Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias
2018-03-01
This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.
A parallelized binary search tree
USDA-ARS?s Scientific Manuscript database
PTTRNFNDR is an unsupervised statistical learning algorithm that detects patterns in DNA sequences, protein sequences, or any natural language texts that can be decomposed into letters of a finite alphabet. PTTRNFNDR performs complex mathematical computations and its processing time increases when i...
Jiao, Y; Chen, R; Ke, X; Cheng, L; Chu, K; Lu, Z; Herskovits, E H
2011-01-01
Autism spectrum disorder (ASD) is a neurodevelopmental disorder, of which Asperger syndrome and high-functioning autism are subtypes. Our goal is: 1) to determine whether a diagnostic model based on single-nucleotide polymorphisms (SNPs), brain regional thickness measurements, or brain regional volume measurements can distinguish Asperger syndrome from high-functioning autism; and 2) to compare the SNP, thickness, and volume-based diagnostic models. Our study included 18 children with ASD: 13 subjects with high-functioning autism and 5 subjects with Asperger syndrome. For each child, we obtained 25 SNPs for 8 ASD-related genes; we also computed regional cortical thicknesses and volumes for 66 brain structures, based on structural magnetic resonance (MR) examination. To generate diagnostic models, we employed five machine-learning techniques: decision stump, alternating decision trees, multi-class alternating decision trees, logistic model trees, and support vector machines. For SNP-based classification, three decision-tree-based models performed better than the other two machine-learning models. The performance metrics for three decision-tree-based models were similar: decision stump was modestly better than the other two methods, with accuracy = 90%, sensitivity = 0.95 and specificity = 0.75. All thickness and volume-based diagnostic models performed poorly. The SNP-based diagnostic models were superior to those based on thickness and volume. For SNP-based classification, rs878960 in GABRB3 (gamma-aminobutyric acid A receptor, beta 3) was selected by all tree-based models. Our analysis demonstrated that SNP-based classification was more accurate than morphometry-based classification in ASD subtype classification. Also, we found that one SNP--rs878960 in GABRB3--distinguishes Asperger syndrome from high-functioning autism.
The application of a decision tree to establish the parameters associated with hypertension.
Tayefi, Maryam; Esmaeili, Habibollah; Saberi Karimian, Maryam; Amirabadi Zadeh, Alireza; Ebrahimi, Mahmoud; Safarian, Mohammad; Nematy, Mohsen; Parizadeh, Seyed Mohammad Reza; Ferns, Gordon A; Ghayour-Mobarhan, Majid
2017-02-01
Hypertension is an important risk factor for cardiovascular disease (CVD). The goal of this study was to establish the factors associated with hypertension by using a decision-tree algorithm as a supervised classification method of data mining. Data from a cross-sectional study were used in this study. A total of 9078 subjects who met the inclusion criteria were recruited. 70% of these subjects (6358 cases) were randomly allocated to the training dataset for the constructing of the decision-tree. The remaining 30% (2720 cases) were used as the testing dataset to evaluate the performance of decision-tree. Two models were evaluated in this study. In model I, age, gender, body mass index, marital status, level of education, occupation status, depression and anxiety status, physical activity level, smoking status, LDL, TG, TC, FBG, uric acid and hs-CRP were considered as input variables and in model II, age, gender, WBC, RBC, HGB, HCT MCV, MCH, PLT, RDW and PDW were considered as input variables. The validation of the model was assessed by constructing a receiver operating characteristic (ROC) curve. The prevalence rates of hypertension were 32% in our population. For the decision-tree model I, the accuracy, sensitivity, specificity and area under the ROC curve (AUC) value for identifying the related risk factors of hypertension were 73%, 63%, 77% and 0.72, respectively. The corresponding values for model II were 70%, 61%, 74% and 0.68, respectively. We have developed a decision tree model to identify the risk factors associated with hypertension that maybe used to develop programs for hypertension management. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
James, Lachlan P; Robertson, Sam; Haff, G Gregory; Beckman, Emma M; Kelly, Vincent G
2017-03-01
To determine those performance indicators that have the greatest influence on classifying outcome at the elite level of mixed martial arts (MMA). A secondary objective was to establish the efficacy of decision tree analysis in explaining the characteristics of victory when compared to alternate statistical methods. Cross-sectional observational. Eleven raw performance indicators from male Ultimate Fighting Championship bouts (n=234) from July 2014 to December 2014 were screened for analysis. Each raw performance indicator was also converted to a rate-dependent measure to be scaled to fight duration. Further, three additional performance indicators were calculated from the dataset and included in the analysis. Cohen's d effect sizes were employed to determine the magnitude of the differences between Wins and Losses, while decision tree (chi-square automatic interaction detector (CHAID)) and discriminant function analyses (DFA) were used to classify outcome (Win and Loss). Effect size comparisons revealed differences between Wins and Losses across a number of performance indicators. Decision tree (raw: 71.8%; rate-scaled: 76.3%) and DFA (raw: 71.4%; rate-scaled 71.2%) achieved similar classification accuracies. Grappling and accuracy performance indicators were the most influential in explaining outcome. The decision tree models also revealed multiple combinations of performance indicators leading to victory. The decision tree analyses suggest that grappling activity and technique accuracy are of particular importance in achieving victory in elite-level MMA competition. The DFA results supported the importance of these performance indicators. Decision tree induction represents an intuitive and slightly more accurate approach to explaining bout outcome in this sport when compared to DFA. Copyright © 2016 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Minimum triplet covers of binary phylogenetic X-trees.
Huber, K T; Moulton, V; Steel, M
2017-12-01
Trees with labelled leaves and with all other vertices of degree three play an important role in systematic biology and other areas of classification. A classical combinatorial result ensures that such trees can be uniquely reconstructed from the distances between the leaves (when the edges are given any strictly positive lengths). Moreover, a linear number of these pairwise distance values suffices to determine both the tree and its edge lengths. A natural set of pairs of leaves is provided by any 'triplet cover' of the tree (based on the fact that each non-leaf vertex is the median vertex of three leaves). In this paper we describe a number of new results concerning triplet covers of minimum size. In particular, we characterize such covers in terms of an associated graph being a 2-tree. Also, we show that minimum triplet covers are 'shellable' and thereby provide a set of pairs for which the inter-leaf distance values will uniquely determine the underlying tree and its associated branch lengths.
NASA Astrophysics Data System (ADS)
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust models in terms of selected predictors and coefficients, as well as of dispersion of the estimated probabilities around the mean value for each mapped pixel. The difference in the behaviour could be interpreted as the result of overfitting effects, which heavily affect decision tree classification more than logistic regression techniques.
Hostettler, Isabel Charlotte; Muroi, Carl; Richter, Johannes Konstantin; Schmid, Josef; Neidert, Marian Christoph; Seule, Martin; Boss, Oliver; Pangalu, Athina; Germans, Menno Robbert; Keller, Emanuela
2018-01-19
OBJECTIVE The aim of this study was to create prediction models for outcome parameters by decision tree analysis based on clinical and laboratory data in patients with aneurysmal subarachnoid hemorrhage (aSAH). METHODS The database consisted of clinical and laboratory parameters of 548 patients with aSAH who were admitted to the Neurocritical Care Unit, University Hospital Zurich. To examine the model performance, the cohort was randomly divided into a derivation cohort (60% [n = 329]; training data set) and a validation cohort (40% [n = 219]; test data set). The classification and regression tree prediction algorithm was applied to predict death, functional outcome, and ventriculoperitoneal (VP) shunt dependency. Chi-square automatic interaction detection was applied to predict delayed cerebral infarction on days 1, 3, and 7. RESULTS The overall mortality was 18.4%. The accuracy of the decision tree models was good for survival on day 1 and favorable functional outcome at all time points, with a difference between the training and test data sets of < 5%. Prediction accuracy for survival on day 1 was 75.2%. The most important differentiating factor was the interleukin-6 (IL-6) level on day 1. Favorable functional outcome, defined as Glasgow Outcome Scale scores of 4 and 5, was observed in 68.6% of patients. Favorable functional outcome at all time points had a prediction accuracy of 71.1% in the training data set, with procalcitonin on day 1 being the most important differentiating factor at all time points. A total of 148 patients (27%) developed VP shunt dependency. The most important differentiating factor was hyperglycemia on admission. CONCLUSIONS The multiple variable analysis capability of decision trees enables exploration of dependent variables in the context of multiple changing influences over the course of an illness. The decision tree currently generated increases awareness of the early systemic stress response, which is seemingly pertinent for prognostication.
Faults Discovery By Using Mined Data
NASA Technical Reports Server (NTRS)
Lee, Charles
2005-01-01
Fault discovery in the complex systems consist of model based reasoning, fault tree analysis, rule based inference methods, and other approaches. Model based reasoning builds models for the systems either by mathematic formulations or by experiment model. Fault Tree Analysis shows the possible causes of a system malfunction by enumerating the suspect components and their respective failure modes that may have induced the problem. The rule based inference build the model based on the expert knowledge. Those models and methods have one thing in common; they have presumed some prior-conditions. Complex systems often use fault trees to analyze the faults. Fault diagnosis, when error occurs, is performed by engineers and analysts performing extensive examination of all data gathered during the mission. International Space Station (ISS) control center operates on the data feedback from the system and decisions are made based on threshold values by using fault trees. Since those decision-making tasks are safety critical and must be done promptly, the engineers who manually analyze the data are facing time challenge. To automate this process, this paper present an approach that uses decision trees to discover fault from data in real-time and capture the contents of fault trees as the initial state of the trees.
Sancak, Eyup Burak; Kılınç, Muhammet Fatih; Yücebaş, Sait Can
2017-01-01
The decision on the choice of proximal ureteral stone therapy depends on many factors, and sometimes urologists have difficulty in choosing the treatment option. This study is aimed at evaluating the factors affecting the success of semirigid ureterorenoscopy (URS) using the "decision tree" method. From January 2005 to November 2015, the data of consecutive patients treated for proximal ureteral stone were retrospectively analyzed. A total of 920 patients with proximal ureteral stone treated with semirigid URS were included in the study. All statistically significant attributes were tested using the decision tree method. The model created using decision tree had a sensitivity of 0.993 and an accuracy of 0.857. While URS treatment was successful in 752 patients (81.7%), it was unsuccessful in 168 patients (18.3%). According to the decision tree method, the most important factor affecting the success of URS is whether the stone is impacted to the ureteral wall. The second most important factor affecting treatment was intramural stricture requiring dilatation if the stone is impacted, and the size of the stone if not impacted. Our study suggests that the impacted stone, intramural stricture requiring dilatation and stone size may have a significant effect on the success rate of semirigid URS for proximal ureteral stone. Further studies with population-based and longitudinal design should be conducted to confirm this finding. © 2017 S. Karger AG, Basel.
C-fuzzy variable-branch decision tree with storage and classification error rate constraints
NASA Astrophysics Data System (ADS)
Yang, Shiueng-Bien
2009-10-01
The C-fuzzy decision tree (CFDT), which is based on the fuzzy C-means algorithm, has recently been proposed. The CFDT is grown by selecting the nodes to be split according to its classification error rate. However, the CFDT design does not consider the classification time taken to classify the input vector. Thus, the CFDT can be improved. We propose a new C-fuzzy variable-branch decision tree (CFVBDT) with storage and classification error rate constraints. The design of the CFVBDT consists of two phases-growing and pruning. The CFVBDT is grown by selecting the nodes to be split according to the classification error rate and the classification time in the decision tree. Additionally, the pruning method selects the nodes to prune based on the storage requirement and the classification time of the CFVBDT. Furthermore, the number of branches of each internal node is variable in the CFVBDT. Experimental results indicate that the proposed CFVBDT outperforms the CFDT and other methods.
A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem
Liu, Dong-sheng; Fan, Shu-jiang
2014-01-01
In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389
Planning effectiveness may grow on fault trees.
Chow, C W; Haddad, K; Mannino, B
1991-10-01
The first step of a strategic planning process--identifying and analyzing threats and opportunities--requires subjective judgments. By using an analytical tool known as a fault tree, healthcare administrators can reduce the unreliability of subjective decision making by creating a logical structure for problem solving and decision making. A case study of 11 healthcare administrators showed that an analysis technique called prospective hindsight can add to a fault tree's ability to improve a strategic planning process.
Discriminating crop and other canopies by overlapping binary image layers
NASA Astrophysics Data System (ADS)
Doi, Ryoichi
2013-02-01
For optimal management of agricultural fields by remote sensing, discrimination of the crop canopy from weeds and other objects is essential. In a digital photograph, a rice canopy was discriminated from a variety of weed and tree canopies and other objects by overlapping binary image layers of red-green-blue and other color components indicating the pixels with target canopy-specific (intensity) values based on the ranges of means ±(3×) standard deviations. By overlapping and merging the binary image layers, the target canopy specificity improved to 0.0015 from 0.027 for the yellow 1× standard deviation binary image layer, which was the best among all combinations of color components and means ±(3×) standard deviations. The most target rice canopy-likely pixels were further identified by limiting the pixels at different luminosity values. The discriminatory power was also visually demonstrated in this manner.
Prescriptive models to support decision making in genetics.
Pauker, S G; Pauker, S P
1987-01-01
Formal prescriptive models can help patients and clinicians better understand the risks and uncertainties they face and better formulate well-reasoned decisions. Using Bayes rule, the clinician can interpret pedigrees, historical data, physical findings and laboratory data, providing individualized probabilities of various diagnoses and outcomes of pregnancy. With the advent of screening programs for genetic disease, it becomes increasingly important to consider the prior probabilities of disease when interpreting an abnormal screening test result. Decision trees provide a convenient formalism for structuring diagnostic, therapeutic and reproductive decisions; such trees can also enhance communication between clinicians and patients. Utility theory provides a mechanism for patients to understand the choices they face and to communicate their attitudes about potential reproductive outcomes in a manner which encourages the integration of those attitudes into appropriate decisions. Using a decision tree, the relevant probabilities and the patients' utilities, physicians can estimate the relative worth of various medical and reproductive options by calculating the expected utility of each. By performing relevant sensitivity analyses, clinicians and patients can understand the impact of various soft data, including the patients' attitudes toward various health outcomes, on the decision making process. Formal clinical decision analytic models can provide deeper understanding and improved decision making in clinical genetics.
Genetic programming based ensemble system for microarray data classification.
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved.
Genetic Programming Based Ensemble System for Microarray Data Classification
Liu, Kun-Hong; Tong, Muchenxuan; Xie, Shu-Tong; Yee Ng, Vincent To
2015-01-01
Recently, more and more machine learning techniques have been applied to microarray data analysis. The aim of this study is to propose a genetic programming (GP) based new ensemble system (named GPES), which can be used to effectively classify different types of cancers. Decision trees are deployed as base classifiers in this ensemble framework with three operators: Min, Max, and Average. Each individual of the GP is an ensemble system, and they become more and more accurate in the evolutionary process. The feature selection technique and balanced subsampling technique are applied to increase the diversity in each ensemble system. The final ensemble committee is selected by a forward search algorithm, which is shown to be capable of fitting data automatically. The performance of GPES is evaluated using five binary class and six multiclass microarray datasets, and results show that the algorithm can achieve better results in most cases compared with some other ensemble systems. By using elaborate base classifiers or applying other sampling techniques, the performance of GPES may be further improved. PMID:25810748
A method of real-time fault diagnosis for power transformers based on vibration analysis
NASA Astrophysics Data System (ADS)
Hong, Kaixing; Huang, Hai; Zhou, Jianping; Shen, Yimin; Li, Yujie
2015-11-01
In this paper, a novel probability-based classification model is proposed for real-time fault detection of power transformers. First, the transformer vibration principle is introduced, and two effective feature extraction techniques are presented. Next, the details of the classification model based on support vector machine (SVM) are shown. The model also includes a binary decision tree (BDT) which divides transformers into different classes according to health state. The trained model produces posterior probabilities of membership to each predefined class for a tested vibration sample. During the experiments, the vibrations of transformers under different conditions are acquired, and the corresponding feature vectors are used to train the SVM classifiers. The effectiveness of this model is illustrated experimentally on typical in-service transformers. The consistency between the results of the proposed model and the actual condition of the test transformers indicates that the model can be used as a reliable method for transformer fault detection.
NASA Astrophysics Data System (ADS)
Chen, Wen-Yuan; Liu, Chen-Chung
2006-01-01
The problems with binary watermarking schemes are that they have only a small amount of embeddable space and are not robust enough. We develop a slice-based large-cluster algorithm (SBLCA) to construct a robust watermarking scheme for binary images. In SBLCA, a small-amount cluster selection (SACS) strategy is used to search for a feasible slice in a large-cluster flappable-pixel decision (LCFPD) method, which is used to search for the best location for concealing a secret bit from a selected slice. This method has four major advantages over the others: (a) SBLCA has a simple and effective decision function to select appropriate concealment locations, (b) SBLCA utilizes a blind watermarking scheme without the original image in the watermark extracting process, (c) SBLCA uses slice-based shuffling capability to transfer the regular image into a hash state without remembering the state before shuffling, and finally, (d) SBLCA has enough embeddable space that every 64 pixels could accommodate a secret bit of the binary image. Furthermore, empirical results on test images reveal that our approach is a robust watermarking scheme for binary images.
Dexter H. Locke; J. Morgan Grove; Michael Galvin; Jarlath P.M. ONeil-Dunne; Charles Murphy
2013-01-01
Urban Tree Canopy (UTC) Prioritizations can be both a set of geographic analysis tools and a planning process for collaborative decision-making. In this paper, we describe how UTC Prioritizations can be used as a planning process to provide decision support to multiple government agencies, civic groups and private businesses to aid in reaching a canopy target. Linkages...
New Splitting Criteria for Decision Trees in Stationary Data Streams.
Jaworski, Maciej; Duda, Piotr; Rutkowski, Leszek; Jaworski, Maciej; Duda, Piotr; Rutkowski, Leszek; Rutkowski, Leszek; Duda, Piotr; Jaworski, Maciej
2018-06-01
The most popular tools for stream data mining are based on decision trees. In previous 15 years, all designed methods, headed by the very fast decision tree algorithm, relayed on Hoeffding's inequality and hundreds of researchers followed this scheme. Recently, we have demonstrated that although the Hoeffding decision trees are an effective tool for dealing with stream data, they are a purely heuristic procedure; for example, classical decision trees such as ID3 or CART cannot be adopted to data stream mining using Hoeffding's inequality. Therefore, there is an urgent need to develop new algorithms, which are both mathematically justified and characterized by good performance. In this paper, we address this problem by developing a family of new splitting criteria for classification in stationary data streams and investigating their probabilistic properties. The new criteria, derived using appropriate statistical tools, are based on the misclassification error and the Gini index impurity measures. The general division of splitting criteria into two types is proposed. Attributes chosen based on type- splitting criteria guarantee, with high probability, the highest expected value of split measure. Type- criteria ensure that the chosen attribute is the same, with high probability, as it would be chosen based on the whole infinite data stream. Moreover, in this paper, two hybrid splitting criteria are proposed, which are the combinations of single criteria based on the misclassification error and Gini index.
Tanaka, Tomohiro; Voigt, Michael D
2018-03-01
Non-melanoma skin cancer (NMSC) is the most common de novo malignancy in liver transplant (LT) recipients; it behaves more aggressively and it increases mortality. We used decision tree analysis to develop a tool to stratify and quantify risk of NMSC in LT recipients. We performed Cox regression analysis to identify which predictive variables to enter into the decision tree analysis. Data were from the Organ Procurement Transplant Network (OPTN) STAR files of September 2016 (n = 102984). NMSC developed in 4556 of the 105984 recipients, a mean of 5.6 years after transplant. The 5/10/20-year rates of NMSC were 2.9/6.3/13.5%, respectively. Cox regression identified male gender, Caucasian race, age, body mass index (BMI) at LT, and sirolimus use as key predictive or protective factors for NMSC. These factors were entered into a decision tree analysis. The final tree stratified non-Caucasians as low risk (0.8%), and Caucasian males > 47 years, BMI < 40 who did not receive sirolimus, as high risk (7.3% cumulative incidence of NMSC). The predictions in the derivation set were almost identical to those in the validation set (r 2 = 0.971, p < 0.0001). Cumulative incidence of NMSC in low, moderate and high risk groups at 5/10/20 year was 0.5/1.2/3.3, 2.1/4.8/11.7 and 5.6/11.6/23.1% (p < 0.0001). The decision tree model accurately stratifies the risk of developing NMSC in the long-term after LT.
Interpretation of diagnostic data: 6. How to do it with more complex maths.
1983-11-15
We have now shown you how to use decision analysis in making those rare, tough diagnostic decisions that are not soluble through other, easier routes. In summary, to "use more complex maths" the following steps will be useful: Create a decision tree or map of all the pertinent courses of action and their consequences. Assign probabilities to the branches of each chance node. Assign utilities to each of the potential outcomes shown on the decision tree. Combine the probabilities and utilities for each node on the decision tree. Pick the decision that leads to the highest expected utility. Test your decision for its sensitivity to clinically sensible changes in probabilities and utilities. That concludes this series of clinical epidemiology rounds. You've come a long way from "doing it with pictures" and are now able to extract most of the diagnostic information that can be provided from signs, symptoms and laboratory investigations. We would appreciate learning whether you have found this series useful and how we can do a better job of presenting these and other elements of "the science of the art of medicine".
Salience from the decision perspective: You know where it is before you know it is there.
Zehetleitner, Michael; Müller, Hermann J
2010-12-31
In visual search for feature contrast ("odd-one-out") singletons, identical manipulations of salience, whether by varying target-distractor similarity or dimensional redundancy of target definition, had smaller effects on reaction times (RTs) for binary localization decisions than for yes/no detection decisions. According to formal models of binary decisions, identical differences in drift rates would yield larger RT differences for slow than for fast decisions. From this principle and the present findings, it follows that decisions on the presence of feature contrast singletons are slower than decisions on their location. This is at variance with two classes of standard models of visual search and object recognition that assume a serial cascade of first detection, then localization and identification of a target object, but also inconsistent with models assuming that as soon as a target is detected all its properties, spatial as well as non-spatial (e.g., its category), are available immediately. As an alternative, we propose a model of detection and localization tasks based on random walk processes, which can account for the present findings.
Policy Route Map for Academic Libraries' Digital Content
ERIC Educational Resources Information Center
Koulouris, Alexandros; Kapidakis, Sarantos
2012-01-01
This paper presents a policy decision tree for digital information management in academic libraries. The decision tree is a policy guide, which offers alternative access and reproduction policy solutions according to the prevailing circumstances (for example acquisition method, copyright ownership). It refers to the digital information life cycle,…
Efforts are increasingly being made to classify the world’s wetland resources, an important ecosystem and habitat that is diminishing in abundance. There are multiple remote sensing classification methods, including a suite of nonparametric classifiers such as decision-tree...
Korucu, M Kemal; Karademir, Aykan
2014-02-01
The procedure of a multi-criteria decision analysis supported by the geographic information systems was applied to the site selection process of a planning municipal solid waste management practice based on twelve different scenarios. The scenarios included two different decision tree modes and two different weighting models for three different area requirements. The suitability rankings of the suitable sites obtained from the application of the decision procedure for the scenarios were assessed by a factorial experimental design concerning the effect of some external criteria on the final decision of the site selection process. The external criteria used in the factorial experimental design were defined as "Risk perception and approval of stakeholders" and "Visibility". The effects of the presence of these criteria in the decision trees were evaluated in detail. For a quantitative expression of the differentiations observed in the suitability rankings, the ranking data were subjected to ANOVA test after a normalization process. Then the results of these tests were evaluated by Tukey test to measure the effects of external criteria on the final decision. The results of Tukey tests indicated that the involvement of the external criteria into the decision trees produced statistically meaningful differentiations in the suitability rankings. Since the external criteria could cause considerable external costs during the operation of the disposal facilities, the presence of these criteria in the decision tree in addition to the other criteria related to environmental and legislative requisites could prevent subsequent external costs in the first place.
Spectral analysis of white ash response to emerald ash borer infestations
NASA Astrophysics Data System (ADS)
Calandra, Laura
The emerald ash borer (EAB) (Agrilus planipennis Fairmaire) is an invasive insect that has killed over 50 million ash trees in the US. The goal of this research was to establish a method to identify ash trees infested with EAB using remote sensing techniques at the leaf-level and tree crown level. First, a field-based study at the leaf-level used the range of spectral bands from the WorldView-2 sensor to determine if there was a significant difference between EAB-infested white ash (Fraxinus americana) and healthy leaves. Binary logistic regression models were developed using individual and combinations of wavelengths; the most successful model included 545 and 950 nm bands. The second half of this research employed imagery to identify healthy and EAB-infested trees, comparing pixel- and object-based methods by applying an unsupervised classification approach and a tree crown delineation algorithm, respectively. The pixel-based models attained the highest overall accuracies.
Fast Construction of Near Parsimonious Hybridization Networks for Multiple Phylogenetic Trees.
Mirzaei, Sajad; Wu, Yufeng
2016-01-01
Hybridization networks represent plausible evolutionary histories of species that are affected by reticulate evolutionary processes. An established computational problem on hybridization networks is constructing the most parsimonious hybridization network such that each of the given phylogenetic trees (called gene trees) is "displayed" in the network. There have been several previous approaches, including an exact method and several heuristics, for this NP-hard problem. However, the exact method is only applicable to a limited range of data, and heuristic methods can be less accurate and also slow sometimes. In this paper, we develop a new algorithm for constructing near parsimonious networks for multiple binary gene trees. This method is more efficient for large numbers of gene trees than previous heuristics. This new method also produces more parsimonious results on many simulated datasets as well as a real biological dataset than a previous method. We also show that our method produces topologically more accurate networks for many datasets.
Post-School Articulation in Australia: A Case of Unresolved Tensions
ERIC Educational Resources Information Center
Keating, Jack
2006-01-01
Post-school education and training in Australia is based upon a binary system of universities and technical and further education (TAFE) institutes. The binary system has been fashioned through decisions that established different curriculum currencies and qualifications, sector orientations and governance, and student profiles for the two…
Poulos, H M; Camp, A E
2010-02-01
Vegetation management is a critical component of rights-of-way (ROW) maintenance for preventing electrical outages and safety hazards resulting from tree contact with conductors during storms. Northeast Utility's (NU) transmission lines are a critical element of the nation's power grid; NU is therefore under scrutiny from federal agencies charged with protecting the electrical transmission infrastructure of the United States. We developed a decision support system to focus right-of-way maintenance and minimize the potential for a tree fall episode that disables transmission capacity across the state of Connecticut. We used field data on tree characteristics to develop a system for identifying hazard trees (HTs) in the field using limited equipment to manage Connecticut power line ROW. Results from this study indicated that the tree height-to-diameter ratio, total tree height, and live crown ratio were the key characteristics that differentiated potential risk trees (danger trees) from trees with a high probability of tree fall (HTs). Products from this research can be transferred to adaptive right-of-way management, and the methods we used have great potential for future application to other regions of the United States and elsewhere where tree failure can disrupt electrical power.
Helium: lifting high-performance stencil kernels from stripped x86 binaries to halide DSL code
Mendis, Charith; Bosboom, Jeffrey; Wu, Kevin; ...
2015-06-03
Highly optimized programs are prone to bit rot, where performance quickly becomes suboptimal in the face of new hardware and compiler techniques. In this paper we show how to automatically lift performance-critical stencil kernels from a stripped x86 binary and generate the corresponding code in the high-level domain-specific language Halide. Using Halide's state-of-the-art optimizations targeting current hardware, we show that new optimized versions of these kernels can replace the originals to rejuvenate the application for newer hardware. The original optimized code for kernels in stripped binaries is nearly impossible to analyze statically. Instead, we rely on dynamic traces to regeneratemore » the kernels. We perform buffer structure reconstruction to identify input, intermediate and output buffer shapes. Here, we abstract from a forest of concrete dependency trees which contain absolute memory addresses to symbolic trees suitable for high-level code generation. This is done by canonicalizing trees, clustering them based on structure, inferring higher-dimensional buffer accesses and finally by solving a set of linear equations based on buffer accesses to lift them up to simple, high-level expressions. Helium can handle highly optimized, complex stencil kernels with input-dependent conditionals. We lift seven kernels from Adobe Photoshop giving a 75 % performance improvement, four kernels from Irfan View, leading to 4.97 x performance, and one stencil from the mini GMG multigrid benchmark netting a 4.25 x improvement in performance. We manually rejuvenated Photoshop by replacing eleven of Photoshop's filters with our lifted implementations, giving 1.12 x speedup without affecting the user experience.« less
Prediction of the compression ratio for municipal solid waste using decision tree.
Heshmati R, Ali Akbar; Mokhtari, Maryam; Shakiba Rad, Saeed
2014-01-01
The compression ratio of municipal solid waste (MSW) is an essential parameter for evaluation of waste settlement and landfill design. However, no appropriate model has been proposed to estimate the waste compression ratio so far. In this study, a decision tree method was utilized to predict the waste compression ratio (C'c). The tree was constructed using Quinlan's M5 algorithm. A reliable database retrieved from the literature was used to develop a practical model that relates C'c to waste composition and properties, including dry density, dry weight water content, and percentage of biodegradable organic waste using the decision tree method. The performance of the developed model was examined in terms of different statistical criteria, including correlation coefficient, root mean squared error, mean absolute error and mean bias error, recommended by researchers. The obtained results demonstrate that the suggested model is able to evaluate the compression ratio of MSW effectively.
Huben, Neil; Hussein, Ahmed; May, Paul; Whittum, Michelle; Kraswowki, Collin; Ahmed, Youssef; Jing, Zhe; Khan, Hijab; Kim, Hyung; Schwaab, Thomas; Underwood Iii, Willie; Kauffman, Eric; Mohler, James L; Guru, Khurshid A
2018-04-10
To develop a methodology for predicting operative times for robot-assisted radical prostatectomy (RARP) using preoperative patient, disease, procedural and surgeon variables to facilitate operating room (OR) scheduling. The model included preoperative metrics: BMI, ASA score, clinical stage, National Comprehensive Cancer Network (NCCN) risk, prostate weight, nerve-sparing status, extent and laterality of lymph node dissection, and operating surgeon (6 surgeons were included in the study). A binary decision tree was fit using a conditional inference tree method to predict operative times. The variables most associated with operative time were determined using permutation tests. The data was split at the value of the variable that results in the largest difference in means for surgical time across the split. This process was repeated recursively on the resultant data. 1709 RARPs were included. The variable most strongly associated with operative time was the surgeon (surgeons 2 and 4 - 102 minutes shorter than surgeons 1, 3, 5, and 6, p<0.001). Among surgeons 2 and 4, BMI had the strongest association with surgical time (p<0.001). Among patients operated by surgeons 1, 3, 5 and 6, RARP time was again most strongly associated with the surgeon performing RARP. Surgeons 1, 3, and 6 were on average 76 minutes faster than surgeon 5 (p<0.001). The regression tree output in the form of box plots showed operative time median and ranges according to patient, disease, procedural and surgeon metrics. We developed a methodology that can predict operative times for RARP based on patient, disease and surgeon variables. This methodology can be utilized for quality control, facilitate OR scheduling and maximize OR efficiency.
What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis
ERIC Educational Resources Information Center
Thomas, Emily H.; Galambos, Nora
2004-01-01
To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…
NASA Astrophysics Data System (ADS)
Luo, Qiu; Xin, Wu; Qiming, Xiong
2017-06-01
In the process of vegetation remote sensing information extraction, the problem of phenological features and low performance of remote sensing analysis algorithm is not considered. To solve this problem, the method of remote sensing vegetation information based on EVI time-series and the classification of decision-tree of multi-source branch similarity is promoted. Firstly, to improve the time-series stability of recognition accuracy, the seasonal feature of vegetation is extracted based on the fitting span range of time-series. Secondly, the decision-tree similarity is distinguished by adaptive selection path or probability parameter of component prediction. As an index, it is to evaluate the degree of task association, decide whether to perform migration of multi-source decision tree, and ensure the speed of migration. Finally, the accuracy of classification and recognition of pests and diseases can reach 87%--98% of commercial forest in Dalbergia hainanensis, which is significantly better than that of MODIS coverage accuracy of 80%--96% in this area. Therefore, the validity of the proposed method can be verified.
Blooming Trees: Substructures and Surrounding Groups of Galaxy Clusters
NASA Astrophysics Data System (ADS)
Yu, Heng; Diaferio, Antonaldo; Serra, Ana Laura; Baldi, Marco
2018-06-01
We develop the Blooming Tree Algorithm, a new technique that uses spectroscopic redshift data alone to identify the substructures and the surrounding groups of galaxy clusters, along with their member galaxies. Based on the estimated binding energy of galaxy pairs, the algorithm builds a binary tree that hierarchically arranges all of the galaxies in the field of view. The algorithm searches for buds, corresponding to gravitational potential minima on the binary tree branches; for each bud, the algorithm combines the number of galaxies, their velocity dispersion, and their average pairwise distance into a parameter that discriminates between the buds that do not correspond to any substructure or group, and thus eventually die, and the buds that correspond to substructures and groups, and thus bloom into the identified structures. We test our new algorithm with a sample of 300 mock redshift surveys of clusters in different dynamical states; the clusters are extracted from a large cosmological N-body simulation of a ΛCDM model. We limit our analysis to substructures and surrounding groups identified in the simulation with mass larger than 1013 h ‑1 M ⊙. With mock redshift surveys with 200 galaxies within 6 h ‑1 Mpc from the cluster center, the technique recovers 80% of the real substructures and 60% of the surrounding groups; in 57% of the identified structures, at least 60% of the member galaxies of the substructures and groups belong to the same real structure. These results improve by roughly a factor of two the performance of the best substructure identification algorithm currently available, the σ plateau algorithm, and suggest that our Blooming Tree Algorithm can be an invaluable tool for detecting substructures of galaxy clusters and investigating their complex dynamics.
On the Number of Non-equivalent Ancestral Configurations for Matching Gene Trees and Species Trees.
Disanto, Filippo; Rosenberg, Noah A
2017-09-14
An ancestral configuration is one of the combinatorially distinct sets of gene lineages that, for a given gene tree, can reach a given node of a specified species tree. Ancestral configurations have appeared in recursive algebraic computations of the conditional probability that a gene tree topology is produced under the multispecies coalescent model for a given species tree. For matching gene trees and species trees, we study the number of ancestral configurations, considered up to an equivalence relation introduced by Wu (Evolution 66:763-775, 2012) to reduce the complexity of the recursive probability computation. We examine the largest number of non-equivalent ancestral configurations possible for a given tree size n. Whereas the smallest number of non-equivalent ancestral configurations increases polynomially with n, we show that the largest number increases with [Formula: see text], where k is a constant that satisfies [Formula: see text]. Under a uniform distribution on the set of binary labeled trees with a given size n, the mean number of non-equivalent ancestral configurations grows exponentially with n. The results refine an earlier analysis of the number of ancestral configurations considered without applying the equivalence relation, showing that use of the equivalence relation does not alter the exponential nature of the increase with tree size.
Multiprocessor sparse L/U decomposition with controlled fill-in
NASA Technical Reports Server (NTRS)
Alaghband, G.; Jordan, H. F.
1985-01-01
Generation of the maximal compatibles of pivot elements for a class of small sparse matrices is studied. The algorithm involves a binary tree search and has a complexity exponential in the order of the matrix. Different strategies for selection of a set of compatible pivots based on the Markowitz criterion are investigated. The competing issues of parallelism and fill-in generation are studied and results are provided. A technque for obtaining an ordered compatible set directly from the ordered incompatible table is given. This technique generates a set of compatible pivots with the property of generating few fills. A new hueristic algorithm is then proposed that combines the idea of an ordered compatible set with a limited binary tree search to generate several sets of compatible pivots in linear time. Finally, an elimination set to reduce the matrix is selected. Parameters are suggested to obtain a balance between parallelism and fill-ins. Results of applying the proposed algorithms on several large application matrices are presented and analyzed.
Pak, Kyoungjune; Kim, Keunyoung; Kim, Mi-Hyun; Eom, Jung Seop; Lee, Min Ki; Cho, Jeong Su; Kim, Yun Seong; Kim, Bum Soo; Kim, Seong Jang; Kim, In Joo
2018-01-01
We aimed to develop a decision tree model to improve diagnostic performance of positron emission tomography/computed tomography (PET/CT) to detect metastatic lymph nodes (LN) in non-small cell lung cancer (NSCLC). 115 patients with NSCLC were included in this study. The training dataset included 66 patients. A decision tree model was developed with 9 variables, and validated with 49 patients: short and long diameters of LNs, ratio of short and long diameters, maximum standardized uptake value (SUVmax) of LN, mean hounsfield unit, ratio of LN SUVmax and ascending aorta SUVmax (LN/AA), and ratio of LN SUVmax and superior vena cava SUVmax. A total of 301 LNs of 115 patients were evaluated in this study. Nodular calcification was applied as the initial imaging parameter, and LN SUVmax (≥3.95) was assessed as the second. LN/AA (≥2.92) was required to high LN SUVmax. Sensitivity was 50% for training dataset, and 40% for validation dataset. However, specificity was 99.28% for training dataset, and 96.23% for validation dataset. In conclusion, we have developed a new decision tree model for interpreting mediastinal LNs. All LNs with nodular calcification were benign, and LNs with high LN SUVmax and high LN/AA were metastatic Further studies are needed to incorporate subjective parameters and pathologic evaluations into a decision tree model to improve the test performance of PET/CT.
Amirabadizadeh, Alireza; Nezami, Hossein; Vaughn, Michael G; Nakhaee, Samaneh; Mehrpour, Omid
2018-05-12
Substance abuse exacts considerable social and health care burdens throughout the world. The aim of this study was to create a prediction model to better identify risk factors for drug use. A prospective cross-sectional study was conducted in South Khorasan Province, Iran. Of the total of 678 eligible subjects, 70% (n: 474) were randomly selected to provide a training set for constructing decision tree and multiple logistic regression (MLR) models. The remaining 30% (n: 204) were employed in a holdout sample to test the performance of the decision tree and MLR models. Predictive performance of different models was analyzed by the receiver operating characteristic (ROC) curve using the testing set. Independent variables were selected from demographic characteristics and history of drug use. For the decision tree model, the sensitivity and specificity for identifying people at risk for drug abuse were 66% and 75%, respectively, while the MLR model was somewhat less effective at 60% and 73%. Key independent variables in the analyses included first substance experience, age at first drug use, age, place of residence, history of cigarette use, and occupational and marital status. While study findings are exploratory and lack generalizability they do suggest that the decision tree model holds promise as an effective classification approach for identifying risk factors for drug use. Convergent with prior research in Western contexts is that age of drug use initiation was a critical factor predicting a substance use disorder.
Phan, Thanh G; Chen, Jian; Singhal, Shaloo; Ma, Henry; Clissold, Benjamin B; Ly, John; Beare, Richard
2018-01-01
Prognostication following hypoxic ischemic encephalopathy (brain injury) is important for clinical management. The aim of this exploratory study is to use a decision tree model to find clinical and MRI associates of severe disability and death in this condition. We evaluate clinical model and then the added value of MRI data. The inclusion criteria were as follows: age ≥17 years, cardio-respiratory arrest, and coma on admission (2003-2011). Decision tree analysis was used to find clinical [Glasgow Coma Score (GCS), features about cardiac arrest, therapeutic hypothermia, age, and sex] and MRI (infarct volume) associates of severe disability and death. We used the area under the ROC (auROC) to determine accuracy of model. There were 41 (63.7% males) patients having MRI imaging with the average age 51.5 ± 18.9 years old. The decision trees showed that infarct volume and age were important factors for discrimination between mild to moderate disability and severe disability and death at day 0 and day 2. The auROC for this model was 0.94 (95% CI 0.82-1.00). At day 7, GCS value was the only predictor; the auROC was 0.96 (95% CI 0.86-1.00). Our findings provide proof of concept for further exploration of the role of MR imaging and decision tree analysis in the early prognostication of hypoxic ischemic brain injury.
NASA Astrophysics Data System (ADS)
Shinya, A.; Ishihara, T.; Inoue, K.; Nozaki, K.; Kita, S.; Notomi, M.
2018-02-01
We propose an optical parallel adder based on a binary decision diagram that can calculate simply by propagating light through electrically controlled optical pass gates. The CARRY and CARRY operations are multiplexed in one circuit by a wavelength division multiplexing scheme to reduce the number of optical elements, and only a single gate constitutes the critical path for one digit calculation. The processing time reaches picoseconds per digit when we use a 100-μm-long optical path gates, which is ten times faster than a CMOS circuit.
Otsuka, Momoka; Uchida, Yuki; Kawaguchi, Takumi; Taniguchi, Eitaro; Kawaguchi, Atsushi; Kitani, Shingo; Itou, Minoru; Oriishi, Tetsuharu; Kakuma, Tatsuyuki; Tanaka, Suiko; Yagi, Minoru; Sata, Michio
2012-10-01
Dietary habits are involved in the development of chronic inflammation; however, the impact of dietary profiles of hepatitis C virus carriers with persistently normal alanine transaminase levels (HCV-PNALT) remains unclear. The decision-tree algorithm is a data-mining statistical technique, which uncovers meaningful profiles of factors from a data collection. We aimed to investigate dietary profiles associated with HCV-PNALT using a decision-tree algorithm. Twenty-seven HCV-PNALT and 41 patients with chronic hepatitis C were enrolled in this study. Dietary habit was assessed using a validated semiquantitative food frequency questionnaire. A decision-tree algorithm was created by dietary variables, and was evaluated by area under the receiver operating characteristic curve analysis (AUROC). In multivariate analysis, fish to meat ratio, dairy product and cooking oils were identified as independent variables associated with HCV-PNALT. The decision-tree algorithm was created with two variables: a fish to meat ratio and cooking oils/ideal bodyweight. When subjects showed a fish to meat ratio of 1.24 or more, 68.8% of the subjects were HCV-PNALT. On the other hand, 11.5% of the subjects were HCV-PNALT when subjects showed a fish to meat ratio of less than 1.24 and cooking oil/ideal bodyweight of less than 0.23 g/kg. The difference in the proportion of HCV-PNALT between these groups are significant (odds ratio 16.87, 95% CI 3.40-83.67, P = 0.0005). Fivefold cross-validation of the decision-tree algorithm showed an AUROC of 0.6947 (95% CI 0.5656-0.8238, P = 0.0067). The decision-tree algorithm disclosed that fish to meat ratio and cooking oil/ideal bodyweight were associated with HCV-PNALT. © 2012 The Japan Society of Hepatology.
Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems
NASA Astrophysics Data System (ADS)
Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen
Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.
Aguirre-Junco, Angel-Ricardo; Colombet, Isabelle; Zunino, Sylvain; Jaulent, Marie-Christine; Leneveut, Laurence; Chatellier, Gilles
2004-01-01
The initial step for the computerization of guidelines is the knowledge specification from the prose text of guidelines. We describe a method of knowledge specification based on a structured and systematic analysis of text allowing detailed specification of a decision tree. We use decision tables to validate the decision algorithm and decision trees to specify and represent this algorithm, along with elementary messages of recommendation. Edition tools are also necessary to facilitate the process of validation and workflow between expert physicians who will validate the specified knowledge and computer scientist who will encode the specified knowledge in a guide-line model. Applied to eleven different guidelines issued by an official agency, the method allows a quick and valid computerization and integration in a larger decision support system called EsPeR (Personalized Estimate of Risks). The quality of the text guidelines is however still to be developed further. The method used for computerization could help to define a framework usable at the initial step of guideline development in order to produce guidelines ready for electronic implementation.
Verbakel, Jan Y; Lemiengre, Marieke B; De Burghgraeve, Tine; De Sutter, An; Aertgeerts, Bert; Bullens, Dominique M A; Shinkins, Bethany; Van den Bruel, Ann; Buntinx, Frank
2015-08-07
Acute infection is the most common presentation of children in primary care with only few having a serious infection (eg, sepsis, meningitis, pneumonia). To avoid complications or death, early recognition and adequate referral are essential. Clinical prediction rules have the potential to improve diagnostic decision-making for rare but serious conditions. In this study, we aimed to validate a recently developed decision tree in a new but similar population. Diagnostic accuracy study validating a clinical prediction rule. Acutely ill children presenting to ambulatory care in Flanders, Belgium, consisting of general practice and paediatric assessment in outpatient clinics or the emergency department. Physicians were asked to score the decision tree in every child. The outcome of interest was hospital admission for at least 24 h with a serious infection within 5 days after initial presentation. We report the diagnostic accuracy of the decision tree in sensitivity, specificity, likelihood ratios and predictive values. In total, 8962 acute illness episodes were included, of which 283 lead to admission to hospital with a serious infection. Sensitivity of the decision tree was 100% (95% CI 71.5% to 100%) at a specificity of 83.6% (95% CI 82.3% to 84.9%) in the general practitioner setting with 17% of children testing positive. In the paediatric outpatient and emergency department setting, sensitivities were below 92%, with specificities below 44.8%. In an independent validation cohort, this clinical prediction rule has shown to be extremely sensitive to identify children at risk of hospital admission for a serious infection in general practice, making it suitable for ruling out. NCT02024282. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Decay fungi of oaks and associated hardwoods for western arborists
Jessie A. Glaeser; Kevin T. Smith
2010-01-01
Examination of trees for the presence and extent of decay should be part of any hazard tree assessment. Identification of the fungi responsible for the decay improves prediction of tree performance and the quality of management decisions, including tree pruning or removal. Scouting for Sudden Oak Death (SOD) in the West has drawn attention to hardwood tree species,...
1990-04-01
focus of attention ). The inherent local control in the FA/C model allows it to achieve just that, since it only requires a global goal to become...Computing Terms Agent Modelling : is concerned with modelling actor’s intentions and plans, and their modification in the light of information... model or program that is based on a mathematical system of logic. B-tree : or "binary-tree" is a self organising storage mechanism that works by taking
Cheaib, Alissar; Badeau, Vincent; Boe, Julien; Chuine, Isabelle; Delire, Christine; Dufrêne, Eric; François, Christophe; Gritti, Emmanuel S; Legay, Myriam; Pagé, Christian; Thuiller, Wilfried; Viovy, Nicolas; Leadley, Paul
2012-06-01
Model-based projections of shifts in tree species range due to climate change are becoming an important decision support tool for forest management. However, poorly evaluated sources of uncertainty require more scrutiny before relying heavily on models for decision-making. We evaluated uncertainty arising from differences in model formulations of tree response to climate change based on a rigorous intercomparison of projections of tree distributions in France. We compared eight models ranging from niche-based to process-based models. On average, models project large range contractions of temperate tree species in lowlands due to climate change. There was substantial disagreement between models for temperate broadleaf deciduous tree species, but differences in the capacity of models to account for rising CO(2) impacts explained much of the disagreement. There was good quantitative agreement among models concerning the range contractions for Scots pine. For the dominant Mediterranean tree species, Holm oak, all models foresee substantial range expansion. © 2012 Blackwell Publishing Ltd/CNRS.
A multivariate decision tree analysis of biophysical factors in tropical forest fire occurrence
Rey S. Ofren; Edward Harvey
2000-01-01
A multivariate decision tree model was used to quantify the relative importance of complex hierarchical relationships between biophysical variables and the occurrence of tropical forest fires. The study site is the Huai Kha Kbaeng wildlife sanctuary, a World Heritage Site in northwestern Thailand where annual fires are common and particularly destructive. Thematic...
Which Types of Leadership Styles Do Followers Prefer? A Decision Tree Approach
ERIC Educational Resources Information Center
Salehzadeh, Reza
2017-01-01
Purpose: The purpose of this paper is to propose a new method to find the appropriate leadership styles based on the followers' preferences using the decision tree technique. Design/methodology/approach: Statistical population includes the students of the University of Isfahan. In total, 750 questionnaires were distributed; out of which, 680…
The Americans with Disabilities Act: A Decision Tree for Social Services Administrators
ERIC Educational Resources Information Center
O'Brien, Gerald V.; Ellegood, Christina
2005-01-01
The 1990 Americans with Disabilities Act has had a profound influence on social workers and social services administrators in virtually all work settings. Because of the multiple elements of the act, however, assessing the validity of claims can be a somewhat arduous and complicated task. This article provides a "decision tree" for…
ERIC Educational Resources Information Center
Hwang, Gwo-Jen; Chu, Hui-Chun; Shih, Ju-Ling; Huang, Shu-Hsien; Tsai, Chin-Chung
2010-01-01
A context-aware ubiquitous learning environment is an authentic learning environment with personalized digital supports. While showing the potential of applying such a learning environment, researchers have also indicated the challenges of providing adaptive and dynamic support to individual students. In this paper, a decision-tree-oriented…
A decision tree approach using silvics to guide planning for forest restoration
Sharon M. Hermann; John S. Kush; John C. Gilbert
2013-01-01
We created a decision tree based on silvics of longleaf pine (Pinus palustris) and historical descriptions to develop approaches for restoration management at Horseshoe Bend National Military Park located in central Alabama. A National Park Service goal is to promote structure and composition of a forest that likely surrounded the 1814 battlefield....
ERIC Educational Resources Information Center
Thomas, Emily H.; Galambos, Nora
To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…
Vergara, Pablo M.; Soto, Gerardo E.; Rodewald, Amanda D.; Meneses, Luis O.; Pérez-Hernández, Christian G.
2016-01-01
Theoretical models predict that animals should make foraging decisions after assessing the quality of available habitat, but most models fail to consider the spatio-temporal scales at which animals perceive habitat availability. We tested three foraging strategies that explain how Magellanic woodpeckers (Campephilus magellanicus) assess the relative quality of trees: 1) Woodpeckers with local knowledge select trees based on the available trees in the immediate vicinity. 2) Woodpeckers lacking local knowledge select trees based on their availability at previously visited locations. 3) Woodpeckers using information from long-term memory select trees based on knowledge about trees available within the entire landscape. We observed foraging woodpeckers and used a Brownian Bridge Movement Model to identify trees available to woodpeckers along foraging routes. Woodpeckers selected trees with a later decay stage than available trees. Selection models indicated that preferences of Magellanic woodpeckers were based on clusters of trees near the most recently visited trees, thus suggesting that woodpeckers use visual cues from neighboring trees. In a second analysis, Cox’s proportional hazards models showed that woodpeckers used information consolidated across broader spatial scales to adjust tree residence times. Specifically, woodpeckers spent more time at trees with larger diameters and in a more advanced stage of decay than trees available along their routes. These results suggest that Magellanic woodpeckers make foraging decisions based on the relative quality of trees that they perceive and memorize information at different spatio-temporal scales. PMID:27416115
Vergara, Pablo M; Soto, Gerardo E; Moreira-Arce, Darío; Rodewald, Amanda D; Meneses, Luis O; Pérez-Hernández, Christian G
2016-01-01
Theoretical models predict that animals should make foraging decisions after assessing the quality of available habitat, but most models fail to consider the spatio-temporal scales at which animals perceive habitat availability. We tested three foraging strategies that explain how Magellanic woodpeckers (Campephilus magellanicus) assess the relative quality of trees: 1) Woodpeckers with local knowledge select trees based on the available trees in the immediate vicinity. 2) Woodpeckers lacking local knowledge select trees based on their availability at previously visited locations. 3) Woodpeckers using information from long-term memory select trees based on knowledge about trees available within the entire landscape. We observed foraging woodpeckers and used a Brownian Bridge Movement Model to identify trees available to woodpeckers along foraging routes. Woodpeckers selected trees with a later decay stage than available trees. Selection models indicated that preferences of Magellanic woodpeckers were based on clusters of trees near the most recently visited trees, thus suggesting that woodpeckers use visual cues from neighboring trees. In a second analysis, Cox's proportional hazards models showed that woodpeckers used information consolidated across broader spatial scales to adjust tree residence times. Specifically, woodpeckers spent more time at trees with larger diameters and in a more advanced stage of decay than trees available along their routes. These results suggest that Magellanic woodpeckers make foraging decisions based on the relative quality of trees that they perceive and memorize information at different spatio-temporal scales.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kupriyanov, M. S., E-mail: mikhail.kupriyanov@gmail.com; Shukeilo, E. Y., E-mail: eyshukeylo@gmail.com; Shichkina, J. A., E-mail: strange.y@mail.ru
2015-11-17
Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient’s health condition using data from a wearable device considers in this article.
NASA Astrophysics Data System (ADS)
Kupriyanov, M. S.; Shukeilo, E. Y.; Shichkina, J. A.
2015-11-01
Nowadays technologies which are used in traumatology are a combination of mechanical, electronic, calculating and programming tools. Relevance of development of mobile applications for an expeditious data processing which are received from medical devices (in particular, wearable devices), and formulation of management decisions increases. Using of a mathematical method of building of decision trees for an assessment of a patient's health condition using data from a wearable device considers in this article.
Protein attributes contribute to halo-stability, bioinformatics approach
2011-01-01
Halophile proteins can tolerate high salt concentrations. Understanding halophilicity features is the first step toward engineering halostable crops. To this end, we examined protein features contributing to the halo-toleration of halophilic organisms. We compared more than 850 features for halophilic and non-halophilic proteins with various screening, clustering, decision tree, and generalized rule induction models to search for patterns that code for halo-toleration. Up to 251 protein attributes selected by various attribute weighting algorithms as important features contribute to halo-stability; from them 14 attributes selected by 90% of models and the count of hydrogen gained the highest value (1.0) in 70% of attribute weighting models, showing the importance of this attribute in feature selection modeling. The other attributes mostly were the frequencies of di-peptides. No changes were found in the numbers of groups when K-Means and TwoStep clustering modeling were performed on datasets with or without feature selection filtering. Although the depths of induced trees were not high, the accuracies of trees were higher than 94% and the frequency of hydrophobic residues pointed as the most important feature to build trees. The performance evaluation of decision tree models had the same values and the best correctness percentage recorded with the Exhaustive CHAID and CHAID models. We did not find any significant difference in the percent of correctness, performance evaluation, and mean correctness of various decision tree models with or without feature selection. For the first time, we analyzed the performance of different screening, clustering, and decision tree algorithms for discriminating halophilic and non-halophilic proteins and the results showed that amino acid composition can be used to discriminate between halo-tolerant and halo-sensitive proteins. PMID:21592393
Classification tree for the assessment of sedentary lifestyle among hypertensive.
Castelo Guedes Martins, Larissa; Venícios de Oliveira Lopes, Marcos; Gomes Guedes, Nirla; Paixão de Menezes, Angélica; de Oliveira Farias, Odaleia; Alves Dos Santos, Naftale
2016-04-01
To develop a classification tree of clinical indicators for the correct prediction of the nursing diagnosis "Sedentary lifestyle" (SL) in people with high blood pressure (HTN). A cross-sectional study conducted in an outpatient care center specializing in high blood pressure and Mellitus diabetes located in northeastern Brazil. The sample consisted of 285 people between 19 and 59 years old diagnosed with high blood pressure and was applied an interview and physical examination, obtaining socio-demographic information, related factors and signs and symptoms that made the defining characteristics for the diagnosis under study. The tree was generated using the CHAID algorithm (Chi-square Automatic Interaction Detection). The construction of the decision tree allowed establishing the interactions between clinical indicators that facilitate a probabilistic analysis of multiple situations allowing quantify the probability of an individual presenting a sedentary lifestyle. The tree included the clinical indicator Choose daily routine without exercise as the first node. People with this indicator showed a probability of 0.88 of presenting the SL. The second node was composed of the indicator Does not perform physical activity during leisure, with 0.99 probability of presenting the SL with these two indicators. The predictive capacity of the tree was established at 69.5%. Decision trees help nurses who care HTN people in decision-making in assessing the characteristics that increase the probability of SL nursing diagnosis, optimizing the time for diagnostic inference.
NASA Technical Reports Server (NTRS)
Tian, Jianhui; Porter, Adam; Zelkowitz, Marvin V.
1992-01-01
Identification of high cost modules has been viewed as one mechanism to improve overall system reliability, since such modules tend to produce more than their share of problems. A decision tree model was used to identify such modules. In this current paper, a previously developed axiomatic model of program complexity is merged with the previously developed decision tree process for an improvement in the ability to identify such modules. This improvement was tested using data from the NASA Software Engineering Laboratory.
On defining a unique phylogenetic tree with homoplastic characters.
Goloboff, Pablo A; Wilkinson, Mark
2018-05-01
This paper discusses the problem of whether creating a matrix with all the character state combinations that have a fixed number of steps (or extra steps) on a given tree T, produces the same tree T when analyzed with maximum parsimony or maximum likelihood. Exhaustive enumeration of cases up to 20 taxa for binary characters, and up to 12 taxa for 4-state characters, shows that the same tree is recovered (as unique most likely or most parsimonious tree) as long as the number of extra steps is within 1/4 of the number of taxa. This dependence, 1/4 of the number of taxa, is discussed with a general argumentation, in terms of the spread of the character changes on the tree used to select character state distributions. The present finding allows creating matrices which have as much homoplasy as possible for the most parsimonious or likely tree to be predictable, and examination of these matrices with hill-climbing search algorithms provides additional evidence on the (lack of a) necessary relationship between homoplasy and the ability of search methods to find optimal trees. Copyright © 2018 Elsevier Inc. All rights reserved.
A key for the Forest Service hardwood tree grades
Gary W. Miller; Leland F. Hanks; Harry V., Jr. Wiant
1986-01-01
A dichotomous key organizes the USDA Forest Service hardwood tree grade specifications into a stepwise procedure for those learning to grade hardwood sawtimber. The key addresses the major grade factors, tree size, surface characteristics, and allowable cull deductions in a series of paried choices that lead the user to a decision regarding tree grade.
Inferences from growing trees backwards
David W. Green; Kent A. McDonald
1997-01-01
The objective of this paper is to illustrate how longitudinal stress wave techniques can be useful in tracking the future quality of a growing tree. Monitoring the quality of selected trees in a plantation forest could provide early input to decisions on the effectiveness of management practices, or future utilization options, for trees in a plantation. There will...
Morales, Susana; Barros, Jorge; Echávarri, Orietta; García, Fabián; Osses, Alex; Moya, Claudia; Maino, María Paz; Fischman, Ronit; Núñez, Catalina; Szmulewicz, Tita; Tomicic, Alemka
2017-01-01
In efforts to develop reliable methods to detect the likelihood of impending suicidal behaviors, we have proposed the following. To gain a deeper understanding of the state of suicide risk by determining the combination of variables that distinguishes between groups with and without suicide risk. A study involving 707 patients consulting for mental health issues in three health centers in Greater Santiago, Chile. Using 345 variables, an analysis was carried out with artificial intelligence tools, Cross Industry Standard Process for Data Mining processes, and decision tree techniques. The basic algorithm was top-down, and the most suitable division produced by the tree was selected by using the lowest Gini index as a criterion and by looping it until the condition of belonging to the group with suicidal behavior was fulfilled. Four trees distinguishing the groups were obtained, of which the elements of one were analyzed in greater detail, since this tree included both clinical and personality variables. This specific tree consists of six nodes without suicide risk and eight nodes with suicide risk (tree decision 01, accuracy 0.674, precision 0.652, recall 0.678, specificity 0.670, F measure 0.665, receiver operating characteristic (ROC) area under the curve (AUC) 73.35%; tree decision 02, accuracy 0.669, precision 0.642, recall 0.694, specificity 0.647, F measure 0.667, ROC AUC 68.91%; tree decision 03, accuracy 0.681, precision 0.675, recall 0.638, specificity 0.721, F measure, 0.656, ROC AUC 65.86%; tree decision 04, accuracy 0.714, precision 0.734, recall 0.628, specificity 0.792, F measure 0.677, ROC AUC 58.85%). This study defines the interactions among a group of variables associated with suicidal ideation and behavior. By using these variables, it may be possible to create a quick and easy-to-use tool. As such, psychotherapeutic interventions could be designed to mitigate the impact of these variables on the emotional state of individuals, thereby reducing eventual risk of suicide. Such interventions may reinforce psychological well-being, feelings of self-worth, and reasons for living, for each individual in certain groups of patients.
NASA Astrophysics Data System (ADS)
Kaur, Parneet; Singh, Sukhwinder; Garg, Sushil; Harmanpreet
2010-11-01
In this paper we study about classification algorithms for farm DSS. By applying classification algorithms i.e. Limited search, ID3, CHAID, C4.5, Improved C4.5 and One VS all Decision Tree on common data set of crop with specified class, results are obtained. The tool used to derive results is SPINA. The graphical results obtained from tool are compared to suggest best technique to develop farm Decision Support System. This analysis would help to researchers to design effective and fast DSS for farmer to take decision for enhancing their yield.
Uninjured trees - a meaningful guide to white-pine weevil control decisions
William E. Waters
1962-01-01
The white-pine weevil, Pissodes strobi, is a particularly insidious forest pest that can render a stand of host trees virtually worthless. It rarely, if ever, kills a tree; but the crooks, forks, and internal defects that develop in attacked trees over a period of years may reduce the merchantable volume and value of the tree at harvest age to zero. Dollar losses are...
Compensatory value of urban trees in the United States
David J. Nowak; Daniel E. Crane; John F. Dwyer
2002-01-01
Understanding the value of an urban forest can give decision makers a better foundation for urban tree namagement. Based on tree-valuation methods of the Council of Tree and Landscape Appraisers and field data from eight cities, total compensatory value of tree populations in U.S. cities ranges from $101 million in Jersey City, New Jersey, to $6.2 billion in New York,...
A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks.
Alauthaman, Mohammad; Aslam, Nauman; Zhang, Li; Alasem, Rafe; Hossain, M A
2018-01-01
In recent years, Botnets have been adopted as a popular method to carry and spread many malicious codes on the Internet. These malicious codes pave the way to execute many fraudulent activities including spam mail, distributed denial-of-service attacks and click fraud. While many Botnets are set up using centralized communication architecture, the peer-to-peer (P2P) Botnets can adopt a decentralized architecture using an overlay network for exchanging command and control data making their detection even more difficult. This work presents a method of P2P Bot detection based on an adaptive multilayer feed-forward neural network in cooperation with decision trees. A classification and regression tree is applied as a feature selection technique to select relevant features. With these features, a multilayer feed-forward neural network training model is created using a resilient back-propagation learning algorithm. A comparison of feature set selection based on the decision tree, principal component analysis and the ReliefF algorithm indicated that the neural network model with features selection based on decision tree has a better identification accuracy along with lower rates of false positives. The usefulness of the proposed approach is demonstrated by conducting experiments on real network traffic datasets. In these experiments, an average detection rate of 99.08 % with false positive rate of 0.75 % was observed.
In Search of Speedier Searches.
ERIC Educational Resources Information Center
Peterson, Ivars
1984-01-01
Methods to make computer searching as simple and efficient as possible have led to the development of various data structures. Data structures specify the items involved in searching and what can be done to them. The nature and advantages of using "self-adjusting" data structures (self-adjusting binary search trees) are discussed. (JN)
A new algorithm to construct phylogenetic networks from trees.
Wang, J
2014-03-06
Developing appropriate methods for constructing phylogenetic networks from tree sets is an important problem, and much research is currently being undertaken in this area. BIMLR is an algorithm that constructs phylogenetic networks from tree sets. The algorithm can construct a much simpler network than other available methods. Here, we introduce an improved version of the BIMLR algorithm, QuickCass. QuickCass changes the selection strategy of the labels of leaves below the reticulate nodes, i.e., the nodes with an indegree of at least 2 in BIMLR. We show that QuickCass can construct simpler phylogenetic networks than BIMLR. Furthermore, we show that QuickCass is a polynomial-time algorithm when the output network that is constructed by QuickCass is binary.
Prognostic Factors and Decision Tree for Long-term Survival in Metastatic Uveal Melanoma.
Lorenzo, Daniel; Ochoa, María; Piulats, Josep Maria; Gutiérrez, Cristina; Arias, Luis; Català, Jaum; Grau, María; Peñafiel, Judith; Cobos, Estefanía; Garcia-Bru, Pere; Rubio, Marcos Javier; Padrón-Pérez, Noel; Dias, Bruno; Pera, Joan; Caminal, Josep Maria
2017-12-04
The purpose of this study was to demonstrate the existence of a bimodal survival pattern in metastatic uveal melanoma. Secondary aims were to identify the characteristics and prognostic factors associated with long-term survival and to develop a clinical decision tree. The medical records of 99 metastatic uveal melanoma patients were retrospectively reviewed. Patients were classified as either short (≤ 12 months) or long-term survivors (> 12 months) based on a graphical interpretation of the survival curve after diagnosis of the first metastatic lesion. Ophthalmic and oncological characteristics were assessed in both groups. Of the 99 patients, 62 (62.6%) were classified as short-term survivors, and 37 (37.4%) as long-term survivors. The multivariate analysis identified the following predictors of long-term survival: age ≤ 65 years (p=0.012) and unaltered serum lactate dehydrogenase levels (p=0.018); additionally, the size (smaller vs. larger) of the largest liver metastasis showed a trend towards significance (p=0.063). Based on the variables significantly associated with long-term survival, we developed a decision tree to facilitate clinical decision-making. The findings of this study demonstrate the existence of a bimodal survival pattern in patients with metastatic uveal melanoma. The presence of certain clinical characteristics at diagnosis of distant disease is associated with long-term survival. A decision tree was developed to facilitate clinical decision-making and to counsel patients about the expected course of disease.
ERIC Educational Resources Information Center
Tansy, Michael
2009-01-01
The Emotional Disturbance Decision Tree (EDDT) is a teacher-completed norm-referenced rating scale published by Psychological Assessment Resources, Inc., in Lutz, Florida. The 156-item EDDT was developed for use as part of a broader assessment process to screen and assist in the identification of 5- to 18-year-old children for the special…
Phytotechnology Technical and Regulatory Guidance Document
2001-04-01
contaminated media is rather new. Throughout the development process of this document, we referred to the science as “ phytoremediation .” Recently...the media containing contaminants, we now refer to “phytotechnologies” as the overarching terminology, while using “ phytoremediation ” more...publication of the ITRC document, Phytoremediation Decision Tree. The decision tree was designed to allow potential users to take basic information
Özdemir, Merve Erkınay; Telatar, Ziya; Eroğul, Osman; Tunca, Yusuf
2018-05-01
Dysmorphic syndromes have different facial malformations. These malformations are significant to an early diagnosis of dysmorphic syndromes and contain distinctive information for face recognition. In this study we define the certain features of each syndrome by considering facial malformations and classify Fragile X, Hurler, Prader Willi, Down, Wolf Hirschhorn syndromes and healthy groups automatically. The reference points are marked on the face images and ratios between the points' distances are taken into consideration as features. We suggest a neural network based hierarchical decision tree structure in order to classify the syndrome types. We also implement k-nearest neighbor (k-NN) and artificial neural network (ANN) classifiers to compare classification accuracy with our hierarchical decision tree. The classification accuracy is 50, 73 and 86.7% with k-NN, ANN and hierarchical decision tree methods, respectively. Then, the same images are shown to a clinical expert who achieve a recognition rate of 46.7%. We develop an efficient system to recognize different syndrome types automatically in a simple, non-invasive imaging data, which is independent from the patient's age, sex and race at high accuracy. The promising results indicate that our method can be used for pre-diagnosis of the dysmorphic syndromes by clinical experts.
Ramezankhani, Azra; Pournik, Omid; Shahrabi, Jamal; Khalili, Davood; Azizi, Fereidoun; Hadaegh, Farzad
2014-09-01
The aim of this study was to create a prediction model using data mining approach to identify low risk individuals for incidence of type 2 diabetes, using the Tehran Lipid and Glucose Study (TLGS) database. For a 6647 population without diabetes, aged ≥20 years, followed for 12 years, a prediction model was developed using classification by the decision tree technique. Seven hundred and twenty-nine (11%) diabetes cases occurred during the follow-up. Predictor variables were selected from demographic characteristics, smoking status, medical and drug history and laboratory measures. We developed the predictive models by decision tree using 60 input variables and one output variable. The overall classification accuracy was 90.5%, with 31.1% sensitivity, 97.9% specificity; and for the subjects without diabetes, precision and f-measure were 92% and 0.95, respectively. The identified variables included fasting plasma glucose, body mass index, triglycerides, mean arterial blood pressure, family history of diabetes, educational level and job status. In conclusion, decision tree analysis, using routine demographic, clinical, anthropometric and laboratory measurements, created a simple tool to predict individuals at low risk for type 2 diabetes. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Intelligent Diagnostic Assistant for Complicated Skin Diseases through C5's Algorithm.
Jeddi, Fatemeh Rangraz; Arabfard, Masoud; Kermany, Zahra Arab
2017-09-01
Intelligent Diagnostic Assistant can be used for complicated diagnosis of skin diseases, which are among the most common causes of disability. The aim of this study was to design and implement a computerized intelligent diagnostic assistant for complicated skin diseases through C5's Algorithm. An applied-developmental study was done in 2015. Knowledge base was developed based on interviews with dermatologists through questionnaires and checklists. Knowledge representation was obtained from the train data in the database using Excel Microsoft Office. Clementine Software and C5's Algorithms were applied to draw the decision tree. Analysis of test accuracy was performed based on rules extracted using inference chains. The rules extracted from the decision tree were entered into the CLIPS programming environment and the intelligent diagnostic assistant was designed then. The rules were defined using forward chaining inference technique and were entered into Clips programming environment as RULE. The accuracy and error rates obtained in the training phase from the decision tree were 99.56% and 0.44%, respectively. The accuracy of the decision tree was 98% and the error was 2% in the test phase. Intelligent diagnostic assistant can be used as a reliable system with high accuracy, sensitivity, specificity, and agreement.
Data mining for multiagent rules, strategies, and fuzzy decision tree structure
NASA Astrophysics Data System (ADS)
Smith, James F., III; Rhyne, Robert D., II; Fisher, Kristin
2002-03-01
A fuzzy logic based resource manager (RM) has been developed that automatically allocates electronic attack resources in real-time over many dissimilar platforms. Two different data mining algorithms have been developed to determine rules, strategies, and fuzzy decision tree structure. The first data mining algorithm uses a genetic algorithm as a data mining function and is called from an electronic game. The game allows a human expert to play against the resource manager in a simulated battlespace with each of the defending platforms being exclusively directed by the fuzzy resource manager and the attacking platforms being controlled by the human expert or operating autonomously under their own logic. This approach automates the data mining problem. The game automatically creates a database reflecting the domain expert's knowledge. It calls a data mining function, a genetic algorithm, for data mining of the database as required and allows easy evaluation of the information mined in the second step. The criterion for re- optimization is discussed as well as experimental results. Then a second data mining algorithm that uses a genetic program as a data mining function is introduced to automatically discover fuzzy decision tree structures. Finally, a fuzzy decision tree generated through this process is discussed.
Amini, Payam; Maroufizadeh, Saman; Samani, Reza Omani; Hamidi, Omid; Sepidarkish, Mahdi
2017-06-01
Preterm birth (PTB) is a leading cause of neonatal death and the second biggest cause of death in children under five years of age. The objective of this study was to determine the prevalence of PTB and its associated factors using logistic regression and decision tree classification methods. This cross-sectional study was conducted on 4,415 pregnant women in Tehran, Iran, from July 6-21, 2015. Data were collected by a researcher-developed questionnaire through interviews with mothers and review of their medical records. To evaluate the accuracy of the logistic regression and decision tree methods, several indices such as sensitivity, specificity, and the area under the curve were used. The PTB rate was 5.5% in this study. The logistic regression outperformed the decision tree for the classification of PTB based on risk factors. Logistic regression showed that multiple pregnancies, mothers with preeclampsia, and those who conceived with assisted reproductive technology had an increased risk for PTB ( p < 0.05). Identifying and training mothers at risk as well as improving prenatal care may reduce the PTB rate. We also recommend that statisticians utilize the logistic regression model for the classification of risk groups for PTB.
A Conditional Curie-Weiss Model for Stylized Multi-group Binary Choice with Social Interaction
NASA Astrophysics Data System (ADS)
Opoku, Alex Akwasi; Edusei, Kwame Owusu; Ansah, Richard Kwame
2018-04-01
This paper proposes a conditional Curie-Weiss model as a model for decision making in a stylized society made up of binary decision makers that face a particular dichotomous choice between two options. Following Brock and Durlauf (Discrete choice with social interaction I: theory, 1955), we set-up both socio-economic and statistical mechanical models for the choice problem. We point out when both the socio-economic and statistical mechanical models give rise to the same self-consistent equilibrium mean choice level(s). Phase diagram of the associated statistical mechanical model and its socio-economic implications are discussed.
Ultra Low Energy Binary Decision Diagram Circuits Using Few Electron Transistors
NASA Astrophysics Data System (ADS)
Saripalli, Vinay; Narayanan, Vijay; Datta, Suman
Novel medical applications involving embedded sensors, require ultra low energy dissipation with low-to-moderate performance (10kHz-100MHz) driving the conventional MOSFETs into sub-threshold operation regime. In this paper, we present an alternate ultra-low power computing architecture using Binary Decision Diagram based logic circuits implemented using Single Electron Transistors (SETs) operating in the Coulomb blockade regime with very low supply voltages. We evaluate the energy - performance tradeoff metrics of such BDD circuits using time domain Monte Carlo simulations and compare them with the energy-optimized CMOS logic circuits. Simulation results show that the proposed approach achieves better energy-delay characteristics than CMOS realizations.
Soft-decision decoding techniques for linear block codes and their error performance analysis
NASA Technical Reports Server (NTRS)
Lin, Shu
1996-01-01
The first paper presents a new minimum-weight trellis-based soft-decision iterative decoding algorithm for binary linear block codes. The second paper derives an upper bound on the probability of block error for multilevel concatenated codes (MLCC). The bound evaluates difference in performance for different decompositions of some codes. The third paper investigates the bit error probability code for maximum likelihood decoding of binary linear codes. The fourth and final paper included in this report is concerns itself with the construction of multilevel concatenated block modulation codes using a multilevel concatenation scheme for the frequency non-selective Rayleigh fading channel.
Decision tree and PCA-based fault diagnosis of rotating machinery
NASA Astrophysics Data System (ADS)
Sun, Weixiang; Chen, Jin; Li, Jiaqing
2007-04-01
After analysing the flaws of conventional fault diagnosis methods, data mining technology is introduced to fault diagnosis field, and a new method based on C4.5 decision tree and principal component analysis (PCA) is proposed. In this method, PCA is used to reduce features after data collection, preprocessing and feature extraction. Then, C4.5 is trained by using the samples to generate a decision tree model with diagnosis knowledge. At last the tree model is used to make diagnosis analysis. To validate the method proposed, six kinds of running states (normal or without any defect, unbalance, rotor radial rub, oil whirl, shaft crack and a simultaneous state of unbalance and radial rub), are simulated on Bently Rotor Kit RK4 to test C4.5 and PCA-based method and back-propagation neural network (BPNN). The result shows that C4.5 and PCA-based diagnosis method has higher accuracy and needs less training time than BPNN.
NASA Astrophysics Data System (ADS)
Park, J.; Yoo, K.
2013-12-01
For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.
The application of data mining techniques to oral cancer prognosis.
Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan
2015-05-01
This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.
Modeling individual tree survial
Quang V. Cao
2016-01-01
Information provided by growth and yield models is the basis for forest managers to make decisions on how to manage their forests. Among different types of growth models, whole-stand models offer predictions at stand level, whereas individual-tree models give detailed information at tree level. The well-known logistic regression is commonly used to predict tree...
Integrative relational machine-learning for understanding drug side-effect profiles
2013-01-01
Background Drug side effects represent a common reason for stopping drug development during clinical trials. Improving our ability to understand drug side effects is necessary to reduce attrition rates during drug development as well as the risk of discovering novel side effects in available drugs. Today, most investigations deal with isolated side effects and overlook possible redundancy and their frequent co-occurrence. Results In this work, drug annotations are collected from SIDER and DrugBank databases. Terms describing individual side effects reported in SIDER are clustered with a semantic similarity measure into term clusters (TCs). Maximal frequent itemsets are extracted from the resulting drug x TC binary table, leading to the identification of what we call side-effect profiles (SEPs). A SEP is defined as the longest combination of TCs which are shared by a significant number of drugs. Frequent SEPs are explored on the basis of integrated drug and target descriptors using two machine learning methods: decision-trees and inductive-logic programming. Although both methods yield explicit models, inductive-logic programming method performs relational learning and is able to exploit not only drug properties but also background knowledge. Learning efficiency is evaluated by cross-validation and direct testing with new molecules. Comparison of the two machine-learning methods shows that the inductive-logic-programming method displays a greater sensitivity than decision trees and successfully exploit background knowledge such as functional annotations and pathways of drug targets, thereby producing rich and expressive rules. All models and theories are available on a dedicated web site. Conclusions Side effect profiles covering significant number of drugs have been extracted from a drug ×side-effect association table. Integration of background knowledge concerning both chemical and biological spaces has been combined with a relational learning method for discovering rules which explicitly characterize drug-SEP associations. These rules are successfully used for predicting SEPs associated with new drugs. PMID:23802887
Integrative relational machine-learning for understanding drug side-effect profiles.
Bresso, Emmanuel; Grisoni, Renaud; Marchetti, Gino; Karaboga, Arnaud Sinan; Souchet, Michel; Devignes, Marie-Dominique; Smaïl-Tabbone, Malika
2013-06-26
Drug side effects represent a common reason for stopping drug development during clinical trials. Improving our ability to understand drug side effects is necessary to reduce attrition rates during drug development as well as the risk of discovering novel side effects in available drugs. Today, most investigations deal with isolated side effects and overlook possible redundancy and their frequent co-occurrence. In this work, drug annotations are collected from SIDER and DrugBank databases. Terms describing individual side effects reported in SIDER are clustered with a semantic similarity measure into term clusters (TCs). Maximal frequent itemsets are extracted from the resulting drug x TC binary table, leading to the identification of what we call side-effect profiles (SEPs). A SEP is defined as the longest combination of TCs which are shared by a significant number of drugs. Frequent SEPs are explored on the basis of integrated drug and target descriptors using two machine learning methods: decision-trees and inductive-logic programming. Although both methods yield explicit models, inductive-logic programming method performs relational learning and is able to exploit not only drug properties but also background knowledge. Learning efficiency is evaluated by cross-validation and direct testing with new molecules. Comparison of the two machine-learning methods shows that the inductive-logic-programming method displays a greater sensitivity than decision trees and successfully exploit background knowledge such as functional annotations and pathways of drug targets, thereby producing rich and expressive rules. All models and theories are available on a dedicated web site. Side effect profiles covering significant number of drugs have been extracted from a drug ×side-effect association table. Integration of background knowledge concerning both chemical and biological spaces has been combined with a relational learning method for discovering rules which explicitly characterize drug-SEP associations. These rules are successfully used for predicting SEPs associated with new drugs.
Predicting Tillage Patterns in the Tiffin River Watershed Using Remote Sensing Methods
NASA Astrophysics Data System (ADS)
Brooks, C.; McCarty, J. L.; Dean, D. B.; Mann, B. F.
2012-12-01
Previous research in tillage mapping has focused primarily on utilizing low to no-cost, moderate (30 m to 15 m) resolution satellite data. Successful data processing techniques published in the scientific literature have focused on extracting and/or classifying tillage patterns through manipulation of spectral bands. For instance, Daughtry et al. (2005) evaluated several spectral indices for crop residue cover using satellite multispectral and hyperspectral data and to categorize soil tillage intensity in agricultural fields. A weak to moderate relationship between Landsat Thematic Mapper (TM) indices and crop residue cover was found; similar results were reported in Minnesota. Building on the findings from the scientific literature and previous work done by MTRI in the heavily agricultural Tiffin watershed of northwest Ohio and southeast Michigan, a decision tree classifier approach (also referred to as a classification tree) was used, linking several satellite data to on-the-ground tillage information in order to boost classification results. This approach included five tillage indices and derived products. A decision tree methodology enabled the development of statistically optimized (i.e., minimizing misclassification rates) classification algorithms at various desired time steps: monthly, seasonally, and annual over the 2006-2010 time period. Due to their flexibility, processing speed, and availability within all major remote sensing and statistical software packages, decision trees can ingest several data inputs from multiple sensors and satellite products, selecting only the bands, band ratios, indices, and products that further reduce misclassification errors. The project team created crop-specific tillage pattern classification trees whereby a training data set (~ 50% of available ground data) was created for production of the actual decision tree and a validation data set was set aside (~ 50% of available ground data) in order to assess the accuracy of the classification. A seasonal time step was used, optimizing a decision tree based on seasonal ground data for tillage patterns and satellite data and products for years 2006 through 2010. Annual crop type maps derived by the project team and the USDA Cropland Data Layer project was used an input to understand locations of corn, soybeans, wheat, etc. on a yearly basis. As previously stated, the robustness of the decision tree approach is the ability to implement various satellite data and products across temporal, spectral, and spatial resolutions, thereby improving the resulting classification and providing a reliable method that is not sensor-dependent. Tillage pattern classification from satellite imagery is not a simple task and has proven a challenge to previous researchers investigating this remote sensing topic. The team's decision tree method produced a practical, usable output within a focused project time period. Daughtry, C.S.T., Hunt Jr., E.R., Doraiswamy, P.C., McMurtrey III, J.E. 2005. Remote sensing the spatial distribution of crop residues. Agron. J. 97, 864-871.
Using decision tree models to depict primary care physicians CRC screening decision heuristics.
Wackerbarth, Sarah B; Tarasenko, Yelena N; Curtis, Laurel A; Joyce, Jennifer M; Haist, Steven A
2007-10-01
The purpose of this study was to identify decision heuristics utilized by primary care physicians in formulating colorectal cancer screening recommendations. Qualitative research using in-depth semi-structured interviews. We interviewed 66 primary care internists and family physicians evenly drawn from academic and community practices. A majority of physicians were male, and almost all were white, non-Hispanic. Three researchers independently reviewed each transcript to determine the physician's decision criteria and developed decision trees. Final trees were developed by consensus. The constant comparative methodology was used to define the categories. Physicians were found to use 1 of 4 heuristics ("age 50," "age 50, if family history, then earlier," "age 50, if family history, then screen at age 40," or "age 50, if family history, then adjust relative to reference case") for the timing recommendation and 5 heuristics ["fecal occult blood test" (FOBT), "colonoscopy," "if not colonoscopy, then...," "FOBT and another test," and "a choice between options"] for the type decision. No connection was found between timing and screening type heuristics. We found evidence of heuristic use. Further research is needed to determine the potential impact on quality of care.
NASA Astrophysics Data System (ADS)
Dogon-Yaro, M. A.; Kumar, P.; Rahman, A. Abdul; Buyuksalih, G.
2016-09-01
Mapping of trees plays an important role in modern urban spatial data management, as many benefits and applications inherit from this detailed up-to-date data sources. Timely and accurate acquisition of information on the condition of urban trees serves as a tool for decision makers to better appreciate urban ecosystems and their numerous values which are critical to building up strategies for sustainable development. The conventional techniques used for extracting trees include ground surveying and interpretation of the aerial photography. However, these techniques are associated with some constraints, such as labour intensive field work and a lot of financial requirement which can be overcome by means of integrated LiDAR and digital image datasets. Compared to predominant studies on trees extraction mainly in purely forested areas, this study concentrates on urban areas, which have a high structural complexity with a multitude of different objects. This paper presented a workflow about semi-automated approach for extracting urban trees from integrated processing of airborne based LiDAR point cloud and multispectral digital image datasets over Istanbul city of Turkey. The paper reveals that the integrated datasets is a suitable technology and viable source of information for urban trees management. As a conclusion, therefore, the extracted information provides a snapshot about location, composition and extent of trees in the study area useful to city planners and other decision makers in order to understand how much canopy cover exists, identify new planting, removal, or reforestation opportunities and what locations have the greatest need or potential to maximize benefits of return on investment. It can also help track trends or changes to the urban trees over time and inform future management decisions.
Multiple confidence estimates as indices of eyewitness memory.
Sauer, James D; Brewer, Neil; Weber, Nathan
2008-08-01
Eyewitness identification decisions are vulnerable to various influences on witnesses' decision criteria that contribute to false identifications of innocent suspects and failures to choose perpetrators. An alternative procedure using confidence estimates to assess the degree of match between novel and previously viewed faces was investigated. Classification algorithms were applied to participants' confidence data to determine when a confidence value or pattern of confidence values indicated a positive response. Experiment 1 compared confidence group classification accuracy with a binary decision control group's accuracy on a standard old-new face recognition task and found superior accuracy for the confidence group for target-absent trials but not for target-present trials. Experiment 2 used a face mini-lineup task and found reduced target-present accuracy offset by large gains in target-absent accuracy. Using a standard lineup paradigm, Experiments 3 and 4 also found improved classification accuracy for target-absent lineups and, with a more sophisticated algorithm, for target-present lineups. This demonstrates the accessibility of evidence for recognition memory decisions and points to a more sensitive index of memory quality than is afforded by binary decisions.
NASA Technical Reports Server (NTRS)
Buntine, Wray
1994-01-01
IND computer program introduces Bayesian and Markov/maximum-likelihood (MML) methods and more-sophisticated methods of searching in growing trees. Produces more-accurate class-probability estimates important in applications like diagnosis. Provides range of features and styles with convenience for casual user, fine-tuning for advanced user or for those interested in research. Consists of four basic kinds of routines: data-manipulation, tree-generation, tree-testing, and tree-display. Written in C language.
Interpretable Categorization of Heterogeneous Time Series Data
NASA Technical Reports Server (NTRS)
Lee, Ritchie; Kochenderfer, Mykel J.; Mengshoel, Ole J.; Silbermann, Joshua
2017-01-01
We analyze data from simulated aircraft encounters to validate and inform the development of a prototype aircraft collision avoidance system. The high-dimensional and heterogeneous time series dataset is analyzed to discover properties of near mid-air collisions (NMACs) and categorize the NMAC encounters. Domain experts use these properties to better organize and understand NMAC occurrences. Existing solutions either are not capable of handling high-dimensional and heterogeneous time series datasets or do not provide explanations that are interpretable by a domain expert. The latter is critical to the acceptance and deployment of safety-critical systems. To address this gap, we propose grammar-based decision trees along with a learning algorithm. Our approach extends decision trees with a grammar framework for classifying heterogeneous time series data. A context-free grammar is used to derive decision expressions that are interpretable, application-specific, and support heterogeneous data types. In addition to classification, we show how grammar-based decision trees can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply grammar-based decision trees to a simulated aircraft encounter dataset and evaluate the performance of four variants of our learning algorithm. The best algorithm is used to analyze and categorize near mid-air collisions in the aircraft encounter dataset. We describe each discovered category in detail and discuss its relevance to aircraft collision avoidance.
Fang, H; Lu, B; Wang, X; Zheng, L; Sun, K; Cai, W
2017-08-17
This study proposed a decision tree model to screen upper urinary tract damage (UUTD) for patients with neurogenic bladder (NGB). Thirty-four NGB patients with UUTD were recruited in the case group, while 78 without UUTD were included in the control group. A decision tree method, classification and regression tree (CART), was then applied to develop the model in which UUTD was used as a dependent variable and history of urinary tract infections, bladder management, conservative treatment, and urodynamic findings were used as independent variables. The urethra function factor was found to be the primary screening information of patients and treated as the root node of the tree; Pabd max (maximum abdominal pressure, >14 cmH2O), Pves max (maximum intravesical pressure, ≤89 cmH2O), and gender (female) were also variables associated with UUTD. The accuracy of the proposed model was 84.8%, and the area under curve was 0.901 (95%CI=0.844-0.958), suggesting that the decision tree model might provide a new and convenient way to screen UUTD for NGB patients in both undeveloped and developing areas.
Graphic Representations as Tools for Decision Making.
ERIC Educational Resources Information Center
Howard, Judith
2001-01-01
Focuses on the use of graphic representations to enable students to improve their decision making skills in the social studies. Explores three visual aids used in assisting students with decision making: (1) the force field; (2) the decision tree; and (3) the decision making grid. (CMK)
The Effect of Defense R&D Expenditures on Military Capability and Technological Spillover
2013-03-01
ix List of Figures Page Figure 1. Decision Tree for Sectoring R&D Units...approach, often called sectoring , categorizes R&D activities by funding source, and the functional approach categorizes R&D activities by their objective...economic objectives (defense, and control and care of environment) (OECD, 2002). Figure 1 shows the decision tree for sectoring R&D units and
NASA Astrophysics Data System (ADS)
Ragettli, S.; Zhou, J.; Wang, H.; Liu, C.
2017-12-01
Flash floods in small mountain catchments are one of the most frequent causes of loss of life and property from natural hazards in China. Hydrological models can be a useful tool for the anticipation of these events and the issuing of timely warnings. Since sub-daily streamflow information is unavailable for most small basins in China, one of the main challenges is finding appropriate parameter values for simulating flash floods in ungauged catchments. In this study, we use decision tree learning to explore parameter set transferability between different catchments. For this purpose, the physically-based, semi-distributed rainfall-runoff model PRMS-OMS is set up for 35 catchments in ten Chinese provinces. Hourly data from more than 800 storm runoff events are used to calibrate the model and evaluate the performance of parameter set transfers between catchments. For each catchment, 58 catchment attributes are extracted from several data sets available for whole China. We then use a data mining technique (decision tree learning) to identify catchment similarities that can be related to good transfer performance. Finally, we use the splitting rules of decision trees for finding suitable donor catchments for ungauged target catchments. We show that decision tree learning allows to optimally utilize the information content of available catchment descriptors and outperforms regionalization based on a conventional measure of physiographic-climatic similarity by 15%-20%. Similar performance can be achieved with a regionalization method based on spatial proximity, but decision trees offer flexible rules for selecting suitable donor catchments, not relying on the vicinity of gauged catchments. This flexibility makes the method particularly suitable for implementation in sparsely gauged environments. We evaluate the probability to detect flood events exceeding a given return period, considering measured discharge and PRMS-OMS simulated flows with regionalized parameters. Overall, the probability of detection of an event with a return period of 10 years is 62%. 44% of all 10-year flood peaks can be detected with a timing error of 2 hours or less. These results indicate that the modeling system can provide useful information about the timing and magnitude of flood events at ungauged sites.
Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng
2018-02-09
Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P < 0.001) and inferior to that of the Logistic regression model (71.87% vs 78.71%, P < 0.001). The specificity of decision tree model was equivalent to that of the line discriminant model (63.87% vs 63.73%, P = 0.939) and higher than that of the logistic regression model (63.87% vs 38.0%, P < 0.001). The Area under the ROC curve (AUROCC) of the decision tree model was greater than that of the line discriminant model (0.765 vs 0.629, P < 0.001) and logistic regression model (0.765 vs 0.662, P < 0.001). BOLD MRI is a useful non-invasive imaging technique for the evaluation of lupus nephritis. Decision tree models constructed using functions of R2* values may facilitate the prediction of renal pathological patterns.
Goodman, Katherine E; Lessler, Justin; Cosgrove, Sara E; Harris, Anthony D; Lautenbach, Ebbing; Han, Jennifer H; Milstone, Aaron M; Massey, Colin J; Tamma, Pranita D
2016-10-01
Timely identification of extended-spectrum β-lactamase (ESBL) bacteremia can improve clinical outcomes while minimizing unnecessary use of broad-spectrum antibiotics, including carbapenems. However, most clinical microbiology laboratories currently require at least 24 additional hours from the time of microbial genus and species identification to confirm ESBL production. Our objective was to develop a user-friendly decision tree to predict which organisms are ESBL producing, to guide appropriate antibiotic therapy. We included patients ≥18 years of age with bacteremia due to Escherichia coli or Klebsiella species from October 2008 to March 2015 at Johns Hopkins Hospital. Isolates with ceftriaxone minimum inhibitory concentrations ≥2 µg/mL underwent ESBL confirmatory testing. Recursive partitioning was used to generate a decision tree to determine the likelihood that a bacteremic patient was infected with an ESBL producer. Discrimination of the original and cross-validated models was evaluated using receiver operating characteristic curves and by calculation of C-statistics. A total of 1288 patients with bacteremia met eligibility criteria. For 194 patients (15%), bacteremia was due to a confirmed ESBL producer. The final classification tree for predicting ESBL-positive bacteremia included 5 predictors: history of ESBL colonization/infection, chronic indwelling vascular hardware, age ≥43 years, recent hospitalization in an ESBL high-burden region, and ≥6 days of antibiotic exposure in the prior 6 months. The decision tree's positive and negative predictive values were 90.8% and 91.9%, respectively. Our findings suggest that a clinical decision tree can be used to estimate a bacteremic patient's likelihood of infection with ESBL-producing bacteria. Recursive partitioning offers a practical, user-friendly approach for addressing important diagnostic questions. © The Author 2016. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail journals.permissions@oup.com.
Ensemble stump classifiers and gene expression signatures in lung cancer.
Frey, Lewis; Edgerton, Mary; Fisher, Douglas; Levy, Shawn
2007-01-01
Microarray data sets for cancer tumor tissue generally have very few samples, each sample having thousands of probes (i.e., continuous variables). The sparsity of samples makes it difficult for machine learning techniques to discover probes relevant to the classification of tumor tissue. By combining data from different platforms (i.e., data sources), data sparsity is reduced, but this typically requires normalizing data from the different platforms, which can be non-trivial. This paper proposes a variant on the idea of ensemble learners to circumvent the need for normalization. To facilitate comprehension we build ensembles of very simple classifiers known as decision stumps--decision trees of one test each. The Ensemble Stump Classifier (ESC) identifies an mRNA signature having three probes and high accuracy for distinguishing between adenocarcinoma and squamous cell carcinoma of the lung across four data sets. In terms of accuracy, ESC outperforms a decision tree classifier on all four data sets, outperforms ensemble decision trees on three data sets, and simple stump classifiers on two data sets.
1987-07-01
position of the camshaft (for example), the position of each other kinematic pair will have a definite computable value. Thus the only state variables needed...DF CONNROD _CON ROD DIF C OD WF CYL /N. D I T.~ UN ,BLO CT!L \\ CYL C"YL CYL Y I C BLO"cB BLO CYL BLO Figure 2 CSG Tree for Crank .Mechanism program...ships plate. The actual representation is a binary tree who-e - internal nodes are set operation, and rigid motions and The first step for kinematic
Retrieving and Indexing Spatial Data in the Cloud Computing Environment
NASA Astrophysics Data System (ADS)
Wang, Yonggang; Wang, Sheng; Zhou, Daliang
In order to solve the drawbacks of spatial data storage in common Cloud Computing platform, we design and present a framework for retrieving, indexing, accessing and managing spatial data in the Cloud environment. An interoperable spatial data object model is provided based on the Simple Feature Coding Rules from the OGC such as Well Known Binary (WKB) and Well Known Text (WKT). And the classic spatial indexing algorithms like Quad-Tree and R-Tree are re-designed in the Cloud Computing environment. In the last we develop a prototype software based on Google App Engine to implement the proposed model.
CytoSPADE: high-performance analysis and visualization of high-dimensional cytometry data
Linderman, Michael D.; Simonds, Erin F.; Qiu, Peng; Bruggner, Robert V.; Sheode, Ketaki; Meng, Teresa H.; Plevritis, Sylvia K.; Nolan, Garry P.
2012-01-01
Motivation: Recent advances in flow cytometry enable simultaneous single-cell measurement of 30+ surface and intracellular proteins. CytoSPADE is a high-performance implementation of an interface for the Spanning-tree Progression Analysis of Density-normalized Events algorithm for tree-based analysis and visualization of this high-dimensional cytometry data. Availability: Source code and binaries are freely available at http://cytospade.org and via Bioconductor version 2.10 onwards for Linux, OSX and Windows. CytoSPADE is implemented in R, C++ and Java. Contact: michael.linderman@mssm.edu Supplementary Information: Additional documentation available at http://cytospade.org. PMID:22782546
Machine-assisted discovery of relationships in astronomy
NASA Astrophysics Data System (ADS)
Graham, Matthew J.; Djorgovski, S. G.; Mahabal, Ashish A.; Donalek, Ciro; Drake, Andrew J.
2013-05-01
High-volume feature-rich data sets are becoming the bread-and-butter of 21st century astronomy but present significant challenges to scientific discovery. In particular, identifying scientifically significant relationships between sets of parameters is non-trivial. Similar problems in biological and geosciences have led to the development of systems which can explore large parameter spaces and identify potentially interesting sets of associations. In this paper, we describe the application of automated discovery systems of relationships to astronomical data sets, focusing on an evolutionary programming technique and an information-theory technique. We demonstrate their use with classical astronomical relationships - the Hertzsprung-Russell diagram and the Fundamental Plane of elliptical galaxies. We also show how they work with the issue of binary classification which is relevant to the next generation of large synoptic sky surveys, such as the Large Synoptic Survey Telescope (LSST). We find that comparable results to more familiar techniques, such as decision trees, are achievable. Finally, we consider the reality of the relationships discovered and how this can be used for feature selection and extraction.
Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin
2016-01-01
Background: In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. Methods: As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6–12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. Results: The prevalence of anemia was 12.60% with a range of 3.47%–40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. Conclusions: The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities. PMID:27174328
Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano
2016-07-07
Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.
Dias, Cláudia Camila; Pereira Rodrigues, Pedro; Fernandes, Samuel; Portela, Francisco; Ministro, Paula; Martins, Diana; Sousa, Paula; Lago, Paula; Rosa, Isadora; Correia, Luis; Moura Santos, Paula; Magro, Fernando
2017-01-01
Crohn's disease (CD) is a chronic inflammatory bowel disease known to carry a high risk of disabling and many times requiring surgical interventions. This article describes a decision-tree based approach that defines the CD patients' risk or undergoing disabling events, surgical interventions and reoperations, based on clinical and demographic variables. This multicentric study involved 1547 CD patients retrospectively enrolled and divided into two cohorts: a derivation one (80%) and a validation one (20%). Decision trees were built upon applying the CHAIRT algorithm for the selection of variables. Three-level decision trees were built for the risk of disabling and reoperation, whereas the risk of surgery was described in a two-level one. A receiver operating characteristic (ROC) analysis was performed, and the area under the curves (AUC) Was higher than 70% for all outcomes. The defined risk cut-off values show usefulness for the assessed outcomes: risk levels above 75% for disabling had an odds test positivity of 4.06 [3.50-4.71], whereas risk levels below 34% and 19% excluded surgery and reoperation with an odds test negativity of 0.15 [0.09-0.25] and 0.50 [0.24-1.01], respectively. Overall, patients with B2 or B3 phenotype had a higher proportion of disabling disease and surgery, while patients with later introduction of pharmacological therapeutic (1 months after initial surgery) had a higher proportion of reoperation. The decision-tree based approach used in this study, with demographic and clinical variables, has shown to be a valid and useful approach to depict such risks of disabling, surgery and reoperation.
Ye, Fang; Chen, Zhi-Hua; Chen, Jie; Liu, Fang; Zhang, Yong; Fan, Qin-Ying; Wang, Lin
2016-05-20
In the past decades, studies on infant anemia have mainly focused on rural areas of China. With the increasing heterogeneity of population in recent years, available information on infant anemia is inconclusive in large cities of China, especially with comparison between native residents and floating population. This population-based cross-sectional study was implemented to determine the anemic status of infants as well as the risk factors in a representative downtown area of Beijing. As useful methods to build a predictive model, Chi-squared automatic interaction detection (CHAID) decision tree analysis and logistic regression analysis were introduced to explore risk factors of infant anemia. A total of 1091 infants aged 6-12 months together with their parents/caregivers living at Heping Avenue Subdistrict of Beijing were surveyed from January 1, 2013 to December 31, 2014. The prevalence of anemia was 12.60% with a range of 3.47%-40.00% in different subgroup characteristics. The CHAID decision tree model has demonstrated multilevel interaction among risk factors through stepwise pathways to detect anemia. Besides the three predictors identified by logistic regression model including maternal anemia during pregnancy, exclusive breastfeeding in the first 6 months, and floating population, CHAID decision tree analysis also identified the fourth risk factor, the maternal educational level, with higher overall classification accuracy and larger area below the receiver operating characteristic curve. The infant anemic status in metropolis is complex and should be carefully considered by the basic health care practitioners. CHAID decision tree analysis has demonstrated a better performance in hierarchical analysis of population with great heterogeneity. Risk factors identified by this study might be meaningful in the early detection and prompt treatment of infant anemia in large cities.
NASA Astrophysics Data System (ADS)
Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano
2016-07-01
Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.
Model of the best-of-N nest-site selection process in honeybees.
Reina, Andreagiovanni; Marshall, James A R; Trianni, Vito; Bose, Thomas
2017-05-01
The ability of a honeybee swarm to select the best nest site plays a fundamental role in determining the future colony's fitness. To date, the nest-site selection process has mostly been modeled and theoretically analyzed for the case of binary decisions. However, when the number of alternative nests is larger than two, the decision-process dynamics qualitatively change. In this work, we extend previous analyses of a value-sensitive decision-making mechanism to a decision process among N nests. First, we present the decision-making dynamics in the symmetric case of N equal-quality nests. Then, we generalize our findings to a best-of-N decision scenario with one superior nest and N-1 inferior nests, previously studied empirically in bees and ants. Whereas previous binary models highlighted the crucial role of inhibitory stop-signaling, the key parameter in our new analysis is the relative time invested by swarm members in individual discovery and in signaling behaviors. Our new analysis reveals conflicting pressures on this ratio in symmetric and best-of-N decisions, which could be solved through a time-dependent signaling strategy. Additionally, our analysis suggests how ecological factors determining the density of suitable nest sites may have led to selective pressures for an optimal stable signaling ratio.
Model of the best-of-N nest-site selection process in honeybees
NASA Astrophysics Data System (ADS)
Reina, Andreagiovanni; Marshall, James A. R.; Trianni, Vito; Bose, Thomas
2017-05-01
The ability of a honeybee swarm to select the best nest site plays a fundamental role in determining the future colony's fitness. To date, the nest-site selection process has mostly been modeled and theoretically analyzed for the case of binary decisions. However, when the number of alternative nests is larger than two, the decision-process dynamics qualitatively change. In this work, we extend previous analyses of a value-sensitive decision-making mechanism to a decision process among N nests. First, we present the decision-making dynamics in the symmetric case of N equal-quality nests. Then, we generalize our findings to a best-of-N decision scenario with one superior nest and N -1 inferior nests, previously studied empirically in bees and ants. Whereas previous binary models highlighted the crucial role of inhibitory stop-signaling, the key parameter in our new analysis is the relative time invested by swarm members in individual discovery and in signaling behaviors. Our new analysis reveals conflicting pressures on this ratio in symmetric and best-of-N decisions, which could be solved through a time-dependent signaling strategy. Additionally, our analysis suggests how ecological factors determining the density of suitable nest sites may have led to selective pressures for an optimal stable signaling ratio.
ERIC Educational Resources Information Center
Braus, Judy, Ed.
1992-01-01
Ranger Rick's NatureScope is a creative education series dedicated to inspiring in children an understanding and appreciation of the natural world while developing the skills they will need to make responsible decisions about the environment. Contents are organized into the following sections: (1) "What Makes a Tree a Tree?," including…
Binary Coded Web Access Pattern Tree in Education Domain
ERIC Educational Resources Information Center
Gomathi, C.; Moorthi, M.; Duraiswamy, K.
2008-01-01
Web Access Pattern (WAP), which is the sequence of accesses pursued by users frequently, is a kind of interesting and useful knowledge in practice. Sequential Pattern mining is the process of applying data mining techniques to a sequential database for the purposes of discovering the correlation relationships that exist among an ordered list of…
Khosravi, Khabat; Pham, Binh Thai; Chapi, Kamran; Shirzadi, Ataollah; Shahabi, Himan; Revhaug, Inge; Prakash, Indra; Tien Bui, Dieu
2018-06-15
Floods are one of the most damaging natural hazards causing huge loss of property, infrastructure and lives. Prediction of occurrence of flash flood locations is very difficult due to sudden change in climatic condition and manmade factors. However, prior identification of flood susceptible areas can be done with the help of machine learning techniques for proper timely management of flood hazards. In this study, we tested four decision trees based machine learning models namely Logistic Model Trees (LMT), Reduced Error Pruning Trees (REPT), Naïve Bayes Trees (NBT), and Alternating Decision Trees (ADT) for flash flood susceptibility mapping at the Haraz Watershed in the northern part of Iran. For this, a spatial database was constructed with 201 present and past flood locations and eleven flood-influencing factors namely ground slope, altitude, curvature, Stream Power Index (SPI), Topographic Wetness Index (TWI), land use, rainfall, river density, distance from river, lithology, and Normalized Difference Vegetation Index (NDVI). Statistical evaluation measures, the Receiver Operating Characteristic (ROC) curve, and Freidman and Wilcoxon signed-rank tests were used to validate and compare the prediction capability of the models. Results show that the ADT model has the highest prediction capability for flash flood susceptibility assessment, followed by the NBT, the LMT, and the REPT, respectively. These techniques have proven successful in quickly determining flood susceptible areas. Copyright © 2018 Elsevier B.V. All rights reserved.
Finding structure in data using multivariate tree boosting
Miller, Patrick J.; Lubke, Gitta H.; McArtor, Daniel B.; Bergeman, C. S.
2016-01-01
Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary. We provide the R package ‘mvtboost’ to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package ‘gbm’ (Ridgeway et al., 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. PMID:27918183
Tools of the Future: How Decision Tree Analysis Will Impact Mission Planning
NASA Technical Reports Server (NTRS)
Otterstatter, Matthew R.
2005-01-01
The universe is infinitely complex; however, the human mind has a finite capacity. The multitude of possible variables, metrics, and procedures in mission planning are far too many to address exhaustively. This is unfortunate because, in general, considering more possibilities leads to more accurate and more powerful results. To compensate, we can get more insightful results by employing our greatest tool, the computer. The power of the computer will be utilized through a technology that considers every possibility, decision tree analysis. Although decision trees have been used in many other fields, this is innovative for space mission planning. Because this is a new strategy, no existing software is able to completely accommodate all of the requirements. This was determined through extensive research and testing of current technologies. It was necessary to create original software, for which a short-term model was finished this summer. The model was built into Microsoft Excel to take advantage of the familiar graphical interface for user input, computation, and viewing output. Macros were written to automate the process of tree construction, optimization, and presentation. The results are useful and promising. If this tool is successfully implemented in mission planning, our reliance on old-fashioned heuristics, an error-prone shortcut for handling complexity, will be reduced. The computer algorithms involved in decision trees will revolutionize mission planning. The planning will be faster and smarter, leading to optimized missions with the potential for more valuable data.
Lateral gene transfers have polished animal genomes: lessons from nematodes
Danchin, Etienne G. J.; Rosso, Marie-Noëlle
2012-01-01
It is now accepted that lateral gene transfers (LGT), have significantly contributed to the composition of bacterial genomes. The amplitude of the phenomenon is considered so high in prokaryotes that it challenges the traditional view of a binary hierarchical tree of life to correctly represent the evolutionary history of species. Given the plethora of transfers between prokaryotes, it is currently impossible to infer the last common ancestral gene set for any extant species. For this ensemble of reasons, it has been proposed that the Darwinian binary tree of life may be inappropriate to correctly reflect the actual relations between species, at least in prokaryotes. In contrast, the contribution of LGT to the composition of animal genomes is less documented. In the light of recent analyses that reported series of LGT events in nematodes, we discuss the importance of this phenomenon in the evolutionary history and in the current composition of an animal genome. Far from being neutral, it appears that besides having contributed to nematode genome contents, LGT have favored the emergence of important traits such as plant-parasitism. PMID:22919619
A fast learning method for large scale and multi-class samples of SVM
NASA Astrophysics Data System (ADS)
Fan, Yu; Guo, Huiming
2017-06-01
A multi-class classification SVM(Support Vector Machine) fast learning method based on binary tree is presented to solve its low learning efficiency when SVM processing large scale multi-class samples. This paper adopts bottom-up method to set up binary tree hierarchy structure, according to achieved hierarchy structure, sub-classifier learns from corresponding samples of each node. During the learning, several class clusters are generated after the first clustering of the training samples. Firstly, central points are extracted from those class clusters which just have one type of samples. For those which have two types of samples, cluster numbers of their positive and negative samples are set respectively according to their mixture degree, secondary clustering undertaken afterwards, after which, central points are extracted from achieved sub-class clusters. By learning from the reduced samples formed by the integration of extracted central points above, sub-classifiers are obtained. Simulation experiment shows that, this fast learning method, which is based on multi-level clustering, can guarantee higher classification accuracy, greatly reduce sample numbers and effectively improve learning efficiency.
Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando
2014-09-01
This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation. © 2014 Institute of Food Technologists®
Pollution mitigation and carbon sequestration by an urban forest.
Brack, C L
2002-01-01
At the beginning of the 1900s, the Canberra plain was largely treeless. Graziers had carried out extensive clearing of the original trees since the 1820s leaving only scattered remnants and some plantings near homesteads. With the selection of Canberra as the site for the new capital of Australia, extensive tree plantings began in 1911. These trees have delivered a number of benefits, including aesthetic values and the amelioration of climatic extremes. Recently, however, it was considered that the benefits might extend to pollution mitigation and the sequestration of carbon. This paper outlines a case study of the value of the Canberra urban forest with particular reference to pollution mitigation. This study uses a tree inventory, modelling and decision support system developed to collect and use data about trees for tree asset management. The decision support system (DISMUT) was developed to assist in the management of about 400,000 trees planted in Canberra. The size of trees during the 5-year Kyoto Commitment Period was estimated using DISMUT and multiplied by estimates of value per square meter of canopy derived from available literature. The planted trees are estimated to have a combined energy reduction, pollution mitigation and carbon sequestration value of US$20-67 million during the period 2008-2012.
Using real options analysis to support strategic management decisions
NASA Astrophysics Data System (ADS)
Kabaivanov, Stanimir; Markovska, Veneta; Milev, Mariyan
2013-12-01
Decision making is a complex process that requires taking into consideration multiple heterogeneous sources of uncertainty. Standard valuation and financial analysis techniques often fail to properly account for all these sources of risk as well as for all sources of additional flexibility. In this paper we explore applications of a modified binomial tree method for real options analysis (ROA) in an effort to improve decision making process. Usual cases of use of real options are analyzed with elaborate study on the applications and advantages that company management can derive from their application. A numeric results based on extending simple binomial tree approach for multiple sources of uncertainty are provided to demonstrate the improvement effects on management decisions.
An attentional drift diffusion model over binary-attribute choice.
Fisher, Geoffrey
2017-11-01
In order to make good decisions, individuals need to identify and properly integrate information about various attributes associated with a choice. Since choices are often complex and made rapidly, they are typically affected by contextual variables that are thought to influence how much attention is paid to different attributes. I propose a modification of the attentional drift-diffusion model, the binary-attribute attentional drift diffusion model (baDDM), which describes the choice process over simple binary-attribute choices and how it is affected by fluctuations in visual attention. Using an eye-tracking experiment, I find the baDDM makes accurate quantitative predictions about several key variables including choices, reaction times, and how these variables are correlated with attention to two attributes in an accept-reject decision. Furthermore, I estimate an attribute-based fixation bias that suggests attention to an attribute increases its subjective weight by 5%, while the unattended attribute's weight is decreased by 10%. Copyright © 2017 Elsevier B.V. All rights reserved.
Improving ensemble decision tree performance using Adaboost and Bagging
NASA Astrophysics Data System (ADS)
Hasan, Md. Rajib; Siraj, Fadzilah; Sainin, Mohd Shamrie
2015-12-01
Ensemble classifier systems are considered as one of the most promising in medical data classification and the performance of deceision tree classifier can be increased by the ensemble method as it is proven to be better than single classifiers. However, in a ensemble settings the performance depends on the selection of suitable base classifier. This research employed two prominent esemble s namely Adaboost and Bagging with base classifiers such as Random Forest, Random Tree, j48, j48grafts and Logistic Model Regression (LMT) that have been selected independently. The empirical study shows that the performance varries when different base classifiers are selected and even some places overfitting issue also been noted. The evidence shows that ensemble decision tree classfiers using Adaboost and Bagging improves the performance of selected medical data sets.
Knowledge Quality Functions for Rule Discovery
1994-09-01
Managers in many organizations finding themselves in the possession of large and rapidly growing databases are beginning to suspect the information in their...missing values (Smyth and Goodman, 1992, p. 303). Decision trees "tend to grow very large for realistic applications and are thus difficult to interpret...by humans" (Holsheimer, 1994, p. 42). Decision trees also grow excessively complicated in the presence of noisy databases (Dhar and Tuzhilin, 1993, p
Structural Equation Model Trees
ERIC Educational Resources Information Center
Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman
2013-01-01
In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree…
NASA Astrophysics Data System (ADS)
Zhang, C.; Pan, X.; Zhang, S. Q.; Li, H. P.; Atkinson, P. M.
2017-09-01
Recent advances in remote sensing have witnessed a great amount of very high resolution (VHR) images acquired at sub-metre spatial resolution. These VHR remotely sensed data has post enormous challenges in processing, analysing and classifying them effectively due to the high spatial complexity and heterogeneity. Although many computer-aid classification methods that based on machine learning approaches have been developed over the past decades, most of them are developed toward pixel level spectral differentiation, e.g. Multi-Layer Perceptron (MLP), which are unable to exploit abundant spatial details within VHR images. This paper introduced a rough set model as a general framework to objectively characterize the uncertainty in CNN classification results, and further partition them into correctness and incorrectness on the map. The correct classification regions of CNN were trusted and maintained, whereas the misclassification areas were reclassified using a decision tree with both CNN and MLP. The effectiveness of the proposed rough set decision tree based MLP-CNN was tested using an urban area at Bournemouth, United Kingdom. The MLP-CNN, well capturing the complementarity between CNN and MLP through the rough set based decision tree, achieved the best classification performance both visually and numerically. Therefore, this research paves the way to achieve fully automatic and effective VHR image classification.
Park, Ji Hyun; Kim, Hyeon-Young; Lee, Hanna; Yun, Eun Kyoung
2015-12-01
This study compares the performance of the logistic regression and decision tree analysis methods for assessing the risk factors for infection in cancer patients undergoing chemotherapy. The subjects were 732 cancer patients who were receiving chemotherapy at K university hospital in Seoul, Korea. The data were collected between March 2011 and February 2013 and were processed for descriptive analysis, logistic regression and decision tree analysis using the IBM SPSS Statistics 19 and Modeler 15.1 programs. The most common risk factors for infection in cancer patients receiving chemotherapy were identified as alkylating agents, vinca alkaloid and underlying diabetes mellitus. The logistic regression explained 66.7% of the variation in the data in terms of sensitivity and 88.9% in terms of specificity. The decision tree analysis accounted for 55.0% of the variation in the data in terms of sensitivity and 89.0% in terms of specificity. As for the overall classification accuracy, the logistic regression explained 88.0% and the decision tree analysis explained 87.2%. The logistic regression analysis showed a higher degree of sensitivity and classification accuracy. Therefore, logistic regression analysis is concluded to be the more effective and useful method for establishing an infection prediction model for patients undergoing chemotherapy. Copyright © 2015 Elsevier Ltd. All rights reserved.
MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination
NASA Technical Reports Server (NTRS)
Riggs, George A.; Hall, Dorothy K.
2010-01-01
Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.
Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid
2017-04-01
Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.
Esmaily, Habibollah; Tayefi, Maryam; Doosti, Hassan; Ghayour-Mobarhan, Majid; Nezami, Hossein; Amirabadizadeh, Alireza
2018-04-24
We aimed to identify the associated risk factors of type 2 diabetes mellitus (T2DM) using data mining approach, decision tree and random forest techniques using the Mashhad Stroke and Heart Atherosclerotic Disorders (MASHAD) Study program. A cross-sectional study. The MASHAD study started in 2010 and will continue until 2020. Two data mining tools, namely decision trees, and random forests, are used for predicting T2DM when some other characteristics are observed on 9528 subjects recruited from MASHAD database. This paper makes a comparison between these two models in terms of accuracy, sensitivity, specificity and the area under ROC curve. The prevalence rate of T2DM was 14% among these subjects. The decision tree model has 64.9% accuracy, 64.5% sensitivity, 66.8% specificity, and area under the ROC curve measuring 68.6%, while the random forest model has 71.1% accuracy, 71.3% sensitivity, 69.9% specificity, and area under the ROC curve measuring 77.3% respectively. The random forest model, when used with demographic, clinical, and anthropometric and biochemical measurements, can provide a simple tool to identify associated risk factors for type 2 diabetes. Such identification can substantially use for managing the health policy to reduce the number of subjects with T2DM .
Dreyer, Nancy A; Bryant, Allison; Velentgas, Priscilla
2016-10-01
Recognizing the growing need for robust evidence about treatment effectiveness in real-world populations, the Good Research for Comparative Effectiveness (GRACE) guidelines have been developed for noninterventional studies of comparative effectiveness to determine which studies are sufficiently rigorous to be reliable enough for use in health technology assessments. To evaluate which aspects of the GRACE Checklist contribute most strongly to recognition of quality. We assembled 28 observational comparative effectiveness articles published from 2001 to 2010 that compared treatment effectiveness and/or safety of drugs, medical devices, and medical procedures. Twenty-two volunteers from academia, pharmaceutical companies, and government agencies applied the GRACE Checklist to those articles, providing 56 assessments. Ten senior academic and industry experts provided assessments of overall article quality for the purpose of decision support. We also rated each article based on the number of annual citations and impact factor of the journal in which the article was published. To identify checklist items that were most predictive of quality, classification and regression tree (CART) analysis, a binary, recursive, partitioning methodology, was used to create 3 decision trees, which compared the 56 article assessments with 3 external quality outcomes: (1) expert assessment of overall quality, (2) citation frequency, and (3) impact factor. A fourth tree looked at the composite outcome of all 3 quality indicators. The best predictors of quality included the following: use of concurrent comparators, limiting the study to new initiators of the study drug, equivalent measurement of outcomes in study groups, collecting data on most if not all known confounders or effect modifiers, accounting for immortal time bias in the analysis, and use of sensitivity analyses to test how much effect estimates depended on various assumptions. Only sensitivity analyses appeared consistently as a predictor of quality in all 4 trees. When a composite outcome of the 3 quality measures was used, the GRACE Checklist showed high sensitivity and specificity (71.43% and 80.95%, respectively). The GRACE Checklist stands out from other consensus-driven and expert guidance documents because of its extensive validation efforts. This most recent work shows that the checklist has strong sensitivity and specificity, increasing its utility as a screening tool to identify high-quality observational comparative effectiveness research worthy of in-depth review and applicability for decision support. No outside funding supported this research. All authors are full-time employees of Quintiles, which provides research and consulting services to the biopharmaceutical industry. The authors have no other disclosures to report. Two of the 3 CART trees were presented at the International Society of Pharmacepidemiology in 2015 ("Article Citations per Year" and "Journal Impact Factor"). The original validation study was published in the March 2014 issue of the Journal of Managed Care & Specialty Pharmacy. The checklist questions and scoring were included using a table that was originally published by this journal in 2014. Study concept and design were primarily contributed by Dreyer and Velentgas, along with Bryant. Bryant took the lead in data collection and analysis, along with Dreyer and Velentgas, and data interpretation was performed by Dreyer, Velentgas, and Bryant. The manuscript was written and revised primarily by Dreyer, along with Bryant and Velentgas.
Geoffrey H. Donovan; John Mills
2014-01-01
Many cities have policies encouraging homeowners to plant trees. For these policies to be effective, it is important to understand what motivates a homeownerâs tree-planting decision. Researchers address this question by identifying variables that influence participation in a tree-planting program in Portland, Oregon, U.S. According to the study, homeowners with street...
Individual differences in decision making by foraging hummingbirds.
Morgan, Kate V; Hurly, T Andrew; Healy, Susan D
2014-11-01
For both humans and animals preference for one option over others can be influenced by the context in which the options occur. In animals, changes in preference could be due to comparative decision-making or to changes in the energy state of the animal when making decisions. We investigated which of these possibilities better explained the response of wild hummingbirds to the addition of a decoy option to a set of two options by presenting Rufous hummingbirds (Selasphorus rufus) with a foraging experiment with two treatments. In each treatment the birds were presented with a binary choice between two options and a trinary choice with three options. In treatment one the binary choice was between a volume option and a concentration option, whereas in treatment two the same volume option was presented alongside an alternative concentration option. In the trinary choice, birds were presented with the same options as in the binary choice plus one of two inferior options. Birds changed their preferences when a poorer option was added to the choice set: birds increased their preference for the same option when in the presence of either decoy. Which option differed across individuals and the changes in preference were not readily explained by either energy maximisation or the decoy effect. The consistency in response within individuals, however, would suggest that the individual itself brings an extra dimension to context-dependent decision-making. This article is part of a Special Issue entitled: Cognition in the wild. Copyright © 2014 Elsevier B.V. All rights reserved.
Yang, Cheng-Hong; Wu, Kuo-Chuan; Chuang, Li-Yeh; Chang, Hsueh-Wei
2018-01-01
DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a r ibulose diphosphate carboxylase ( rbcL ) S NP b arcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree-selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species.
NASA Astrophysics Data System (ADS)
de Barros, Felipe P. J.; Bolster, Diogo; Sanchez-Vila, Xavier; Nowak, Wolfgang
2011-05-01
Assessing health risk in hydrological systems is an interdisciplinary field. It relies on the expertise in the fields of hydrology and public health and needs powerful translation concepts to provide decision support and policy making. Reliable health risk estimates need to account for the uncertainties and variabilities present in hydrological, physiological, and human behavioral parameters. Despite significant theoretical advancements in stochastic hydrology, there is still a dire need to further propagate these concepts to practical problems and to society in general. Following a recent line of work, we use fault trees to address the task of probabilistic risk analysis and to support related decision and management problems. Fault trees allow us to decompose the assessment of health risk into individual manageable modules, thus tackling a complex system by a structural divide and conquer approach. The complexity within each module can be chosen individually according to data availability, parsimony, relative importance, and stage of analysis. Three differences are highlighted in this paper when compared to previous works: (1) The fault tree proposed here accounts for the uncertainty in both hydrological and health components, (2) system failure within the fault tree is defined in terms of risk being above a threshold value, whereas previous studies that used fault trees used auxiliary events such as exceedance of critical concentration levels, and (3) we introduce a new form of stochastic fault tree that allows us to weaken the assumption of independent subsystems that is required by a classical fault tree approach. We illustrate our concept in a simple groundwater-related setting.
NASA Technical Reports Server (NTRS)
Roberts, Dar A.; Church, Richard; Ustin, Susan L.; Brass, James A. (Technical Monitor)
2001-01-01
Large urban wildfires throughout southern California have caused billions of dollars of damage and significant loss of life over the last few decades. Rapid urban growth along the wildland interface, high fuel loads and a potential increase in the frequency of large fires due to climatic change suggest that the problem will worsen in the future. Improved fire spread prediction and reduced uncertainty in assessing fire hazard would be significant, both economically and socially. Current problems in the modeling of fire spread include the role of plant community differences, spatial heterogeneity in fuels and spatio-temporal changes in fuels. In this research, we evaluated the potential of Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) and Airborne Synthetic Aperture Radar (AIRSAR) data for providing improved maps of wildfire fuel properties. Analysis concentrated in two areas of Southern California, the Santa Monica Mountains and Santa Barbara Front Range. Wildfire fuel information can be divided into four basic categories: fuel type, fuel load (live green and woody biomass), fuel moisture and fuel condition (live vs senesced fuels). To map fuel type, AVIRIS data were used to map vegetation species using Multiple Endmember Spectral Mixture Analysis (MESMA) and Binary Decision Trees. Green live biomass and canopy moisture were mapped using AVIRIS through analysis of the 980 nm liquid water absorption feature and compared to alternate measures of moisture and field measurements. Woody biomass was mapped using L and P band cross polarimetric data acquired in 1998 and 1999. Fuel condition was mapped using spectral mixture analysis to map green vegetation (green leaves), nonphotosynthetic vegetation (NPV; stems, wood and litter), shade and soil. Summaries describing the potential of hyperspectral and SAR data for fuel mapping are provided by Roberts et al. and Dennison et al. To utilize remotely sensed data to assess fire hazard, fuel-type maps were translated into standard fuel models accessible to the FARSITE fire spread simulator. The FARSITE model and BEHAVE are considered industry standards for fire behavior analysis. Anderson level fuels map, generated using a binary decision tree classifier are available for multiple dates in the Santa Monica Mountains and at least one date for Santa Barbara. Fuel maps that will fill in the areas between Santa Barbara and the Santa Monica Mountains study sites are in progress, as part of a NASA Regional Earth Science Application Center, the Southern California Wildfire Hazard Center. Species-level maps, were supplied to fire managing agencies (Los Angeles County Fire, California Department of Forestry). Research results were published extensively in the refereed and non-refereed literature. Educational outreach included funding of several graduate students, undergraduate intern training and an article featured in the California Alliance for Minorities Program (CAMP) Quarterly Journal.
Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling.
Tsipouras, Markos G; Exarchos, Themis P; Fotiadis, Dimitrios I; Kotsia, Anna P; Vakalis, Konstantinos V; Naka, Katerina K; Michalis, Lampros K
2008-07-01
A fuzzy rule-based decision support system (DSS) is presented for the diagnosis of coronary artery disease (CAD). The system is automatically generated from an initial annotated dataset, using a four stage methodology: 1) induction of a decision tree from the data; 2) extraction of a set of rules from the decision tree, in disjunctive normal form and formulation of a crisp model; 3) transformation of the crisp set of rules into a fuzzy model; and 4) optimization of the parameters of the fuzzy model. The dataset used for the DSS generation and evaluation consists of 199 subjects, each one characterized by 19 features, including demographic and history data, as well as laboratory examinations. Tenfold cross validation is employed, and the average sensitivity and specificity obtained is 62% and 54%, respectively, using the set of rules extracted from the decision tree (first and second stages), while the average sensitivity and specificity increase to 80% and 65%, respectively, when the fuzzification and optimization stages are used. The system offers several advantages since it is automatically generated, it provides CAD diagnosis based on easily and noninvasively acquired features, and is able to provide interpretation for the decisions made.
Re-Construction of Reference Population and Generating Weights by Decision Tree
2017-07-21
2017 Claflin University Orangeburg, SC 29115 DEFENSE EQUAL OPPORTUNITY MANAGEMENT INSTITUTE RESEARCH, DEVELOPMENT, AND STRATEGIC...Original Dataset 32 List of Figures in Appendix B Figure 1: Flow and Components of Project 20 Figure 2: Decision Tree 31 Figure 3: Effects of Weight...can compare the sample data. The dataset of this project has the reference population on unit level for group and gender, which is in red-dotted box
1983-03-01
Decision Tree -------------------- 62 4-E. PACKAGE unitrep Action/Area Selection flow Chart 82 4-7. PACKAGE unitrep Control Flow Chart...the originetor wculd manually draft simple, readable, formatted iressages using "-i predef.ined forms and decision logic trees . This alternative was...Study Analysis DATA CCNTENT ERRORS PERCENT OF ERRORS Character Type 2.1 Calcvlations/Associations 14.3 Message Identification 4.? Value Pisiratch 22.E
Zhong, Taiyang; Chen, Dongmei; Zhang, Xiuying
2016-11-09
Identification of the sources of soil mercury (Hg) on the provincial scale is helpful for enacting effective policies to prevent further contamination and take reclamation measurements. The natural and anthropogenic sources and their contributions of Hg in Chinese farmland soil were identified based on a decision tree method. The results showed that the concentrations of Hg in parent materials were most strongly associated with the general spatial distribution pattern of Hg concentration on a provincial scale. The decision tree analysis gained an 89.70% total accuracy in simulating the influence of human activities on the additions of Hg in farmland soil. Human activities-for example, the production of coke, application of fertilizers, discharge of wastewater, discharge of solid waste, and the production of non-ferrous metals-were the main external sources of a large amount of Hg in the farmland soil.
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Identifying Risk and Protective Factors in Recidivist Juvenile Offenders: A Decision Tree Approach
Ortega-Campos, Elena; García-García, Juan; Gil-Fenoy, Maria José; Zaldívar-Basurto, Flor
2016-01-01
Research on juvenile justice aims to identify profiles of risk and protective factors in juvenile offenders. This paper presents a study of profiles of risk factors that influence young offenders toward committing sanctionable antisocial behavior (S-ASB). Decision tree analysis is used as a multivariate approach to the phenomenon of repeated sanctionable antisocial behavior in juvenile offenders in Spain. The study sample was made up of the set of juveniles who were charged in a court case in the Juvenile Court of Almeria (Spain). The period of study of recidivism was two years from the baseline. The object of study is presented, through the implementation of a decision tree. Two profiles of risk and protective factors are found. Risk factors associated with higher rates of recidivism are antisocial peers, age at baseline S-ASB, problems in school and criminality in family members. PMID:27611313
Circum-Arctic petroleum systems identified using decision-tree chemometrics
Peters, K.E.; Ramos, L.S.; Zumberge, J.E.; Valin, Z.C.; Scotese, C.R.; Gautier, D.L.
2007-01-01
Source- and age-related biomarker and isotopic data were measured for more than 1000 crude oil samples from wells and seeps collected above approximately 55??N latitude. A unique, multitiered chemometric (multivariate statistical) decision tree was created that allowed automated classification of 31 genetically distinct circumArctic oil families based on a training set of 622 oil samples. The method, which we call decision-tree chemometrics, uses principal components analysis and multiple tiers of K-nearest neighbor and SIMCA (soft independent modeling of class analogy) models to classify and assign confidence limits for newly acquired oil samples and source rock extracts. Geochemical data for each oil sample were also used to infer the age, lithology, organic matter input, depositional environment, and identity of its source rock. These results demonstrate the value of large petroleum databases where all samples were analyzed using the same procedures and instrumentation. Copyright ?? 2007. The American Association of Petroleum Geologists. All rights reserved.
Three-dimensional object recognition using similar triangles and decision trees
NASA Technical Reports Server (NTRS)
Spirkovska, Lilly
1993-01-01
A system, TRIDEC, that is capable of distinguishing between a set of objects despite changes in the objects' positions in the input field, their size, or their rotational orientation in 3D space is described. TRIDEC combines very simple yet effective features with the classification capabilities of inductive decision tree methods. The feature vector is a list of all similar triangles defined by connecting all combinations of three pixels in a coarse coded 127 x 127 pixel input field. The classification is accomplished by building a decision tree using the information provided from a limited number of translated, scaled, and rotated samples. Simulation results are presented which show that TRIDEC achieves 94 percent recognition accuracy in the 2D invariant object recognition domain and 98 percent recognition accuracy in the 3D invariant object recognition domain after training on only a small sample of transformed views of the objects.
Zhong, Taiyang; Chen, Dongmei; Zhang, Xiuying
2016-01-01
Identification of the sources of soil mercury (Hg) on the provincial scale is helpful for enacting effective policies to prevent further contamination and take reclamation measurements. The natural and anthropogenic sources and their contributions of Hg in Chinese farmland soil were identified based on a decision tree method. The results showed that the concentrations of Hg in parent materials were most strongly associated with the general spatial distribution pattern of Hg concentration on a provincial scale. The decision tree analysis gained an 89.70% total accuracy in simulating the influence of human activities on the additions of Hg in farmland soil. Human activities—for example, the production of coke, application of fertilizers, discharge of wastewater, discharge of solid waste, and the production of non-ferrous metals—were the main external sources of a large amount of Hg in the farmland soil. PMID:27834884
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
DLRS: gene tree evolution in light of a species tree.
Sjöstrand, Joel; Sennblad, Bengt; Arvestad, Lars; Lagergren, Jens
2012-11-15
PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters. PrIME-DLRS is available for Java SE 6+ under the New BSD License, and JAR files and source code can be downloaded from http://code.google.com/p/jprime/. There is also a slightly older C++ version available as a binary package for Ubuntu, with download instructions at http://prime.sbc.su.se. The C++ source code is available upon request. joel.sjostrand@scilifelab.se or jens.lagergren@scilifelab.se. PrIME-DLRS is based on a sound probabilistic model (Åkerborg et al., 2009) and has been thoroughly validated on synthetic and biological datasets (Supplementary Material online).
iNJclust: Iterative Neighbor-Joining Tree Clustering Framework for Inferring Population Structure.
Limpiti, Tulaya; Amornbunchornvej, Chainarong; Intarapanich, Apichart; Assawamakin, Anunchai; Tongsima, Sissades
2014-01-01
Understanding genetic differences among populations is one of the most important issues in population genetics. Genetic variations, e.g., single nucleotide polymorphisms, are used to characterize commonality and difference of individuals from various populations. This paper presents an efficient graph-based clustering framework which operates iteratively on the Neighbor-Joining (NJ) tree called the iNJclust algorithm. The framework uses well-known genetic measurements, namely the allele-sharing distance, the neighbor-joining tree, and the fixation index. The behavior of the fixation index is utilized in the algorithm's stopping criterion. The algorithm provides an estimated number of populations, individual assignments, and relationships between populations as outputs. The clustering result is reported in the form of a binary tree, whose terminal nodes represent the final inferred populations and the tree structure preserves the genetic relationships among them. The clustering performance and the robustness of the proposed algorithm are tested extensively using simulated and real data sets from bovine, sheep, and human populations. The result indicates that the number of populations within each data set is reasonably estimated, the individual assignment is robust, and the structure of the inferred population tree corresponds to the intrinsic relationships among populations within the data.
Tree value system: users guide.
J.K. Ayer Sachet; D.G. Briggs; R.D. Fight
1989-01-01
This paper instructs resource analysts on use of the Tree Value System (TREEVAL). TREEVAL is a microcomputer system of programs for calculating tree or stand values and volumes based on predicted product recovery. Designed for analyzing silvicultural decisions, the system can also be used for appraisals and for evaluating log bucking. The system calculates results...
NASA Astrophysics Data System (ADS)
Shiju, S.; Sumitra, S.
2017-12-01
In this paper, the multiple kernel learning (MKL) is formulated as a supervised classification problem. We dealt with binary classification data and hence the data modelling problem involves the computation of two decision boundaries of which one related with that of kernel learning and the other with that of input data. In our approach, they are found with the aid of a single cost function by constructing a global reproducing kernel Hilbert space (RKHS) as the direct sum of the RKHSs corresponding to the decision boundaries of kernel learning and input data and searching that function from the global RKHS, which can be represented as the direct sum of the decision boundaries under consideration. In our experimental analysis, the proposed model had shown superior performance in comparison with that of existing two stage function approximation formulation of MKL, where the decision functions of kernel learning and input data are found separately using two different cost functions. This is due to the fact that single stage representation helps the knowledge transfer between the computation procedures for finding the decision boundaries of kernel learning and input data, which inturn boosts the generalisation capacity of the model.
Mailloux, Allan T; Cummings, Stephen W; Mugdh, Mrinal
2010-01-01
Our objective was to use Wisconsin's Medicaid Evaluation and Decision Support (MEDS) data warehouse to develop and validate a decision support tool (DST) that (1) identifies Wisconsin Medicaid fee-for-service recipients who are abusing controlled substances, (2) effectively replicates clinical pharmacist recommendations for interventions intended to curb abuse of physician and pharmacy services, and (3) automates data extraction, profile generation and tracking of recommendations and interventions. From pharmacist manual reviews of medication profiles, seven measures of overutilization of controlled substances were developed, including (1-2) 6-month and 2-month "shopping" scores, (3-4) 6-month and 2-month forgery scores, (5) duplicate/same day prescriptions, (6) count of controlled substance claims, and the (7) shopping 6-month score for the individual therapeutic class with the highest score. The pattern analysis logic for the measures was encoded into SQL and applied to the medication profiles of 190 recipients who had already undergone manual review. The scores for each measure and numbers of providers were analyzed by exhaustive chi-squared automatic interaction detection (CHAID) to determine significant thresholds and combinations of predictors of pharmacist recommendations, resulting in a decision tree to classify recipients by pharmacist recommendations. The overall correct classification rate of the decision tree was 95.3%, with a 2.4% false positive rate and 4.0% false negative rate for lock-in versus prescriber-alert letter recommendations. Measures used by the decision tree include the 2-month and 6-month shopping scores, and the number of pharmacies and prescribers. The number of pharmacies was the best predictor of abuse of controlled substances. When a Medicaid recipient receives prescriptions for controlled substances at 8 or more pharmacies, the likelihood of a lock-in recommendation is 90%. The availability of the Wisconsin MEDS data warehouse has enabled development and application of a decision tree for detecting recipient fraud and abuse of controlled substance medications. Using standard pharmacy claims data, the decision tree effectively replicates pharmacist manual review recommendations. The DST has automated extraction and evaluation of pharmacy claims data for creating recommendations for guiding pharmacists in the selection of profiles for manual review. The DST is now the primary method used by the Wisconsin Medicaid program to detect fraud and abuse of physician and pharmacy services committed by recipients.
A decision support system using combined-classifier for high-speed data stream in smart grid
NASA Astrophysics Data System (ADS)
Yang, Hang; Li, Peng; He, Zhian; Guo, Xiaobin; Fong, Simon; Chen, Huajun
2016-11-01
Large volume of high-speed streaming data is generated by big power grids continuously. In order to detect and avoid power grid failure, decision support systems (DSSs) are commonly adopted in power grid enterprises. Among all the decision-making algorithms, incremental decision tree is the most widely used one. In this paper, we propose a combined classifier that is a composite of a cache-based classifier (CBC) and a main tree classifier (MTC). We integrate this classifier into a stream processing engine on top of the DSS such that high-speed steaming data can be transformed into operational intelligence efficiently. Experimental results show that our proposed classifier can return more accurate answers than other existing ones.
ERIC Educational Resources Information Center
Stewart, Neil; Chater, Nick; Brown, Gordon D. A.
2006-01-01
We present a theory of decision by sampling (DbS) in which, in contrast with traditional models, there are no underlying psychoeconomic scales. Instead, we assume that an attribute's subjective value is constructed from a series of binary, ordinal comparisons to a sample of attribute values drawn from memory and is its rank within the sample. We…
Tang, Keshuang; Xu, Yanqing; Wang, Fen; Oguchi, Takashi
2016-10-01
The objective of this study is to empirically analyze and model the stop-go decision behavior of drivers at rural high-speed intersections in China, where a flashing green signal of 3s followed by a yellow signal of 3s is commonly applied to end a green phase. 1, 186 high-resolution vehicle trajectories were collected at four typical high-speed intersection approaches in Shanghai and used for the identification of actual stop-go decision zones and the modeling of stop-go decision behavior. Results indicate that the presence of flashing green significantly changed the theoretical decision zones based on the conventional Dilemma Zone theory. The actual stop-go decision zones at the study intersections were thus formulated and identified based on the empirical data. Binary Logistic model and Fuzzy Logic model were then developed to further explore the impacts of flashing green on the stop-go behavior of drivers. It was found that the Fuzzy Logic model could produce comparably good estimation results as compared to the traditional Binary Logistic models. The findings of this study could contribute the development of effective dilemma zone protection strategies, the improvement of stop-go decision model embedded in the microscopic traffic simulation software and the proper design of signal change and clearance intervals at high-speed intersections in China. Copyright © 2016 Elsevier Ltd. All rights reserved.
Effect of feedback mode and task difficulty on quality of timing decisions in a zero-sum game.
Tikuisis, Peter; Vartanian, Oshin; Mandel, David R
2014-09-01
The objective was to investigate the interaction between the mode of performance outcome feedback and task difficulty on timing decisions (i.e., when to act). Feedback is widely acknowledged to affect task performance. However, the extent to which feedback display mode and its impact on timing decisions is moderated by task difficulty remains largely unknown. Participants repeatedly engaged a zero-sum game involving silent duels with a computerized opponent and were given visual performance feedback after each engagement. They were sequentially tested on three different levels of task difficulty (low, intermediate, and high) in counterbalanced order. Half received relatively simple "inside view" binary outcome feedback, and the other half received complex "outside view" hit rate probability feedback. The key dependent variables were response time (i.e., time taken to make a decision) and survival outcome. When task difficulty was low to moderate, participants were more likely to learn and perform better from hit rate probability feedback than binary outcome feedback. However, better performance with hit rate feedback exacted a higher cognitive cost manifested by higher decision response time. The beneficial effect of hit rate probability feedback on timing decisions is partially moderated by task difficulty. Performance feedback mode should be judiciously chosen in relation to task difficulty for optimal performance in tasks involving timing decisions.
An information-based network approach for protein classification
Wan, Xiaogeng; Zhao, Xin; Yau, Stephen S. T.
2017-01-01
Protein classification is one of the critical problems in bioinformatics. Early studies used geometric distances and polygenetic-tree to classify proteins. These methods use binary trees to present protein classification. In this paper, we propose a new protein classification method, whereby theories of information and networks are used to classify the multivariate relationships of proteins. In this study, protein universe is modeled as an undirected network, where proteins are classified according to their connections. Our method is unsupervised, multivariate, and alignment-free. It can be applied to the classification of both protein sequences and structures. Nine examples are used to demonstrate the efficiency of our new method. PMID:28350835
TreSpEx—Detection of Misleading Signal in Phylogenetic Reconstructions Based on Tree Information
Struck, Torsten H
2014-01-01
Phylogenies of species or genes are commonplace nowadays in many areas of comparative biological studies. However, for phylogenetic reconstructions one must refer to artificial signals such as paralogy, long-branch attraction, saturation, or conflict between different datasets. These signals might eventually mislead the reconstruction even in phylogenomic studies employing hundreds of genes. Unfortunately, there has been no program allowing the detection of such effects in combination with an implementation into automatic process pipelines. TreSpEx (Tree Space Explorer) now combines different approaches (including statistical tests), which utilize tree-based information like nodal support or patristic distances (PDs) to identify misleading signals. The program enables the parallel analysis of hundreds of trees and/or predefined gene partitions, and being command-line driven, it can be integrated into automatic process pipelines. TreSpEx is implemented in Perl and supported on Linux, Mac OS X, and MS Windows. Source code, binaries, and additional material are freely available at http://www.annelida.de/research/bioinformatics/software.html. PMID:24701118
2012-03-01
with each SVM discriminating between a pair of the N total speakers in the data set. The (( + 1))/2 classifiers then vote on the final...classification of a test sample. The Random Forest classifier is an ensemble classifier that votes amongst decision trees generated with each node using...Forest vote , and the effects of overtraining will be mitigated by the fact that each decision tree is overtrained differently (due to the random
Using Decision Trees for Estimating Mode Choice of Trips in Buca-Izmir
NASA Astrophysics Data System (ADS)
Oral, L. O.; Tecim, V.
2013-05-01
Decision makers develop transportation plans and models for providing sustainable transport systems in urban areas. Mode Choice is one of the stages in transportation modelling. Data mining techniques can discover factors affecting the mode choice. These techniques can be applied with knowledge process approach. In this study a data mining process model is applied to determine the factors affecting the mode choice with decision trees techniques by considering individual trip behaviours from household survey data collected within Izmir Transportation Master Plan. From this perspective transport mode choice problem is solved on a case in district of Buca-Izmir, Turkey with CRISP-DM knowledge process model.
NASA Astrophysics Data System (ADS)
Elleuch, Hanene; Wali, Ali; Samet, Anis; Alimi, Adel M.
2017-03-01
Two systems of eyes and hand gestures recognition are used to control mobile devices. Based on a real-time video streaming captured from the device's camera, the first system recognizes the motion of user's eyes and the second one detects the static hand gestures. To avoid any confusion between natural and intentional movements we developed a system to fuse the decision coming from eyes and hands gesture recognition systems. The phase of fusion was based on decision tree approach. We conducted a study on 5 volunteers and the results that our system is robust and competitive.
A biclustering algorithm for extracting bit-patterns from binary datasets.
Rodriguez-Baena, Domingo S; Perez-Pulido, Antonio J; Aguilar-Ruiz, Jesus S
2011-10-01
Binary datasets represent a compact and simple way to store data about the relationships between a group of objects and their possible properties. In the last few years, different biclustering algorithms have been specially developed to be applied to binary datasets. Several approaches based on matrix factorization, suffix trees or divide-and-conquer techniques have been proposed to extract useful biclusters from binary data, and these approaches provide information about the distribution of patterns and intrinsic correlations. A novel approach to extracting biclusters from binary datasets, BiBit, is introduced here. The results obtained from different experiments with synthetic data reveal the excellent performance and the robustness of BiBit to density and size of input data. Also, BiBit is applied to a central nervous system embryonic tumor gene expression dataset to test the quality of the results. A novel gene expression preprocessing methodology, based on expression level layers, and the selective search performed by BiBit, based on a very fast bit-pattern processing technique, provide very satisfactory results in quality and computational cost. The power of biclustering in finding genes involved simultaneously in different cancer processes is also shown. Finally, a comparison with Bimax, one of the most cited binary biclustering algorithms, shows that BiBit is faster while providing essentially the same results. The source and binary codes, the datasets used in the experiments and the results can be found at: http://www.upo.es/eps/bigs/BiBit.html dsrodbae@upo.es Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Herguido, Estela; Pulido, Manuel; Francisco Lavado Contador, Joaquín; Schnabel, Susanne
2017-04-01
In Iberian dehesas and montados, the lack of tree recruitment compromises its long-term sustainability. However, in marginal areas of dehesas shrub encroachment facilitates tree recruitment while altering the distinctive physiognomic and cultural characteristics of the system. These are ongoing processes that should be considered when designing afforestation measures and policies. Based on spatial variables, we modeled the proneness of a piece of land to undergo tree recruitment and the results were related with the afforestation measures carried out under the UE First Afforestation Agricultural Land Program between 1992 and 2008. We analyzed the temporal tree population dynamics in 800 randomly selected plots of 100 m radius (2,510 ha in total) in dehesas and treeless pasturelands of Extremadura (hereafter rangelands). Tree changes were revealed by comparing aerial images taken in 1956 with orthophotographs and infrared ones from 2012. Spatial models that predict the areas prone either to lack tree recruitment or with recruitment were developed and based on three data mining algorithms: MARS (Multivariate Adaptive Regression Splines), Random Forest (RF) and Stochastic Gradient Boosting (Tree-Net, TN). Recruited-tree locations (1) vs. locations of places with no recruitment (0) (randomly selected from the study areas) were used as the binary dependent variable. A 5% of the data were used as test data set. As candidate explanatory variables we used 51 different topographic, climatic, bioclimatic, land cover-related and edaphic ones. The statistical models developed were extrapolated to the spatial context of the afforested areas in the region and also to the whole Extremenian rangelands, and the percentage of area modelled as prone to tree recruitment was calculated for each case. A total of 46,674.63 ha were afforested with holm oak (Quercus ilex) or cork oak (Quercus suber) in the studied rangelands under the UE First Afforestation Agricultural Land Program. In the sampled plots, 16,747 trees were detected as recruited, while 47,058 and 12,803 were present in both dates and lost during the studied period, respectively. Based on the Area Under the ROC Curve (AUC), all the data mining models considered showed a high fitness (MARS AUC= 0.86; TN AUC= 0.92; RF AUC= 0.95) and low misclassification rates. Correctly predicted test samples for absence and presence of tree recruitment accounted respectively to 78.3% and 76.8% when using MARS, 90.8% and 90.8% using TN and 88.9% and 89.1% using RF. The spatial patterns of the different models were similar. However, attending only the percentage of area prone to tree recruitment, outstanding differences were observed among models considering the total surface of rangelands (36.03% in MARS, 22.88% in TN and 6.72 % in RF). Despite these differences, when comparing the results with those of the afforested surfaces (31.73% in MARS, 20.70% in TN and 5.63 % in RF) the three algorithms pointed to similar conclusions, i.e. the afforestations performed in rangelands of Extremadura under UE First Afforestation Agricultural Land Program, barely discriminate between areas with or without natural regeneration. In conclusion, data mining technics are suitable to develop high-performance spatial models of vegetation dynamics. These models could be useful for policy and decision makers aimed at assessing the implementation of afforestation measures and the selection of more adequate locations.
Including public-health benefits of trees in urban-forestry decision making
Geoffrey H. Donovan
2017-01-01
Research demonstrating the biophysical benefits of urban trees are often used to justify investments in urban forestry. Far less emphasis, however, is placed on the non-bio-physical benefits such as improvements in public health. Indeed, the public-health benefits of trees may be significantly larger than the biophysical benefits, and, therefore, failure to account for...
Goal Programming: A New Tool for the Christmas Tree Industry
Bruce G. Hansen
1977-01-01
Goal programing (GP) can be useful for decision making in the natural Christmas tree industry. Its usefulness is demonstrated through an analysis of a hypothetical problem in which two potential growers decide how to use 10 acres in growing Christmas trees. Though the physical settings are identical, distinct differences between their goals significantly influence the...
An approach to the language discrimination in different scripts using adjacent local binary pattern
NASA Astrophysics Data System (ADS)
Brodić, D.; Amelio, A.; Milivojević, Z. N.
2017-09-01
The paper proposes a language discrimination method of documents. First, each letter is encoded with the certain script type according to its status in baseline area. Such a cipher text is subjected to a feature extraction process. Accordingly, the local binary pattern as well as its expanded version called adjacent local binary pattern are extracted. Because of the difference in the language characteristics, the above analysis shows significant diversity. This type of diversity is a key aspect in the decision-making differentiation of the languages. Proposed method is tested on an example of documents. The experiments give encouraging results.
NASA Astrophysics Data System (ADS)
Ray, P. A.; Wi, S.; Bonzanigo, L.; Taner, M. U.; Rodriguez, D.; Garcia, L.; Brown, C.
2016-12-01
The Decision Tree for Confronting Climate Change Uncertainty is a hierarchical, staged framework for accomplishing climate change risk management in water resources system investments. Since its development for the World Bank Water Group two years ago, the framework has been applied to pilot demonstration projects in Nepal (hydropower generation), Mexico (water supply), Kenya (multipurpose reservoir operation), and Indonesia (flood risks to dam infrastructure). An important finding of the Decision Tree demonstration projects has been the need to present the risks/opportunities of climate change to stakeholders and investors in proportion to risks/opportunities and hazards of other kinds. This presentation will provide an overview of tools and techniques used to quantify risks/opportunities to each of the project types listed above, with special attention to those found most useful for exploration of the risk space. Careful exploration of the risk/opportunity space shows that some interventions would be better taken now, whereas risks/opportunities of other types would be better instituted incrementally in order to maintain reversibility and flexibility. A number of factors contribute to the robustness/flexibility tradeoff: available capital, magnitude and imminence of potential risk/opportunity, modular (or not) character of investment, and risk aversion of the decision maker, among others. Finally, in each case, nuance was required in the translation of Decision Tree findings into actionable policy recommendations. Though the narrative of stakeholder solicitation, engagement, and ultimate partnership is unique to each case, summary lessons are available from the portfolio that can serve as a guideline to the community of climate change risk managers.
Pinzón-Sánchez, C; Cabrera, V E; Ruegg, P L
2011-04-01
The objective of this study was to develop a decision tree to evaluate the economic impact of different durations of intramammary treatment for the first case of mild or moderate clinical mastitis (CM) occurring in early lactation with various scenarios of pathogen distributions and use of on-farm culture. The tree included 2 decision and 3 probability events. The first decision evaluated use of on-farm culture (OFC; 2 programs using OFC and 1 not using OFC) and the second decision evaluated treatment strategies (no intramammary antimicrobials or antimicrobials administered for 2, 5, or 8 d). The tree included probabilities for the distribution of etiologies (gram-positive, gram-negative, or no growth), bacteriological cure, and recurrence. The economic consequences of mastitis included costs of diagnosis and initial treatment, additional treatments, labor, discarded milk, milk production losses due to clinical and subclinical mastitis, culling, and transmission of infection to other cows (only for CM caused by Staphylococcus aureus). Pathogen-specific estimates for bacteriological cure and milk losses were used. The economically optimal path for several scenarios was determined by comparison of expected monetary values. For most scenarios, the optimal economic strategy was to treat CM caused by gram-positive pathogens for 2 d and to avoid antimicrobials for CM cases caused by gram-negative pathogens or when no pathogen was recovered. Use of extended intramammary antimicrobial therapy (5 or 8 d) resulted in the least expected monetary values. Copyright © 2011 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Scholz, Miklas; Uzomah, Vincent C
2013-08-01
The retrofitting of sustainable drainage systems (SuDS) such as permeable pavements is currently undertaken ad hoc using expert experience supported by minimal guidance based predominantly on hard engineering variables. There is a lack of practical decision support tools useful for a rapid assessment of the potential of ecosystem services when retrofitting permeable pavements in urban areas that either feature existing trees or should be planted with trees in the near future. Thus the aim of this paper is to develop an innovative rapid decision support tool based on novel ecosystem service variables for retrofitting of permeable pavement systems close to trees. This unique tool proposes the retrofitting of permeable pavements that obtained the highest ecosystem service score for a specific urban site enhanced by the presence of trees. This approach is based on a novel ecosystem service philosophy adapted to permeable pavements rather than on traditional engineering judgement associated with variables based on quick community and environment assessments. For an example case study area such as Greater Manchester, which was dominated by Sycamore and Common Lime, a comparison with the traditional approach of determining community and environment variables indicates that permeable pavements are generally a preferred SuDS option. Permeable pavements combined with urban trees received relatively high scores, because of their great potential impact in terms of water and air quality improvement, and flood control, respectively. The outcomes of this paper are likely to lead to more combined permeable pavement and tree systems in the urban landscape, which are beneficial for humans and the environment. Copyright © 2013 Elsevier B.V. All rights reserved.
Hoggart, Lesley
2018-05-21
This paper scrutinises the concepts of moral reasoning and personal reasoning, problematising the binary model by looking at young women's pregnancy decision-making. Data from two UK empirical studies are subjected to theoretically driven qualitative secondary analysis, and illustrative cases show how complex decision-making is characterised by an intertwining of the personal and the moral, and is thus best understood by drawing on moral relativism.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elter, M.; Schulz-Wendtland, R.; Wittenberg, T.
2007-11-15
Mammography is the most effective method for breast cancer screening available today. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to approximately 70% unnecessary biopsies with benign outcomes. To reduce the high number of unnecessary breast biopsies, several computer-aided diagnosis (CAD) systems have been proposed in the last several years. These systems help physicians in their decision to perform a breast biopsy on a suspicious lesion seen in a mammogram or to perform a short term follow-up examination instead. We present two novel CAD approaches that both emphasize an intelligible decision process to predictmore » breast biopsy outcomes from BI-RADS findings. An intelligible reasoning process is an important requirement for the acceptance of CAD systems by physicians. The first approach induces a global model based on decison-tree learning. The second approach is based on case-based reasoning and applies an entropic similarity measure. We have evaluated the performance of both CAD approaches on two large publicly available mammography reference databases using receiver operating characteristic (ROC) analysis, bootstrap sampling, and the ANOVA statistical significance test. Both approaches outperform the diagnosis decisions of the physicians. Hence, both systems have the potential to reduce the number of unnecessary breast biopsies in clinical practice. A comparison of the performance of the proposed decision tree and CBR approaches with a state of the art approach based on artificial neural networks (ANN) shows that the CBR approach performs slightly better than the ANN approach, which in turn results in slightly better performance than the decision-tree approach. The differences are statistically significant (p value <0.001). On 2100 masses extracted from the DDSM database, the CRB approach for example resulted in an area under the ROC curve of A(z)=0.89{+-}0.01, the decision-tree approach in A(z)=0.87{+-}0.01, and the ANN approach in A(z)=0.88{+-}0.01.« less
Automated Classification of ROSAT Sources Using Heterogeneous Multiwavelength Source Catalogs
NASA Technical Reports Server (NTRS)
McGlynn, Thomas; Suchkov, A. A.; Winter, E. L.; Hanisch, R. J.; White, R. L.; Ochsenbein, F.; Derriere, S.; Voges, W.; Corcoran, M. F.
2004-01-01
We describe an on-line system for automated classification of X-ray sources, ClassX, and present preliminary results of classification of the three major catalogs of ROSAT sources, RASS BSC, RASS FSC, and WGACAT, into six class categories: stars, white dwarfs, X-ray binaries, galaxies, AGNs, and clusters of galaxies. ClassX is based on a machine learning technology. It represents a system of classifiers, each classifier consisting of a considerable number of oblique decision trees. These trees are built as the classifier is 'trained' to recognize various classes of objects using a training sample of sources of known object types. Each source is characterized by a preselected set of parameters, or attributes; the same set is then used as the classifier conducts classification of sources of unknown identity. The ClassX pipeline features an automatic search for X-ray source counterparts among heterogeneous data sets in on-line data archives using Virtual Observatory protocols; it retrieves from those archives all the attributes required by the selected classifier and inputs them to the classifier. The user input to ClassX is typically a file with target coordinates, optionally complemented with target IDs. The output contains the class name, attributes, and class probabilities for all classified targets. We discuss ways to characterize and assess the classifier quality and performance and present the respective validation procedures. Based on both internal and external validation, we conclude that the ClassX classifiers yield reasonable and reliable classifications for ROSAT sources and have the potential to broaden class representation significantly for rare object types.
NASA Astrophysics Data System (ADS)
ShiouWei, L.
2014-12-01
Reservoirs are the most important water resources facilities in Taiwan.However,due to the steep slope and fragile geological conditions in the mountain area,storm events usually cause serious debris flow and flood,and the flood then will flush large amount of sediment into reservoirs.The sedimentation caused by flood has great impact on the reservoirs life.Hence,how to operate a reservoir during flood events to increase the efficiency of sediment desilting without risk the reservoir safety and impact the water supply afterward is a crucial issue in Taiwan. Therefore,this study developed a novel optimization planning model for reservoir flood operation considering flood control and sediment desilting,and proposed easy to use operating rules represented by decision trees.The decision trees rules have considered flood mitigation,water supply and sediment desilting.The optimal planning model computes the optimal reservoir release for each flood event that minimum water supply impact and maximum sediment desilting without risk the reservoir safety.Beside the optimal flood operation planning model,this study also proposed decision tree based flood operating rules that were trained by the multiple optimal reservoir releases to synthesis flood scenarios.The synthesis flood scenarios consists of various synthesis storm events,reservoir's initial storage and target storages at the end of flood operating. Comparing the results operated by the decision tree operation rules(DTOR) with that by historical operation for Krosa Typhoon in 2007,the DTOR removed sediment 15.4% more than that of historical operation with reservoir storage only8.38×106m3 less than that of historical operation.For Jangmi Typhoon in 2008,the DTOR removed sediment 24.4% more than that of historical operation with reservoir storage only 7.58×106m3 less than that of historical operation.The results show that the proposed DTOR model can increase the sediment desilting efficiency and extend the reservoir life.
Porpiglia, Ermelinda; Hidalgo, Daniel; Koulnis, Miroslav; Tzafriri, Abraham R.; Socolovsky, Merav
2012-01-01
Erythropoietin (Epo)-induced Stat5 phosphorylation (p-Stat5) is essential for both basal erythropoiesis and for its acceleration during hypoxic stress. A key challenge lies in understanding how Stat5 signaling elicits distinct functions during basal and stress erythropoiesis. Here we asked whether these distinct functions might be specified by the dynamic behavior of the Stat5 signal. We used flow cytometry to analyze Stat5 phosphorylation dynamics in primary erythropoietic tissue in vivo and in vitro, identifying two signaling modalities. In later (basophilic) erythroblasts, Epo stimulation triggers a low intensity but decisive, binary (digital) p-Stat5 signal. In early erythroblasts the binary signal is superseded by a high-intensity graded (analog) p-Stat5 response. We elucidated the biological functions of binary and graded Stat5 signaling using the EpoR-HM mice, which express a “knocked-in” EpoR mutant lacking cytoplasmic phosphotyrosines. Strikingly, EpoR-HM mice are restricted to the binary signaling mode, which rescues these mice from fatal perinatal anemia by promoting binary survival decisions in erythroblasts. However, the absence of the graded p-Stat5 response in the EpoR-HM mice prevents them from accelerating red cell production in response to stress, including a failure to upregulate the transferrin receptor, which we show is a novel stress target. We found that Stat5 protein levels decline with erythroblast differentiation, governing the transition from high-intensity graded signaling in early erythroblasts to low-intensity binary signaling in later erythroblasts. Thus, using exogenous Stat5, we converted later erythroblasts into high-intensity graded signal transducers capable of eliciting a downstream stress response. Unlike the Stat5 protein, EpoR expression in erythroblasts does not limit the Stat5 signaling response, a non-Michaelian paradigm with therapeutic implications in myeloproliferative disease. Our findings show how the binary and graded modalities combine to generate high-fidelity Stat5 signaling over the entire basal and stress Epo range. They suggest that dynamic behavior may encode information during STAT signal transduction. PMID:22969412
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-08-01
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Type 2 Diabetes Mellitus Screening and Risk Factors Using Decision Tree: Results of Data Mining.
Habibi, Shafi; Ahmadi, Maryam; Alizadeh, Somayeh
2015-03-18
The aim of this study was to examine a predictive model using features related to the diabetes type 2 risk factors. The data were obtained from a database in a diabetes control system in Tabriz, Iran. The data included all people referred for diabetes screening between 2009 and 2011. The features considered as "Inputs" were: age, sex, systolic and diastolic blood pressure, family history of diabetes, and body mass index (BMI). Moreover, we used diagnosis as "Class". We applied the "Decision Tree" technique and "J48" algorithm in the WEKA (3.6.10 version) software to develop the model. After data preprocessing and preparation, we used 22,398 records for data mining. The model precision to identify patients was 0.717. The age factor was placed in the root node of the tree as a result of higher information gain. The ROC curve indicates the model function in identification of patients and those individuals who are healthy. The curve indicates high capability of the model, especially in identification of the healthy persons. We developed a model using the decision tree for screening T2DM which did not require laboratory tests for T2DM diagnosis.
Erdoğan, Onur; Aydin Son, Yeşim
2014-01-01
Single Nucleotide Polymorphisms (SNPs) are the most common genomic variations where only a single nucleotide differs between individuals. Individual SNPs and SNP profiles associated with diseases can be utilized as biological markers. But there is a need to determine the SNP subsets and patients' clinical data which is informative for the diagnosis. Data mining approaches have the highest potential for extracting the knowledge from genomic datasets and selecting the representative SNPs as well as most effective and informative clinical features for the clinical diagnosis of the diseases. In this study, we have applied one of the widely used data mining classification methodology: "decision tree" for associating the SNP biomarkers and significant clinical data with the Alzheimer's disease (AD), which is the most common form of "dementia". Different tree construction parameters have been compared for the optimization, and the most accurate tree for predicting the AD is presented.
Pricing and reimbursement frameworks in Central Eastern Europe: a decision tool to support choices.
Kolasa, Katarzyna; Kalo, Zoltan; Hornby, Edward
2015-02-01
Given limited financial resources in the Central Eastern European (CEE) region, challenges in obtaining access to innovative medical technologies are formidable. The objective of this research was to develop a decision tree that supports decision makers and drug manufacturers from CEE region in their search for optimal innovative pricing and reimbursement scheme (IPRSs). A systematic literature review was performed to search for published IPRSs, and then ten experts from the CEE region were interviewed to ascertain their opinions on these schemes. In total, 33 articles representing 46 unique IPRSs were analyzed. Based on our literature review and subsequent expert input, key decision nodes and branches of the decision tree were developed. The results indicate that outcome-based schemes are better suited to deal with uncertainties surrounding cost effectiveness, while non-outcome-based schemes are more appropriate for pricing and budget impact challenges.
NASA Astrophysics Data System (ADS)
Coupon, Jean; Leauthaud, Alexie; Kilbinger, Martin; Medezinski, Elinor
2017-07-01
SWOT (Super W Of Theta) computes two-point statistics for very large data sets, based on “divide and conquer” algorithms, mainly, but not limited to data storage in binary trees, approximation at large scale, parellelization (open MPI), and bootstrap and jackknife resampling methods “on the fly”. It currently supports projected and 3D galaxy auto and cross correlations, galaxy-galaxy lensing, and weighted histograms.
Jiulong Xie; Chung-Yun Hse; Todd F. Shupe; Tingxing Hu
2015-01-01
Lignocellulosic biomass (Moso Bamboo, Chinese tallow tree wood, switchgrass, and pine wood) was subjected to a novel delignification process using microwave energy in a binary glycerol/methanol solvent. The physicochemical properties of the recovered lignin were analyzed prior to its application in the fabrication of polylactic acid (PLA)âlignin composites. The results...
Extracting the information of coastline shape and its multiple representations
NASA Astrophysics Data System (ADS)
Liu, Ying; Li, Shujun; Tian, Zhen; Chen, Huirong
2007-06-01
According to studying the coastline, a new way of multiple representations is put forward in the paper. That is stimulating human thinking way when they generalized, building the appropriate math model and describing the coastline with graphics, extracting all kinds of the coastline shape information. The coastline automatic generalization will be finished based on the knowledge rules and arithmetic operators. Showing the information of coastline shape by building the curve Douglas binary tree, it can reveal the shape character of coastline not only microcosmically but also macroscopically. Extracting the information of coastline concludes the local characteristic point and its orientation, the curve structure and the topology trait. The curve structure can be divided the single curve and the curve cluster. By confirming the knowledge rules of the coastline generalization, the generalized scale and its shape parameter, the coastline automatic generalization model is established finally. The method of the multiple scale representation of coastline in this paper has some strong points. It is human's thinking mode and can keep the nature character of the curve prototype. The binary tree structure can control the coastline comparability, avoid the self-intersect phenomenon and hold the unanimous topology relationship.
Orlando, Lori A.; Buchanan, Adam H.; Hahn, Susan E.; Christianson, Carol A.; Powell, Karen P.; Skinner, Celette Sugg; Chesnut, Blair; Blach, Colette; Due, Barbara; Ginsburg, Geoffrey S.; Henrich, Vincent C.
2016-01-01
INTRODUCTION Family health history is a strong predictor of disease risk. To reduce the morbidity and mortality of many chronic diseases, risk-stratified evidence-based guidelines strongly encourage the collection and synthesis of family health history to guide selection of primary prevention strategies. However, the collection and synthesis of such information is not well integrated into clinical practice. To address barriers to collection and use of family health histories, the Genomedical Connection developed and validated MeTree, a Web-based, patient-facing family health history collection and clinical decision support tool. MeTree is designed for integration into primary care practices as part of the genomic medicine model for primary care. METHODS We describe the guiding principles, operational characteristics, algorithm development, and coding used to develop MeTree. Validation was performed through stakeholder cognitive interviewing, a genetic counseling pilot program, and clinical practice pilot programs in 2 community-based primary care clinics. RESULTS Stakeholder feedback resulted in changes to MeTree’s interface and changes to the phrasing of clinical decision support documents. The pilot studies resulted in the identification and correction of coding errors and the reformatting of clinical decision support documents. MeTree’s strengths in comparison with other tools are its seamless integration into clinical practice and its provision of action-oriented recommendations guided by providers’ needs. LIMITATIONS The tool was validated in a small cohort. CONCLUSION MeTree can be integrated into primary care practices to help providers collect and synthesize family health history information from patients with the goal of improving adherence to risk-stratified evidence-based guidelines. PMID:24044145
Multistable binary decision making on networks
NASA Astrophysics Data System (ADS)
Lucas, Andrew; Lee, Ching Hua
2013-03-01
We propose a simple model for a binary decision making process on a graph, motivated by modeling social decision making with cooperative individuals. The model is similar to a random field Ising model or fiber bundle model, but with key differences in behavior on heterogeneous networks. For many types of disorder and interactions between the nodes, we predict with mean field theory discontinuous phase transitions that are largely independent of network structure. We show how these phase transitions can also be understood by studying microscopic avalanches and describe how network structure enhances fluctuations in the distribution of avalanches. We suggest theoretically the existence of a “glassy” spectrum of equilibria associated with a typical phase, even on infinite graphs, so long as the first moment of the degree distribution is finite. This behavior implies that the model is robust against noise below a certain scale and also that phase transitions can switch from discontinuous to continuous on networks with too few edges. Numerical simulations suggest that our theory is accurate.
Collective decision dynamics in the presence of external drivers
NASA Astrophysics Data System (ADS)
Bassett, Danielle S.; Alderson, David L.; Carlson, Jean M.
2012-09-01
We develop a sequence of models describing information transmission and decision dynamics for a network of individual agents subject to multiple sources of influence. Our general framework is set in the context of an impending natural disaster, where individuals, represented by nodes on the network, must decide whether or not to evacuate. Sources of influence include a one-to-many externally driven global broadcast as well as pairwise interactions, across links in the network, in which agents transmit either continuous opinions or binary actions. We consider both uniform and variable threshold rules on the individual opinion as baseline models for decision making. Our results indicate that (1) social networks lead to clustering and cohesive action among individuals, (2) binary information introduces high temporal variability and stagnation, and (3) information transmission over the network can either facilitate or hinder action adoption, depending on the influence of the global broadcast relative to the social network. Our framework highlights the essential role of local interactions between agents in predicting collective behavior of the population as a whole.
Kamphuis, C; Mollenhorst, H; Heesterbeek, J A P; Hogeveen, H
2010-08-01
The objective was to develop and validate a clinical mastitis (CM) detection model by means of decision-tree induction. For farmers milking with an automatic milking system (AMS), it is desirable that the detection model has a high level of sensitivity (Se), especially for more severe cases of CM, at a very high specificity (Sp). In addition, an alert for CM should be generated preferably at the quarter milking (QM) at which the CM infection is visible for the first time. Data were collected from 9 Dutch dairy herds milking automatically during a 2.5-yr period. Data included sensor data (electrical conductivity, color, and yield) at the QM level and visual observations of quarters with CM recorded by the farmers. Visual observations of quarters with CM were combined with sensor data of the most recent automatic milking recorded for that same quarter, within a 24-h time window before the visual assessment time. Sensor data of 3.5 million QM were collected, of which 348 QM were combined with a CM observation. Data were divided into a training set, including two-thirds of all data, and a test set. Cows in the training set were not included in the test set and vice versa. A decision-tree model was trained using only clear examples of healthy (n=24,717) or diseased (n=243) QM. The model was tested on 105 QM with CM and a random sample of 50,000 QM without CM. While keeping the Se at a level comparable to that of models currently used by AMS, the decision-tree model was able to decrease the number of false-positive alerts by more than 50%. At an Sp of 99%, 40% of the CM cases were detected. Sixty-four percent of the severe CM cases were detected and only 12.5% of the CM that were scored as watery milk. The Se increased considerably from 40% to 66.7% when the time window increased from less than 24h before the CM observation, to a time window from 24h before to 24h after the CM observation. Even at very wide time windows, however, it was impossible to reach an Se of 100%. This indicates the inability to detect all CM cases based on sensor data alone. Sensitivity levels varied largely when the decision tree was validated per herd. This trend was confirmed when decision trees were trained using data from 8 herds and tested on data from the ninth herd. This indicates that when using the decision tree as a generic CM detection model in practice, some herds will continue having difficulties in detecting CM using mastitis alert lists, whereas others will perform well. Copyright (c) 2010 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Sharon Hood; Duncan Lutes
2017-01-01
Accurate prediction of fire-caused tree mortality is critical for making sound land management decisions such as developing burning prescriptions and post-fire management guidelines. To improve efforts to predict post-fire tree mortality, we developed 3-year post-fire mortality models for 12 Western conifer species - white fir (Abies concolor [Gord. &...
Context-Sensitive Ethics in School Psychology
ERIC Educational Resources Information Center
Lasser, Jon; Klose, Laurie McGarry; Robillard, Rachel
2013-01-01
Ethical codes and licensing rules provide foundational guidance for practicing school psychologists, but these sources fall short in their capacity to facilitate effective decision-making. When faced with ethical dilemmas, school psychologists can turn to decision-making models, but step-wise decision trees frequently lack the situation…
The fourfold way of the genetic code.
Jiménez-Montaño, Miguel Angel
2009-11-01
We describe a compact representation of the genetic code that factorizes the table in quartets. It represents a "least grammar" for the genetic language. It is justified by the Klein-4 group structure of RNA bases and codon doublets. The matrix of the outer product between the column-vector of bases and the corresponding row-vector V(T)=(C G U A), considered as signal vectors, has a block structure consisting of the four cosets of the KxK group of base transformations acting on doublet AA. This matrix, translated into weak/strong (W/S) and purine/pyrimidine (R/Y) nucleotide classes, leads to a code table with mixed and unmixed families in separate regions. A basic difference between them is the non-commuting (R/Y) doublets: AC/CA, GU/UG. We describe the degeneracy in the canonical code and the systematic changes in deviant codes in terms of the divisors of 24, employing modulo multiplication groups. We illustrate binary sub-codes characterizing mutations in the quartets. We introduce a decision-tree to predict the mode of tRNA recognition corresponding to each codon, and compare our result with related findings by Jestin and Soulé [Jestin, J.-L., Soulé, C., 2007. Symmetries by base substitutions in the genetic code predict 2' or 3' aminoacylation of tRNAs. J. Theor. Biol. 247, 391-394], and the rearrangements of the table by Delarue [Delarue, M., 2007. An asymmetric underlying rule in the assignment of codons: possible clue to a quick early evolution of the genetic code via successive binary choices. RNA 13, 161-169] and Rodin and Rodin [Rodin, S.N., Rodin, A.S., 2008. On the origin of the genetic code: signatures of its primordial complementarity in tRNAs and aminoacyl-tRNA synthetases. Heredity 100, 341-355], respectively.
Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi
2017-11-02
Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.
Don't bet on it! Wagering as a measure of awareness in decision making under uncertainty.
Konstantinidis, Emmanouil; Shanks, David R
2014-12-01
Can our decisions be guided by unconscious or implicit influences? According to the somatic marker hypothesis, emotion-based signals can guide our decisions in uncertain environments outside awareness. Postdecision wagering, in which participants make wagers on the outcomes of their decisions, has been recently proposed as an objective and sensitive measure of conscious content. In 5 experiments we employed variations of a classic decision-making assessment, the Iowa Gambling Task, in combination with wagering in order to investigate the role played by unconscious influences. We examined the validity of postdecision wagering by comparing it with alternative measures of conscious knowledge, specifically confidence ratings and quantitative questions. Consistent with a putative role for unconscious influences, in Experiments 2 and 3 we observed a lag between choice accuracy and the onset of advantageous wagering. However, the lag was eliminated by a change in the wagering payoff matrix (Experiment 2) and by a switch from a binary wager response to either a binary or a 4-point confidence response (Experiment 3), and wagering underestimated awareness compared to explicit quantitative questions (Experiments 1 and 4). Our results demonstrate the insensitivity of postdecision wagering as a direct measure of conscious knowledge and challenge the claim that implicit processes influence decision making under uncertainty. PsycINFO Database Record (c) 2014 APA, all rights reserved.
Branch: an interactive, web-based tool for testing hypotheses and developing predictive models.
Gangavarapu, Karthik; Babji, Vyshakh; Meißner, Tobias; Su, Andrew I; Good, Benjamin M
2016-07-01
Branch is a web application that provides users with the ability to interact directly with large biomedical datasets. The interaction is mediated through a collaborative graphical user interface for building and evaluating decision trees. These trees can be used to compose and test sophisticated hypotheses and to develop predictive models. Decision trees are built and evaluated based on a library of imported datasets and can be stored in a collective area for sharing and re-use. Branch is hosted at http://biobranch.org/ and the open source code is available at http://bitbucket.org/sulab/biobranch/ asu@scripps.edu or bgood@scripps.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Event Classification and Identification Based on the Characteristic Ellipsoid of Phasor Measurement
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ma, Jian; Diao, Ruisheng; Makarov, Yuri V.
2011-09-23
In this paper, a method to classify and identify power system events based on the characteristic ellipsoid of phasor measurement is presented. The decision tree technique is used to perform the event classification and identification. Event types, event locations and clearance times are identified by decision trees based on the indices of the characteristic ellipsoid. A sufficiently large number of transient events were simulated on the New England 10-machine 39-bus system based on different system configurations. Transient simulations taking into account different event types, clearance times and various locations are conducted to simulate phasor measurement. Bus voltage magnitudes and recordedmore » reactive and active power flows are used to build the characteristic ellipsoid. The volume, eccentricity, center and projection of the longest axis in the parameter space coordinates of the characteristic ellipsoids are used to classify and identify events. Results demonstrate that the characteristic ellipsoid and the decision tree are capable to detect the event type, location, and clearance time with very high accuracy.« less
Online adaptive decision trees: pattern classification and function approximation.
Basak, Jayanta
2006-09-01
Recently we have shown that decision trees can be trained in the online adaptive (OADT) mode (Basak, 2004), leading to better generalization score. OADTs were bottlenecked by the fact that they are able to handle only two-class classification tasks with a given structure. In this article, we provide an architecture based on OADT, ExOADT, which can handle multiclass classification tasks and is able to perform function approximation. ExOADT is structurally similar to OADT extended with a regression layer. We also show that ExOADT is capable not only of adapting the local decision hyperplanes in the nonterminal nodes but also has the potential of smoothly changing the structure of the tree depending on the data samples. We provide the learning rules based on steepest gradient descent for the new model ExOADT. Experimentally we demonstrate the effectiveness of ExOADT in the pattern classification and function approximation tasks. Finally, we briefly discuss the relationship of ExOADT with other classification models.
A hybrid method for classifying cognitive states from fMRI data.
Parida, S; Dehuri, S; Cho, S-B; Cacha, L A; Poznanski, R R
2015-09-01
Functional magnetic resonance imaging (fMRI) makes it possible to detect brain activities in order to elucidate cognitive-states. The complex nature of fMRI data requires under-standing of the analyses applied to produce possible avenues for developing models of cognitive state classification and improving brain activity prediction. While many models of classification task of fMRI data analysis have been developed, in this paper, we present a novel hybrid technique through combining the best attributes of genetic algorithms (GAs) and ensemble decision tree technique that consistently outperforms all other methods which are being used for cognitive-state classification. Specifically, this paper illustrates the combined effort of decision-trees ensemble and GAs for feature selection through an extensive simulation study and discusses the classification performance with respect to fMRI data. We have shown that our proposed method exhibits significant reduction of the number of features with clear edge classification accuracy over ensemble of decision-trees.
NASA Astrophysics Data System (ADS)
Muslim, M. A.; Herowati, A. J.; Sugiharti, E.; Prasetiyo, B.
2018-03-01
A technique to dig valuable information buried or hidden in data collection which is so big to be found an interesting patterns that was previously unknown is called data mining. Data mining has been applied in the healthcare industry. One technique used data mining is classification. The decision tree included in the classification of data mining and algorithm developed by decision tree is C4.5 algorithm. A classifier is designed using applying pessimistic pruning in C4.5 algorithm in diagnosing chronic kidney disease. Pessimistic pruning use to identify and remove branches that are not needed, this is done to avoid overfitting the decision tree generated by the C4.5 algorithm. In this paper, the result obtained using these classifiers are presented and discussed. Using pessimistic pruning shows increase accuracy of C4.5 algorithm of 1.5% from 95% to 96.5% in diagnosing of chronic kidney disease.
The economic impact of pig-associated parasitic zoonosis in Northern Lao PDR.
Choudhury, Adnan Ali Khan; Conlan, James V; Racloz, Vanessa Nadine; Reid, Simon Andrew; Blacksell, Stuart D; Fenwick, Stanley G; Thompson, Andrew R C; Khamlome, Boualam; Vongxay, Khamphouth; Whittaker, Maxine
2013-03-01
The parasitic zoonoses human cysticercosis (Taenia solium), taeniasis (other Taenia species) and trichinellosis (Trichinella species) are endemic in the Lao People's Democratic Republic (Lao PDR). This study was designed to quantify the economic burden pig-associated zoonotic disease pose in Lao PDR. In particular, the analysis included estimation of the losses in the pork industry as well as losses due to human illness and lost productivity. A Markov-probability based decision-tree model was chosen to form the basis of the calculations to estimate the economic and public health impacts of taeniasis, trichinellosis and cysticercosis. Two different decision trees were run simultaneously on the model's human cohort. A third decision tree simulated the potential impacts on pig production. The human capital method was used to estimate productivity loss. The results found varied significantly depending on the rate of hospitalisation due to neurocysticerosis. This study is the first systematic estimate of the economic impact of pig-associated zoonotic diseases in Lao PDR that demonstrates the significance of the diseases in that country.
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
On multi-site damage identification using single-site training data
NASA Astrophysics Data System (ADS)
Barthorpe, R. J.; Manson, G.; Worden, K.
2017-11-01
This paper proposes a methodology for developing multi-site damage location systems for engineering structures that can be trained using single-site damaged state data only. The methodology involves training a sequence of binary classifiers based upon single-site damage data and combining the developed classifiers into a robust multi-class damage locator. In this way, the multi-site damage identification problem may be decomposed into a sequence of binary decisions. In this paper Support Vector Classifiers are adopted as the means of making these binary decisions. The proposed methodology represents an advancement on the state of the art in the field of multi-site damage identification which require either: (1) full damaged state data from single- and multi-site damage cases or (2) the development of a physics-based model to make multi-site model predictions. The potential benefit of the proposed methodology is that a significantly reduced number of recorded damage states may be required in order to train a multi-site damage locator without recourse to physics-based model predictions. In this paper it is first demonstrated that Support Vector Classification represents an appropriate approach to the multi-site damage location problem, with methods for combining binary classifiers discussed. Next, the proposed methodology is demonstrated and evaluated through application to a real engineering structure - a Piper Tomahawk trainer aircraft wing - with its performance compared to classifiers trained using the full damaged-state dataset.
Peña, Carlos; Espeland, Marianne
2015-01-01
The species rich butterfly family Nymphalidae has been used to study evolutionary interactions between plants and insects. Theories of insect-hostplant dynamics predict accelerated diversification due to key innovations. In evolutionary biology, analysis of maximum credibility trees in the software MEDUSA (modelling evolutionary diversity using stepwise AIC) is a popular method for estimation of shifts in diversification rates. We investigated whether phylogenetic uncertainty can produce different results by extending the method across a random sample of trees from the posterior distribution of a Bayesian run. Using the MultiMEDUSA approach, we found that phylogenetic uncertainty greatly affects diversification rate estimates. Different trees produced diversification rates ranging from high values to almost zero for the same clade, and both significant rate increase and decrease in some clades. Only four out of 18 significant shifts found on the maximum clade credibility tree were consistent across most of the sampled trees. Among these, we found accelerated diversification for Ithomiini butterflies. We used the binary speciation and extinction model (BiSSE) and found that a hostplant shift to Solanaceae is correlated with increased net diversification rates in Ithomiini, congruent with the diffuse cospeciation hypothesis. Our results show that taking phylogenetic uncertainty into account when estimating net diversification rate shifts is of great importance, as very different results can be obtained when using the maximum clade credibility tree and other trees from the posterior distribution. PMID:25830910
Peña, Carlos; Espeland, Marianne
2015-01-01
The species rich butterfly family Nymphalidae has been used to study evolutionary interactions between plants and insects. Theories of insect-hostplant dynamics predict accelerated diversification due to key innovations. In evolutionary biology, analysis of maximum credibility trees in the software MEDUSA (modelling evolutionary diversity using stepwise AIC) is a popular method for estimation of shifts in diversification rates. We investigated whether phylogenetic uncertainty can produce different results by extending the method across a random sample of trees from the posterior distribution of a Bayesian run. Using the MultiMEDUSA approach, we found that phylogenetic uncertainty greatly affects diversification rate estimates. Different trees produced diversification rates ranging from high values to almost zero for the same clade, and both significant rate increase and decrease in some clades. Only four out of 18 significant shifts found on the maximum clade credibility tree were consistent across most of the sampled trees. Among these, we found accelerated diversification for Ithomiini butterflies. We used the binary speciation and extinction model (BiSSE) and found that a hostplant shift to Solanaceae is correlated with increased net diversification rates in Ithomiini, congruent with the diffuse cospeciation hypothesis. Our results show that taking phylogenetic uncertainty into account when estimating net diversification rate shifts is of great importance, as very different results can be obtained when using the maximum clade credibility tree and other trees from the posterior distribution.
Bevilacqua, M; Ciarapica, F E; Giacchetta, G
2008-07-01
This work is an attempt to apply classification tree methods to data regarding accidents in a medium-sized refinery, so as to identify the important relationships between the variables, which can be considered as decision-making rules when adopting any measures for improvement. The results obtained using the CART (Classification And Regression Trees) method proved to be the most precise and, in general, they are encouraging concerning the use of tree diagrams as preliminary explorative techniques for the assessment of the ergonomic, management and operational parameters which influence high accident risk situations. The Occupational Injury analysis carried out in this paper was planned as a dynamic process and can be repeated systematically. The CART technique, which considers a very wide set of objective and predictive variables, shows new cause-effect correlations in occupational safety which had never been previously described, highlighting possible injury risk groups and supporting decision-making in these areas. The use of classification trees must not, however, be seen as an attempt to supplant other techniques, but as a complementary method which can be integrated into traditional types of analysis.
NASA Astrophysics Data System (ADS)
Książek, Judyta
2015-10-01
At present, there has been a great interest in the development of texture based image classification methods in many different areas. This study presents the results of research carried out to assess the usefulness of selected textural features for detection of asbestos-cement roofs in orthophotomap classification. Two different orthophotomaps of southern Poland (with ground resolution: 5 cm and 25 cm) were used. On both orthoimages representative samples for two classes: asbestos-cement roofing sheets and other roofing materials were selected. Estimation of texture analysis usefulness was conducted using machine learning methods based on decision trees (C5.0 algorithm). For this purpose, various sets of texture parameters were calculated in MaZda software. During the calculation of decision trees different numbers of texture parameters groups were considered. In order to obtain the best settings for decision trees models cross-validation was performed. Decision trees models with the lowest mean classification error were selected. The accuracy of the classification was held based on validation data sets, which were not used for the classification learning. For 5 cm ground resolution samples, the lowest mean classification error was 15.6%. The lowest mean classification error in the case of 25 cm ground resolution was 20.0%. The obtained results confirm potential usefulness of the texture parameter image processing for detection of asbestos-cement roofing sheets. In order to improve the accuracy another extended study should be considered in which additional textural features as well as spectral characteristics should be analyzed.
Rezaei-Darzi, Ehsan; Farzadfar, Farshad; Hashemi-Meshkini, Amir; Navidi, Iman; Mahmoudi, Mahmoud; Varmaghani, Mehdi; Mehdipour, Parinaz; Soudi Alamdari, Mahsa; Tayefi, Batool; Naderimagham, Shohreh; Soleymani, Fatemeh; Mesdaghinia, Alireza; Delavari, Alireza; Mohammad, Kazem
2014-12-01
This study aimed to evaluate and compare the prediction accuracy of two data mining techniques, including decision tree and neural network models in labeling diagnosis to gastrointestinal prescriptions in Iran. This study was conducted in three phases: data preparation, training phase, and testing phase. A sample from a database consisting of 23 million pharmacy insurance claim records, from 2004 to 2011 was used, in which a total of 330 prescriptions were assessed and used to train and test the models simultaneously. In the training phase, the selected prescriptions were assessed by both a physician and a pharmacist separately and assigned a diagnosis. To test the performance of each model, a k-fold stratified cross validation was conducted in addition to measuring their sensitivity and specificity. Generally, two methods had very similar accuracies. Considering the weighted average of true positive rate (sensitivity) and true negative rate (specificity), the decision tree had slightly higher accuracy in its ability for correct classification (83.3% and 96% versus 80.3% and 95.1%, respectively). However, when the weighted average of ROC area (AUC between each class and all other classes) was measured, the ANN displayed higher accuracies in predicting the diagnosis (93.8% compared with 90.6%). According to the result of this study, artificial neural network and decision tree model represent similar accuracy in labeling diagnosis to GI prescription.
Miles, Kenneth A; Ganeshan, Balaji; Rodriguez-Justo, Manuel; Goh, Vicky J; Ziauddin, Zia; Engledow, Alec; Meagher, Marie; Endozo, Raymondo; Taylor, Stuart A; Halligan, Stephen; Ell, Peter J; Groves, Ashley M
2014-03-01
This study explores the potential for multifunctional imaging to provide a signature for V-KI-RAS2 Kirsten rat sarcoma viral oncogene homolog (KRAS) gene mutations in colorectal cancer. This prospective study approved by the institutional review board comprised 33 patients undergoing PET/CT before surgery for proven primary colorectal cancer. Tumor tissue was examined histologically for presence of the KRAS mutations and for expression of hypoxia-inducible factor-1 (HIF-1) and minichromosome maintenance protein 2 (mcm2). The following imaging parameters were derived for each tumor: (18)F-FDG uptake ((18)F-FDG maximum standardized uptake value [SUVmax]), CT texture (expressed as mean of positive pixels [MPP]), and blood flow measured by dynamic contrast-enhanced CT. A recursive decision tree was developed in which the imaging investigations were applied sequentially to identify tumors with KRAS mutations. Monte Carlo analysis provided mean values and 95% confidence intervals for sensitivity, specificity, and accuracy. The final decision tree comprised 4 decision nodes and 5 terminal nodes, 2 of which identified KRAS mutants. The true-positive rate, false-positive rate, and accuracy (95% confidence intervals) of the decision tree were 82.4% (63.9%-93.9%), 0% (0%-10.4%), and 90.1% (79.2%-96.0%), respectively. KRAS mutants with high (18)F-FDG SUVmax and low MPP showed greater frequency of HIF-1 expression (P = 0.032). KRAS mutants with low (18)F-FDG SUV(max), high MPP, and high blood flow expressed mcm2 (P = 0.036). Multifunctional imaging with PET/CT and recursive decision-tree analysis to combine measurements of tumor (18)F-FDG uptake, CT texture, and perfusion has the potential to identify imaging signatures for colorectal cancers with KRAS mutations exhibiting hypoxic or proliferative phenotypes.
Adaptive segmentation of cerebrovascular tree in time-of-flight magnetic resonance angiography.
Hao, J T; Li, M L; Tang, F L
2008-01-01
Accurate segmentation of the human vasculature is an important prerequisite for a number of clinical procedures, such as diagnosis, image-guided neurosurgery and pre-surgical planning. In this paper, an improved statistical approach to extracting whole cerebrovascular tree in time-of-flight magnetic resonance angiography is proposed. Firstly, in order to get a more accurate segmentation result, a localized observation model is proposed instead of defining the observation model over the entire dataset. Secondly, for the binary segmentation, an improved Iterative Conditional Model (ICM) algorithm is presented to accelerate the segmentation process. The experimental results showed that the proposed algorithm can obtain more satisfactory segmentation results and save more processing time than conventional approaches, simultaneously.
NASA Astrophysics Data System (ADS)
Raziff, Abdul Rafiez Abdul; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran
2017-10-01
Gait recognition is widely used in many applications. In the application of the gait identification especially in people, the number of classes (people) is many which may comprise to more than 20. Due to the large amount of classes, the usage of single classification mapping (direct classification) may not be suitable as most of the existing algorithms are mostly designed for the binary classification. Furthermore, having many classes in a dataset may result in the possibility of having a high degree of overlapped class boundary. This paper discusses the application of multiclass classifier mappings such as one-vs-all (OvA), one-vs-one (OvO) and random correction code (RCC) on handheld based smartphone gait signal for person identification. The results is then compared with a single J48 decision tree for benchmark. From the result, it can be said that using multiclass classification mapping method thus partially improved the overall accuracy especially on OvO and RCC with width factor more than 4. For OvA, the accuracy result is worse than a single J48 due to a high number of classes.
A Dictionary Approach to Electron Backscatter Diffraction Indexing.
Chen, Yu H; Park, Se Un; Wei, Dennis; Newstadt, Greg; Jackson, Michael A; Simmons, Jeff P; De Graef, Marc; Hero, Alfred O
2015-06-01
We propose a framework for indexing of grain and subgrain structures in electron backscatter diffraction patterns of polycrystalline materials. We discretize the domain of a dynamical forward model onto a dense grid of orientations, producing a dictionary of patterns. For each measured pattern, we identify the most similar patterns in the dictionary, and identify boundaries, detect anomalies, and index crystal orientations. The statistical distribution of these closest matches is used in an unsupervised binary decision tree (DT) classifier to identify grain boundaries and anomalous regions. The DT classifies a pattern as an anomaly if it has an abnormally low similarity to any pattern in the dictionary. It classifies a pixel as being near a grain boundary if the highly ranked patterns in the dictionary differ significantly over the pixel's neighborhood. Indexing is accomplished by computing the mean orientation of the closest matches to each pattern. The mean orientation is estimated using a maximum likelihood approach that models the orientation distribution as a mixture of Von Mises-Fisher distributions over the quaternionic three sphere. The proposed dictionary matching approach permits segmentation, anomaly detection, and indexing to be performed in a unified manner with the additional benefit of uncertainty quantification.
Insurance Contract Analysis for Company Decision Support in Acquisition Management
NASA Astrophysics Data System (ADS)
Chernovita, H. P.; Manongga, D.; Iriani, A.
2017-01-01
One of company activities to retain their business is marketing the products which include in acquisition management to get new customers. Insurance contract analysis using ID3 to produce decision tree and rules to be decision support for the insurance company. The decision tree shows 13 rules that lead to contract termination claim. This could be a guide for the insurance company in acquisition management to prevent contract binding with these contract condition because it has a big chance for the customer to terminate their insurance contract before its expired date. As the result, there are several strong points that could be the determinant of contract termination such as: 1) customer age whether too young or too old, 2) long insurance period (above 10 years), 3) big insurance amount, 4) big amount of premium charges, and 5) payment method.
Comparative seed-tree and selection harvesting costs in young-growth mixed-conifer stands
William A. Atkinson; Dale O. Hall
1963-01-01
Little difference was found between yarding and felling costs in seed-tree and selection harvest cuts. The volume per acre logged was 23,800 board feet on the seed-tree compartments and 10,600 board feet on the selection compartments. For a comparable operation with this range of volumes, cutting method decisions should be based on factors other than logging costs....
Louis R. Iverson; Anantha M. Prasad; Stephen N. Matthews; Matthew P. Peters
2010-01-01
Climate change will likely cause impacts that are species specific and significant; modeling is critical to better understand potential changes in suitable habitat. We use empirical, abundance-based habitat models utilizing decision tree-based ensemble methods to explore potential changes of 134 tree species habitats in the eastern United States (http://www.nrs.fs.fed....
ERIC Educational Resources Information Center
Gillingham, Mark G.
A study examined what happened when a group of adult students read a hypertext for the goal of answering specific questions. Subjects, 30 students enrolled in an upper-division psychology course at a state university in the northwestern United States, read a binary tree-structured hypertext to answer three two-part questions on the topic of…
Hollemeyer, Klaus; Altmeyer, Wolfgang; Heinzle, Elmar; Pitra, Christian
2012-08-30
The identification of fur origins from the 5300-year-old Tyrolean Iceman's accoutrement is not yet complete, although definite identification is essential for the socio-cultural context of his epoch. Neither have all potential samples been identified so far, nor there has a consensus been reached on the species identified using the classical methods. Archaeological hair often lacks analyzable hair scale patterns in microscopic analyses and polymer chain reaction (PCR)-based techniques are often inapplicable due to the lack of amplifiable ancient DNA. To overcome these drawbacks, a matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) method was used exclusively based on hair keratins. Thirteen fur specimens from his accoutrement were analyzed after tryptic digest of native hair. Peptide mass fingerprints (pmfs) from ancient samples and from reference species mostly occurring in the Alpine surroundings at his lifetime were compared to each other using multidimensional scaling and binary hierarchical cluster tree analysis. Both statistical methods highly reflect spectral similarities among pmfs as close zoological relationships. While multidimensional scaling was useful to discriminate specimens on the zoological order level, binary hierarchical cluster tree reached the family or subfamily level. Additionally, the presence and/or absence of order, family and/or species-specific diagnostic masses in their pmfs allowed the identification of mammals mostly down to single species level. Red deer was found in his shoe vamp, goat in the leggings, cattle in his shoe sole and at his quiver's closing flap as well as sheep and chamois in his coat. Canid species, like grey wolf, domestic dog or European red fox, were discovered in his leggings for the first time, but could not be differentiated to species level. This is widening the spectrum of processed fur-bearing species to at least one member of the Canidae family. His fur cap was allocated to a carnivore species, but differentiation between brown bear and a canid species could not be made with certainty. Copyright © 2012 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Sanchez-Vila, X.; de Barros, F.; Bolster, D.; Nowak, W.
2010-12-01
Assessing the potential risk of hydro(geo)logical supply systems to human population is an interdisciplinary field. It relies on the expertise in fields as distant as hydrogeology, medicine, or anthropology, and needs powerful translation concepts to provide decision support and policy making. Reliable health risk estimates need to account for the uncertainties in hydrological, physiological and human behavioral parameters. We propose the use of fault trees to address the task of probabilistic risk analysis (PRA) and to support related management decisions. Fault trees allow decomposing the assessment of health risk into individual manageable modules, thus tackling a complex system by a structural “Divide and Conquer” approach. The complexity within each module can be chosen individually according to data availability, parsimony, relative importance and stage of analysis. The separation in modules allows for a true inter- and multi-disciplinary approach. This presentation highlights the three novel features of our work: (1) we define failure in terms of risk being above a threshold value, whereas previous studies used auxiliary events such as exceedance of critical concentration levels, (2) we plot an integrated fault tree that handles uncertainty in both hydrological and health components in a unified way, and (3) we introduce a new form of stochastic fault tree that allows to weaken the assumption of independent subsystems that is required by a classical fault tree approach. We illustrate our concept in a simple groundwater-related setting.
Modeling time-to-event (survival) data using classification tree analysis.
Linden, Ariel; Yarnold, Paul R
2017-12-01
Time to the occurrence of an event is often studied in health research. Survival analysis differs from other designs in that follow-up times for individuals who do not experience the event by the end of the study (called censored) are accounted for in the analysis. Cox regression is the standard method for analysing censored data, but the assumptions required of these models are easily violated. In this paper, we introduce classification tree analysis (CTA) as a flexible alternative for modelling censored data. Classification tree analysis is a "decision-tree"-like classification model that provides parsimonious, transparent (ie, easy to visually display and interpret) decision rules that maximize predictive accuracy, derives exact P values via permutation tests, and evaluates model cross-generalizability. Using empirical data, we identify all statistically valid, reproducible, longitudinally consistent, and cross-generalizable CTA survival models and then compare their predictive accuracy to estimates derived via Cox regression and an unadjusted naïve model. Model performance is assessed using integrated Brier scores and a comparison between estimated survival curves. The Cox regression model best predicts average incidence of the outcome over time, whereas CTA survival models best predict either relatively high, or low, incidence of the outcome over time. Classification tree analysis survival models offer many advantages over Cox regression, such as explicit maximization of predictive accuracy, parsimony, statistical robustness, and transparency. Therefore, researchers interested in accurate prognoses and clear decision rules should consider developing models using the CTA-survival framework. © 2017 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
He, Xin; Frey, Eric C.
2007-03-01
Binary ROC analysis has solid decision-theoretic foundations and a close relationship to linear discriminant analysis (LDA). In particular, for the case of Gaussian equal covariance input data, the area under the ROC curve (AUC) value has a direct relationship to the Hotelling trace. Many attempts have been made to extend binary classification methods to multi-class. For example, Fukunaga extended binary LDA to obtain multi-class LDA, which uses the multi-class Hotelling trace as a figure-of-merit, and we have previously developed a three-class ROC analysis method. This work explores the relationship between conventional multi-class LDA and three-class ROC analysis. First, we developed a linear observer, the three-class Hotelling observer (3-HO). For Gaussian equal covariance data, the 3- HO provides equivalent performance to the three-class ideal observer and, under less strict conditions, maximizes the signal to noise ratio for classification of all pairs of the three classes simultaneously. The 3-HO templates are not the eigenvectors obtained from multi-class LDA. Second, we show that the three-class Hotelling trace, which is the figureof- merit in the conventional three-class extension of LDA, has significant limitations. Third, we demonstrate that, under certain conditions, there is a linear relationship between the eigenvectors obtained from multi-class LDA and 3-HO templates. We conclude that the 3-HO based on decision theory has advantages both in its decision theoretic background and in the usefulness of its figure-of-merit. Additionally, there exists the possibility of interpreting the two linear features extracted by the conventional extension of LDA from a decision theoretic point of view.
Bayesian truthing as experimental verification of C4ISR sensors
NASA Astrophysics Data System (ADS)
Jannson, Tomasz; Forrester, Thomas; Romanov, Volodymyr; Wang, Wenjian; Nielsen, Thomas; Kostrzewski, Andrew
2015-05-01
In this paper, the general methodology for experimental verification/validation of C4ISR and other sensors' performance, is presented, based on Bayesian inference, in general, and binary sensors, in particular. This methodology, called Bayesian Truthing, defines Performance Metrics for binary sensors in: physics, optics, electronics, medicine, law enforcement, C3ISR, QC, ATR (Automatic Target Recognition), terrorism related events, and many others. For Bayesian Truthing, the sensing medium itself is not what is truly important; it is how the decision process is affected.
Optical communication system performance with tracking error induced signal fading.
NASA Technical Reports Server (NTRS)
Tycz, M.; Fitzmaurice, M. W.; Premo, D. A.
1973-01-01
System performance is determined for an optical communication system using noncoherent detection in the presence of tracking error induced signal fading assuming (1) binary on-off modulation (OOK) with both fixed and adaptive threshold receivers, and (2) binary polarization modulation (BPM). BPM is shown to maintain its inherent 2- to 3-dB advantage over OOK when adaptive thresholding is used, and to have a substantially greater advantage when the OOK system is restricted to a fixed decision threshold.
Wang, Bo; Ives, Anthony R
2017-03-01
Individual variation in seed size and seed production is high in many plant species. How does this variation affect seed-dispersing animals and, in turn, the fitness of individual plants? In this study, we first surveyed intraspecific variation in seed mass and production in a population of a Chinese white pine, Pinus armandii. For 134 target trees investigated in 2012, there was very high variation in seed size, with mean seed mass varying among trees almost tenfold, from 0.038 to 0.361 g. Furthermore, 30 of the 134 trees produced seeds 2 years later, and for these individuals there was a correlation in seed mass of 0.59 between years, implying consistent differences among individuals. For a subset of 67 trees, we monitored the foraging preferences of scatter-hoarding rodents on a total of 15,301 seeds: 8380 were ignored, 3184 were eaten in situ, 2651 were eaten after being cached, and 395 were successfully dispersed (cached and left intact). At the scale of individual seeds, seed mass affected almost every decision that rodents made to eat, remove, and cache individual seeds. At the level of individual trees, larger seeds had increased probabilities of both predation and successful dispersal: the effects of mean seed size on costs (predation) and benefits (caching) balanced out. Thus, despite seed size affecting rodent decisions, variation among trees in dispersal success associated with mean seed size was small once seeds were harvested. This might explain, at least in part, the maintenance of high variation in mean seed mass among tree individuals.
ERIC Educational Resources Information Center
Markham, Mary T.
2000-01-01
Introduces a unit on forest management in which students manage the school forest. Involves students in tree identification, determining the size or volume and height of trees, and evaluation of the forest for management decisions. Integrates mathematics, writing, and social studies with plant classification, plant reproduction, and the use of…
Beauregard, Eric; Deslauriers-Varin, Nadine; St-Yves, Michel
2010-09-01
Most studies of confessions have looked at the influence of individual factors, neglecting the potential interactions between these factors and their impact on the decision to confess or not during an interrogation. Classification and regression tree analyses conducted on a sample of 624 convicted sex offenders showed that certain factors related to the offenders (e.g., personality, criminal career), victims (e.g., sex, relationship to offender), and case (e.g., time of day of the crime) were related to the decision to confess or not during the police interrogation. Several interactions were also observed between these factors. Results will be discussed in light of previous findings and interrogation strategies for sex offenders.
Chen, Guangchao; Li, Xuehua; Chen, Jingwen; Zhang, Ya-Nan; Peijnenburg, Willie J G M
2014-12-01
Biodegradation is the principal environmental dissipation process of chemicals. As such, it is a dominant factor determining the persistence and fate of organic chemicals in the environment, and is therefore of critical importance to chemical management and regulation. In the present study, the authors developed in silico methods assessing biodegradability based on a large heterogeneous set of 825 organic compounds, using the techniques of the C4.5 decision tree, the functional inner regression tree, and logistic regression. External validation was subsequently carried out by 2 independent test sets of 777 and 27 chemicals. As a result, the functional inner regression tree exhibited the best predictability with predictive accuracies of 81.5% and 81.0%, respectively, on the training set (825 chemicals) and test set I (777 chemicals). Performance of the developed models on the 2 test sets was subsequently compared with that of the Estimation Program Interface (EPI) Suite Biowin 5 and Biowin 6 models, which also showed a better predictability of the functional inner regression tree model. The model built in the present study exhibits a reasonable predictability compared with existing models while possessing a transparent algorithm. Interpretation of the mechanisms of biodegradation was also carried out based on the models developed. © 2014 SETAC.
Angelis, Aris; Kanavos, Panos
2017-09-01
Escalating drug prices have catalysed the generation of numerous "value frameworks" with the aim of informing payers, clinicians and patients on the assessment and appraisal process of new medicines for the purpose of coverage and treatment selection decisions. Although this is an important step towards a more inclusive Value Based Assessment (VBA) approach, aspects of these frameworks are based on weak methodologies and could potentially result in misleading recommendations or decisions. In this paper, a Multiple Criteria Decision Analysis (MCDA) methodological process, based on Multi Attribute Value Theory (MAVT), is adopted for building a multi-criteria evaluation model. A five-stage model-building process is followed, using a top-down "value-focused thinking" approach, involving literature reviews and expert consultations. A generic value tree is structured capturing decision-makers' concerns for assessing the value of new medicines in the context of Health Technology Assessment (HTA) and in alignment with decision theory. The resulting value tree (Advance Value Tree) consists of three levels of criteria (top level criteria clusters, mid-level criteria, bottom level sub-criteria or attributes) relating to five key domains that can be explicitly measured and assessed: (a) burden of disease, (b) therapeutic impact, (c) safety profile (d) innovation level and (e) socioeconomic impact. A number of MAVT modelling techniques are introduced for operationalising (i.e. estimating) the model, for scoring the alternative treatment options, assigning relative weights of importance to the criteria, and combining scores and weights. Overall, the combination of these MCDA modelling techniques for the elicitation and construction of value preferences across the generic value tree provides a new value framework (Advance Value Framework) enabling the comprehensive measurement of value in a structured and transparent way. Given its flexibility to meet diverse requirements and become readily adaptable across different settings, the Advance Value Framework could be offered as a decision-support tool for evaluators and payers to aid coverage and reimbursement of new medicines. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Spatial and spatiotemporal pattern analysis of coconut lethal yellowing in Mozambique.
Bonnot, F; de Franqueville, H; Lourenço, E
2010-04-01
Coconut lethal yellowing (LY) is caused by a phytoplasma and is a major threat for coconut production throughout its growing area. Incidence of LY was monitored visually on every coconut tree in six fields in Mozambique for 34 months. Disease progress curves were plotted and average monthly disease incidence was estimated. Spatial patterns of disease incidence were analyzed at six assessment times. Aggregation was tested by the coefficient of spatial autocorrelation of the beta-binomial distribution of diseased trees in quadrats. The binary power law was used as an assessment of overdispersion across the six fields. Spatial autocorrelation between symptomatic trees was measured by the BB join count statistic based on the number of pairs of diseased trees separated by a specific distance and orientation, and tested using permutation methods. Aggregation of symptomatic trees was detected in every field in both cumulative and new cases. Spatiotemporal patterns were analyzed with two methods. The proximity of symptomatic trees at two assessment times was investigated using the spatiotemporal BB join count statistic based on the number of pairs of trees separated by a specific distance and orientation and exhibiting the first symptoms of LY at the two times. The semivariogram of times of appearance of LY was calculated to characterize how the lag between times of appearance of LY was related to the distance between symptomatic trees. Both statistics were tested using permutation methods. A tendency for new cases to appear in the proximity of previously diseased trees and a spatially structured pattern of times of appearance of LY within clusters of diseased trees were detected, suggesting secondary spread of the disease.
Hegazi, E M; Khafagi, W E; Konstantopoulou, M A; Schlyter, F; Raptopoulos, D; Shweil, S; Abd El-Rahman, S; Atwa, A; Ali, S E; Tawfik, H
2010-10-01
The leopard moth, Zeuzera pyrina (L.) (Lepidoptera: Cossidae), is a damaging pest for many fruit trees (e.g., apple [Malus spp.], pear [Pyrus spp.] peach [Prunus spp.], and olive [Olea]). Recently, it caused serious yield losses in newly established olive orchards in Egypt, including the death of young trees. Chemical and biological control have shown limited efficiency against this pest. Field tests were conducted in 2005 and 2006 to evaluate mating disruption (MD) for the control of the leopard moth, on heavily infested, densely planted olive plots (336 trees per ha). The binary blend of the pheromone components (E,Z)-2,13-octadecenyl acetate and (E,Z)-3,13-octadecenyl acetate (95:5) was dispensed from polyethylene vials. Efficacy was measured considering reduction of catches in pheromone traps, reduction of active galleries of leopard moth per tree and fruit yield in the pheromone-treated plots (MD) compared with control plots (CO). Male captures in MD plots were reduced by 89.3% in 2005 and 82.9% in 2006, during a trapping period of 14 and 13 wk, respectively. Application of MD over two consecutive years progressively reduced the number of active galleries per tree in the third year where no sex pheromone was applied. In all years, larval galleries outnumbered moth captures. Fruit yield from trees where sex pheromone had been applied in 2005 and 2006 increased significantly in 2006 (98.8 +/- 2.9 kg per tree) and 2007 (23 +/- 1.3 kg per tree) compared with control ones (61.0 +/- 3.9 and 10.0 +/- 0.6 kg per tree, respectively). Mating disruption shows promising for suppressing leopard moth infestation in olives.
Receiver Statistics for Cognitive Radios in Dynamic Spectrum Access Networks
2012-02-28
SNR) are employed by many protocols and processes in direct-sequence ( DS ) spread-spectrum packet radio networks, including soft-decision decoding...adaptive modulation protocols, and power adjustment protocols. For DS spread spectrum, we have introduced and evaluated SNR estimators that employ...obtained during demodulation in a binary CDMA receiver. We investigated several methods to apply the proposed metric to the demodulator’s soft-decision
Office of Legacy Management Decision Tree for Solar Photovoltaic Projects - 13317
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elmer, John; Butherus, Michael; Barr, Deborah L.
2013-07-01
To support consideration of renewable energy power development as a land reuse option, the DOE Office of Legacy Management (LM) and the National Renewable Energy Laboratory (NREL) established a partnership to conduct an assessment of wind and solar renewable energy resources on LM lands. From a solar capacity perspective, the larger sites in the western United States present opportunities for constructing solar photovoltaic (PV) projects. A detailed analysis and preliminary plan was developed for three large sites in New Mexico, assessing the costs, the conceptual layout of a PV system, and the electric utility interconnection process. As a result ofmore » the study, a 1,214-hectare (3,000-acre) site near Grants, New Mexico, was chosen for further study. The state incentives, utility connection process, and transmission line capacity were key factors in assessing the feasibility of the project. LM's Durango, Colorado, Disposal Site was also chosen for consideration because the uranium mill tailings disposal cell is on a hillside facing south, transmission lines cross the property, and the community was very supportive of the project. LM worked with the regulators to demonstrate that the disposal cell's long-term performance would not be impacted by the installation of a PV solar system. A number of LM-unique issues were resolved in making the site available for a private party to lease a portion of the site for a solar PV project. A lease was awarded in September 2012. Using a solar decision tree that was developed and launched by the EPA and NREL, LM has modified and expanded the decision tree structure to address the unique aspects and challenges faced by LM on its multiple sites. The LM solar decision tree covers factors such as land ownership, usable acreage, financial viability of the project, stakeholder involvement, and transmission line capacity. As additional sites are transferred to LM in the future, the decision tree will assist in determining whether a solar PV project is feasible on the new sites. (authors)« less
Wendling, T; Jung, K; Callahan, A; Schuler, A; Shah, N H; Gallego, B
2018-06-03
There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in "real-world" conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies. Copyright © 2018 John Wiley & Sons, Ltd.