tree construction algorithms: Topics by Science.gov

Sample records for tree construction algorithms

A new algorithm to construct phylogenetic networks from trees.

PubMed

Wang, J

2014-03-06

Developing appropriate methods for constructing phylogenetic networks from tree sets is an important problem, and much research is currently being undertaken in this area. BIMLR is an algorithm that constructs phylogenetic networks from tree sets. The algorithm can construct a much simpler network than other available methods. Here, we introduce an improved version of the BIMLR algorithm, QuickCass. QuickCass changes the selection strategy of the labels of leaves below the reticulate nodes, i.e., the nodes with an indegree of at least 2 in BIMLR. We show that QuickCass can construct simpler phylogenetic networks than BIMLR. Furthermore, we show that QuickCass is a polynomial-time algorithm when the output network that is constructed by QuickCass is binary.
Memory-Scalable GPU Spatial Hierarchy Construction.

PubMed

Qiming Hou; Xin Sun; Kun Zhou; Lauterbach, C; Manocha, D

2011-04-01

Recent GPU algorithms for constructing spatial hierarchies have achieved promising performance for moderately complex models by using the breadth-first search (BFS) construction order. While being able to exploit the massive parallelism on the GPU, the BFS order also consumes excessive GPU memory, which becomes a serious issue for interactive applications involving very complex models with more than a few million triangles. In this paper, we propose to use the partial breadth-first search (PBFS) construction order to control memory consumption while maximizing performance. We apply the PBFS order to two hierarchy construction algorithms. The first algorithm is for kd-trees that automatically balances between the level of parallelism and intermediate memory usage. With PBFS, peak memory consumption during construction can be efficiently controlled without costly CPU-GPU data transfer. We also develop memory allocation strategies to effectively limit memory fragmentation. The resulting algorithm scales well with GPU memory and constructs kd-trees of models with millions of triangles at interactive rates on GPUs with 1 GB memory. Compared with existing algorithms, our algorithm is an order of magnitude more scalable for a given GPU memory bound. The second algorithm is for out-of-core bounding volume hierarchy (BVH) construction for very large scenes based on the PBFS construction order. At each iteration, all constructed nodes are dumped to the CPU memory, and the GPU memory is freed for the next iteration's use. In this way, the algorithm is able to build trees that are too large to be stored in the GPU memory. Experiments show that our algorithm can construct BVHs for scenes with up to 20 M triangles, several times larger than previous GPU algorithms.
Finding Minimum-Power Broadcast Trees for Wireless Networks

NASA Technical Reports Server (NTRS)

Arabshahi, Payman; Gray, Andrew; Das, Arindam; El-Sharkawi, Mohamed; Marks, Robert, II

2004-01-01

Some algorithms have been devised for use in a method of constructing tree graphs that represent connections among the nodes of a wireless communication network. These algorithms provide for determining the viability of any given candidate connection tree and for generating an initial set of viable trees that can be used in any of a variety of search algorithms (e.g., a genetic algorithm) to find a tree that enables the network to broadcast from a source node to all other nodes while consuming the minimum amount of total power. The method yields solutions better than those of a prior algorithm known as the broadcast incremental power algorithm, albeit at a slightly greater computational cost.
Automatic creation of object hierarchies for ray tracing

NASA Technical Reports Server (NTRS)

Goldsmith, Jeffrey; Salmon, John

1987-01-01

Various methods for evaluating generated trees are proposed. The use of the hierarchical extent method of Rubin and Whitted (1980) to find the objects that will be hit by a ray is examined. This method employs tree searching; the construction of a tree of bounding volumes in order to determine the number of objects that will be hit by a ray is discussed. A tree generation algorithm, which uses a heuristic tree search strategy, is described. The effects of shuffling and sorting on the input data are investigated. The cost of inserting an object into the hierarchy during the construction of a tree algorithm is estimated. The steps involved in estimating the number of intersection calculations are presented.
Using traveling salesman problem algorithms for evolutionary tree construction.

PubMed

Korostensky, C; Gonnet, G H

2000-07-01

The construction of evolutionary trees is one of the major problems in computational biology, mainly due to its complexity. We present a new tree construction method that constructs a tree with minimum score for a given set of sequences, where the score is the amount of evolution measured in PAM distances. To do this, the problem of tree construction is reduced to the Traveling Salesman Problem (TSP). The input for the TSP algorithm are the pairwise distances of the sequences and the output is a circular tour through the optimal, unknown tree plus the minimum score of the tree. The circular order and the score can be used to construct the topology of the optimal tree. Our method can be used for any scoring function that correlates to the amount of changes along the branches of an evolutionary tree, for instance it could also be used for parsimony scores, but it cannot be used for least squares fit of distances. A TSP solution reduces the space of all possible trees to 2n. Using this order, we can guarantee that we reconstruct a correct evolutionary tree if the absolute value of the error for each distance measurement is smaller than f2.gif" BORDER="0">, where f3.gif" BORDER="0">is the length of the shortest edge in the tree. For data sets with large errors, a dynamic programming approach is used to reconstruct the tree. Finally simulations and experiments with real data are shown.
BIMLR: a method for constructing rooted phylogenetic networks from rooted phylogenetic trees.

PubMed

Wang, Juan; Guo, Maozu; Xing, Linlin; Che, Kai; Liu, Xiaoyan; Wang, Chunyu

2013-09-15

Rooted phylogenetic trees constructed from different datasets (e.g. from different genes) are often conflicting with one another, i.e. they cannot be integrated into a single phylogenetic tree. Phylogenetic networks have become an important tool in molecular evolution, and rooted phylogenetic networks are able to represent conflicting rooted phylogenetic trees. Hence, the development of appropriate methods to compute rooted phylogenetic networks from rooted phylogenetic trees has attracted considerable research interest of late. The CASS algorithm proposed by van Iersel et al. is able to construct much simpler networks than other available methods, but it is extremely slow, and the networks it constructs are dependent on the order of the input data. Here, we introduce an improved CASS algorithm, BIMLR. We show that BIMLR is faster than CASS and less dependent on the input data order. Moreover, BIMLR is able to construct much simpler networks than almost all other methods. BIMLR is available at http://nclab.hit.edu.cn/wangjuan/BIMLR/. © 2013 Elsevier B.V. All rights reserved.
Reconciliation of Gene and Species Trees

PubMed Central

Rusin, L. Y.; Lyubetskaya, E. V.; Gorbunov, K. Y.; Lyubetsky, V. A.

2014-01-01

The first part of the paper briefly overviews the problem of gene and species trees reconciliation with the focus on defining and algorithmic construction of the evolutionary scenario. Basic ideas are discussed for the aspects of mapping definitions, costs of the mapping and evolutionary scenario, imposing time scales on a scenario, incorporating horizontal gene transfers, binarization and reconciliation of polytomous trees, and construction of species trees and scenarios. The review does not intend to cover the vast diversity of literature published on these subjects. Instead, the authors strived to overview the problem of the evolutionary scenario as a central concept in many areas of evolutionary research. The second part provides detailed mathematical proofs for the solutions of two problems: (i) inferring a gene evolution along a species tree accounting for various types of evolutionary events and (ii) trees reconciliation into a single species tree when only gene duplications and losses are allowed. All proposed algorithms have a cubic time complexity and are mathematically proved to find exact solutions. Solving algorithms for problem (ii) can be naturally extended to incorporate horizontal transfers, other evolutionary events, and time scales on the species tree. PMID:24800245
OCTGRAV: Sparse Octree Gravitational N-body Code on Graphics Processing Units

NASA Astrophysics Data System (ADS)

Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

2010-10-01

Octgrav is a very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The algorithms are based on parallel-scan and sort methods. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way, a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s is achieved. It takes about a second to compute forces on a million particles with an opening angle of heta approx 0.5. To test the performance and feasibility, we implemented the algorithms in CUDA in the form of a gravitational tree-code which completely runs on the GPU. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. The code has a convenient user interface and is freely available for use.
Polynomial-Time Algorithms for Building a Consensus MUL-Tree

PubMed Central

Cui, Yun; Jansson, Jesper

2012-01-01

Abstract A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host–parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists. PMID:22963134
Polynomial-time algorithms for building a consensus MUL-tree.

PubMed

Cui, Yun; Jansson, Jesper; Sung, Wing-Kin

2012-09-01

A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host-parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists.
An Extension of CART's Pruning Algorithm. Program Statistics Research Technical Report No. 91-11.

ERIC Educational Resources Information Center

Kim, Sung-Ho

Among the computer-based methods used for the construction of trees such as AID, THAID, CART, and FACT, the only one that uses an algorithm that first grows a tree and then prunes the tree is CART. The pruning component of CART is analogous in spirit to the backward elimination approach in regression analysis. This idea provides a tool in…
Efficient FPT Algorithms for (Strict) Compatibility of Unrooted Phylogenetic Trees.

PubMed

Baste, Julien; Paul, Christophe; Sau, Ignasi; Scornavacca, Celine

2017-04-01

In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species X; these relationships are often depicted via a phylogenetic tree-a tree having its leaves labeled bijectively by elements of X and without degree-2 nodes-called the "species tree." One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g., DNA sequences originating from some species in X), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping-but not identical-sets of labels, is called "supertree." In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed parameter tractable in the number of input trees k, by using their expressibility in monadic second-order logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on k of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time [Formula: see text], where n is the total size of the input.
Tree tensor network approach to simulating Shor's algorithm

NASA Astrophysics Data System (ADS)

Dumitrescu, Eugene

2017-12-01

Constructively simulating quantum systems furthers our understanding of qualitative and quantitative features which may be analytically intractable. In this paper, we directly simulate and explore the entanglement structure present in the paradigmatic example for exponential quantum speedups: Shor's algorithm. To perform our simulation, we construct a dynamic tree tensor network which manifestly captures two salient circuit features for modular exponentiation. These are the natural two-register bipartition and the invariance of entanglement with respect to permutations of the top-register qubits. Our construction help identify the entanglement entropy properties, which we summarize by a scaling relation. Further, the tree network is efficiently projected onto a matrix product state from which we efficiently execute the quantum Fourier transform. Future simulation of quantum information states with tensor networks exploiting circuit symmetries is discussed.
Rare itemsets mining algorithm based on RP-Tree and spark framework

NASA Astrophysics Data System (ADS)

Liu, Sainan; Pan, Haoan

2018-05-01

For the issues of the rare itemsets mining in big data, this paper proposed a rare itemsets mining algorithm based on RP-Tree and Spark framework. Firstly, it arranged the data vertically according to the transaction identifier, in order to solve the defects of scan the entire data set, the vertical datasets are divided into frequent vertical datasets and rare vertical datasets. Then, it adopted the RP-Tree algorithm to construct the frequent pattern tree that contains rare items and generate rare 1-itemsets. After that, it calculated the support of the itemsets by scanning the two vertical data sets, finally, it used the iterative process to generate rare itemsets. The experimental show that the algorithm can effectively excavate rare itemsets and have great superiority in execution time.
Decision tree methods: applications for classification and prediction.

PubMed

Song, Yan-Yan; Lu, Ying

2015-04-25

Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
Technology transfer by means of fault tree synthesis

NASA Astrophysics Data System (ADS)

Batzias, Dimitris F.

2012-12-01

Since Fault Tree Analysis (FTA) attempts to model and analyze failure processes of engineering, it forms a common technique for good industrial practice. On the contrary, fault tree synthesis (FTS) refers to the methodology of constructing complex trees either from dentritic modules built ad hoc or from fault tress already used and stored in a Knowledge Base. In both cases, technology transfer takes place in a quasi-inductive mode, from partial to holistic knowledge. In this work, an algorithmic procedure, including 9 activity steps and 3 decision nodes is developed for performing effectively this transfer when the fault under investigation occurs within one of the latter stages of an industrial procedure with several stages in series. The main parts of the algorithmic procedure are: (i) the construction of a local fault tree within the corresponding production stage, where the fault has been detected, (ii) the formation of an interface made of input faults that might occur upstream, (iii) the fuzzy (to count for uncertainty) multicriteria ranking of these faults according to their significance, and (iv) the synthesis of an extended fault tree based on the construction of part (i) and on the local fault tree of the first-ranked fault in part (iii). An implementation is presented, referring to 'uneven sealing of Al anodic film', thus proving the functionality of the developed methodology.
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age

PubMed Central

2010-01-01

Background Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. Results We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. Conclusions GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events. PMID:20534164
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.

PubMed

Thomas, Paul D

2010-06-09

Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events.
The stopping rules for winsorized tree

NASA Astrophysics Data System (ADS)

Ch'ng, Chee Keong; Mahat, Nor Idayu

2017-11-01

Winsorized tree is a modified tree-based classifier that is able to investigate and to handle all outliers in all nodes along the process of constructing the tree. It overcomes the tedious process of constructing a classical tree where the splitting of branches and pruning go concurrently so that the constructed tree would not grow bushy. This mechanism is controlled by the proposed algorithm. In winsorized tree, data are screened for identifying outlier. If outlier is detected, the value is neutralized using winsorize approach. Both outlier identification and value neutralization are executed recursively in every node until predetermined stopping criterion is met. The aim of this paper is to search for significant stopping criterion to stop the tree from further splitting before overfitting. The result obtained from the conducted experiment on pima indian dataset proved that the node could produce the final successor nodes (leaves) when it has achieved the range of 70% in information gain.
Data-Parallel Algorithm for Contour Tree Construction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sewell, Christopher Meyer; Ahrens, James Paul; Carr, Hamish

2017-01-19

The goal of this project is to develop algorithms for additional visualization and analysis filters in order to expand the functionality of the VTK-m toolkit to support less critical but commonly used operators.

An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

PubMed

Liang, Ying; Liao, Bo; Zhu, Wen

2017-01-01

Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.
Live phylogeny with polytomies: Finding the most compact parsimonious trees.

PubMed

Papamichail, D; Huang, A; Kennedy, E; Ott, J-L; Miller, A; Papamichail, G

2017-08-01

Construction of phylogenetic trees has traditionally focused on binary trees where all species appear on leaves, a problem for which numerous efficient solutions have been developed. Certain application domains though, such as viral evolution and transmission, paleontology, linguistics, and phylogenetic stemmatics, often require phylogeny inference that involves placing input species on ancestral tree nodes (live phylogeny), and polytomies. These requirements, despite their prevalence, lead to computationally harder algorithmic solutions and have been sparsely examined in the literature to date. In this article we prove some unique properties of most parsimonious live phylogenetic trees with polytomies, and their mapping to traditional binary phylogenetic trees. We show that our problem reduces to finding the most compact parsimonious tree for n species, and describe a novel efficient algorithm to find such trees without resorting to exhaustive enumeration of all possible tree topologies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Constructing high-quality bounding volume hierarchies for N-body computation using the acceptance volume heuristic

NASA Astrophysics Data System (ADS)

Olsson, O.

2018-01-01

We present a novel heuristic derived from a probabilistic cost model for approximate N-body simulations. We show that this new heuristic can be used to guide tree construction towards higher quality trees with improved performance over current N-body codes. This represents an important step beyond the current practice of using spatial partitioning for N-body simulations, and enables adoption of a range of state-of-the-art algorithms developed for computer graphics applications to yield further improvements in N-body simulation performance. We outline directions for further developments and review the most promising such algorithms.
SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

PubMed

Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

2010-07-01

We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.
Hierarchical Learning of Tree Classifiers for Large-Scale Plant Species Identification.

PubMed

Fan, Jianping; Zhou, Ning; Peng, Jinye; Gao, Ling

2015-11-01

In this paper, a hierarchical multi-task structural learning algorithm is developed to support large-scale plant species identification, where a visual tree is constructed for organizing large numbers of plant species in a coarse-to-fine fashion and determining the inter-related learning tasks automatically. For a given parent node on the visual tree, it contains a set of sibling coarse-grained categories of plant species or sibling fine-grained plant species, and a multi-task structural learning algorithm is developed to train their inter-related classifiers jointly for enhancing their discrimination power. The inter-level relationship constraint, e.g., a plant image must first be assigned to a parent node (high-level non-leaf node) correctly if it can further be assigned to the most relevant child node (low-level non-leaf node or leaf node) on the visual tree, is formally defined and leveraged to learn more discriminative tree classifiers over the visual tree. Our experimental results have demonstrated the effectiveness of our hierarchical multi-task structural learning algorithm on training more discriminative tree classifiers for large-scale plant species identification.
Automated rule-base creation via CLIPS-Induce

NASA Technical Reports Server (NTRS)

Murphy, Patrick M.

1994-01-01

Many CLIPS rule-bases contain one or more rule groups that perform classification. In this paper we describe CLIPS-Induce, an automated system for the creation of a CLIPS classification rule-base from a set of test cases. CLIPS-Induce consists of two components, a decision tree induction component and a CLIPS production extraction component. ID3, a popular decision tree induction algorithm, is used to induce a decision tree from the test cases. CLIPS production extraction is accomplished through a top-down traversal of the decision tree. Nodes of the tree are used to construct query rules, and branches of the tree are used to construct classification rules. The learned CLIPS productions may easily be incorporated into a large CLIPS system that perform tasks such as accessing a database or displaying information.
Fast Construction of Near Parsimonious Hybridization Networks for Multiple Phylogenetic Trees.

PubMed

Mirzaei, Sajad; Wu, Yufeng

2016-01-01

Hybridization networks represent plausible evolutionary histories of species that are affected by reticulate evolutionary processes. An established computational problem on hybridization networks is constructing the most parsimonious hybridization network such that each of the given phylogenetic trees (called gene trees) is "displayed" in the network. There have been several previous approaches, including an exact method and several heuristics, for this NP-hard problem. However, the exact method is only applicable to a limited range of data, and heuristic methods can be less accurate and also slow sometimes. In this paper, we develop a new algorithm for constructing near parsimonious networks for multiple binary gene trees. This method is more efficient for large numbers of gene trees than previous heuristics. This new method also produces more parsimonious results on many simulated datasets as well as a real biological dataset than a previous method. We also show that our method produces topologically more accurate networks for many datasets.
Learning accurate very fast decision trees from uncertain data streams

NASA Astrophysics Data System (ADS)

Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo

2015-12-01

Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.
Indexing Volumetric Shapes with Matching and Packing

PubMed Central

Koes, David Ryan; Camacho, Carlos J.

2014-01-01

We describe a novel algorithm for bulk-loading an index with high-dimensional data and apply it to the problem of volumetric shape matching. Our matching and packing algorithm is a general approach for packing data according to a similarity metric. First an approximate k-nearest neighbor graph is constructed using vantage-point initialization, an improvement to previous work that decreases construction time while improving the quality of approximation. Then graph matching is iteratively performed to pack related items closely together. The end result is a dense index with good performance. We define a new query specification for shape matching that uses minimum and maximum shape constraints to explicitly specify the spatial requirements of the desired shape. This specification provides a natural language for performing volumetric shape matching and is readily supported by the geometry-based similarity search (GSS) tree, an indexing structure that maintains explicit representations of volumetric shape. We describe our implementation of a GSS tree for volumetric shape matching and provide a comprehensive evaluation of parameter sensitivity, performance, and scalability. Compared to previous bulk-loading algorithms, we find that matching and packing can construct a GSS-tree index in the same amount of time that is denser, flatter, and better performing, with an observed average performance improvement of 2X. PMID:26085707
Implementation of Data Mining to Analyze Drug Cases Using C4.5 Decision Tree

NASA Astrophysics Data System (ADS)

Wahyuni, Sri

2018-03-01

Data mining was the process of finding useful information from a large set of databases. One of the existing techniques in data mining was classification. The method used was decision tree method and algorithm used was C4.5 algorithm. The decision tree method was a method that transformed a very large fact into a decision tree which was presenting the rules. Decision tree method was useful for exploring data, as well as finding a hidden relationship between a number of potential input variables with a target variable. The decision tree of the C4.5 algorithm was constructed with several stages including the selection of attributes as roots, created a branch for each value and divided the case into the branch. These stages would be repeated for each branch until all the cases on the branch had the same class. From the solution of the decision tree there would be some rules of a case. In this case the researcher classified the data of prisoners at Labuhan Deli prison to know the factors of detainees committing criminal acts of drugs. By applying this C4.5 algorithm, then the knowledge was obtained as information to minimize the criminal acts of drugs. From the findings of the research, it was found that the most influential factor of the detainee committed the criminal act of drugs was from the address variable.
Binary tree eigen solver in finite element analysis

NASA Technical Reports Server (NTRS)

Akl, F. A.; Janetzke, D. C.; Kiraly, L. J.

1993-01-01

This paper presents a transputer-based binary tree eigensolver for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on the method of recursive doubling, which parallel implementation of a number of associative operations on an arbitrary set having N elements is of the order of o(log2N), compared to (N-1) steps if implemented sequentially. The hardware used in the implementation of the binary tree consists of 32 transputers. The algorithm is written in OCCAM which is a high-level language developed with the transputers to address parallel programming constructs and to provide the communications between processors. The algorithm can be replicated to match the size of the binary tree transputer network. Parallel and sequential finite element analysis programs have been developed to solve for the set of the least-order eigenpairs using the modified subspace method. The speed-up obtained for a typical analysis problem indicates close agreement with the theoretical prediction given by the method of recursive doubling.
VCFtoTree: a user-friendly tool to construct locus-specific alignments and phylogenies from thousands of anthropologically relevant genome sequences.

PubMed

Xu, Duo; Jaber, Yousef; Pavlidis, Pavlos; Gokcumen, Omer

2017-09-26

Constructing alignments and phylogenies for a given locus from large genome sequencing studies with relevant outgroups allow novel evolutionary and anthropological insights. However, no user-friendly tool has been developed to integrate thousands of recently available and anthropologically relevant genome sequences to construct complete sequence alignments and phylogenies. Here, we provide VCFtoTree, a user friendly tool with a graphical user interface that directly accesses online databases to download, parse and analyze genome variation data for regions of interest. Our pipeline combines popular sequence datasets and tree building algorithms with custom data parsing to generate accurate alignments and phylogenies using all the individuals from the 1000 Genomes Project, Neanderthal and Denisovan genomes, as well as reference genomes of Chimpanzee and Rhesus Macaque. It can also be applied to other phased human genomes, as well as genomes from other species. The output of our pipeline includes an alignment in FASTA format and a tree file in newick format. VCFtoTree fulfills the increasing demand for constructing alignments and phylogenies for a given loci from thousands of available genomes. Our software provides a user friendly interface for a wider audience without prerequisite knowledge in programming. VCFtoTree can be accessed from https://github.com/duoduoo/VCFtoTree_3.0.0 .
Application of a fast skyline computation algorithm for serendipitous searching problems

NASA Astrophysics Data System (ADS)

Koizumi, Kenichi; Hiraki, Kei; Inaba, Mary

2018-02-01

Skyline computation is a method of extracting interesting entries from a large population with multiple attributes. These entries, called skyline or Pareto optimal entries, are known to have extreme characteristics that cannot be found by outlier detection methods. Skyline computation is an important task for characterizing large amounts of data and selecting interesting entries with extreme features. When the population changes dynamically, the task of calculating a sequence of skyline sets is called continuous skyline computation. This task is known to be difficult to perform for the following reasons: (1) information of non-skyline entries must be stored since they may join the skyline in the future; (2) the appearance or disappearance of even a single entry can change the skyline drastically; (3) it is difficult to adopt a geometric acceleration algorithm for skyline computation tasks with high-dimensional datasets. Our new algorithm called jointed rooted-tree (JR-tree) manages entries using a rooted tree structure. JR-tree delays extend the tree to deep levels to accelerate tree construction and traversal. In this study, we presented the difficulties in extracting entries tagged with a rare label in high-dimensional space and the potential of fast skyline computation in low-latency cell identification technology.
An efficient group multicast routing for multimedia communication

NASA Astrophysics Data System (ADS)

Wang, Yanlin; Sun, Yugen; Yan, Xinfang

2004-04-01

Group multicasting is a kind of communication mechanism whereby each member of a group sends messages to all the other members of the same group. Group multicast routing algorithms capable of satisfying quality of service (QoS) requirements of multimedia applications are essential for high-speed networks. We present a heuristic algorithm for group multicast routing with end to end delay constraint. Source-specific routing trees for each member are generated in our algorithm, which satisfy member"s bandwidth and end to end delay requirements. Simulations over random network were carried out to compare proposed algorithm performance with Low and Song"s. The experimental results show that our proposed algorithm performs better in terms of network cost and ability in constructing feasible multicast trees for group members. Moreover, our algorithm achieves good performance in balancing traffic, which can avoid link blocking and enhance the network behavior efficiently.
Multipoint to multipoint routing and wavelength assignment in multi-domain optical networks

NASA Astrophysics Data System (ADS)

Qin, Panke; Wu, Jingru; Li, Xudong; Tang, Yongli

2018-01-01

In multi-point to multi-point (MP2MP) routing and wavelength assignment (RWA) problems, researchers usually assume the optical networks to be a single domain. However, the optical networks develop toward to multi-domain and larger scale in practice. In this context, multi-core shared tree (MST)-based MP2MP RWA are introduced problems including optimal multicast domain sequence selection, core nodes belonging in which domains and so on. In this letter, we focus on MST-based MP2MP RWA problems in multi-domain optical networks, mixed integer linear programming (MILP) formulations to optimally construct MP2MP multicast trees is presented. A heuristic algorithm base on network virtualization and weighted clustering algorithm (NV-WCA) is proposed. Simulation results show that, under different traffic patterns, the proposed algorithm achieves significant improvement on network resources occupation and multicast trees setup latency in contrast with the conventional algorithms which were proposed base on a single domain network environment.
Phylogenetic Copy-Number Factorization of Multiple Tumor Samples.

PubMed

Zaccaria, Simone; El-Kebir, Mohammed; Klau, Gunnar W; Raphael, Benjamin J

2018-04-16

Cancer is an evolutionary process driven by somatic mutations. This process can be represented as a phylogenetic tree. Constructing such a phylogenetic tree from genome sequencing data is a challenging task due to the many types of mutations in cancer and the fact that nearly all cancer sequencing is of a bulk tumor, measuring a superposition of somatic mutations present in different cells. We study the problem of reconstructing tumor phylogenies from copy-number aberrations (CNAs) measured in bulk-sequencing data. We introduce the Copy-Number Tree Mixture Deconvolution (CNTMD) problem, which aims to find the phylogenetic tree with the fewest number of CNAs that explain the copy-number data from multiple samples of a tumor. We design an algorithm for solving the CNTMD problem and apply the algorithm to both simulated and real data. On simulated data, we find that our algorithm outperforms existing approaches that either perform deconvolution/factorization of mixed tumor samples or build phylogenetic trees assuming homogeneous tumor samples. On real data, we analyze multiple samples from a prostate cancer patient, identifying clones within these samples and a phylogenetic tree that relates these clones and their differing proportions across samples. This phylogenetic tree provides a higher resolution view of copy-number evolution of this cancer than published analyses.
A survival tree method for the analysis of discrete event times in clinical and epidemiological studies.

PubMed

Schmid, Matthias; Küchenhoff, Helmut; Hoerauf, Achim; Tutz, Gerhard

2016-02-28

Survival trees are a popular alternative to parametric survival modeling when there are interactions between the predictor variables or when the aim is to stratify patients into prognostic subgroups. A limitation of classical survival tree methodology is that most algorithms for tree construction are designed for continuous outcome variables. Hence, classical methods might not be appropriate if failure time data are measured on a discrete time scale (as is often the case in longitudinal studies where data are collected, e.g., quarterly or yearly). To address this issue, we develop a method for discrete survival tree construction. The proposed technique is based on the result that the likelihood of a discrete survival model is equivalent to the likelihood of a regression model for binary outcome data. Hence, we modify tree construction methods for binary outcomes such that they result in optimized partitions for the estimation of discrete hazard functions. By applying the proposed method to data from a randomized trial in patients with filarial lymphedema, we demonstrate how discrete survival trees can be used to identify clinically relevant patient groups with similar survival behavior. Copyright © 2015 John Wiley & Sons, Ltd.
Development of Gis Tool for the Solution of Minimum Spanning Tree Problem using Prim's Algorithm

NASA Astrophysics Data System (ADS)

Dutta, S.; Patra, D.; Shankar, H.; Alok Verma, P.

2014-11-01

minimum spanning tree (MST) of a connected, undirected and weighted network is a tree of that network consisting of all its nodes and the sum of weights of all its edges is minimum among all such possible spanning trees of the same network. In this study, we have developed a new GIS tool using most commonly known rudimentary algorithm called Prim's algorithm to construct the minimum spanning tree of a connected, undirected and weighted road network. This algorithm is based on the weight (adjacency) matrix of a weighted network and helps to solve complex network MST problem easily, efficiently and effectively. The selection of the appropriate algorithm is very essential otherwise it will be very hard to get an optimal result. In case of Road Transportation Network, it is very essential to find the optimal results by considering all the necessary points based on cost factor (time or distance). This paper is based on solving the Minimum Spanning Tree (MST) problem of a road network by finding it's minimum span by considering all the important network junction point. GIS technology is usually used to solve the network related problems like the optimal path problem, travelling salesman problem, vehicle routing problems, location-allocation problems etc. Therefore, in this study we have developed a customized GIS tool using Python script in ArcGIS software for the solution of MST problem for a Road Transportation Network of Dehradun city by considering distance and time as the impedance (cost) factors. It has a number of advantages like the users do not need a greater knowledge of the subject as the tool is user-friendly and that allows to access information varied and adapted the needs of the users. This GIS tool for MST can be applied for a nationwide plan called Prime Minister Gram Sadak Yojana in India to provide optimal all weather road connectivity to unconnected villages (points). This tool is also useful for constructing highways or railways spanning several cities optimally or connecting all cities with minimum total road length.
Resolution of the 1D regularized Burgers equation using a spatial wavelet approximation

NASA Technical Reports Server (NTRS)

Liandrat, J.; Tchamitchian, PH.

1990-01-01

The Burgers equation with a small viscosity term, initial and periodic boundary conditions is resolved using a spatial approximation constructed from an orthonormal basis of wavelets. The algorithm is directly derived from the notions of multiresolution analysis and tree algorithms. Before the numerical algorithm is described these notions are first recalled. The method uses extensively the localization properties of the wavelets in the physical and Fourier spaces. Moreover, the authors take advantage of the fact that the involved linear operators have constant coefficients. Finally, the algorithm can be considered as a time marching version of the tree algorithm. The most important point is that an adaptive version of the algorithm exists: it allows one to reduce in a significant way the number of degrees of freedom required for a good computation of the solution. Numerical results and description of the different elements of the algorithm are provided in combination with different mathematical comments on the method and some comparison with more classical numerical algorithms.
Adaptive Broadcasting Mechanism for Bandwidth Allocation in Mobile Services

PubMed Central

Horng, Gwo-Jiun; Wang, Chi-Hsuan; Chou, Chih-Lun

2014-01-01

This paper proposes a tree-based adaptive broadcasting (TAB) algorithm for data dissemination to improve data access efficiency. The proposed TAB algorithm first constructs a broadcast tree to determine the broadcast frequency of each data and splits the broadcast tree into some broadcast wood to generate the broadcast program. In addition, this paper develops an analytical model to derive the mean access latency of the generated broadcast program. In light of the derived results, both the index channel's bandwidth and the data channel's bandwidth can be optimally allocated to maximize bandwidth utilization. This paper presents experiments to help evaluate the effectiveness of the proposed strategy. From the experimental results, it can be seen that the proposed mechanism is feasible in practice. PMID:25057509

Numerical taxonomy on data: Experimental results

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cohen, J.; Farach, M.

1997-12-01

The numerical taxonomy problems associated with most of the optimization criteria described above are NP - hard [3, 5, 1, 4]. In, the first positive result for numerical taxonomy was presented. They showed that if e is the distance to the closest tree metric under the L{sub {infinity}} norm. i.e., e = min{sub T} [L{sub {infinity}} (T-D)], then it is possible to construct a tree T such that L{sub {infinity}} (T-D) {le} 3e, that is, they gave a 3-approximation algorithm for this problem. We will refer to this algorithm as the Single Pivot (SP) heuristic.
A fast algorithm for identifying friends-of-friends halos

NASA Astrophysics Data System (ADS)

Feng, Y.; Modi, C.

2017-07-01

We describe a simple and fast algorithm for identifying friends-of-friends features and prove its correctness. The algorithm avoids unnecessary expensive neighbor queries, uses minimal memory overhead, and rejects slowdown in high over-density regions. We define our algorithm formally based on pair enumeration, a problem that has been heavily studied in fast 2-point correlation codes and our reference implementation employs a dual KD-tree correlation function code. We construct features in a hierarchical tree structure, and use a splay operation to reduce the average cost of identifying the root of a feature from O [ log L ] to O [ 1 ] (L is the size of a feature) without additional memory costs. This reduces the overall time complexity of merging trees from O [ L log L ] to O [ L ] , reducing the number of operations per splay by orders of magnitude. We next introduce a pruning operation that skips merge operations between two fully self-connected KD-tree nodes. This improves the robustness of the algorithm, reducing the number of merge operations in high density peaks from O [δ2 ] to O [ δ ] . We show that for cosmological data set the algorithm eliminates more than half of merge operations for typically used linking lengths b ∼ 0 . 2 (relative to mean separation). Furthermore, our algorithm is extremely simple and easy to implement on top of an existing pair enumeration code, reusing the optimization effort that has been invested in fast correlation function codes.
a Gross Error Elimination Method for Point Cloud Data Based on Kd-Tree

NASA Astrophysics Data System (ADS)

Kang, Q.; Huang, G.; Yang, S.

2018-04-01

Point cloud data has been one type of widely used data sources in the field of remote sensing. Key steps of point cloud data's pro-processing focus on gross error elimination and quality control. Owing to the volume feature of point could data, existed gross error elimination methods need spend massive memory both in space and time. This paper employed a new method which based on Kd-tree algorithm to construct, k-nearest neighbor algorithm to search, settled appropriate threshold to determine with result turns out a judgement that whether target point is or not an outlier. Experimental results show that, our proposed algorithm will help to delete gross error in point cloud data and facilitate to decrease memory consumption, improve efficiency.
Automated construction of arterial and venous trees in retinal images.

PubMed

Hu, Qiao; Abràmoff, Michael D; Garvin, Mona K

2015-10-01

While many approaches exist to segment retinal vessels in fundus photographs, only a limited number focus on the construction and disambiguation of arterial and venous trees. Previous approaches are local and/or greedy in nature, making them susceptible to errors or limiting their applicability to large vessels. We propose a more global framework to generate arteriovenous trees in retinal images, given a vessel segmentation. In particular, our approach consists of three stages. The first stage is to generate an overconnected vessel network, named the vessel potential connectivity map (VPCM), consisting of vessel segments and the potential connectivity between them. The second stage is to disambiguate the VPCM into multiple anatomical trees, using a graph-based metaheuristic algorithm. The third stage is to classify these trees into arterial or venous (A/V) trees. We evaluated our approach with a ground truth built based on a public database, showing a pixel-wise classification accuracy of 88.15% using a manual vessel segmentation as input, and 86.11% using an automatic vessel segmentation as input.
Blood oxygen level dependent magnetic resonance imaging for detecting pathological patterns in lupus nephritis patients: a preliminary study using a decision tree model.

PubMed

Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng

2018-02-09

Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P < 0.001) and inferior to that of the Logistic regression model (71.87% vs 78.71%, P < 0.001). The specificity of decision tree model was equivalent to that of the line discriminant model (63.87% vs 63.73%, P = 0.939) and higher than that of the logistic regression model (63.87% vs 38.0%, P < 0.001). The Area under the ROC curve (AUROCC) of the decision tree model was greater than that of the line discriminant model (0.765 vs 0.629, P < 0.001) and logistic regression model (0.765 vs 0.662, P < 0.001). BOLD MRI is a useful non-invasive imaging technique for the evaluation of lupus nephritis. Decision tree models constructed using functions of R2* values may facilitate the prediction of renal pathological patterns.
Evolving optimised decision rules for intrusion detection using particle swarm paradigm

NASA Astrophysics Data System (ADS)

Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.

2012-12-01

The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.
Association between split selection instability and predictive error in survival trees.

PubMed

Radespiel-Tröger, M; Gefeller, O; Rabenstein, T; Hothorn, T

2006-01-01

To evaluate split selection instability in six survival tree algorithms and its relationship with predictive error by means of a bootstrap study. We study the following algorithms: logrank statistic with multivariate p-value adjustment without pruning (LR), Kaplan-Meier distance of survival curves (KM), martingale residuals (MR), Poisson regression for censored data (PR), within-node impurity (WI), and exponential log-likelihood loss (XL). With the exception of LR, initial trees are pruned by using split-complexity, and final trees are selected by means of cross-validation. We employ a real dataset from a clinical study of patients with gallbladder stones. The predictive error is evaluated using the integrated Brier score for censored data. The relationship between split selection instability and predictive error is evaluated by means of box-percentile plots, covariate and cutpoint selection entropy, and cutpoint selection coefficients of variation, respectively, in the root node. We found a positive association between covariate selection instability and predictive error in the root node. LR yields the lowest predictive error, while KM and MR yield the highest predictive error. The predictive error of survival trees is related to split selection instability. Based on the low predictive error of LR, we recommend the use of this algorithm for the construction of survival trees. Unpruned survival trees with multivariate p-value adjustment can perform equally well compared to pruned trees. The analysis of split selection instability can be used to communicate the results of tree-based analyses to clinicians and to support the application of survival trees.
Outdoor Illegal Construction Identification Algorithm Based on 3D Point Cloud Segmentation

NASA Astrophysics Data System (ADS)

An, Lu; Guo, Baolong

2018-03-01

Recently, various illegal constructions occur significantly in our surroundings, which seriously restrict the orderly development of urban modernization. The 3D point cloud data technology is used to identify the illegal buildings, which could address the problem above effectively. This paper proposes an outdoor illegal construction identification algorithm based on 3D point cloud segmentation. Initially, in order to save memory space and reduce processing time, a lossless point cloud compression method based on minimum spanning tree is proposed. Then, a ground point removing method based on the multi-scale filtering is introduced to increase accuracy. Finally, building clusters on the ground can be obtained using a region growing method, as a result, the illegal construction can be marked. The effectiveness of the proposed algorithm is verified using a publicly data set collected from the International Society for Photogrammetry and Remote Sensing (ISPRS).
Toward a Better Compression for DNA Sequences Using Huffman Encoding

PubMed Central

Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

2017-01-01

Abstract Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016). PMID:27960065
Toward a Better Compression for DNA Sequences Using Huffman Encoding.

PubMed

Al-Okaily, Anas; Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

2017-04-01

Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016 ).
Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

PubMed

Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

2014-02-01

Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.
Comparing Phylogenetic Trees by Matching Nodes Using the Transfer Distance Between Partitions.

PubMed

Bogdanowicz, Damian; Giaro, Krzysztof

2017-05-01

Ability to quantify dissimilarity of different phylogenetic trees describing the relationship between the same group of taxa is required in various types of phylogenetic studies. For example, such metrics are used to assess the quality of phylogeny construction methods, to define optimization criteria in supertree building algorithms, or to find horizontal gene transfer (HGT) events. Among the set of metrics described so far in the literature, the most commonly used seems to be the Robinson-Foulds distance. In this article, we define a new metric for rooted trees-the Matching Pair (MP) distance. The MP metric uses the concept of the minimum-weight perfect matching in a complete bipartite graph constructed from partitions of all pairs of leaves of the compared phylogenetic trees. We analyze the properties of the MP metric and present computational experiments showing its potential applicability in tasks related to finding the HGT events.
Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.

PubMed

Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H

2006-01-01

Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.
Improving estimation of tree carbon stocks by harvesting aboveground woody biomass within airborne LiDAR flight areas

NASA Astrophysics Data System (ADS)

Colgan, M.; Asner, G. P.; Swemmer, A. M.

2011-12-01

The accurate estimation of carbon stored in a tree is essential to accounting for the carbon emissions due to deforestation and degradation. Airborne LiDAR (Light Detection and Ranging) has been successful in estimating aboveground carbon density (ACD) by correlating airborne metrics, such as canopy height, to field-estimated biomass. This latter step is reliant on field allometry which is applied to forest inventory quantities, such as stem diameter and height, to predict the biomass of a given tree stem. Constructing such allometry is expensive, time consuming, and requires destructive sampling. Consequently, the sample sizes used to construct such allometry are often small, and the largest tree sampled is often much smaller than the largest in the forest population. The uncertainty resulting from these sampling errors can lead to severe biases when the allometry is applied to stems larger than those harvested to construct the allometry, which is then subsequently propagated to airborne ACD estimates. The Kruger National Park (KNP) mission of maintaining biodiversity coincides with preserving ecosystem carbon stocks. However, one hurdle to accurately quantifying carbon density in savannas is that small stems are typically harvested to construct woody biomass allometry, yet they are not representative of Kruger's distribution of biomass. Consequently, these equations inadequately capture large tree variation in sapwood/hardwood composition, root/shoot/leaf allocation, branch fall, and stem rot. This study eliminates the "middleman" of field allometry by directly measuring, or harvesting, tree biomass within the extent of airborne LiDAR. This enables comparisons of field and airborne ACD estimates, and also enables creation of new airborne algorithms to estimate biomass at the scale of individual trees. A field campaign was conducted at Pompey Silica Mine 5km outside Kruger National Park, South Africa, in Mar-Aug 2010 to harvest and weigh tree mass. Since harvesting of trees is not possible within KNP, this was a unique opportunity to fell trees already scheduled to be cleared for mining operations. The area was first flown by the Carnegie Airborne Observatory in early May, prior to harvest, to enable correlation of LiDAR-measured tree height and crown diameter to harvested tree mass. Results include over 4,000 harvested stems and 13 species-specific biomass equations, including seven Kruger woody species previously without allometry. We found existing biomass stem allometry over-estimates ACD in the field, whereas airborne estimates based on harvest data avoid this bias while maintaining similar precision to field-based estimates. Lastly, a new airborne algorithm estimating biomass at the tree-level reduced error from tree canopies "leaning" into field plots but whose stems are outside plot boundaries. These advances pave the way to better understanding of savanna and forest carbon density at landscape and regional scales.
Reducing process delays for real-time earthquake parameter estimation - An application of KD tree to large databases for Earthquake Early Warning

NASA Astrophysics Data System (ADS)

Yin, Lucy; Andrews, Jennifer; Heaton, Thomas

2018-05-01

Earthquake parameter estimations using nearest neighbor searching among a large database of observations can lead to reliable prediction results. However, in the real-time application of Earthquake Early Warning (EEW) systems, the accurate prediction using a large database is penalized by a significant delay in the processing time. We propose to use a multidimensional binary search tree (KD tree) data structure to organize large seismic databases to reduce the processing time in nearest neighbor search for predictions. We evaluated the performance of KD tree on the Gutenberg Algorithm, a database-searching algorithm for EEW. We constructed an offline test to predict peak ground motions using a database with feature sets of waveform filter-bank characteristics, and compare the results with the observed seismic parameters. We concluded that large database provides more accurate predictions of the ground motion information, such as peak ground acceleration, velocity, and displacement (PGA, PGV, PGD), than source parameters, such as hypocenter distance. Application of the KD tree search to organize the database reduced the average searching process by 85% time cost of the exhaustive method, allowing the method to be feasible for real-time implementation. The algorithm is straightforward and the results will reduce the overall time of warning delivery for EEW.
Application of decision tree model for the ground subsidence hazard mapping near abandoned underground coal mines.

PubMed

Lee, Saro; Park, Inhye

2013-09-30

Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. Copyright © 2013. Published by Elsevier Ltd.
An Isometric Mapping Based Co-Location Decision Tree Algorithm

NASA Astrophysics Data System (ADS)

Zhou, G.; Wei, J.; Zhou, X.; Zhang, R.; Huang, W.; Sha, H.; Chen, J.

2018-05-01

Decision tree (DT) induction has been widely used in different pattern classification. However, most traditional DTs have the disadvantage that they consider only non-spatial attributes (ie, spectral information) as a result of classifying pixels, which can result in objects being misclassified. Therefore, some researchers have proposed a co-location decision tree (Cl-DT) method, which combines co-location and decision tree to solve the above the above-mentioned traditional decision tree problems. Cl-DT overcomes the shortcomings of the existing DT algorithms, which create a node for each value of a given attribute, which has a higher accuracy than the existing decision tree approach. However, for non-linearly distributed data instances, the euclidean distance between instances does not reflect the true positional relationship between them. In order to overcome these shortcomings, this paper proposes an isometric mapping method based on Cl-DT (called, (Isomap-based Cl-DT), which is a method that combines heterogeneous and Cl-DT together. Because isometric mapping methods use geodetic distances instead of Euclidean distances between non-linearly distributed instances, the true distance between instances can be reflected. The experimental results and several comparative analyzes show that: (1) The extraction method of exposed carbonate rocks is of high accuracy. (2) The proposed method has many advantages, because the total number of nodes, the number of leaf nodes and the number of nodes are greatly reduced compared to Cl-DT. Therefore, the Isomap -based Cl-DT algorithm can construct a more accurate and faster decision tree.
Arterial tree tracking from anatomical landmarks in magnetic resonance angiography scans

NASA Astrophysics Data System (ADS)

O'Neil, Alison; Beveridge, Erin; Houston, Graeme; McCormick, Lynne; Poole, Ian

2014-03-01

This paper reports on arterial tree tracking in fourteen Contrast Enhanced MRA volumetric scans, given the positions of a predefined set of vascular landmarks, by using the A* algorithm to find the optimal path for each vessel based on voxel intensity and a learnt vascular probability atlas. The algorithm is intended for use in conjunction with an automatic landmark detection step, to enable fully automatic arterial tree tracking. The scan is filtered to give two further images using the top-hat transform with 4mm and 8mm cubic structuring elements. Vessels are then tracked independently on the scan in which the vessel of interest is best enhanced, as determined from knowledge of typical vessel diameter and surrounding structures. A vascular probability atlas modelling expected vessel location and orientation is constructed by non-rigidly registering the training scans to the test scan using a 3D thin plate spline to match landmark correspondences, and employing kernel density estimation with the ground truth center line points to form a probability density distribution. Threshold estimation by histogram analysis is used to segment background from vessel intensities. The A* algorithm is run using a linear cost function constructed from the threshold and the vascular atlas prior. Tracking results are presented for all major arteries excluding those in the upper limbs. An improvement was observed when tracking was informed by contextual information, with particular benefit for peripheral vessels.
Automated construction of arterial and venous trees in retinal images

PubMed Central

Hu, Qiao; Abràmoff, Michael D.; Garvin, Mona K.

2015-01-01

Abstract. While many approaches exist to segment retinal vessels in fundus photographs, only a limited number focus on the construction and disambiguation of arterial and venous trees. Previous approaches are local and/or greedy in nature, making them susceptible to errors or limiting their applicability to large vessels. We propose a more global framework to generate arteriovenous trees in retinal images, given a vessel segmentation. In particular, our approach consists of three stages. The first stage is to generate an overconnected vessel network, named the vessel potential connectivity map (VPCM), consisting of vessel segments and the potential connectivity between them. The second stage is to disambiguate the VPCM into multiple anatomical trees, using a graph-based metaheuristic algorithm. The third stage is to classify these trees into arterial or venous (A/V) trees. We evaluated our approach with a ground truth built based on a public database, showing a pixel-wise classification accuracy of 88.15% using a manual vessel segmentation as input, and 86.11% using an automatic vessel segmentation as input. PMID:26636114
Extensions and applications of ensemble-of-trees methods in machine learning

NASA Astrophysics Data System (ADS)

Bleich, Justin

Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems.

Efficient enumeration of monocyclic chemical graphs with given path frequencies

PubMed Central

2014-01-01

Background The enumeration of chemical graphs (molecular graphs) satisfying given constraints is one of the fundamental problems in chemoinformatics and bioinformatics because it leads to a variety of useful applications including structure determination and development of novel chemical compounds. Results We consider the problem of enumerating chemical graphs with monocyclic structure (a graph structure that contains exactly one cycle) from a given set of feature vectors, where a feature vector represents the frequency of the prescribed paths in a chemical compound to be constructed and the set is specified by a pair of upper and lower feature vectors. To enumerate all tree-like (acyclic) chemical graphs from a given set of feature vectors, Shimizu et al. and Suzuki et al. proposed efficient branch-and-bound algorithms based on a fast tree enumeration algorithm. In this study, we devise a novel method for extending these algorithms to enumeration of chemical graphs with monocyclic structure by designing a fast algorithm for testing uniqueness. The results of computational experiments reveal that the computational efficiency of the new algorithm is as good as those for enumeration of tree-like chemical compounds. Conclusions We succeed in expanding the class of chemical graphs that are able to be enumerated efficiently. PMID:24955135
Treecode with a Special-Purpose Processor

NASA Astrophysics Data System (ADS)

Makino, Junichiro

1991-08-01

We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.
On the Effective Construction of Compactly Supported Wavelets Satisfying Homogenous Boundary Conditions on the Interval

NASA Technical Reports Server (NTRS)

Chiavassa, G.; Liandrat, J.

1996-01-01

We construct compactly supported wavelet bases satisfying homogeneous boundary conditions on the interval (0,1). The maximum features of multiresolution analysis on the line are retained, including polynomial approximation and tree algorithms. The case of H(sub 0)(sup 1)(0, 1)is detailed, and numerical values, required for the implementation, are provided for the Neumann and Dirichlet boundary conditions.
Algorithm of composing the schedule of construction and installation works

NASA Astrophysics Data System (ADS)

Nehaj, Rustam; Molotkov, Georgij; Rudchenko, Ivan; Grinev, Anatolij; Sekisov, Aleksandr

2017-10-01

An algorithm for scheduling works is developed, in which the priority of the work corresponds to the total weight of the subordinate works, the vertices of the graph, and it is proved that for graphs of the tree type the algorithm is optimal. An algorithm is synthesized to reduce the search for solutions when drawing up schedules of construction and installation works, allocating a subset with the optimal solution of the problem of the minimum power, which is determined by the structure of its initial data and numerical values. An algorithm for scheduling construction and installation work is developed, taking into account the schedule for the movement of brigades, which is characterized by the possibility to efficiently calculate the values of minimizing the time of work performance by the parameters of organizational and technological reliability through the use of the branch and boundary method. The program of the computational algorithm was compiled in the MatLAB-2008 program. For the initial data of the matrix, random numbers were taken, uniformly distributed in the range from 1 to 100. It takes 0.5; 2.5; 7.5; 27 minutes to solve the problem. Thus, the proposed method for estimating the lower boundary of the solution is sufficiently accurate and allows efficient solution of the minimax task of scheduling construction and installation works.
The Extrapolation of Elementary Sequences

NASA Technical Reports Server (NTRS)

Laird, Philip; Saul, Ronald

1992-01-01

We study sequence extrapolation as a stream-learning problem. Input examples are a stream of data elements of the same type (integers, strings, etc.), and the problem is to construct a hypothesis that both explains the observed sequence of examples and extrapolates the rest of the stream. A primary objective -- and one that distinguishes this work from previous extrapolation algorithms -- is that the same algorithm be able to extrapolate sequences over a variety of different types, including integers, strings, and trees. We define a generous family of constructive data types, and define as our learning bias a stream language called elementary stream descriptions. We then give an algorithm that extrapolates elementary descriptions over constructive datatypes and prove that it learns correctly. For freely-generated types, we prove a polynomial time bound on descriptions of bounded complexity. An especially interesting feature of this work is the ability to provide quantitative measures of confidence in competing hypotheses, using a Bayesian model of prediction.
Multi-level tree analysis of pulmonary artery/vein trees in non-contrast CT images

NASA Astrophysics Data System (ADS)

Gao, Zhiyun; Grout, Randall W.; Hoffman, Eric A.; Saha, Punam K.

2012-02-01

Diseases like pulmonary embolism and pulmonary hypertension are associated with vascular dystrophy. Identifying such pulmonary artery/vein (A/V) tree dystrophy in terms of quantitative measures via CT imaging significantly facilitates early detection of disease or a treatment monitoring process. A tree structure, consisting of nodes and connected arcs, linked to the volumetric representation allows multi-level geometric and volumetric analysis of A/V trees. Here, a new theory and method is presented to generate multi-level A/V tree representation of volumetric data and to compute quantitative measures of A/V tree geometry and topology at various tree hierarchies. The new method is primarily designed on arc skeleton computation followed by a tree construction based topologic and geometric analysis of the skeleton. The method starts with a volumetric A/V representation as input and generates its topologic and multi-level volumetric tree representations long with different multi-level morphometric measures. A new recursive merging and pruning algorithms are introduced to detect bad junctions and noisy branches often associated with digital geometric and topologic analysis. Also, a new notion of shortest axial path is introduced to improve the skeletal arc joining two junctions. The accuracy of the multi-level tree analysis algorithm has been evaluated using computer generated phantoms and pulmonary CT images of a pig vessel cast phantom while the reproducibility of method is evaluated using multi-user A/V separation of in vivo contrast-enhanced CT images of a pig lung at different respiratory volumes.
Efficient Construction of Mesostate Networks from Molecular Dynamics Trajectories.

PubMed

Vitalis, Andreas; Caflisch, Amedeo

2012-03-13

The coarse-graining of data from molecular simulations yields conformational space networks that may be used for predicting the system's long time scale behavior, to discover structural pathways connecting free energy basins in the system, or simply to represent accessible phase space regions of interest and their connectivities in a two-dimensional plot. In this contribution, we present a tree-based algorithm to partition conformations of biomolecules into sets of similar microstates, i.e., to coarse-grain trajectory data into mesostates. On account of utilizing an architecture similar to that of established tree-based algorithms, the proposed scheme operates in near-linear time with data set size. We derive expressions needed for the fast evaluation of mesostate properties and distances when employing typical choices for measures of similarity between microstates. Using both a pedagogically useful and a real-word application, the algorithm is shown to be robust with respect to tree height, which in addition to mesostate threshold size is the main adjustable parameter. It is demonstrated that the derived mesostate networks can preserve information regarding the free energy basins and barriers by which the system is characterized.
A construction scheme of web page comment information extraction system based on frequent subtree mining

NASA Astrophysics Data System (ADS)

Zhang, Xiaowen; Chen, Bingfeng

2017-08-01

Based on the frequent sub-tree mining algorithm, this paper proposes a construction scheme of web page comment information extraction system based on frequent subtree mining, referred to as FSM system. The entire system architecture and the various modules to do a brief introduction, and then the core of the system to do a detailed description, and finally give the system prototype.
Performance Analysis of Evolutionary Algorithms for Steiner Tree Problems.

PubMed

Lai, Xinsheng; Zhou, Yuren; Xia, Xiaoyun; Zhang, Qingfu

2017-01-01

The Steiner tree problem (STP) aims to determine some Steiner nodes such that the minimum spanning tree over these Steiner nodes and a given set of special nodes has the minimum weight, which is NP-hard. STP includes several important cases. The Steiner tree problem in graphs (GSTP) is one of them. Many heuristics have been proposed for STP, and some of them have proved to be performance guarantee approximation algorithms for this problem. Since evolutionary algorithms (EAs) are general and popular randomized heuristics, it is significant to investigate the performance of EAs for STP. Several empirical investigations have shown that EAs are efficient for STP. However, up to now, there is no theoretical work on the performance of EAs for STP. In this article, we reveal that the (1+1) EA achieves 3/2-approximation ratio for STP in a special class of quasi-bipartite graphs in expected runtime [Formula: see text], where [Formula: see text], [Formula: see text], and [Formula: see text] are, respectively, the number of Steiner nodes, the number of special nodes, and the largest weight among all edges in the input graph. We also show that the (1+1) EA is better than two other heuristics on two GSTP instances, and the (1+1) EA may be inefficient on a constructed GSTP instance.
Pattern Matcher for Trees Constructed from Lists

NASA Technical Reports Server (NTRS)

James, Mark

2007-01-01

A software library has been developed that takes a high-level description of a pattern to be satisfied and applies it to a target. If the two match, it returns success; otherwise, it indicates a failure. The target is semantically a tree that is constructed from elements of terminal and non-terminal nodes represented through lists and symbols. Additionally, functionality is provided for finding the element in a set that satisfies a given pattern and doing a tree search, finding all occurrences of leaf nodes that match a given pattern. This process is valuable because it is a new algorithmic approach that significantly improves the productivity of the programmers and has the potential of making their resulting code more efficient by the introduction of a novel semantic representation language. This software has been used in many applications delivered to NASA and private industry, and the cost savings that have resulted from it are significant.
Comparing Phylogenetic Trees by Matching Nodes Using the Transfer Distance Between Partitions

PubMed Central

Giaro, Krzysztof

2017-01-01

Abstract Ability to quantify dissimilarity of different phylogenetic trees describing the relationship between the same group of taxa is required in various types of phylogenetic studies. For example, such metrics are used to assess the quality of phylogeny construction methods, to define optimization criteria in supertree building algorithms, or to find horizontal gene transfer (HGT) events. Among the set of metrics described so far in the literature, the most commonly used seems to be the Robinson–Foulds distance. In this article, we define a new metric for rooted trees—the Matching Pair (MP) distance. The MP metric uses the concept of the minimum-weight perfect matching in a complete bipartite graph constructed from partitions of all pairs of leaves of the compared phylogenetic trees. We analyze the properties of the MP metric and present computational experiments showing its potential applicability in tasks related to finding the HGT events. PMID:28177699
A Distance Measure for Genome Phylogenetic Analysis

NASA Astrophysics Data System (ADS)

Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.
Introduction to Blueweb: A Decentralized Scatternet Formation Algorithm for Bluetooth Ad Hoc Networks

NASA Astrophysics Data System (ADS)

Yu, Chih-Min; Huang, Chia-Chi

In this letter, a decentralized scatternet formation algorithm called Bluelayer is proposed. First, Bluelayer uses a designated root to construct a tree-shaped subnet and propagates an integer variable k1 called counter limit as well as a constant k in its downstream direction to determine new roots. Then each new root asks its upstream master to start a return connection procedure to convert the tree-shaped subnet into a web-shaped subnet for its immediate upstream root. At the same time, each new root repeats the same procedure as the root to build its own subnet until the whole scatternet is formed. Simulation results show that Bluelayer achieves good network scalability and generates an efficient scatternet configuration for various sizes of Bluetooth ad hoc network.
INDDGO: Integrated Network Decomposition & Dynamic programming for Graph Optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Groer, Christopher S; Sullivan, Blair D; Weerapurage, Dinesh P

2012-10-01

It is well-known that dynamic programming algorithms can utilize tree decompositions to provide a way to solve some \\emph{NP}-hard problems on graphs where the complexity is polynomial in the number of nodes and edges in the graph, but exponential in the width of the underlying tree decomposition. However, there has been relatively little computational work done to determine the practical utility of such dynamic programming algorithms. We have developed software to construct tree decompositions using various heuristics and have created a fast, memory-efficient dynamic programming implementation for solving maximum weighted independent set. We describe our software and the algorithms wemore » have implemented, focusing on memory saving techniques for the dynamic programming. We compare the running time and memory usage of our implementation with other techniques for solving maximum weighted independent set, including a commercial integer programming solver and a semi-definite programming solver. Our results indicate that it is possible to solve some instances where the underlying decomposition has width much larger than suggested by the literature. For certain types of problems, our dynamic programming code runs several times faster than these other methods.« less
A Method of Data Aggregation for Wearable Sensor Systems

PubMed Central

Shen, Bo; Fu, Jun-Song

2016-01-01

Data aggregation has been considered as an effective way to decrease the data to be transferred in sensor networks. Particularly for wearable sensor systems, smaller battery has less energy, which makes energy conservation in data transmission more important. Nevertheless, wearable sensor systems usually have features like frequently dynamic changes of topologies and data over a large range, of which current aggregating methods can’t adapt to the demand. In this paper, we study the system composed of many wearable devices with sensors, such as the network of a tactical unit, and introduce an energy consumption-balanced method of data aggregation, named LDA-RT. In the proposed method, we develop a query algorithm based on the idea of ‘happened-before’ to construct a dynamic and energy-balancing routing tree. We also present a distributed data aggregating and sorting algorithm to execute top-k query and decrease the data that must be transferred among wearable devices. Combining these algorithms, LDA-RT tries to balance the energy consumptions for prolonging the lifetime of wearable sensor systems. Results of evaluation indicate that LDA-RT performs well in constructing routing trees and energy balances. It also outperforms the filter-based top-k monitoring approach in energy consumption, load balance, and the network’s lifetime, especially for highly dynamic data sources. PMID:27347953
Phylogenomic analyses data of the avian phylogenomics project.

PubMed

Jarvis, Erich D; Mirarab, Siavash; Aberer, Andre J; Li, Bo; Houde, Peter; Li, Cai; Ho, Simon Y W; Faircloth, Brant C; Nabholz, Benoit; Howard, Jason T; Suh, Alexander; Weber, Claudia C; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Narula, Nitish; Liu, Liang; Burt, Dave; Ellegren, Hans; Edwards, Scott V; Stamatakis, Alexandros; Mindell, David P; Cracraft, Joel; Braun, Edward L; Warnow, Tandy; Jun, Wang; Gilbert, M Thomas Pius; Zhang, Guojie

2015-01-01

Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
Exact Algorithms for Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees.

PubMed

Kordi, Misagh; Bansal, Mukul S

2017-06-01

Duplication-Transfer-Loss (DTL) reconciliation is a powerful method for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation seeks to reconcile gene trees with species trees by postulating speciation, duplication, transfer, and loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. In practice, however, gene trees are often non-binary due to uncertainty in the gene tree topologies, and DTL reconciliation with non-binary gene trees is known to be NP-hard. In this paper, we present the first exact algorithms for DTL reconciliation with non-binary gene trees. Specifically, we (i) show that the DTL reconciliation problem for non-binary gene trees is fixed-parameter tractable in the maximum degree of the gene tree, (ii) present an exponential-time, but in-practice efficient, algorithm to track and enumerate all optimal binary resolutions of a non-binary input gene tree, and (iii) apply our algorithms to a large empirical data set of over 4700 gene trees from 100 species to study the impact of gene tree uncertainty on DTL-reconciliation and to demonstrate the applicability and utility of our algorithms. The new techniques and algorithms introduced in this paper will help biologists avoid incorrect evolutionary inferences caused by gene tree uncertainty.
Scheduling language and algorithm development study. Volume 2, phase 2: Introduction to plans programming. [user guide

NASA Technical Reports Server (NTRS)

Cochran, D. R.; Ishikawa, M. K.; Paulson, R. E.; Ramsey, H. R.

1975-01-01

A user guide for the Programming Language for Allocation and Network Scheduling (PLANS) is presented. Information is included for the construction of PLANS programs. The basic philosophy of PLANS is discussed, and access and update reference techniques are described along with the use of tree structures.
Recursive algorithms for phylogenetic tree counting.

PubMed

Gavryushkina, Alexandra; Welch, David; Drummond, Alexei J

2013-10-28

In Bayesian phylogenetic inference we are interested in distributions over a space of trees. The number of trees in a tree space is an important characteristic of the space and is useful for specifying prior distributions. When all samples come from the same time point and no prior information available on divergence times, the tree counting problem is easy. However, when fossil evidence is used in the inference to constrain the tree or data are sampled serially, new tree spaces arise and counting the number of trees is more difficult. We describe an algorithm that is polynomial in the number of sampled individuals for counting of resolutions of a constraint tree assuming that the number of constraints is fixed. We generalise this algorithm to counting resolutions of a fully ranked constraint tree. We describe a quadratic algorithm for counting the number of possible fully ranked trees on n sampled individuals. We introduce a new type of tree, called a fully ranked tree with sampled ancestors, and describe a cubic time algorithm for counting the number of such trees on n sampled individuals. These algorithms should be employed for Bayesian Markov chain Monte Carlo inference when fossil data are included or data are serially sampled.
SDIA: A dynamic situation driven information fusion algorithm for cloud environment

NASA Astrophysics Data System (ADS)

Guo, Shuhang; Wang, Tong; Wang, Jian

2017-09-01

Information fusion is an important issue in information integration domain. In order to form an extensive information fusion technology under the complex and diverse situations, a new information fusion algorithm is proposed. Firstly, a fuzzy evaluation model of tag utility was proposed that can be used to count the tag entropy. Secondly, a ubiquitous situation tag tree model is proposed to define multidimensional structure of information situation. Thirdly, the similarity matching between the situation models is classified into three types: the tree inclusion, the tree embedding, and the tree compatibility. Next, in order to reduce the time complexity of the tree compatible matching algorithm, a fast and ordered tree matching algorithm is proposed based on the node entropy, which is used to support the information fusion by ubiquitous situation. Since the algorithm revolve from the graph theory of disordered tree matching algorithm, it can improve the information fusion present recall rate and precision rate in the situation. The information fusion algorithm is compared with the star and the random tree matching algorithm, and the difference between the three algorithms is analyzed in the view of isomorphism, which proves the innovation and applicability of the algorithm.

An implementation of a tree code on a SIMD, parallel computer

NASA Technical Reports Server (NTRS)

Olson, Kevin M.; Dorband, John E.

1994-01-01

We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood.

PubMed

Wu, Yufeng

2012-03-01

Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets. © 2011 The Author. Evolution© 2011 The Society for the Study of Evolution.
TreePOD: Sensitivity-Aware Selection of Pareto-Optimal Decision Trees.

PubMed

Muhlbacher, Thomas; Linhardt, Lorenz; Moller, Torsten; Piringer, Harald

2018-01-01

Balancing accuracy gains with other objectives such as interpretability is a key challenge when building decision trees. However, this process is difficult to automate because it involves know-how about the domain as well as the purpose of the model. This paper presents TreePOD, a new approach for sensitivity-aware model selection along trade-offs. TreePOD is based on exploring a large set of candidate trees generated by sampling the parameters of tree construction algorithms. Based on this set, visualizations of quantitative and qualitative tree aspects provide a comprehensive overview of possible tree characteristics. Along trade-offs between two objectives, TreePOD provides efficient selection guidance by focusing on Pareto-optimal tree candidates. TreePOD also conveys the sensitivities of tree characteristics on variations of selected parameters by extending the tree generation process with a full-factorial sampling. We demonstrate how TreePOD supports a variety of tasks involved in decision tree selection and describe its integration in a holistic workflow for building and selecting decision trees. For evaluation, we illustrate a case study for predicting critical power grid states, and we report qualitative feedback from domain experts in the energy sector. This feedback suggests that TreePOD enables users with and without statistical background a confident and efficient identification of suitable decision trees.
Detection of fraudulent financial statements using the hybrid data mining approach.

PubMed

Chen, Suduan

2016-01-01

The purpose of this study is to construct a valid and rigorous fraudulent financial statement detection model. The research objects are companies which experienced both fraudulent and non-fraudulent financial statements between the years 2002 and 2013. In the first stage, two decision tree algorithms, including the classification and regression trees (CART) and the Chi squared automatic interaction detector (CHAID) are applied in the selection of major variables. The second stage combines CART, CHAID, Bayesian belief network, support vector machine and artificial neural network in order to construct fraudulent financial statement detection models. According to the results, the detection performance of the CHAID-CART model is the most effective, with an overall accuracy of 87.97 % (the FFS detection accuracy is 92.69 %).
Path Planning Method in Multi-obstacle Marine Environment

NASA Astrophysics Data System (ADS)

Zhang, Jinpeng; Sun, Hanxv

2017-12-01

In this paper, an improved algorithm for particle swarm optimization is proposed for the application of underwater robot in the complex marine environment. Not only did consider to avoid obstacles when path planning, but also considered the current direction and the size effect on the performance of the robot dynamics. The algorithm uses the trunk binary tree structure to construct the path search space and A * heuristic search method is used in the search space to find a evaluation standard path. Then the particle swarm algorithm to optimize the path by adjusting evaluation function, which makes the underwater robot in the current navigation easier to control, and consume less energy.
MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL

PubMed Central

Hua, Guan-Jie; Hung, Che-Lun; Lin, Chun-Yuan; Wu, Fu-Che; Chan, Yu-Wei; Tang, Chuan Yi

2017-01-01

A phylogenetic tree is a visual diagram of the relationship between a set of biological species. The scientists usually use it to analyze many characteristics of the species. The distance-matrix methods, such as Unweighted Pair Group Method with Arithmetic Mean and Neighbor Joining, construct a phylogenetic tree by calculating pairwise genetic distances between taxa. These methods have the computational performance issue. Although several new methods with high-performance hardware and frameworks have been proposed, the issue still exists. In this work, a novel parallel Unweighted Pair Group Method with Arithmetic Mean approach on multiple Graphics Processing Units is proposed to construct a phylogenetic tree from extremely large set of sequences. The experimental results present that the proposed approach on a DGX-1 server with 8 NVIDIA P100 graphic cards achieves approximately 3-fold to 7-fold speedup over the implementation of Unweighted Pair Group Method with Arithmetic Mean on a modern CPU and a single GPU, respectively. PMID:29051701
MGUPGMA: A Fast UPGMA Algorithm With Multiple Graphics Processing Units Using NCCL.

PubMed

Hua, Guan-Jie; Hung, Che-Lun; Lin, Chun-Yuan; Wu, Fu-Che; Chan, Yu-Wei; Tang, Chuan Yi

2017-01-01

A phylogenetic tree is a visual diagram of the relationship between a set of biological species. The scientists usually use it to analyze many characteristics of the species. The distance-matrix methods, such as Unweighted Pair Group Method with Arithmetic Mean and Neighbor Joining, construct a phylogenetic tree by calculating pairwise genetic distances between taxa. These methods have the computational performance issue. Although several new methods with high-performance hardware and frameworks have been proposed, the issue still exists. In this work, a novel parallel Unweighted Pair Group Method with Arithmetic Mean approach on multiple Graphics Processing Units is proposed to construct a phylogenetic tree from extremely large set of sequences. The experimental results present that the proposed approach on a DGX-1 server with 8 NVIDIA P100 graphic cards achieves approximately 3-fold to 7-fold speedup over the implementation of Unweighted Pair Group Method with Arithmetic Mean on a modern CPU and a single GPU, respectively.
Finding Frequent Closed Itemsets in Sliding Window in Linear Time

NASA Astrophysics Data System (ADS)

Chen, Junbo; Zhou, Bo; Chen, Lu; Wang, Xinyu; Ding, Yiqun

One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent itemsets. Among such algorithms, the approach of mining closed itemsets has raised much interest in data mining community. The algorithms taking this approach include TITANIC [8], CLOSET+[6], DCI-Closed [4], FCI-Stream [3], GC-Tree [15], TGC-Tree [16] etc. Among these algorithms, FCI-Stream, GC-Tree and TGC-Tree are online algorithms work under sliding window environments. By the performance evaluation in [16], GC-Tree [15] is the fastest one. In this paper, an improved algorithm based on GC-Tree is proposed, the computational complexity of which is proved to be a linear combination of the average transaction size and the average closed itemset size. The algorithm is based on the essential theorem presented in Sect. 4.2. Empirically, the new algorithm is several orders of magnitude faster than the state of art algorithm, GC-Tree.
An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree

PubMed Central

2016-01-01

Motivation: Gene tree represents the evolutionary history of gene lineages that originate from multiple related populations. Under the multispecies coalescent model, lineages may coalesce outside the species (population) boundary. Given a species tree (with branch lengths), the gene tree probability is the probability of observing a specific gene tree topology under the multispecies coalescent model. There are two existing algorithms for computing the exact gene tree probability. The first algorithm is due to Degnan and Salter, where they enumerate all the so-called coalescent histories for the given species tree and the gene tree topology. Their algorithm runs in exponential time in the number of gene lineages in general. The second algorithm is the STELLS algorithm (2012), which is usually faster but also runs in exponential time in almost all the cases. Results: In this article, we present a new algorithm, called CompactCH, for computing the exact gene tree probability. This new algorithm is based on the notion of compact coalescent histories: multiple coalescent histories are represented by a single compact coalescent history. The key advantage of our new algorithm is that it runs in polynomial time in the number of gene lineages if the number of populations is fixed to be a constant. The new algorithm is more efficient than the STELLS algorithm both in theory and in practice when the number of populations is small and there are multiple gene lineages from each population. As an application, we show that CompactCH can be applied in the inference of population tree (i.e. the population divergence history) from population haplotypes. Simulation results show that the CompactCH algorithm enables efficient and accurate inference of population trees with much more haplotypes than a previous approach. Availability: The CompactCH algorithm is implemented in the STELLS software package, which is available for download at http://www.engr.uconn.edu/ywu/STELLS.html. Contact: ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307621
Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects

PubMed Central

2015-01-01

Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project. PMID:26339227
Application of Boosting Regression Trees to Preliminary Cost Estimation in Building Construction Projects.

PubMed

Shin, Yoonseok

2015-01-01

Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.
Learning Instance-Specific Predictive Models

PubMed Central

Visweswaran, Shyam; Cooper, Gregory F.

2013-01-01

This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325
Decision Tree Algorithm-Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging.

PubMed

Yang, Cheng-Hong; Wu, Kuo-Chuan; Chuang, Li-Yeh; Chang, Hsueh-Wei

2018-01-01

DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a r ibulose diphosphate carboxylase ( rbcL ) S NP b arcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree-selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species.
A short note on the use of the red-black tree in Cartesian adaptive mesh refinement algorithms

NASA Astrophysics Data System (ADS)

Hasbestan, Jaber J.; Senocak, Inanc

2017-12-01

Mesh adaptivity is an indispensable capability to tackle multiphysics problems with large disparity in time and length scales. With the availability of powerful supercomputers, there is a pressing need to extend time-proven computational techniques to extreme-scale problems. Cartesian adaptive mesh refinement (AMR) is one such method that enables simulation of multiscale, multiphysics problems. AMR is based on construction of octrees. Originally, an explicit tree data structure was used to generate and manipulate an adaptive Cartesian mesh. At least eight pointers are required in an explicit approach to construct an octree. Parent-child relationships are then used to traverse the tree. An explicit octree, however, is expensive in terms of memory usage and the time it takes to traverse the tree to access a specific node. For these reasons, implicit pointerless methods have been pioneered within the computer graphics community, motivated by applications requiring interactivity and realistic three dimensional visualization. Lewiner et al. [1] provides a concise review of pointerless approaches to generate an octree. Use of a hash table and Z-order curve are two key concepts in pointerless methods that we briefly discuss next.
Frequent statistics of link-layer bit stream data based on AC-IM algorithm

NASA Astrophysics Data System (ADS)

Cao, Chenghong; Lei, Yingke; Xu, Yiming

2017-08-01

At present, there are many relevant researches on data processing using classical pattern matching and its improved algorithm, but few researches on statistical data of link-layer bit stream. This paper adopts a frequent statistical method of link-layer bit stream data based on AC-IM algorithm for classical multi-pattern matching algorithms such as AC algorithm has high computational complexity, low efficiency and it cannot be applied to binary bit stream data. The method's maximum jump distance of the mode tree is length of the shortest mode string plus 3 in case of no missing? In this paper, theoretical analysis is made on the principle of algorithm construction firstly, and then the experimental results show that the algorithm can adapt to the binary bit stream data environment and extract the frequent sequence more accurately, the effect is obvious. Meanwhile, comparing with the classical AC algorithm and other improved algorithms, AC-IM algorithm has a greater maximum jump distance and less time-consuming.
An O(log sup 2 N) parallel algorithm for computing the eigenvalues of a symmetric tridiagonal matrix

NASA Technical Reports Server (NTRS)

Swarztrauber, Paul N.

1989-01-01

An O(log sup 2 N) parallel algorithm is presented for computing the eigenvalues of a symmetric tridiagonal matrix using a parallel algorithm for computing the zeros of the characteristic polynomial. The method is based on a quadratic recurrence in which the characteristic polynomial is constructed on a binary tree from polynomials whose degree doubles at each level. Intervals that contain exactly one zero are determined by the zeros of polynomials at the previous level which ensures that different processors compute different zeros. The exact behavior of the polynomials at the interval endpoints is used to eliminate the usual problems induced by finite precision arithmetic.
Construction of phylogenetic trees by kernel-based comparative analysis of metabolic networks.

PubMed

Oh, S June; Joung, Je-Gun; Chang, Jeong-Ho; Zhang, Byoung-Tak

2006-06-06

To infer the tree of life requires knowledge of the common characteristics of each species descended from a common ancestor as the measuring criteria and a method to calculate the distance between the resulting values of each measure. Conventional phylogenetic analysis based on genomic sequences provides information about the genetic relationships between different organisms. In contrast, comparative analysis of metabolic pathways in different organisms can yield insights into their functional relationships under different physiological conditions. However, evaluating the similarities or differences between metabolic networks is a computationally challenging problem, and systematic methods of doing this are desirable. Here we introduce a graph-kernel method for computing the similarity between metabolic networks in polynomial time, and use it to profile metabolic pathways and to construct phylogenetic trees. To compare the structures of metabolic networks in organisms, we adopted the exponential graph kernel, which is a kernel-based approach with a labeled graph that includes a label matrix and an adjacency matrix. To construct the phylogenetic trees, we used an unweighted pair-group method with arithmetic mean, i.e., a hierarchical clustering algorithm. We applied the kernel-based network profiling method in a comparative analysis of nine carbohydrate metabolic networks from 81 biological species encompassing Archaea, Eukaryota, and Eubacteria. The resulting phylogenetic hierarchies generally support the tripartite scheme of three domains rather than the two domains of prokaryotes and eukaryotes. By combining the kernel machines with metabolic information, the method infers the context of biosphere development that covers physiological events required for adaptation by genetic reconstruction. The results show that one may obtain a global view of the tree of life by comparing the metabolic pathway structures using meta-level information rather than sequence information. This method may yield further information about biological evolution, such as the history of horizontal transfer of each gene, by studying the detailed structure of the phylogenetic tree constructed by the kernel-based method.
Autumn Algorithm-Computation of Hybridization Networks for Realistic Phylogenetic Trees.

PubMed

Huson, Daniel H; Linz, Simone

2018-01-01

A minimum hybridization network is a rooted phylogenetic network that displays two given rooted phylogenetic trees using a minimum number of reticulations. Previous mathematical work on their calculation has usually assumed the input trees to be bifurcating, correctly rooted, or that they both contain the same taxa. These assumptions do not hold in biological studies and "realistic" trees have multifurcations, are difficult to root, and rarely contain the same taxa. We present a new algorithm for computing minimum hybridization networks for a given pair of "realistic" rooted phylogenetic trees. We also describe how the algorithm might be used to improve the rooting of the input trees. We introduce the concept of "autumn trees", a nice framework for the formulation of algorithms based on the mathematics of "maximum acyclic agreement forests". While the main computational problem is hard, the run-time depends mainly on how different the given input trees are. In biological studies, where the trees are reasonably similar, our parallel implementation performs well in practice. The algorithm is available in our open source program Dendroscope 3, providing a platform for biologists to explore rooted phylogenetic networks. We demonstrate the utility of the algorithm using several previously studied data sets.
Traversing the tangle: algorithms and applications for cophylogenetic studies.

PubMed

Charleston, Michael A; Perkins, Susan L

2006-02-01

Cophylogenetic analysis supposes that two or more phylogenetic trees for linked groups have been constructed, and explores the relationships the trees have with each other. These types of analyses are most commonly used to assess relationships between hosts and their parasites, however the methodology can also be applied to diverse types of problems such as an examination of the phylogenies of genes with respect to those of organisms or those of geographic areas and the organisms that reside there. The working hypothesis is that the trees are correct, though sometimes attempts are made to take into account their uncertainty. Cophylogeny is computationally hard: that is, there are no known fast methods to compute relationships among such trees for any but the simplest of models. A review of methodology that has been developed to examine cophylogenetic relationships is presented and a brief discussion of some medically relevant examples is given.
Prediction of the compression ratio for municipal solid waste using decision tree.

PubMed

Heshmati R, Ali Akbar; Mokhtari, Maryam; Shakiba Rad, Saeed

2014-01-01

The compression ratio of municipal solid waste (MSW) is an essential parameter for evaluation of waste settlement and landfill design. However, no appropriate model has been proposed to estimate the waste compression ratio so far. In this study, a decision tree method was utilized to predict the waste compression ratio (C'c). The tree was constructed using Quinlan's M5 algorithm. A reliable database retrieved from the literature was used to develop a practical model that relates C'c to waste composition and properties, including dry density, dry weight water content, and percentage of biodegradable organic waste using the decision tree method. The performance of the developed model was examined in terms of different statistical criteria, including correlation coefficient, root mean squared error, mean absolute error and mean bias error, recommended by researchers. The obtained results demonstrate that the suggested model is able to evaluate the compression ratio of MSW effectively.

Direct evaluation of fault trees using object-oriented programming techniques

NASA Technical Reports Server (NTRS)

Patterson-Hine, F. A.; Koen, B. V.

1989-01-01

Object-oriented programming techniques are used in an algorithm for the direct evaluation of fault trees. The algorithm combines a simple bottom-up procedure for trees without repeated events with a top-down recursive procedure for trees with repeated events. The object-oriented approach results in a dynamic modularization of the tree at each step in the reduction process. The algorithm reduces the number of recursive calls required to solve trees with repeated events and calculates intermediate results as well as the solution of the top event. The intermediate results can be reused if part of the tree is modified. An example is presented in which the results of the algorithm implemented with conventional techniques are compared to those of the object-oriented approach.
Quasi-Optimal Elimination Trees for 2D Grids with Singularities

DOE PAGES

Paszyńska, A.; Paszyński, M.; Jopek, K.; ...

2015-01-01

We consmore » truct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O N e log ⁡ N e , where N e is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.« less
Quasi-Optimal Elimination Trees for 2D Grids with Singularities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paszyńska, A.; Paszyński, M.; Jopek, K.

We consmore » truct quasi-optimal elimination trees for 2D finite element meshes with singularities. These trees minimize the complexity of the solution of the discrete system. The computational cost estimates of the elimination process model the execution of the multifrontal algorithms in serial and in parallel shared-memory executions. Since the meshes considered are a subspace of all possible mesh partitions, we call these minimizers quasi-optimal. We minimize the cost functionals using dynamic programming. Finding these minimizers is more computationally expensive than solving the original algebraic system. Nevertheless, from the insights provided by the analysis of the dynamic programming minima, we propose a heuristic construction of the elimination trees that has cost O N e log ⁡ N e , where N e is the number of elements in the mesh. We show that this heuristic ordering has similar computational cost to the quasi-optimal elimination trees found with dynamic programming and outperforms state-of-the-art alternatives in our numerical experiments.« less
Bayesian Networks for Modeling Dredging Decisions

DTIC Science & Technology

2011-10-01

change scenarios. Arctic Expert elicitation Netica Bacon et al . 2002 Identify factors that might lead to a change in land use from farming to...tree) algorithms developed by Lauritzen and Spiegelhalter (1988) and Jensen et al . (1990). Statistical inference is simply the process of...causality when constructing a Bayesian network (Kjaerulff and Madsen 2008, Darwiche 2009, Marcot et al . 2006). A knowledge representation approach is the
A short note on the paper of Liu et al. (2012). A relative Lempel-Ziv complexity: Application to comparing biological sequences. Chemical Physics Letters, volume 530, 19 March 2012, pages 107-112

NASA Astrophysics Data System (ADS)

Arit, Turkan; Keskin, Burak; Firuzan, Esin; Cavas, Cagin Kandemir; Liu, Liwei; Cavas, Levent

2018-04-01

The report entitled "L. Liu, D. Li, F. Bai, A relative Lempel-Ziv complexity: Application to comparing biological sequences, Chem. Phys. Lett. 530 (2012) 107-112" mentions on the powerful construction of phylogenetic trees based on Lempel-Ziv algorithm. On the other hand, the method explained in the paper does not give promising result on the data set on invasive Caulerpa taxifolia in the Mediterranean Sea. The phylogenetic trees are obtained by the proposed method of the aforementioned paper in this short note.
A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.

PubMed

Wang, Xueyi

2012-02-08

The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
Decision tree and ensemble learning algorithms with their applications in bioinformatics.

PubMed

Che, Dongsheng; Liu, Qi; Rasheed, Khaled; Tao, Xiuping

2011-01-01

Machine learning approaches have wide applications in bioinformatics, and decision tree is one of the successful approaches applied in this field. In this chapter, we briefly review decision tree and related ensemble algorithms and show the successful applications of such approaches on solving biological problems. We hope that by learning the algorithms of decision trees and ensemble classifiers, biologists can get the basic ideas of how machine learning algorithms work. On the other hand, by being exposed to the applications of decision trees and ensemble algorithms in bioinformatics, computer scientists can get better ideas of which bioinformatics topics they may work on in their future research directions. We aim to provide a platform to bridge the gap between biologists and computer scientists.
A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images.

PubMed

Moschini, Ugo; Meijster, Arnold; Wilkinson, Michael H F

2018-03-01

Max-trees, or component trees, are graph structures that represent the connected components of an image in a hierarchical way. Nowadays, many application fields rely on images with high-dynamic range or floating point values. Efficient sequential algorithms exist to build trees and compute attributes for images of any bit depth. However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel. We propose a parallel method combining the two worlds of flooding and merging max-tree algorithms. First, a pilot max-tree of a quantized version of the image is built in parallel using a flooding method. Later, this structure is used in a parallel leaf-to-root approach to compute efficiently the final max-tree and to drive the merging of the sub-trees computed by the threads. We present an analysis of the performance both on simulated and actual 2D images and 3D volumes. Execution times are about better than the fastest sequential algorithm and speed-up goes up to on 64 threads.
Computational analyses of spectral trees from electrospray multi-stage mass spectrometry to aid metabolite identification.

PubMed

Cao, Mingshu; Fraser, Karl; Rasmussen, Susanne

2013-10-31

Mass spectrometry coupled with chromatography has become the major technical platform in metabolomics. Aided by peak detection algorithms, the detected signals are characterized by mass-over-charge ratio (m/z) and retention time. Chemical identities often remain elusive for the majority of the signals. Multi-stage mass spectrometry based on electrospray ionization (ESI) allows collision-induced dissociation (CID) fragmentation of selected precursor ions. These fragment ions can assist in structural inference for metabolites of low molecular weight. Computational investigations of fragmentation spectra have increasingly received attention in metabolomics and various public databases house such data. We have developed an R package "iontree" that can capture, store and analyze MS2 and MS3 mass spectral data from high throughput metabolomics experiments. The package includes functions for ion tree construction, an algorithm (distMS2) for MS2 spectral comparison, and tools for building platform-independent ion tree (MS2/MS3) libraries. We have demonstrated the utilization of the package for the systematic analysis and annotation of fragmentation spectra collected in various metabolomics platforms, including direct infusion mass spectrometry, and liquid chromatography coupled with either low resolution or high resolution mass spectrometry. Assisted by the developed computational tools, we have demonstrated that spectral trees can provide informative evidence complementary to retention time and accurate mass to aid with annotating unknown peaks. These experimental spectral trees once subjected to a quality control process, can be used for querying public MS2 databases or de novo interpretation. The putatively annotated spectral trees can be readily incorporated into reference libraries for routine identification of metabolites.
Concurrent approach for evolving compact decision rule sets

NASA Astrophysics Data System (ADS)

Marmelstein, Robert E.; Hammack, Lonnie P.; Lamont, Gary B.

1999-02-01

The induction of decision rules from data is important to many disciplines, including artificial intelligence and pattern recognition. To improve the state of the art in this area, we introduced the genetic rule and classifier construction environment (GRaCCE). It was previously shown that GRaCCE consistently evolved decision rule sets from data, which were significantly more compact than those produced by other methods (such as decision tree algorithms). The primary disadvantage of GRaCCe, however, is its relatively poor run-time execution performance. In this paper, a concurrent version of the GRaCCE architecture is introduced, which improves the efficiency of the original algorithm. A prototype of the algorithm is tested on an in- house parallel processor configuration and the results are discussed.
Forecasting of the development of professional medical equipment engineering based on neuro-fuzzy algorithms

NASA Astrophysics Data System (ADS)

Vaganova, E. V.; Syryamkin, M. V.

2015-11-01

The purpose of the research is the development of evolutionary algorithms for assessments of promising scientific directions. The main attention of the present study is paid to the evaluation of the foresight possibilities for identification of technological peaks and emerging technologies in professional medical equipment engineering in Russia and worldwide on the basis of intellectual property items and neural network modeling. An automated information system consisting of modules implementing various classification methods for accuracy of the forecast improvement and the algorithm of construction of neuro-fuzzy decision tree have been developed. According to the study result, modern trends in this field will focus on personalized smart devices, telemedicine, bio monitoring, «e-Health» and «m-Health» technologies.
Attention trees and semantic paths

NASA Astrophysics Data System (ADS)

Giusti, Christian; Pieroni, Goffredo G.; Pieroni, Laura

2007-02-01

In the last few decades several techniques for image content extraction, often based on segmentation, have been proposed. It has been suggested that under the assumption of very general image content, segmentation becomes unstable and classification becomes unreliable. According to recent psychological theories, certain image regions attract the attention of human observers more than others and, generally, the image main meaning appears concentrated in those regions. Initially, regions attracting our attention are perceived as a whole and hypotheses on their content are formulated; successively the components of those regions are carefully analyzed and a more precise interpretation is reached. It is interesting to observe that an image decomposition process performed according to these psychological visual attention theories might present advantages with respect to a traditional segmentation approach. In this paper we propose an automatic procedure generating image decomposition based on the detection of visual attention regions. A new clustering algorithm taking advantage of the Delaunay- Voronoi diagrams for achieving the decomposition target is proposed. By applying that algorithm recursively, starting from the whole image, a transformation of the image into a tree of related meaningful regions is obtained (Attention Tree). Successively, a semantic interpretation of the leaf nodes is carried out by using a structure of Neural Networks (Neural Tree) assisted by a knowledge base (Ontology Net). Starting from leaf nodes, paths toward the root node across the Attention Tree are attempted. The task of the path consists in relating the semantics of each child-parent node pair and, consequently, in merging the corresponding image regions. The relationship detected in this way between two tree nodes generates, as a result, the extension of the interpreted image area through each step of the path. The construction of several Attention Trees has been performed and partial results will be shown.
Automatic determination of trunk diameter, crown base and height of scots pine (Pinus Sylvestris L.) Based on analysis of 3D point clouds gathered from multi-station terrestrial laser scanning. (Polish Title: Automatyczne okreslanie srednicy pnia, podstawy korony oraz wysokosci sosny zwyczajnej (Pinus Silvestris L.) Na podstawie analiz chmur punktow 3D pochodzacych z wielostanowiskowego naziemnego skanowania laserowego)

NASA Astrophysics Data System (ADS)

Ratajczak, M.; Wężyk, P.

2015-12-01

Rapid development of terrestrial laser scanning (TLS) in recent years resulted in its recognition and implementation in many industries, including forestry and nature conservation. The use of the 3D TLS point clouds in the process of inventory of trees and stands, as well as in the determination of their biometric features (trunk diameter, tree height, crown base, number of trunk shapes), trees and lumber size (volume of trees) is slowly becoming a practice. In addition to the measurement precision, the primary added value of TLS is the ability to automate the processing of the clouds of points 3D in the direction of the extraction of selected features of trees and stands. The paper presents the original software (GNOM) for the automatic measurement of selected features of trees, based on the cloud of points obtained by the ground laser scanner FARO. With the developed algorithms (GNOM), the location of tree trunks on the circular research surface was specified and the measurement was performed; the measurement covered the DBH (l: 1.3m), further diameters of tree trunks at different heights of the tree trunk, base of the tree crown and volume of the tree trunk (the selection measurement method), as well as the tree crown. Research works were performed in the territory of the Niepolomice Forest in an unmixed pine stand (Pinussylvestris L.) on the circular surface with a radius of 18 m, within which there were 16 pine trees (14 of them were cut down). It was characterized by a two-storey and even-aged construction (147 years old) and was devoid of undergrowth. Ground scanning was performed just before harvesting. The DBH of 16 pine trees was specified in a fully automatic way, using the algorithm GNOM with an accuracy of +2.1%, as compared to the reference measurement by the DBH measurement device. The medium, absolute measurement error in the cloud of points - using semi-automatic methods "PIXEL" (between points) and PIPE (fitting the cylinder) in the FARO Scene 5.x., showed the error, 3.5% and 5.0%,.respectively The reference height was assumed as the measurement performed by the tape on the cut tree. The average error of automatic determination of the tree height by the algorithm GNOM based on the TLS point clouds amounted to 6.3% and was slightly higher than when using the manual method of measurements on profiles in the TerraScan (Terrasolid; the error of 5.6%). The relatively high value of the error may be mainly related to the small number of points TLS in the upper parts of crowns. The crown height measurement showed the error of +9.5%. The reference in this case was the tape measurement performed already on the trunks of cut pine trees. Processing the clouds of points by the algorithms GNOM for 16 analyzed trees took no longer than 10 min. (37 sec. /tree). The paper mainly showed the TLS measurement innovation and its high precision in acquiring biometric data in forestry, and at the same time also the further need to increase the degree of automation of processing the clouds of points 3D from terrestrial laser scanning.
Bayes Forest: a data-intensive generator of morphological tree clones

PubMed Central

Järvenpää, Marko; Åkerblom, Markku; Raumonen, Pasi; Kaasalainen, Mikko

2017-01-01

Abstract Detailed and realistic tree form generators have numerous applications in ecology and forestry. For example, the varying morphology of trees contributes differently to formation of landscapes, natural habitats of species, and eco-physiological characteristics of the biosphere. Here, we present an algorithm for generating morphological tree “clones” based on the detailed reconstruction of the laser scanning data, statistical measure of similarity, and a plant growth model with simple stochastic rules. The algorithm is designed to produce tree forms, i.e., morphological clones, similar (and not identical) in respect to tree-level structure, but varying in fine-scale structural detail. Although we opted for certain choices in our algorithm, individual parts may vary depending on the application, making it a general adaptable pipeline. Namely, we showed that a specific multipurpose procedural stochastic growth model can be algorithmically adjusted to produce the morphological clones replicated from the target experimentally measured tree. For this, we developed a statistical measure of similarity (structural distance) between any given pair of trees, which allows for the comprehensive comparing of the tree morphologies by means of empirical distributions describing the geometrical and topological features of a tree. Finally, we developed a programmable interface to manipulate data required by the algorithm. Our algorithm can be used in a variety of applications for exploration of the morphological potential of the growth models (both theoretical and experimental), arising in all sectors of plant science research. PMID:29020742
Detection and Counting of Orchard Trees from Vhr Images Using a Geometrical-Optical Model and Marked Template Matching

NASA Astrophysics Data System (ADS)

Maillard, Philippe; Gomes, Marília F.

2016-06-01

This article presents an original algorithm created to detect and count trees in orchards using very high resolution images. The algorithm is based on an adaptation of the "template matching" image processing approach, in which the template is based on a "geometricaloptical" model created from a series of parameters, such as illumination angles, maximum and ambient radiance, and tree size specifications. The algorithm is tested on four images from different regions of the world and different crop types. These images all have < 1 meter spatial resolution and were downloaded from the GoogleEarth application. Results show that the algorithm is very efficient at detecting and counting trees as long as their spectral and spatial characteristics are relatively constant. For walnut, mango and orange trees, the overall accuracy was clearly above 90%. However, the overall success rate for apple trees fell under 75%. It appears that the openness of the apple tree crown is most probably responsible for this poorer result. The algorithm is fully explained with a step-by-step description. At this stage, the algorithm still requires quite a bit of user interaction. The automatic determination of most of the required parameters is under development.
A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem

PubMed Central

Liu, Dong-sheng; Fan, Shu-jiang

2014-01-01

In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389
Pruning Rogue Taxa Improves Phylogenetic Accuracy: An Efficient Algorithm and Webservice

PubMed Central

Aberer, Andre J.; Krompass, Denis; Stamatakis, Alexandros

2013-01-01

Abstract The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees. PMID:22962004
Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice.

PubMed

Aberer, Andre J; Krompass, Denis; Stamatakis, Alexandros

2013-01-01

The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees.
Hierarchical clustering using mutual information

NASA Astrophysics Data System (ADS)

Kraskov, A.; Stögbauer, H.; Andrzejak, R. G.; Grassberger, P.

2005-04-01

We present a conceptually simple method for hierarchical clustering of data called mutual information clustering (MIC) algorithm. It uses mutual information (MI) as a similarity measure and exploits its grouping property: The MI between three objects X, Y, and Z is equal to the sum of the MI between X and Y, plus the MI between Z and the combined object (XY). We use this both in the Shannon (probabilistic) version of information theory and in the Kolmogorov (algorithmic) version. We apply our method to the construction of phylogenetic trees from mitochondrial DNA sequences and to the output of independent components analysis (ICA) as illustrated with the ECG of a pregnant woman.
GSHR-Tree: a spatial index tree based on dynamic spatial slot and hash table in grid environments

NASA Astrophysics Data System (ADS)

Chen, Zhanlong; Wu, Xin-cai; Wu, Liang

2008-12-01

Computation Grids enable the coordinated sharing of large-scale distributed heterogeneous computing resources that can be used to solve computationally intensive problems in science, engineering, and commerce. Grid spatial applications are made possible by high-speed networks and a new generation of Grid middleware that resides between networks and traditional GIS applications. The integration of the multi-sources and heterogeneous spatial information and the management of the distributed spatial resources and the sharing and cooperative of the spatial data and Grid services are the key problems to resolve in the development of the Grid GIS. The performance of the spatial index mechanism is the key technology of the Grid GIS and spatial database affects the holistic performance of the GIS in Grid Environments. In order to improve the efficiency of parallel processing of a spatial mass data under the distributed parallel computing grid environment, this paper presents a new grid slot hash parallel spatial index GSHR-Tree structure established in the parallel spatial indexing mechanism. Based on the hash table and dynamic spatial slot, this paper has improved the structure of the classical parallel R tree index. The GSHR-Tree index makes full use of the good qualities of R-Tree and hash data structure. This paper has constructed a new parallel spatial index that can meet the needs of parallel grid computing about the magnanimous spatial data in the distributed network. This arithmetic splits space in to multi-slots by multiplying and reverting and maps these slots to sites in distributed and parallel system. Each sites constructs the spatial objects in its spatial slot into an R tree. On the basis of this tree structure, the index data was distributed among multiple nodes in the grid networks by using large node R-tree method. The unbalance during process can be quickly adjusted by means of a dynamical adjusting algorithm. This tree structure has considered the distributed operation, reduplication operation transfer operation of spatial index in the grid environment. The design of GSHR-Tree has ensured the performance of the load balance in the parallel computation. This tree structure is fit for the parallel process of the spatial information in the distributed network environments. Instead of spatial object's recursive comparison where original R tree has been used, the algorithm builds the spatial index by applying binary code operation in which computer runs more efficiently, and extended dynamic hash code for bit comparison. In GSHR-Tree, a new server is assigned to the network whenever a split of a full node is required. We describe a more flexible allocation protocol which copes with a temporary shortage of storage resources. It uses a distributed balanced binary spatial tree that scales with insertions to potentially any number of storage servers through splits of the overloaded ones. The application manipulates the GSHR-Tree structure from a node in the grid environment. The node addresses the tree through its image that the splits can make outdated. This may generate addressing errors, solved by the forwarding among the servers. In this paper, a spatial index data distribution algorithm that limits the number of servers has been proposed. We improve the storage utilization at the cost of additional messages. The structure of GSHR-Tree is believed that the scheme of this grid spatial index should fit the needs of new applications using endlessly larger sets of spatial data. Our proposal constitutes a flexible storage allocation method for a distributed spatial index. The insertion policy can be tuned dynamically to cope with periods of storage shortage. In such cases storage balancing should be favored for better space utilization, at the price of extra message exchanges between servers. This structure makes a compromise in the updating of the duplicated index and the transformation of the spatial index data. Meeting the needs of the grid computing, GSHRTree has a flexible structure in order to satisfy new needs in the future. The GSHR-Tree provides the R-tree capabilities for large spatial datasets stored over interconnected servers. The analysis, including the experiments, confirmed the efficiency of our design choices. The scheme should fit the needs of new applications of spatial data, using endlessly larger datasets. Using the system response time of the parallel processing of spatial scope query algorithm as the performance evaluation factor, According to the result of the simulated the experiments, GSHR-Tree is performed to prove the reasonable design and the high performance of the indexing structure that the paper presented.

The application of remote sensing image sea ice monitoring method in Bohai Bay based on C4.5 decision tree algorithm

NASA Astrophysics Data System (ADS)

Ye, Wei; Song, Wei

2018-02-01

In The Paper, the remote sensing monitoring of sea ice problem was turned into a classification problem in data mining. Based on the statistic of the related band data of HJ1B remote sensing images, the main bands of HJ1B images related with the reflectance of seawater and sea ice were found. On the basis, the decision tree rules for sea ice monitoring were constructed by the related bands found above, and then the rules were applied to Liaodong Bay area seriously covered by sea ice for sea ice monitoring. The result proved that the method is effective.
Concurrent computation of attribute filters on shared memory parallel machines.

PubMed

Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold

2008-10-01

Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.
Phylogenetic search through partial tree mixing

PubMed Central

2012-01-01

Background Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques. Results When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/psoda Conclusions The use of Partial Tree Mixing in a partition based tree space allows the algorithm to quickly converge on near optimal tree regions. These regions can then be searched in a methodical way to determine the overall optimal phylogenetic solution. PMID:23320449
Reliable Adaptive Data Aggregation Route Strategy for a Trade-off between Energy and Lifetime in WSNs

PubMed Central

Guo, Wenzhong; Hong, Wei; Zhang, Bin; Chen, Yuzhong; Xiong, Naixue

2014-01-01

Mobile security is one of the most fundamental problems in Wireless Sensor Networks (WSNs). The data transmission path will be compromised for some disabled nodes. To construct a secure and reliable network, designing an adaptive route strategy which optimizes energy consumption and network lifetime of the aggregation cost is of great importance. In this paper, we address the reliable data aggregation route problem for WSNs. Firstly, to ensure nodes work properly, we propose a data aggregation route algorithm which improves the energy efficiency in the WSN. The construction process achieved through discrete particle swarm optimization (DPSO) saves node energy costs. Then, to balance the network load and establish a reliable network, an adaptive route algorithm with the minimal energy and the maximum lifetime is proposed. Since it is a non-linear constrained multi-objective optimization problem, in this paper we propose a DPSO with the multi-objective fitness function combined with the phenotype sharing function and penalty function to find available routes. Experimental results show that compared with other tree routing algorithms our algorithm can effectively reduce energy consumption and trade off energy consumption and network lifetime. PMID:25215944
Performance and policy dimensions in internet routing

NASA Technical Reports Server (NTRS)

Mills, David L.; Boncelet, Charles G.; Elias, John G.; Schragger, Paul A.; Jackson, Alden W.; Thyagarajan, Ajit

1995-01-01

The Internet Routing Project, referred to in this report as the 'Highball Project', has been investigating architectures suitable for networks spanning large geographic areas and capable of very high data rates. The Highball network architecture is based on a high speed crossbar switch and an adaptive, distributed, TDMA scheduling algorithm. The scheduling algorithm controls the instantaneous configuration and swell time of the switch, one of which is attached to each node. In order to send a single burst or a multi-burst packet, a reservation request is sent to all nodes. The scheduling algorithm then configures the switches immediately prior to the arrival of each burst, so it can be relayed immediately without requiring local storage. Reservations and housekeeping information are sent using a special broadcast-spanning-tree schedule. Progress to date in the Highball Project includes the design and testing of a suite of scheduling algorithms, construction of software reservation/scheduling simulators, and construction of a strawman hardware and software implementation. A prototype switch controller and timestamp generator have been completed and are in test. Detailed documentation on the algorithms, protocols and experiments conducted are given in various reports and papers published. Abstracts of this literature are included in the bibliography at the end of this report, which serves as an extended executive summary.
Analytical simulation and PROFAT II: a new methodology and a computer automated tool for fault tree analysis in chemical process industries.

PubMed

Khan, F I; Abbasi, S A

2000-07-10

Fault tree analysis (FTA) is based on constructing a hypothetical tree of base events (initiating events) branching into numerous other sub-events, propagating the fault and eventually leading to the top event (accident). It has been a powerful technique used traditionally in identifying hazards in nuclear installations and power industries. As the systematic articulation of the fault tree is associated with assigning probabilities to each fault, the exercise is also sometimes called probabilistic risk assessment. But powerful as this technique is, it is also very cumbersome and costly, limiting its area of application. We have developed a new algorithm based on analytical simulation (named as AS-II), which makes the application of FTA simpler, quicker, and cheaper; thus opening up the possibility of its wider use in risk assessment in chemical process industries. Based on the methodology we have developed a computer-automated tool. The details are presented in this paper.
QuateXelero: An Accelerated Exact Network Motif Detection Algorithm

PubMed Central

Khakabimamaghani, Sahand; Sharafuddin, Iman; Dichter, Norbert; Koch, Ina; Masoudi-Nejad, Ali

2013-01-01

Finding motifs in biological, social, technological, and other types of networks has become a widespread method to gain more knowledge about these networks’ structure and function. However, this task is very computationally demanding, because it is highly associated with the graph isomorphism which is an NP problem (not known to belong to P or NP-complete subsets yet). Accordingly, this research is endeavoring to decrease the need to call NAUTY isomorphism detection method, which is the most time-consuming step in many existing algorithms. The work provides an extremely fast motif detection algorithm called QuateXelero, which has a Quaternary Tree data structure in the heart. The proposed algorithm is based on the well-known ESU (FANMOD) motif detection algorithm. The results of experiments on some standard model networks approve the overal superiority of the proposed algorithm, namely QuateXelero, compared with two of the fastest existing algorithms, G-Tries and Kavosh. QuateXelero is especially fastest in constructing the central data structure of the algorithm from scratch based on the input network. PMID:23874498
Formal modeling of a system of chemical reactions under uncertainty.

PubMed

Ghosh, Krishnendu; Schlipf, John

2014-10-01

We describe a novel formalism representing a system of chemical reactions, with imprecise rates of reactions and concentrations of chemicals, and describe a model reduction method, pruning, based on the chemical properties. We present two algorithms, midpoint approximation and interval approximation, for construction of efficient model abstractions with uncertainty in data. We evaluate computational feasibility by posing queries in computation tree logic (CTL) on a prototype of extracellular-signal-regulated kinase (ERK) pathway.
Comparing barrier algorithms

NASA Technical Reports Server (NTRS)

Arenstorf, Norbert S.; Jordan, Harry F.

1987-01-01

A barrier is a method for synchronizing a large number of concurrent computer processes. After considering some basic synchronization mechanisms, a collection of barrier algorithms with either linear or logarithmic depth are presented. A graphical model is described that profiles the execution of the barriers and other parallel programming constructs. This model shows how the interaction between the barrier algorithms and the work that they synchronize can impact their performance. One result is that logarithmic tree structured barriers show good performance when synchronizing fixed length work, while linear self-scheduled barriers show better performance when synchronizing fixed length work with an imbedded critical section. The linear barriers are better able to exploit the process skew associated with critical sections. Timing experiments, performed on an eighteen processor Flex/32 shared memory multiprocessor, that support these conclusions are detailed.
DVD-COOP: Innovative Conjunction Prediction Using Voronoi-filter based on the Dynamic Voronoi Diagram of 3D Spheres

NASA Astrophysics Data System (ADS)

Cha, J.; Ryu, J.; Lee, M.; Song, C.; Cho, Y.; Schumacher, P.; Mah, M.; Kim, D.

Conjunction prediction is one of the critical operations in space situational awareness (SSA). For geospace objects, common algorithms for conjunction prediction are usually based on all-pairwise check, spatial hash, or kd-tree. Computational load is usually reduced through some filters. However, there exists a good chance of missing potential collisions between space objects. We present a novel algorithm which both guarantees no missing conjunction and is efficient to answer to a variety of spatial queries including pairwise conjunction prediction. The algorithm takes only O(k log N) time for N objects in the worst case to answer conjunctions where k is a constant which is linear to prediction time length. The proposed algorithm, named DVD-COOP (Dynamic Voronoi Diagram-based Conjunctive Orbital Object Predictor), is based on the dynamic Voronoi diagram of moving spherical balls in 3D space. The algorithm has a preprocessing which consists of two steps: The construction of an initial Voronoi diagram (taking O(N) time on average) and the construction of a priority queue for the events of topology changes in the Voronoi diagram (taking O(N log N) time in the worst case). The scalability of the proposed algorithm is also discussed. We hope that the proposed Voronoi-approach will change the computational paradigm in spatial reasoning among space objects.
[Construction of information management-based virtual forest landscape and its application].

PubMed

Chen, Chongcheng; Tang, Liyu; Quan, Bing; Li, Jianwei; Shi, Song

2005-11-01

Based on the analysis of the contents and technical characteristics of different scale forest visualization modeling, this paper brought forward the principles and technical systems of constructing an information management-based virtual forest landscape. With the combination of process modeling and tree geometric structure description, a software method of interactively and parameterized tree modeling was developed, and the corresponding renderings and geometrical elements simplification algorithms were delineated to speed up rendering run-timely. As a pilot study, the geometrical model bases associated with the typical tree categories in Zhangpu County of Fujian Province, southeast China were established as template files. A Virtual Forest Management System prototype was developed with GIS component (ArcObject), OpenGL graphics environment, and Visual C++ language, based on forest inventory and remote sensing data. The prototype could be used for roaming between 2D and 3D, information query and analysis, and virtual and interactive forest growth simulation, and its reality and accuracy could meet the needs of forest resource management. Some typical interfaces of the system and the illustrative scene cross-sections of simulated masson pine growth under conditions of competition and thinning were listed.
Fuzzy α-minimum spanning tree problem: definition and solutions

NASA Astrophysics Data System (ADS)

Zhou, Jian; Chen, Lu; Wang, Ke; Yang, Fan

2016-04-01

In this paper, the minimum spanning tree problem is investigated on the graph with fuzzy edge weights. The notion of fuzzy ? -minimum spanning tree is presented based on the credibility measure, and then the solutions of the fuzzy ? -minimum spanning tree problem are discussed under different assumptions. First, we respectively, assume that all the edge weights are triangular fuzzy numbers and trapezoidal fuzzy numbers and prove that the fuzzy ? -minimum spanning tree problem can be transformed to a classical problem on a crisp graph in these two cases, which can be solved by classical algorithms such as the Kruskal algorithm and the Prim algorithm in polynomial time. Subsequently, as for the case that the edge weights are general fuzzy numbers, a fuzzy simulation-based genetic algorithm using Prüfer number representation is designed for solving the fuzzy ? -minimum spanning tree problem. Some numerical examples are also provided for illustrating the effectiveness of the proposed solutions.
Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees.

PubMed

Stolzer, Maureen; Lai, Han; Xu, Minli; Sathaye, Deepa; Vernot, Benjamin; Durand, Dannie

2012-09-15

Gene duplication (D), transfer (T), loss (L) and incomplete lineage sorting (I) are crucial to the evolution of gene families and the emergence of novel functions. The history of these events can be inferred via comparison of gene and species trees, a process called reconciliation, yet current reconciliation algorithms model only a subset of these evolutionary processes. We present an algorithm to reconcile a binary gene tree with a nonbinary species tree under a DTLI parsimony criterion. This is the first reconciliation algorithm to capture all four evolutionary processes driving tree incongruence and the first to reconcile non-binary species trees with a transfer model. Our algorithm infers all optimal solutions and reports complete, temporally feasible event histories, giving the gene and species lineages in which each event occurred. It is fixed-parameter tractable, with polytime complexity when the maximum species outdegree is fixed. Application of our algorithms to prokaryotic and eukaryotic data show that use of an incomplete event model has substantial impact on the events inferred and resulting biological conclusions. Our algorithms have been implemented in Notung, a freely available phylogenetic reconciliation software package, available at http://www.cs.cmu.edu/~durand/Notung. mstolzer@andrew.cmu.edu.
A stochastic multiple imputation algorithm for missing covariate data in tree-structured survival analysis.

PubMed

Wallace, Meredith L; Anderson, Stewart J; Mazumdar, Sati

2010-12-20

Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults. Copyright © 2010 John Wiley & Sons, Ltd.
Routing Algorithm based on Minimum Spanning Tree and Minimum Cost Flow for Hybrid Wireless-optical Broadband Access Network

NASA Astrophysics Data System (ADS)

Le, Zichun; Suo, Kaihua; Fu, Minglei; Jiang, Ling; Dong, Wen

2012-03-01

In order to minimize the average end to end delay for data transporting in hybrid wireless optical broadband access network, a novel routing algorithm named MSTMCF (minimum spanning tree and minimum cost flow) is devised. The routing problem is described as a minimum spanning tree and minimum cost flow model and corresponding algorithm procedures are given. To verify the effectiveness of MSTMCF algorithm, extensively simulations based on OWNS have been done under different types of traffic source.
Exact solutions for species tree inference from discordant gene trees.

PubMed

Chang, Wen-Chieh; Górecki, Paweł; Eulenstein, Oliver

2013-10-01

Phylogenetic analysis has to overcome the grant challenge of inferring accurate species trees from evolutionary histories of gene families (gene trees) that are discordant with the species tree along whose branches they have evolved. Two well studied approaches to cope with this challenge are to solve either biologically informed gene tree parsimony (GTP) problems under gene duplication, gene loss, and deep coalescence, or the classic RF supertree problem that does not rely on any biological model. Despite the potential of these problems to infer credible species trees, they are NP-hard. Therefore, these problems are addressed by heuristics that typically lack any provable accuracy and precision. We describe fast dynamic programming algorithms that solve the GTP problems and the RF supertree problem exactly, and demonstrate that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Extensions of our algorithms can also report the number of all optimal species trees, as well as the trees themselves. To better asses the quality of the resulting species trees that best fit the given gene trees, we also compute the worst case species trees, their numbers, and optimization score for each of the computational problems. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated data sets, and analyze the quality of heuristic solutions for the studied problems by contrasting them with our exact solutions.
Shortcomings with Tree-Structured Edge Encodings for Neural Networks

NASA Technical Reports Server (NTRS)

Hornby, Gregory S.

2004-01-01

In evolutionary algorithms a common method for encoding neural networks is to use a tree structured assembly procedure for constructing them. Since node operators have difficulties in specifying edge weights and these operators are execution-order dependent, an alternative is to use edge operators. Here we identify three problems with edge operators: in the initialization phase most randomly created genotypes produce an incorrect number of inputs and outputs; variation operators can easily change the number of input/output (I/O) units; and units have a connectivity bias based on their order of creation. Instead of creating I/O nodes as part of the construction process we propose using parameterized operators to connect to preexisting I/O units. Results from experiments show that these parameterized operators greatly improve the probability of creating and maintaining networks with the correct number of I/O units, remove the connectivity bias with I/O units and produce better controllers for a goal-scoring task.
The OGCleaner: filtering false-positive homology clusters.

PubMed

Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Snell, Quinn; Bybee, Seth M

2017-01-01

Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine clusters as a whole. We present the Orthology Group Cleaner (the OGCleaner), a tool designed for filtering putative orthology groups as homology or non-homology clusters by considering all sequences in a cluster. The OGCleaner relies on high-quality orthologous groups identified in OrthoDB to train machine learning algorithms that are able to distinguish between true-positive and false-positive homology groups. This package aims to improve the quality of phylogenetic tree construction especially in instances of lower-quality transcriptome assemblies. https://github.com/byucsl/ogcleaner CONTACT: sfujimoto@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Reaction Mechanism Generator: Automatic construction of chemical kinetic mechanisms

NASA Astrophysics Data System (ADS)

Gao, Connie W.; Allen, Joshua W.; Green, William H.; West, Richard H.

2016-06-01

Reaction Mechanism Generator (RMG) constructs kinetic models composed of elementary chemical reaction steps using a general understanding of how molecules react. Species thermochemistry is estimated through Benson group additivity and reaction rate coefficients are estimated using a database of known rate rules and reaction templates. At its core, RMG relies on two fundamental data structures: graphs and trees. Graphs are used to represent chemical structures, and trees are used to represent thermodynamic and kinetic data. Models are generated using a rate-based algorithm which excludes species from the model based on reaction fluxes. RMG can generate reaction mechanisms for species involving carbon, hydrogen, oxygen, sulfur, and nitrogen. It also has capabilities for estimating transport and solvation properties, and it automatically computes pressure-dependent rate coefficients and identifies chemically-activated reaction paths. RMG is an object-oriented program written in Python, which provides a stable, robust programming architecture for developing an extensible and modular code base with a large suite of unit tests. Computationally intensive functions are cythonized for speed improvements.
Algorithm for protecting light-trees in survivable mesh wavelength-division-multiplexing networks

NASA Astrophysics Data System (ADS)

Luo, Hongbin; Li, Lemin; Yu, Hongfang

2006-12-01

Wavelength-division-multiplexing (WDM) technology is expected to facilitate bandwidth-intensive multicast applications such as high-definition television. A single fiber cut in a WDM mesh network, however, can disrupt the dissemination of information to several destinations on a light-tree based multicast session. Thus it is imperative to protect multicast sessions by reserving redundant resources. We propose a novel and efficient algorithm for protecting light-trees in survivable WDM mesh networks. The algorithm is called segment-based protection with sister node first (SSNF), whose basic idea is to protect a light-tree using a set of backup segments with a higher priority to protect the segments from a branch point to its children (sister nodes). The SSNF algorithm differs from the segment protection scheme proposed in the literature in how the segments are identified and protected. Our objective is to minimize the network resources used for protecting each primary light-tree such that the blocking probability can be minimized. To verify the effectiveness of the SSNF algorithm, we conduct extensive simulation experiments. The simulation results demonstrate that the SSNF algorithm outperforms existing algorithms for the same problem.

Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.

PubMed

Barros, Rodrigo C; Winck, Ana T; Machado, Karina S; Basgalupp, Márcio P; de Carvalho, André C P L F; Ruiz, Duncan D; de Souza, Osmar Norberto

2012-11-21

This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data

PubMed Central

2012-01-01

Background This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor. PMID:23171000
a Novel Approach of Indexing and Retrieving Spatial Polygons for Efficient Spatial Region Queries

NASA Astrophysics Data System (ADS)

Zhao, J. H.; Wang, X. Z.; Wang, F. Y.; Shen, Z. H.; Zhou, Y. C.; Wang, Y. L.

2017-10-01

Spatial region queries are more and more widely used in web-based applications. Mechanisms to provide efficient query processing over geospatial data are essential. However, due to the massive geospatial data volume, heavy geometric computation, and high access concurrency, it is difficult to get response in real time. Spatial indexes are usually used in this situation. In this paper, based on k-d tree, we introduce a distributed KD-Tree (DKD-Tree) suitbable for polygon data, and a two-step query algorithm. The spatial index construction is recursive and iterative, and the query is an in memory process. Both the index and query methods can be processed in parallel, and are implemented based on HDFS, Spark and Redis. Experiments on a large volume of Remote Sensing images metadata have been carried out, and the advantages of our method are investigated by comparing with spatial region queries executed on PostgreSQL and PostGIS. Results show that our approach not only greatly improves the efficiency of spatial region query, but also has good scalability, Moreover, the two-step spatial range query algorithm can also save cluster resources to support a large number of concurrent queries. Therefore, this method is very useful when building large geographic information systems.
Learning classification trees

NASA Technical Reports Server (NTRS)

Buntine, Wray

1991-01-01

Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.
A new approach to enhance the performance of decision tree for classifying gene expression data.

PubMed

Hassan, Md; Kotagiri, Ramamohanarao

2013-12-20

Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.
Enumerating Substituted Benzene Isomers of Tree-Like Chemical Graphs.

PubMed

Li, Jinghui; Nagamochi, Hiroshi; Akutsu, Tatsuya

2018-01-01

Enumeration of chemical structures is useful for drug design, which is one of the main targets of computational biology and bioinformatics. A chemical graph with no other cycles than benzene rings is called tree-like, and becomes a tree possibly with multiple edges if we contract each benzene ring into a single virtual atom of valence 6. All tree-like chemical graphs with a given tree representation are called the substituted benzene isomers of . When we replace each virtual atom in with a benzene ring to obtain a substituted benzene isomer, distinct isomers of are caused by the difference in arrangements of atom groups around a benzene ring. In this paper, we propose an efficient algorithm that enumerates all substituted benzene isomers of a given tree representation . Our algorithm first counts the number of all the isomers of the tree representation by a dynamic programming method. To enumerate all the isomers, for each , our algorithm then generates the th isomer by backtracking the counting phase of the dynamic programming. We also implemented our algorithm for computational experiments.
Efficient tree tensor network states (TTNS) for quantum chemistry: Generalizations of the density matrix renormalization group algorithm

NASA Astrophysics Data System (ADS)

Nakatani, Naoki; Chan, Garnet Kin-Lic

2013-04-01

We investigate tree tensor network states for quantum chemistry. Tree tensor network states represent one of the simplest generalizations of matrix product states and the density matrix renormalization group. While matrix product states encode a one-dimensional entanglement structure, tree tensor network states encode a tree entanglement structure, allowing for a more flexible description of general molecules. We describe an optimal tree tensor network state algorithm for quantum chemistry. We introduce the concept of half-renormalization which greatly improves the efficiency of the calculations. Using our efficient formulation we demonstrate the strengths and weaknesses of tree tensor network states versus matrix product states. We carry out benchmark calculations both on tree systems (hydrogen trees and π-conjugated dendrimers) as well as non-tree molecules (hydrogen chains, nitrogen dimer, and chromium dimer). In general, tree tensor network states require much fewer renormalized states to achieve the same accuracy as matrix product states. In non-tree molecules, whether this translates into a computational savings is system dependent, due to the higher prefactor and computational scaling associated with tree algorithms. In tree like molecules, tree network states are easily superior to matrix product states. As an illustration, our largest dendrimer calculation with tree tensor network states correlates 110 electrons in 110 active orbitals.
The application of a decision tree to establish the parameters associated with hypertension.

PubMed

Tayefi, Maryam; Esmaeili, Habibollah; Saberi Karimian, Maryam; Amirabadi Zadeh, Alireza; Ebrahimi, Mahmoud; Safarian, Mohammad; Nematy, Mohsen; Parizadeh, Seyed Mohammad Reza; Ferns, Gordon A; Ghayour-Mobarhan, Majid

2017-02-01

Hypertension is an important risk factor for cardiovascular disease (CVD). The goal of this study was to establish the factors associated with hypertension by using a decision-tree algorithm as a supervised classification method of data mining. Data from a cross-sectional study were used in this study. A total of 9078 subjects who met the inclusion criteria were recruited. 70% of these subjects (6358 cases) were randomly allocated to the training dataset for the constructing of the decision-tree. The remaining 30% (2720 cases) were used as the testing dataset to evaluate the performance of decision-tree. Two models were evaluated in this study. In model I, age, gender, body mass index, marital status, level of education, occupation status, depression and anxiety status, physical activity level, smoking status, LDL, TG, TC, FBG, uric acid and hs-CRP were considered as input variables and in model II, age, gender, WBC, RBC, HGB, HCT MCV, MCH, PLT, RDW and PDW were considered as input variables. The validation of the model was assessed by constructing a receiver operating characteristic (ROC) curve. The prevalence rates of hypertension were 32% in our population. For the decision-tree model I, the accuracy, sensitivity, specificity and area under the ROC curve (AUC) value for identifying the related risk factors of hypertension were 73%, 63%, 77% and 0.72, respectively. The corresponding values for model II were 70%, 61%, 74% and 0.68, respectively. We have developed a decision tree model to identify the risk factors associated with hypertension that maybe used to develop programs for hypertension management. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The influence of conifer forest canopy cover on the accuracy of two individual tree measurement algorithms using lidar data

Treesearch

Michael J. Falkowski; Alistair M.S. Smith; Paul E. Gessler; Andrew T. Hudak; Lee A. Vierling; Jeffrey S. Evans

2008-01-01

Individual tree detection algorithms can provide accurate measurements of individual tree locations, crown diameters (from aerial photography and light detection and ranging (lidar) data), and tree heights (from lidar data). However, to be useful for forest management goals relating to timber harvest, carbon accounting, and ecological processes, there is a need to...
Distance-Based Phylogenetic Methods Around a Polytomy.

PubMed

Davidson, Ruth; Sullivant, Seth

2014-01-01

Distance-based phylogenetic algorithms attempt to solve the NP-hard least-squares phylogeny problem by mapping an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and neighbor-joining are points in the maximal cones in the fan. Tree metrics with polytomies lie at the intersections of maximal cones. A phylogenetic algorithm divides the space of all dissimilarity maps into regions based upon which combinatorial tree is reconstructed by the algorithm. Comparison of phylogenetic methods can be done by comparing the geometry of these regions. We use polyhedral geometry to compare the local nature of the subdivisions induced by least-squares phylogeny, UPGMA, and neighbor-joining when the true tree has a single polytomy with exactly four neighbors. Our results suggest that in some circumstances, UPGMA and neighbor-joining poorly match least-squares phylogeny.
Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

PubMed

Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

2017-10-25

Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.
Convergence properties of halo merger trees; halo and substructure merger rates across cosmic history

NASA Astrophysics Data System (ADS)

Poole, Gregory B.; Mutch, Simon J.; Croton, Darren J.; Wyithe, Stuart

2017-12-01

We introduce GBPTREES: an algorithm for constructing merger trees from cosmological simulations, designed to identify and correct for pathological cases introduced by errors or ambiguities in the halo finding process. GBPTREES is built upon a halo matching method utilizing pseudo-radial moments constructed from radially sorted particle ID lists (no other information is required) and a scheme for classifying merger tree pathologies from networks of matches made to-and-from haloes across snapshots ranging forward-and-backward in time. Focusing on SUBFIND catalogues for this work, a sweep of parameters influencing our merger tree construction yields the optimal snapshot cadence and scanning range required for converged results. Pathologies proliferate when snapshots are spaced by ≲0.128 dynamical times; conveniently similar to that needed for convergence of semi-analytical modelling, as established by Benson et al. Total merger counts are converged at the level of ∼5 per cent for friends-of-friends (FoF) haloes of size np ≳ 75 across a factor of 512 in mass resolution, but substructure rates converge more slowly with mass resolution, reaching convergence of ∼10 per cent for np ≳ 100 and particle mass mp ≲ 109 M⊙. We present analytic fits to FoF and substructure merger rates across nearly all observed galactic history (z ≤ 8.5). While we find good agreement with the results presented by Fakhouri et al. for FoF haloes, a slightly flatter dependence on merger ratio and increased major merger rates are found, reducing previously reported discrepancies with extended Press-Schechter estimates. When appropriately defined, substructure merger rates show a similar mass ratio dependence as FoF rates, but with stronger mass and redshift dependencies for their normalization.
Image compression using quad-tree coding with morphological dilation

NASA Astrophysics Data System (ADS)

Wu, Jiaji; Jiang, Weiwei; Jiao, Licheng; Wang, Lei

2007-11-01

In this paper, we propose a new algorithm which integrates morphological dilation operation to quad-tree coding, the purpose of doing this is to compensate each other's drawback by using quad-tree coding and morphological dilation operation respectively. New algorithm can not only quickly find the seed significant coefficient of dilation but also break the limit of block boundary of quad-tree coding. We also make a full use of both within-subband and cross-subband correlation to avoid the expensive cost of representing insignificant coefficients. Experimental results show that our algorithm outperforms SPECK and SPIHT. Without using any arithmetic coding, our algorithm can achieve good performance with low computational cost and it's more suitable to mobile devices or scenarios with a strict real-time requirement.
Identification of Some Zeolite Group Minerals by Application of Artificial Neural Network and Decision Tree Algorithm Based on SEM-EDS Data

NASA Astrophysics Data System (ADS)

Akkaş, Efe; Evren Çubukçu, H.; Akin, Lutfiye; Erkut, Volkan; Yurdakul, Yasin; Karayigit, Ali Ihsan

2016-04-01

Identification of zeolite group minerals is complicated due to their similar chemical formulas and habits. Although the morphologies of various zeolite crystals can be recognized under Scanning Electron Microscope (SEM), it is relatively more challenging and problematic process to identify zeolites using their mineral chemical data. SEMs integrated with energy dispersive X-ray spectrometers (EDS) provide fast and reliable chemical data of minerals. However, considering elemental similarities of characteristic chemical formulae of zeolite species (e.g. Clinoptilolite ((Na,K,Ca)2 -3Al3(Al,Si)2Si13O3612H2O) and Erionite ((Na2,K2,Ca)2Al4Si14O36ṡ15H2O)) EDS data alone does not seem to be sufficient for correct identification. Furthermore, the physical properties of the specimen (e.g. roughness, electrical conductivity) and the applied analytical conditions (e.g. accelerating voltage, beam current, spot size) of the SEM-EDS should be uniform in order to obtain reliable elemental results of minerals having high alkali (Na, K) and H2O (approx. %14-18) contents. This study which was funded by The Scientific and Technological Research Council of Turkey (TUBITAK Project No: 113Y439), aims to construct a database as large as possible for various zeolite minerals and to develop a general prediction model for the identification of zeolite minerals using SEM-EDS data. For this purpose, an artificial neural network and rule based decision tree algorithm were employed. Throughout the analyses, a total of 1850 chemical data were collected from four distinct zeolite species, (Clinoptilolite-Heulandite, Erionite, Analcime and Mordenite) observed in various rocks (e.g. coals, pyroclastics). In order to obtain a representative training data set for each minerals, a selection procedure for reference mineral analyses was applied. During the selection procedure, SEM based crystal morphology data, XRD spectra and re-calculated cationic distribution, obtained by EDS have been used for the characterization of the training set. Consequently, for each zeolite species 250 EDS data (as elemental intensities) used for training and 200 ±50 analyses were tested. Finally, two prediction models were developed. The constructed models with various cross-correlation values (r) yielded an average accuracy of >91% for the best predictions using C5.0 Decision Tree algorithm and back propagation artificial neural network. Despite having similar accuracies, the developed models exhibit different prediction behaviors for some zeolite minerals. The results demonstrate that artificial neural network as a nonlinear tool and decision tree algorithm as a rule based prediction model would be employed to provide considerably efficient and reliable identification/classification of some zeolite minerals regardless of their similar elemental compositions. Keywords: mineral identification; zeolites; energy dispersive spectrometry; artificial neural networks; decision tree.
Greedy algorithms in disordered systems

NASA Astrophysics Data System (ADS)

Duxbury, P. M.; Dobrin, R.

1999-08-01

We discuss search, minimal path and minimal spanning tree algorithms and their applications to disordered systems. Greedy algorithms solve these problems exactly, and are related to extremal dynamics in physics. Minimal cost path (Dijkstra) and minimal cost spanning tree (Prim) algorithms provide extremal dynamics for a polymer in a random medium (the KPZ universality class) and invasion percolation (without trapping) respectively.
A data driven approach for condition monitoring of wind turbine blade using vibration signals through best-first tree algorithm and functional trees algorithm: A comparative study.

PubMed

Joshuva, A; Sugumaran, V

2017-03-01

Wind energy is one of the important renewable energy resources available in nature. It is one of the major resources for production of energy because of its dependability due to the development of the technology and relatively low cost. Wind energy is converted into electrical energy using rotating blades. Due to environmental conditions and large structure, the blades are subjected to various vibration forces that may cause damage to the blades. This leads to a liability in energy production and turbine shutdown. The downtime can be reduced when the blades are diagnosed continuously using structural health condition monitoring. These are considered as a pattern recognition problem which consists of three phases namely, feature extraction, feature selection, and feature classification. In this study, statistical features were extracted from vibration signals, feature selection was carried out using a J48 decision tree algorithm and feature classification was performed using best-first tree algorithm and functional trees algorithm. The better algorithm is suggested for fault diagnosis of wind turbine blade. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Object-Oriented Algorithm For Evaluation Of Fault Trees

NASA Technical Reports Server (NTRS)

Patterson-Hine, F. A.; Koen, B. V.

1992-01-01

Algorithm for direct evaluation of fault trees incorporates techniques of object-oriented programming. Reduces number of calls needed to solve trees with repeated events. Provides significantly improved software environment for such computations as quantitative analyses of safety and reliability of complicated systems of equipment (e.g., spacecraft or factories).
Surface temperature dataset for North America obtained by application of optimal interpolation algorithm merging tree-ring chronologies and climate model output

NASA Astrophysics Data System (ADS)

Chen, Xin; Xing, Pei; Luo, Yong; Nie, Suping; Zhao, Zongci; Huang, Jianbin; Wang, Shaowu; Tian, Qinhua

2017-02-01

A new dataset of surface temperature over North America has been constructed by merging climate model results and empirical tree-ring data through the application of an optimal interpolation algorithm. Errors of both the Community Climate System Model version 4 (CCSM4) simulation and the tree-ring reconstruction were considered to optimize the combination of the two elements. Variance matching was used to reconstruct the surface temperature series. The model simulation provided the background field, and the error covariance matrix was estimated statistically using samples from the simulation results with a running 31-year window for each grid. Thus, the merging process could continue with a time-varying gain matrix. This merging method (MM) was tested using two types of experiment, and the results indicated that the standard deviation of errors was about 0.4 °C lower than the tree-ring reconstructions and about 0.5 °C lower than the model simulation. Because of internal variabilities and uncertainties in the external forcing data, the simulated decadal warm-cool periods were readjusted by the MM such that the decadal variability was more reliable (e.g., the 1940-1960s cooling). During the two centuries (1601-1800 AD) of the preindustrial period, the MM results revealed a compromised spatial pattern of the linear trend of surface temperature, which is in accordance with the phase transition of the Pacific decadal oscillation and Atlantic multidecadal oscillation. Compared with pure CCSM4 simulations, it was demonstrated that the MM brought a significant improvement to the decadal variability of the gridded temperature via the merging of temperature-sensitive tree-ring records.
Applications and Benefits for Big Data Sets Using Tree Distances and The T-SNE Algorithm

DTIC Science & Technology

2016-03-01

BENEFITS FOR BIG DATA SETS USING TREE DISTANCES AND THE T-SNE ALGORITHM by Suyoung Lee March 2016 Thesis Advisor: Samuel E. Buttrey...REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE APPLICATIONS AND BENEFITS FOR BIG DATA SETS USING TREE DISTANCES AND THE T-SNE...this work we use tree distance computed using Buttrey’s treeClust package in R, as discussed by Buttrey and Whitaker in 2015, to process mixed data
Wavelet compression of multichannel ECG data by enhanced set partitioning in hierarchical trees algorithm.

PubMed

Sharifahmadian, Ershad

2006-01-01

The set partitioning in hierarchical trees (SPIHT) algorithm is very effective and computationally simple technique for image and signal compression. Here the author modified the algorithm which provides even better performance than the SPIHT algorithm. The enhanced set partitioning in hierarchical trees (ESPIHT) algorithm has performance faster than the SPIHT algorithm. In addition, the proposed algorithm reduces the number of bits in a bit stream which is stored or transmitted. I applied it to compression of multichannel ECG data. Also, I presented a specific procedure based on the modified algorithm for more efficient compression of multichannel ECG data. This method employed on selected records from the MIT-BIH arrhythmia database. According to experiments, the proposed method attained the significant results regarding compression of multichannel ECG data. Furthermore, in order to compress one signal which is stored for a long time, the proposed multichannel compression method can be utilized efficiently.

Traversing and labeling interconnected vascular tree structures from 3D medical images

NASA Astrophysics Data System (ADS)

O'Dell, Walter G.; Govindarajan, Sindhuja Tirumalai; Salgia, Ankit; Hegde, Satyanarayan; Prabhakaran, Sreekala; Finol, Ender A.; White, R. James

2014-03-01

Purpose: Detailed characterization of pulmonary vascular anatomy has important applications for the diagnosis and management of a variety of vascular diseases. Prior efforts have emphasized using vessel segmentation to gather information on the number or branches, number of bifurcations, and branch length and volume, but accurate traversal of the vessel tree to identify and repair erroneous interconnections between adjacent branches and neighboring tree structures has not been carefully considered. In this study, we endeavor to develop and implement a successful approach to distinguishing and characterizing individual vascular trees from among a complex intermingling of trees. Methods: We developed strategies and parameters in which the algorithm identifies and repairs false branch inter-tree and intra-tree connections to traverse complicated vessel trees. A series of two-dimensional (2D) virtual datasets with a variety of interconnections were constructed for development, testing, and validation. To demonstrate the approach, a series of real 3D computed tomography (CT) lung datasets were obtained, including that of an anthropomorphic chest phantom; an adult human chest CT; a pediatric patient chest CT; and a micro-CT of an excised rat lung preparation. Results: Our method was correct in all 2D virtual test datasets. For each real 3D CT dataset, the resulting simulated vessel tree structures faithfully depicted the vessel tree structures that were originally extracted from the corresponding lung CT scans. Conclusion: We have developed a comprehensive strategy for traversing and labeling interconnected vascular trees and successfully implemented its application to pulmonary vessels observed using 3D CT images of the chest.
On the Complexity of Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees.

PubMed

Kordi, Misagh; Bansal, Mukul S

2017-01-01

Duplication-Transfer-Loss (DTL) reconciliation has emerged as a powerful technique for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation takes as input a gene family phylogeny and the corresponding species phylogeny, and reconciles the two by postulating speciation, gene duplication, horizontal gene transfer, and gene loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. However, gene trees are frequently non-binary. With such non-binary gene trees, the reconciliation problem seeks to find a binary resolution of the gene tree that minimizes the reconciliation cost. Given the prevalence of non-binary gene trees, many efficient algorithms have been developed for this problem in the context of the simpler Duplication-Loss (DL) reconciliation model. Yet, no efficient algorithms exist for DTL reconciliation with non-binary gene trees and the complexity of the problem remains unknown. In this work, we resolve this open question by showing that the problem is, in fact, NP-hard. Our reduction applies to both the dated and undated formulations of DTL reconciliation. By resolving this long-standing open problem, this work will spur the development of both exact and heuristic algorithms for this important problem.
Automated Mobile System for Accurate Outdoor Tree Crop Enumeration Using an Uncalibrated Camera.

PubMed

Nguyen, Thuy Tuong; Slaughter, David C; Hanson, Bradley D; Barber, Andrew; Freitas, Amy; Robles, Daniel; Whelan, Erin

2015-07-28

This paper demonstrates an automated computer vision system for outdoor tree crop enumeration in a seedling nursery. The complete system incorporates both hardware components (including an embedded microcontroller, an odometry encoder, and an uncalibrated digital color camera) and software algorithms (including microcontroller algorithms and the proposed algorithm for tree crop enumeration) required to obtain robust performance in a natural outdoor environment. The enumeration system uses a three-step image analysis process based upon: (1) an orthographic plant projection method integrating a perspective transform with automatic parameter estimation; (2) a plant counting method based on projection histograms; and (3) a double-counting avoidance method based on a homography transform. Experimental results demonstrate the ability to count large numbers of plants automatically with no human effort. Results show that, for tree seedlings having a height up to 40 cm and a within-row tree spacing of approximately 10 cm, the algorithms successfully estimated the number of plants with an average accuracy of 95.2% for trees within a single image and 98% for counting of the whole plant population in a large sequence of images.
Automated Mobile System for Accurate Outdoor Tree Crop Enumeration Using an Uncalibrated Camera

PubMed Central

Nguyen, Thuy Tuong; Slaughter, David C.; Hanson, Bradley D.; Barber, Andrew; Freitas, Amy; Robles, Daniel; Whelan, Erin

2015-01-01

This paper demonstrates an automated computer vision system for outdoor tree crop enumeration in a seedling nursery. The complete system incorporates both hardware components (including an embedded microcontroller, an odometry encoder, and an uncalibrated digital color camera) and software algorithms (including microcontroller algorithms and the proposed algorithm for tree crop enumeration) required to obtain robust performance in a natural outdoor environment. The enumeration system uses a three-step image analysis process based upon: (1) an orthographic plant projection method integrating a perspective transform with automatic parameter estimation; (2) a plant counting method based on projection histograms; and (3) a double-counting avoidance method based on a homography transform. Experimental results demonstrate the ability to count large numbers of plants automatically with no human effort. Results show that, for tree seedlings having a height up to 40 cm and a within-row tree spacing of approximately 10 cm, the algorithms successfully estimated the number of plants with an average accuracy of 95.2% for trees within a single image and 98% for counting of the whole plant population in a large sequence of images. PMID:26225982
Cloud Detection from Satellite Imagery: A Comparison of Expert-Generated and Automatically-Generated Decision Trees

NASA Technical Reports Server (NTRS)

Shiffman, Smadar

2004-01-01

Automated cloud detection and tracking is an important step in assessing global climate change via remote sensing. Cloud masks, which indicate whether individual pixels depict clouds, are included in many of the data products that are based on data acquired on- board earth satellites. Many cloud-mask algorithms have the form of decision trees, which employ sequential tests that scientists designed based on empirical astrophysics studies and astrophysics simulations. Limitations of existing cloud masks restrict our ability to accurately track changes in cloud patterns over time. In this study we explored the potential benefits of automatically-learned decision trees for detecting clouds from images acquired using the Advanced Very High Resolution Radiometer (AVHRR) instrument on board the NOAA-14 weather satellite of the National Oceanic and Atmospheric Administration. We constructed three decision trees for a sample of 8km-daily AVHRR data from 2000 using a decision-tree learning procedure provided within MATLAB(R), and compared the accuracy of the decision trees to the accuracy of the cloud mask. We used ground observations collected by the National Aeronautics and Space Administration Clouds and the Earth s Radiant Energy Systems S COOL project as the gold standard. For the sample data, the accuracy of automatically learned decision trees was greater than the accuracy of the cloud masks included in the AVHRR data product.
StrainSeeker: fast identification of bacterial strains from raw sequencing reads using user-provided guide trees.

PubMed

Roosaare, Märt; Vaher, Mihkel; Kaplinski, Lauris; Möls, Märt; Andreson, Reidar; Lepamets, Maarja; Kõressaar, Triinu; Naaber, Paul; Kõljalg, Siiri; Remm, Maido

2017-01-01

Fast, accurate and high-throughput identification of bacterial isolates is in great demand. The present work was conducted to investigate the possibility of identifying isolates from unassembled next-generation sequencing reads using custom-made guide trees. A tool named StrainSeeker was developed that constructs a list of specific k -mers for each node of any given Newick-format tree and enables the identification of bacterial isolates in 1-2 min. It uses a novel algorithm, which analyses the observed and expected fractions of node-specific k -mers to test the presence of each node in the sample. This allows StrainSeeker to determine where the isolate branches off the guide tree and assign it to a clade whereas other tools assign each read to a reference genome. Using a dataset of 100 Escherichia coli isolates, we demonstrate that StrainSeeker can predict the clades of E. coli with 92% accuracy and correct tree branch assignment with 98% accuracy. Twenty-five thousand Illumina HiSeq reads are sufficient for identification of the strain. StrainSeeker is a software program that identifies bacterial isolates by assigning them to nodes or leaves of a custom-made guide tree. StrainSeeker's web interface and pre-computed guide trees are available at http://bioinfo.ut.ee/strainseeker. Source code is stored at GitHub: https://github.com/bioinfo-ut/StrainSeeker.
Classification tree for the assessment of sedentary lifestyle among hypertensive.

PubMed

Castelo Guedes Martins, Larissa; Venícios de Oliveira Lopes, Marcos; Gomes Guedes, Nirla; Paixão de Menezes, Angélica; de Oliveira Farias, Odaleia; Alves Dos Santos, Naftale

2016-04-01

To develop a classification tree of clinical indicators for the correct prediction of the nursing diagnosis "Sedentary lifestyle" (SL) in people with high blood pressure (HTN). A cross-sectional study conducted in an outpatient care center specializing in high blood pressure and Mellitus diabetes located in northeastern Brazil. The sample consisted of 285 people between 19 and 59 years old diagnosed with high blood pressure and was applied an interview and physical examination, obtaining socio-demographic information, related factors and signs and symptoms that made the defining characteristics for the diagnosis under study. The tree was generated using the CHAID algorithm (Chi-square Automatic Interaction Detection). The construction of the decision tree allowed establishing the interactions between clinical indicators that facilitate a probabilistic analysis of multiple situations allowing quantify the probability of an individual presenting a sedentary lifestyle. The tree included the clinical indicator Choose daily routine without exercise as the first node. People with this indicator showed a probability of 0.88 of presenting the SL. The second node was composed of the indicator Does not perform physical activity during leisure, with 0.99 probability of presenting the SL with these two indicators. The predictive capacity of the tree was established at 69.5%. Decision trees help nurses who care HTN people in decision-making in assessing the characteristics that increase the probability of SL nursing diagnosis, optimizing the time for diagnostic inference.
Certification trails for data structures

NASA Technical Reports Server (NTRS)

Sullivan, Gregory F.; Masson, Gerald M.

1993-01-01

Certification trails are a recently introduced and promising approach to fault detection and fault tolerance. The applicability of the certification trail technique is significantly generalized. Previously, certification trails had to be customized to each algorithm application; trails appropriate to wide classes of algorithms were developed. These certification trails are based on common data-structure operations such as those carried out using these sets of operations such as those carried out using balanced binary trees and heaps. Any algorithms using these sets of operations can therefore employ the certification trail method to achieve software fault tolerance. To exemplify the scope of the generalization of the certification trail technique provided, constructions of trails for abstract data types such as priority queues and union-find structures are given. These trails are applicable to any data-structure implementation of the abstract data type. It is also shown that these ideals lead naturally to monitors for data-structure operations.
Segmentation algorithm on smartphone dual camera: application to plant organs in the wild

NASA Astrophysics Data System (ADS)

Bertrand, Sarah; Cerutti, Guillaume; Tougne, Laure

2018-04-01

In order to identify the species of a tree, the different organs that are the leaves, the bark, the flowers and the fruits, are inspected by botanists. So as to develop an algorithm that identifies automatically the species, we need to extract these objects of interest from their complex natural environment. In this article, we focus on the segmentation of flowers and fruits and we present a new method of segmentation based on an active contour algorithm using two probability maps. The first map is constructed via the dual camera that we can find on the back of the latest smartphones. The second map is made with the help of a multilayer perceptron (MLP). The combination of these two maps to drive the evolution of the object contour allows an efficient segmentation of the organ from a natural background.
In silico prediction of toxicity of phenols to Tetrahymena pyriformis by using genetic algorithm and decision tree-based modeling approach.

PubMed

Abbasitabar, Fatemeh; Zare-Shahabadi, Vahid

2017-04-01

Risk assessment of chemicals is an important issue in environmental protection; however, there is a huge lack of experimental data for a large number of end-points. The experimental determination of toxicity of chemicals involves high costs and time-consuming process. In silico tools such as quantitative structure-toxicity relationship (QSTR) models, which are constructed on the basis of computational molecular descriptors, can predict missing data for toxic end-points for existing or even not yet synthesized chemicals. Phenol derivatives are known to be aquatic pollutants. With this background, we aimed to develop an accurate and reliable QSTR model for the prediction of toxicity of 206 phenols to Tetrahymena pyriformis. A multiple linear regression (MLR)-based QSTR was obtained using a powerful descriptor selection tool named Memorized_ACO algorithm. Statistical parameters of the model were 0.72 and 0.68 for R training 2 and R test 2 , respectively. To develop a high-quality QSTR model, classification and regression tree (CART) was employed. Two approaches were considered: (1) phenols were classified into different modes of action using CART and (2) the phenols in the training set were partitioned to several subsets by a tree in such a manner that in each subset, a high-quality MLR could be developed. For the first approach, the statistical parameters of the resultant QSTR model were improved to 0.83 and 0.75 for R training 2 and R test 2 , respectively. Genetic algorithm was employed in the second approach to obtain an optimal tree, and it was shown that the final QSTR model provided excellent prediction accuracy for the training and test sets (R training 2 and R test 2 were 0.91 and 0.93, respectively). The mean absolute error for the test set was computed as 0.1615. Copyright © 2016 Elsevier Ltd. All rights reserved.
Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

PubMed Central

Chang, Wan-Yu; Chiu, Chung-Cheng; Yang, Jia-Horng

2015-01-01

In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods. PMID:26393597
Undergraduate Students’ Difficulties in Reading and Constructing Phylogenetic Tree

NASA Astrophysics Data System (ADS)

Sa'adah, S.; Tapilouw, F. S.; Hidayat, T.

2017-02-01

Representation is a very important communication tool to communicate scientific concepts. Biologists produce phylogenetic representation to express their understanding of evolutionary relationships. The phylogenetic tree is visual representation depict a hypothesis about the evolutionary relationship and widely used in the biological sciences. Phylogenetic tree currently growing for many disciplines in biology. Consequently, learning about phylogenetic tree become an important part of biological education and an interesting area for biology education research. However, research showed many students often struggle with interpreting the information that phylogenetic trees depict. The purpose of this study was to investigate undergraduate students’ difficulties in reading and constructing a phylogenetic tree. The method of this study is a descriptive method. In this study, we used questionnaires, interviews, multiple choice and open-ended questions, reflective journals and observations. The findings showed students experiencing difficulties, especially in constructing a phylogenetic tree. The students’ responds indicated that main reasons for difficulties in constructing a phylogenetic tree are difficult to placing taxa in a phylogenetic tree based on the data provided so that the phylogenetic tree constructed does not describe the actual evolutionary relationship (incorrect relatedness). Students also have difficulties in determining the sister group, character synapomorphy, autapomorphy from data provided (character table) and comparing among phylogenetic tree. According to them building the phylogenetic tree is more difficult than reading the phylogenetic tree. Finding this studies provide information to undergraduate instructor and students to overcome learning difficulties of reading and constructing phylogenetic tree.
Faster Bit-Parallel Algorithms for Unordered Pseudo-tree Matching and Tree Homeomorphism

NASA Astrophysics Data System (ADS)

Kaneta, Yusaku; Arimura, Hiroki

In this paper, we consider the unordered pseudo-tree matching problem, which is a problem of, given two unordered labeled trees P and T, finding all occurrences of P in T via such many-one embeddings that preserve node labels and parent-child relationship. This problem is closely related to tree pattern matching problem for XPath queries with child axis only. If m > w , we present an efficient algorithm that solves the problem in O(nm log(w)/w) time using O(hm/w + mlog(w)/w) space and O(m log(w)) preprocessing on a unit-cost arithmetic RAM model with addition, where m is the number of nodes in P, n is the number of nodes in T, h is the height of T, and w is the word length. We also discuss a modification of our algorithm for the unordered tree homeomorphism problem, which corresponds to a tree pattern matching problem for XPath queries with descendant axis only.
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azad, Ariful; Buluc, Aydn; Pothen, Alex

It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

DOE PAGES

Azad, Ariful; Buluc, Aydn; Pothen, Alex

2016-03-24

It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems

NASA Astrophysics Data System (ADS)

Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen

Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.
Input selection and performance optimization of ANN-based streamflow forecasts in the drought-prone Murray Darling Basin region using IIS and MODWT algorithm

NASA Astrophysics Data System (ADS)

Prasad, Ramendra; Deo, Ravinesh C.; Li, Yan; Maraseni, Tek

2017-11-01

Forecasting streamflow is vital for strategically planning, utilizing and redistributing water resources. In this paper, a wavelet-hybrid artificial neural network (ANN) model integrated with iterative input selection (IIS) algorithm (IIS-W-ANN) is evaluated for its statistical preciseness in forecasting monthly streamflow, and it is then benchmarked against M5 Tree model. To develop hybrid IIS-W-ANN model, a global predictor matrix is constructed for three local hydrological sites (Richmond, Gwydir, and Darling River) in Australia's agricultural (Murray-Darling) Basin. Model inputs comprised of statistically significant lagged combination of streamflow water level, are supplemented by meteorological data (i.e., precipitation, maximum and minimum temperature, mean solar radiation, vapor pressure and evaporation) as the potential model inputs. To establish robust forecasting models, iterative input selection (IIS) algorithm is applied to screen the best data from the predictor matrix and is integrated with the non-decimated maximum overlap discrete wavelet transform (MODWT) applied on the IIS-selected variables. This resolved the frequencies contained in predictor data while constructing a wavelet-hybrid (i.e., IIS-W-ANN and IIS-W-M5 Tree) model. Forecasting ability of IIS-W-ANN is evaluated via correlation coefficient (r), Willmott's Index (WI), Nash-Sutcliffe Efficiency (ENS), root-mean-square-error (RMSE), and mean absolute error (MAE), including the percentage RMSE and MAE. While ANN models are seen to outperform M5 Tree executed for all hydrological sites, the IIS variable selector was efficient in determining the appropriate predictors, as stipulated by the better performance of the IIS coupled (ANN and M5 Tree) models relative to the models without IIS. When IIS-coupled models are integrated with MODWT, the wavelet-hybrid IIS-W-ANN and IIS-W-M5 Tree are seen to attain significantly accurate performance relative to their standalone counterparts. Importantly, IIS-W-ANN model accuracy outweighs IIS-ANN, as evidenced by a larger r and WI (by 7.5% and 3.8%, respectively) and a lower RMSE (by 21.3%). In comparison to the IIS-W-M5 Tree model, IIS-W-ANN model yielded larger values of WI = 0.936-0.979 and ENS = 0.770-0.920. Correspondingly, the errors (RMSE and MAE) ranged from 0.162-0.487 m and 0.139-0.390 m, respectively, with relative errors, RRMSE = (15.65-21.00) % and MAPE = (14.79-20.78) %. Distinct geographic signature is evident where the most and least accurately forecasted streamflow data is attained for the Gwydir and Darling River, respectively. Conclusively, this study advocates the efficacy of iterative input selection, allowing the proper screening of model predictors, and subsequently, its integration with MODWT resulting in enhanced performance of the models applied in streamflow forecasting.
Pentium Pro inside. 1; A treecode at 430 Gigaflops on ASCI Red

NASA Technical Reports Server (NTRS)

Warren, M. S.; Becker, D. J.; Sterling, T.; Salmon, J. K.; Goda, M. P.

1997-01-01

As an entry for the 1997 Gordon Bell performance prize, we present results from two methods of solving the gravitational N-body problem on the Intel Teraflops system at Sandia National Laboratory (ASCI Red). The first method, an O(N2) algorithm, obtained 635 Gigaflops for a 1 million particle problem on 6800 Pentium Pro processors. The second solution method, a tree-code which scales as O(N log N), sustained 170 Gigaflops over a continuous 9.4 hour period on 4096 processors, integrating the motion of 322 million mutually interacting particles in a cosmology simulation, while saving over 100 Gigabytes of raw data. Additionally, the tree-code sustained 430 Gigaflops on 6800 processors for the first 5 time-steps of that simulation. This tree-code solution is approximately 105 times more efficient than the O(N2) algorithm for this problem. As an entry for the 1997 Gordon Bell price/performance prize, we present two calculations from the disciplines of astrophysics and fluid dynamics. The simulations were performed on two 16 Pentium Pro processor Beowulf-class computers (Loki and Hyglac) constructed entirely from commodity personal computer technology, at a cost of roughly $50k each in September, 1996. The price of an equivalent system in August 1997 is less than $30. At Los Alamos, Loki performed a gravitational tree-code N-body simulation of galaxy formation using 9.75 million particles, which sustained an average of 879 Mflops over a ten day period, and produced roughly 10 Gbytes of raw data.
Efficient algorithms for a class of partitioning problems

NASA Technical Reports Server (NTRS)

Iqbal, M. Ashraf; Bokhari, Shahid H.

1990-01-01

The problem of optimally partitioning the modules of chain- or tree-like tasks over chain-structured or host-satellite multiple computer systems is addressed. This important class of problems includes many signal processing and industrial control applications. Prior research has resulted in a succession of faster exact and approximate algorithms for these problems. Polynomial exact and approximate algorithms are described for this class that are better than any of the previously reported algorithms. The approach is based on a preprocessing step that condenses the given chain or tree structured task into a monotonic chain or tree. The partitioning of this monotonic take can then be carried out using fast search techniques.
Reaction Mechanism Generator: Automatic construction of chemical kinetic mechanisms

DOE PAGES

Gao, Connie W.; Allen, Joshua W.; Green, William H.; ...

2016-02-24

Reaction Mechanism Generator (RMG) constructs kinetic models composed of elementary chemical reaction steps using a general understanding of how molecules react. Species thermochemistry is estimated through Benson group additivity and reaction rate coefficients are estimated using a database of known rate rules and reaction templates. At its core, RMG relies on two fundamental data structures: graphs and trees. Graphs are used to represent chemical structures, and trees are used to represent thermodynamic and kinetic data. Models are generated using a rate-based algorithm which excludes species from the model based on reaction fluxes. RMG can generate reaction mechanisms for species involvingmore » carbon, hydrogen, oxygen, sulfur, and nitrogen. It also has capabilities for estimating transport and solvation properties, and it automatically computes pressure-dependent rate coefficients and identifies chemically-activated reaction paths. RMG is an object-oriented program written in Python, which provides a stable, robust programming architecture for developing an extensible and modular code base with a large suite of unit tests. Computationally intensive functions are cythonized for speed improvements.« less

A New Minimum Trees-Based Approach for Shape Matching with Improved Time Computing: Application to Graphical Symbols Recognition

NASA Astrophysics Data System (ADS)

Franco, Patrick; Ogier, Jean-Marc; Loonis, Pierre; Mullot, Rémy

Recently we have developed a model for shape description and matching. Based on minimum spanning trees construction and specifics stages like the mixture, it seems to have many desirable properties. Recognition invariance in front shift, rotated and noisy shape was checked through median scale tests related to GREC symbol reference database. Even if extracting the topology of a shape by mapping the shortest path connecting all the pixels seems to be powerful, the construction of graph induces an expensive algorithmic cost. In this article we discuss on the ways to reduce time computing. An alternative solution based on image compression concepts is provided and evaluated. The model no longer operates in the image space but in a compact space, namely the Discrete Cosine space. The use of block discrete cosine transform is discussed and justified. The experimental results led on the GREC2003 database show that the proposed method is characterized by a good discrimination power, a real robustness to noise with an acceptable time computing.
iNJclust: Iterative Neighbor-Joining Tree Clustering Framework for Inferring Population Structure.

PubMed

Limpiti, Tulaya; Amornbunchornvej, Chainarong; Intarapanich, Apichart; Assawamakin, Anunchai; Tongsima, Sissades

2014-01-01

Understanding genetic differences among populations is one of the most important issues in population genetics. Genetic variations, e.g., single nucleotide polymorphisms, are used to characterize commonality and difference of individuals from various populations. This paper presents an efficient graph-based clustering framework which operates iteratively on the Neighbor-Joining (NJ) tree called the iNJclust algorithm. The framework uses well-known genetic measurements, namely the allele-sharing distance, the neighbor-joining tree, and the fixation index. The behavior of the fixation index is utilized in the algorithm's stopping criterion. The algorithm provides an estimated number of populations, individual assignments, and relationships between populations as outputs. The clustering result is reported in the form of a binary tree, whose terminal nodes represent the final inferred populations and the tree structure preserves the genetic relationships among them. The clustering performance and the robustness of the proposed algorithm are tested extensively using simulated and real data sets from bovine, sheep, and human populations. The result indicates that the number of populations within each data set is reasonably estimated, the individual assignment is robust, and the structure of the inferred population tree corresponds to the intrinsic relationships among populations within the data.
Performance analysis of a dual-tree algorithm for computing spatial distance histograms

PubMed Central

Chen, Shaoping; Tu, Yi-Cheng; Xia, Yuni

2011-01-01

Many scientific and engineering fields produce large volume of spatiotemporal data. The storage, retrieval, and analysis of such data impose great challenges to database systems design. Analysis of scientific spatiotemporal data often involves computing functions of all point-to-point interactions. One such analytics, the Spatial Distance Histogram (SDH), is of vital importance to scientific discovery. Recently, algorithms for efficient SDH processing in large-scale scientific databases have been proposed. These algorithms adopt a recursive tree-traversing strategy to process point-to-point distances in the visited tree nodes in batches, thus require less time when compared to the brute-force approach where all pairwise distances have to be computed. Despite the promising experimental results, the complexity of such algorithms has not been thoroughly studied. In this paper, we present an analysis of such algorithms based on a geometric modeling approach. The main technique is to transform the analysis of point counts into a problem of quantifying the area of regions where pairwise distances can be processed in batches by the algorithm. From the analysis, we conclude that the number of pairwise distances that are left to be processed decreases exponentially with more levels of the tree visited. This leads to the proof of a time complexity lower than the quadratic time needed for a brute-force algorithm and builds the foundation for a constant-time approximate algorithm. Our model is also general in that it works for a wide range of point spatial distributions, histogram types, and space-partitioning options in building the tree. PMID:21804753
An algorithm to count the number of repeated patient data entries with B tree.

PubMed

Okada, M; Okada, M

1985-04-01

An algorithm to obtain the number of different values that appear a specified number of times in a given data field of a given data file is presented. Basically, a well-known B-tree structure is employed in this study. Some modifications were made to the basic B-tree algorithm. The first step of the modifications is to allow a data item whose values are not necessary distinct from one record to another to be used as a primary key. When a key value is inserted, the number of previous appearances is counted. At the end of all the insertions, the number of key values which are unique in the tree, the number of key values which appear twice, three times, and so forth are obtained. This algorithm is especially powerful for a large size file in disk storage.
Wavelet tree structure based speckle noise removal for optical coherence tomography

NASA Astrophysics Data System (ADS)

Yuan, Xin; Liu, Xuan; Liu, Yang

2018-02-01

We report a new speckle noise removal algorithm in optical coherence tomography (OCT). Though wavelet domain thresholding algorithms have demonstrated superior advantages in suppressing noise magnitude and preserving image sharpness in OCT, the wavelet tree structure has not been investigated in previous applications. In this work, we propose an adaptive wavelet thresholding algorithm via exploiting the tree structure in wavelet coefficients to remove the speckle noise in OCT images. The threshold for each wavelet band is adaptively selected following a special rule to retain the structure of the image across different wavelet layers. Our results demonstrate that the proposed algorithm outperforms conventional wavelet thresholding, with significant advantages in preserving image features.
Using trees to compute approximate solutions to ordinary differential equations exactly

NASA Technical Reports Server (NTRS)

Grossman, Robert

1991-01-01

Some recent work is reviewed which relates families of trees to symbolic algorithms for the exact computation of series which approximate solutions of ordinary differential equations. It turns out that the vector space whose basis is the set of finite, rooted trees carries a natural multiplication related to the composition of differential operators, making the space of trees an algebra. This algebraic structure can be exploited to yield a variety of algorithms for manipulating vector fields and the series and algebras they generate.
Benchmarking protein classification algorithms via supervised cross-validation.

PubMed

Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

2008-04-24

Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph.

PubMed

Jothi, R; Mohanty, Sraban Kumar; Ojha, Aparajita

2016-04-01

Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.
Improved quantum backtracking algorithms using effective resistance estimates

NASA Astrophysics Data System (ADS)

Jarret, Michael; Wan, Kianna

2018-02-01

We investigate quantum backtracking algorithms of the type introduced by Montanaro (Montanaro, arXiv:1509.02374). These algorithms explore trees of unknown structure and in certain settings exponentially outperform their classical counterparts. Some of the previous work focused on obtaining a quantum advantage for trees in which a unique marked vertex is promised to exist. We remove this restriction by recharacterizing the problem in terms of the effective resistance of the search space. In this paper, we present a generalization of one of Montanaro's algorithms to trees containing k marked vertices, where k is not necessarily known a priori. Our approach involves using amplitude estimation to determine a near-optimal weighting of a diffusion operator, which can then be applied to prepare a superposition state with support only on marked vertices and ancestors thereof. By repeatedly sampling this state and updating the input vertex, a marked vertex is reached in a logarithmic number of steps. The algorithm thereby achieves the conjectured bound of O ˜(√{T Rmax }) for finding a single marked vertex and O ˜(k √{T Rmax }) for finding all k marked vertices, where T is an upper bound on the tree size and Rmax is the maximum effective resistance encountered by the algorithm. This constitutes a speedup over Montanaro's original procedure in both the case of finding one and the case of finding multiple marked vertices in an arbitrary tree.
A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran.

PubMed

Khosravi, Khabat; Pham, Binh Thai; Chapi, Kamran; Shirzadi, Ataollah; Shahabi, Himan; Revhaug, Inge; Prakash, Indra; Tien Bui, Dieu

2018-06-15

Floods are one of the most damaging natural hazards causing huge loss of property, infrastructure and lives. Prediction of occurrence of flash flood locations is very difficult due to sudden change in climatic condition and manmade factors. However, prior identification of flood susceptible areas can be done with the help of machine learning techniques for proper timely management of flood hazards. In this study, we tested four decision trees based machine learning models namely Logistic Model Trees (LMT), Reduced Error Pruning Trees (REPT), Naïve Bayes Trees (NBT), and Alternating Decision Trees (ADT) for flash flood susceptibility mapping at the Haraz Watershed in the northern part of Iran. For this, a spatial database was constructed with 201 present and past flood locations and eleven flood-influencing factors namely ground slope, altitude, curvature, Stream Power Index (SPI), Topographic Wetness Index (TWI), land use, rainfall, river density, distance from river, lithology, and Normalized Difference Vegetation Index (NDVI). Statistical evaluation measures, the Receiver Operating Characteristic (ROC) curve, and Freidman and Wilcoxon signed-rank tests were used to validate and compare the prediction capability of the models. Results show that the ADT model has the highest prediction capability for flash flood susceptibility assessment, followed by the NBT, the LMT, and the REPT, respectively. These techniques have proven successful in quickly determining flood susceptible areas. Copyright © 2018 Elsevier B.V. All rights reserved.
Towards Robust Multiagent Plans

DTIC Science & Technology

2016-01-20

corre- sponding to the core UCT algorithm, and Figure 2 illustrates the UCT tree construction, with n denoting the number of state-space samples. 3...is, A(sk) = ∅, or when k = H. Each node/action pair (s, a) is associated with a counter n (s, a) and a value accumulator Q̂(s, a). Both n (s, a) and Q̂...multi-armed bandit (MAB) problems (Robbins, 1952): If n (si, a) > 0 for all a ∈ A(si), then ai+1 = argmax a [ Q̂(si, a) + c √ log n (si) n (si, a) ] , (1
Uncertain decision tree inductive inference

NASA Astrophysics Data System (ADS)

Zarban, L.; Jafari, S.; Fakhrahmad, S. M.

2011-10-01

Induction is the process of reasoning in which general rules are formulated based on limited observations of recurring phenomenal patterns. Decision tree learning is one of the most widely used and practical inductive methods, which represents the results in a tree scheme. Various decision tree algorithms have already been proposed such as CLS, ID3, Assistant C4.5, REPTree and Random Tree. These algorithms suffer from some major shortcomings. In this article, after discussing the main limitations of the existing methods, we introduce a new decision tree induction algorithm, which overcomes all the problems existing in its counterparts. The new method uses bit strings and maintains important information on them. This use of bit strings and logical operation on them causes high speed during the induction process. Therefore, it has several important features: it deals with inconsistencies in data, avoids overfitting and handles uncertainty. We also illustrate more advantages and the new features of the proposed method. The experimental results show the effectiveness of the method in comparison with other methods existing in the literature.
A fast bottom-up algorithm for computing the cut sets of noncoherent fault trees

DOE Office of Scientific and Technical Information (OSTI.GOV)

Corynen, G.C.

1987-11-01

An efficient procedure for finding the cut sets of large fault trees has been developed. Designed to address coherent or noncoherent systems, dependent events, shared or common-cause events, the method - called SHORTCUT - is based on a fast algorithm for transforming a noncoherent tree into a quasi-coherent tree (COHERE), and on a new algorithm for reducing cut sets (SUBSET). To assure sufficient clarity and precision, the procedure is discussed in the language of simple sets, which is also developed in this report. Although the new method has not yet been fully implemented on the computer, we report theoretical worst-casemore » estimates of its computational complexity. 12 refs., 10 figs.« less
Learning Extended Finite State Machines

NASA Technical Reports Server (NTRS)

Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

2014-01-01

We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.
A new tool for post-AGB SED classification

NASA Astrophysics Data System (ADS)

Bendjoya, P.; Suarez, O.; Galluccio, L.; Michel, O.

We present the results of an unsupervised classification method applied on a set of 344 spectral energy distributions (SED) of post-AGB stars extracted from the Torun catalogue of Galactic post-AGB stars. This method aims to find a new unbiased method for post-AGB star classification based on the information contained in the IR region of the SED (fluxes, IR excess, colours). We used the data from IRAS and MSX satellites, and from the 2MASS survey. We applied a classification method based on the construction of the dataset of a minimal spanning tree (MST) with the Prim's algorithm. In order to build this tree, different metrics have been tested on both flux and color indices. Our method is able to classify the set of 344 post-AGB stars in 9 distinct groups according to their SEDs.
Genealogical Trees of Scientific Papers

PubMed Central

Waumans, Michaël Charles; Bersini, Hugues

2016-01-01

Many results have been obtained when studying scientific papers citations databases in a network perspective. Articles can be ranked according to their current in-degree and their future popularity or citation counts can even be predicted. The dynamical properties of such networks and the observation of the time evolution of their nodes started more recently. This work adopts an evolutionary perspective and proposes an original algorithm for the construction of genealogical trees of scientific papers on the basis of their citation count evolution in time. The fitness of a paper now amounts to its in-degree growing trend and a “dying” paper will suddenly see this trend declining in time. It will give birth and be taken over by some of its most prevalent citing “offspring”. Practically, this might be used to trace the successive published milestones of a research field. PMID:26954677
Structure theorems for game trees

PubMed Central

Govindan, Srihari; Wilson, Robert

2002-01-01

Kohlberg and Mertens [Kohlberg, E. & Mertens, J. (1986) Econometrica 54, 1003–1039] proved that the graph of the Nash equilibrium correspondence is homeomorphic to its domain when the domain is the space of payoffs in normal-form games. A counterexample disproves the analog for the equilibrium outcome correspondence over the space of payoffs in extensive-form games, but we prove an analog when the space of behavior strategies is perturbed so that every path in the game tree has nonzero probability. Without such perturbations, the graph is the closure of the union of a finite collection of its subsets, each diffeomorphic to a corresponding path-connected open subset of the space of payoffs. As an application, we construct an algorithm for computing equilibria of an extensive-form game with a perturbed strategy space, and thus approximate equilibria of the unperturbed game. PMID:12060702
Structure theorems for game trees.

PubMed

Govindan, Srihari; Wilson, Robert

2002-06-25

Kohlberg and Mertens [Kohlberg, E. & Mertens, J. (1986) Econometrica 54, 1003-1039] proved that the graph of the Nash equilibrium correspondence is homeomorphic to its domain when the domain is the space of payoffs in normal-form games. A counterexample disproves the analog for the equilibrium outcome correspondence over the space of payoffs in extensive-form games, but we prove an analog when the space of behavior strategies is perturbed so that every path in the game tree has nonzero probability. Without such perturbations, the graph is the closure of the union of a finite collection of its subsets, each diffeomorphic to a corresponding path-connected open subset of the space of payoffs. As an application, we construct an algorithm for computing equilibria of an extensive-form game with a perturbed strategy space, and thus approximate equilibria of the unperturbed game.
RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection.

PubMed

Wu, Ke; Zhang, Kun; Fan, Wei; Edwards, Andrea; Yu, Philip S

Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request.
RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection

PubMed Central

Wu, Ke; Zhang, Kun; Fan, Wei; Edwards, Andrea; Yu, Philip S.

2015-01-01

Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request. PMID:25685112

GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems

NASA Astrophysics Data System (ADS)

Fukushige, Toshiyuki; Makino, Junichiro; Kawai, Atsushi

2005-12-01

In this paper, we describe the design and performance of GRAPE-6A, a special-purpose computer for gravitational many-body simulations. It was designed to be used with a PC cluster, in which each node has one GRAPE-6A. Such a configuration is particularly cost-effective in running parallel tree algorithms. Though the use of parallel tree algorithms was possible with the original GRAPE-6 hardware, it was not very cost-effective since a single GRAPE-6 board was still too fast and too expensive. Therefore, we designed GRAPE-6A as a single PCI card to minimize the reproduction cost and to optimize the computing speed. The peak performance is 130 Gflops for one GRAPE-6A board and 3.1 Tflops for our 24 node cluster. We describe the implementation of the tree, TreePM and individual timestep algorithms on both a single GRAPE-6A system and GRAPE-6A cluster. Using the tree algorithm on our 16-node GRAPE-6A system, we can complete a collisionless simulation with 100 million particles (8000 steps) within 10 days.
From learning taxonomies to phylogenetic learning: integration of 16S rRNA gene data into FAME-based bacterial classification.

PubMed

Slabbinck, Bram; Waegeman, Willem; Dawyndt, Peter; De Vos, Paul; De Baets, Bernard

2010-01-30

Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context.
From learning taxonomies to phylogenetic learning: Integration of 16S rRNA gene data into FAME-based bacterial classification

PubMed Central

2010-01-01

Background Machine learning techniques have shown to improve bacterial species classification based on fatty acid methyl ester (FAME) data. Nonetheless, FAME analysis has a limited resolution for discrimination of bacteria at the species level. In this paper, we approach the species classification problem from a taxonomic point of view. Such a taxonomy or tree is typically obtained by applying clustering algorithms on FAME data or on 16S rRNA gene data. The knowledge gained from the tree can then be used to evaluate FAME-based classifiers, resulting in a novel framework for bacterial species classification. Results In view of learning in a taxonomic framework, we consider two types of trees. First, a FAME tree is constructed with a supervised divisive clustering algorithm. Subsequently, based on 16S rRNA gene sequence analysis, phylogenetic trees are inferred by the NJ and UPGMA methods. In this second approach, the species classification problem is based on the combination of two different types of data. Herein, 16S rRNA gene sequence data is used for phylogenetic tree inference and the corresponding binary tree splits are learned based on FAME data. We call this learning approach 'phylogenetic learning'. Supervised Random Forest models are developed to train the classification tasks in a stratified cross-validation setting. In this way, better classification results are obtained for species that are typically hard to distinguish by a single or flat multi-class classification model. Conclusions FAME-based bacterial species classification is successfully evaluated in a taxonomic framework. Although the proposed approach does not improve the overall accuracy compared to flat multi-class classification, it has some distinct advantages. First, it has better capabilities for distinguishing species on which flat multi-class classification fails. Secondly, the hierarchical classification structure allows to easily evaluate and visualize the resolution of FAME data for the discrimination of bacterial species. Summarized, by phylogenetic learning we are able to situate and evaluate FAME-based bacterial species classification in a more informative context. PMID:20113515
Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)

DOE Office of Scientific and Technical Information (OSTI.GOV)

The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.
Binary Decision Trees for Preoperative Periapical Cyst Screening Using Cone-beam Computed Tomography.

PubMed

Pitcher, Brandon; Alaqla, Ali; Noujeim, Marcel; Wealleans, James A; Kotsakis, Georgios; Chrepa, Vanessa

2017-03-01

Cone-beam computed tomographic (CBCT) analysis allows for 3-dimensional assessment of periradicular lesions and may facilitate preoperative periapical cyst screening. The purpose of this study was to develop and assess the predictive validity of a cyst screening method based on CBCT volumetric analysis alone or combined with designated radiologic criteria. Three independent examiners evaluated 118 presurgical CBCT scans from cases that underwent apicoectomies and had an accompanying gold standard histopathological diagnosis of either a cyst or granuloma. Lesion volume, density, and specific radiologic characteristics were assessed using specialized software. Logistic regression models with histopathological diagnosis as the dependent variable were constructed for cyst prediction, and receiver operating characteristic curves were used to assess the predictive validity of the models. A conditional inference binary decision tree based on a recursive partitioning algorithm was constructed to facilitate preoperative screening. Interobserver agreement was excellent for volume and density, but it varied from poor to good for the radiologic criteria. Volume and root displacement were strong predictors for cyst screening in all analyses. The binary decision tree classifier determined that if the volume of the lesion was >247 mm 3 , there was 80% probability of a cyst. If volume was <247 mm 3 and root displacement was present, cyst probability was 60% (78% accuracy). The good accuracy and high specificity of the decision tree classifier renders it a useful preoperative cyst screening tool that can aid in clinical decision making but not a substitute for definitive histopathological diagnosis after biopsy. Confirmatory studies are required to validate the present findings. Published by Elsevier Inc.
Enumeration of spanning trees in planar unclustered networks

NASA Astrophysics Data System (ADS)

Xiao, Yuzhi; Zhao, Haixing; Hu, Guona; Ma, Xiujuan

2014-07-01

Among a variety of subgraphs, spanning trees are one of the most important and fundamental categories. They are relevant to diverse aspects of networks, including reliability, transport, self-organized criticality, loop-erased random walks and so on. In this paper, we introduce a family of modular, self-similar planar networks with zero clustering. Relevant properties of this family are comparable to those networks associated with technological systems having low clustering, like power grids, some electronic circuits, the Internet and some biological systems. So, it is very significant to research on spanning trees of planar networks. However, for a large network, evaluating the relevant determinant is intractable. In this paper, we propose a fairly generic linear algorithm for counting the number of spanning trees of a planar network. Using the algorithm, we derive analytically the exact numbers of spanning trees in planar networks. Our result shows that the computational complexity is O(t) , which is better than that of the matrix tree theorem with O(m2t2) , where t is the number of steps and m is the girth of the planar network. We also obtain the entropy for the spanning trees of a given planar network. We find that the entropy of spanning trees in the studied network is small, which is in sharp contrast to the previous result for planar networks with the same average degree. We also determine an upper bound and a lower bound for the numbers of spanning trees in the family of planar networks by the algorithm. As another application of the algorithm, we give a formula for the number of spanning trees in an outerplanar network with small-world features.
Phylogenetic Analyses of Meloidogyne Small Subunit rDNA.

PubMed

De Ley, Irma Tandingan; De Ley, Paul; Vierstraete, Andy; Karssen, Gerrit; Moens, Maurice; Vanfleteren, Jacques

2002-12-01

Phylogenies were inferred from nearly complete small subunit (SSU) 18S rDNA sequences of 12 species of Meloidogyne and 4 outgroup taxa (Globodera pallida, Nacobbus abberans, Subanguina radicicola, and Zygotylenchus guevarai). Alignments were generated manually from a secondary structure model, and computationally using ClustalX and Treealign. Trees were constructed using distance, parsimony, and likelihood algorithms in PAUP* 4.0b4a. Obtained tree topologies were stable across algorithms and alignments, supporting 3 clades: clade I = [M. incognita (M. javanica, M. arenaria)]; clade II = M. duytsi and M. maritima in an unresolved trichotomy with (M. hapla, M. microtyla); and clade III = (M. exigua (M. graminicola, M. chitwoodi)). Monophyly of [(clade I, clade II) clade III] was given maximal bootstrap support (mbs). M. artiellia was always a sister taxon to this joint clade, while M. ichinohei was consistently placed with mbs as a basal taxon within the genus. Affinities with the outgroup taxa remain unclear, although G. pallida and S. radicicola were never placed as closest relatives of Meloidogyne. Our results show that SSU sequence data are useful in addressing deeper phylogeny within Meloidogyne, and that both M. ichinohei and M. artiellia are credible outgroups for phylogenetic analysis of speciations among the major species.
Phylogenetic Analyses of Meloidogyne Small Subunit rDNA

PubMed Central

De Ley, Irma Tandingan; De Ley, Paul; Vierstraete, Andy; Karssen, Gerrit; Moens, Maurice; Vanfleteren, Jacques

2002-01-01

Phylogenies were inferred from nearly complete small subunit (SSU) 18S rDNA sequences of 12 species of Meloidogyne and 4 outgroup taxa (Globodera pallida, Nacobbus abberans, Subanguina radicicola, and Zygotylenchus guevarai). Alignments were generated manually from a secondary structure model, and computationally using ClustalX and Treealign. Trees were constructed using distance, parsimony, and likelihood algorithms in PAUP* 4.0b4a. Obtained tree topologies were stable across algorithms and alignments, supporting 3 clades: clade I = [M. incognita (M. javanica, M. arenaria)]; clade II = M. duytsi and M. maritima in an unresolved trichotomy with (M. hapla, M. microtyla); and clade III = (M. exigua (M. graminicola, M. chitwoodi)). Monophyly of [(clade I, clade II) clade III] was given maximal bootstrap support (mbs). M. artiellia was always a sister taxon to this joint clade, while M. ichinohei was consistently placed with mbs as a basal taxon within the genus. Affinities with the outgroup taxa remain unclear, although G. pallida and S. radicicola were never placed as closest relatives of Meloidogyne. Our results show that SSU sequence data are useful in addressing deeper phylogeny within Meloidogyne, and that both M. ichinohei and M. artiellia are credible outgroups for phylogenetic analysis of speciations among the major species. PMID:19265950
Blooming Trees: Substructures and Surrounding Groups of Galaxy Clusters

NASA Astrophysics Data System (ADS)

Yu, Heng; Diaferio, Antonaldo; Serra, Ana Laura; Baldi, Marco

2018-06-01

We develop the Blooming Tree Algorithm, a new technique that uses spectroscopic redshift data alone to identify the substructures and the surrounding groups of galaxy clusters, along with their member galaxies. Based on the estimated binding energy of galaxy pairs, the algorithm builds a binary tree that hierarchically arranges all of the galaxies in the field of view. The algorithm searches for buds, corresponding to gravitational potential minima on the binary tree branches; for each bud, the algorithm combines the number of galaxies, their velocity dispersion, and their average pairwise distance into a parameter that discriminates between the buds that do not correspond to any substructure or group, and thus eventually die, and the buds that correspond to substructures and groups, and thus bloom into the identified structures. We test our new algorithm with a sample of 300 mock redshift surveys of clusters in different dynamical states; the clusters are extracted from a large cosmological N-body simulation of a ΛCDM model. We limit our analysis to substructures and surrounding groups identified in the simulation with mass larger than 1013 h ‑1 M ⊙. With mock redshift surveys with 200 galaxies within 6 h ‑1 Mpc from the cluster center, the technique recovers 80% of the real substructures and 60% of the surrounding groups; in 57% of the identified structures, at least 60% of the member galaxies of the substructures and groups belong to the same real structure. These results improve by roughly a factor of two the performance of the best substructure identification algorithm currently available, the σ plateau algorithm, and suggest that our Blooming Tree Algorithm can be an invaluable tool for detecting substructures of galaxy clusters and investigating their complex dynamics.
Adversarial search by evolutionary computation.

PubMed

Hong, T P; Huang, K Y; Lin, W Y

2001-01-01

In this paper, we consider the problem of finding good next moves in two-player games. Traditional search algorithms, such as minimax and alpha-beta pruning, suffer great temporal and spatial expansion when exploring deeply into search trees to find better next moves. The evolution of genetic algorithms with the ability to find global or near global optima in limited time seems promising, but they are inept at finding compound optima, such as the minimax in a game-search tree. We thus propose a new genetic algorithm-based approach that can find a good next move by reserving the board evaluation values of new offspring in a partial game-search tree. Experiments show that solution accuracy and search speed are greatly improved by our algorithm.
Using laser altimetry-based segmentation to refine automated tree identification in managed forests of the Black Hills, South Dakota

Treesearch

Eric Rowell; Carl Selelstad; Lee Vierling; Lloyd Queen; Wayne Sheppard

2006-01-01

The success of a local maximum (LM) tree detection algorithm for detecting individual trees from lidar data depends on stand conditions that are often highly variable. A laser height variance and percent canopy cover (PCC) classification is used to segment the landscape by stand condition prior to stem detection. We test the performance of the LM algorithm using canopy...
3D Forest: An application for descriptions of three-dimensional forest structures using terrestrial LiDAR

PubMed Central

Krůček, Martin; Vrška, Tomáš; Král, Kamil

2017-01-01

Terrestrial laser scanning is a powerful technology for capturing the three-dimensional structure of forests with a high level of detail and accuracy. Over the last decade, many algorithms have been developed to extract various tree parameters from terrestrial laser scanning data. Here we present 3D Forest, an open-source non-platform-specific software application with an easy-to-use graphical user interface with the compilation of algorithms focused on the forest environment and extraction of tree parameters. The current version (0.42) extracts important parameters of forest structure from the terrestrial laser scanning data, such as stem positions (X, Y, Z), tree heights, diameters at breast height (DBH), as well as more advanced parameters such as tree planar projections, stem profiles or detailed crown parameters including convex and concave crown surface and volume. Moreover, 3D Forest provides quantitative measures of between-crown interactions and their real arrangement in 3D space. 3D Forest also includes an original algorithm of automatic tree segmentation and crown segmentation. Comparison with field data measurements showed no significant difference in measuring DBH or tree height using 3D Forest, although for DBH only the Randomized Hough Transform algorithm proved to be sufficiently resistant to noise and provided results comparable to traditional field measurements. PMID:28472167
EDNA: Expert fault digraph analysis using CLIPS

NASA Technical Reports Server (NTRS)

Dixit, Vishweshwar V.

1990-01-01

Traditionally fault models are represented by trees. Recently, digraph models have been proposed (Sack). Digraph models closely imitate the real system dependencies and hence are easy to develop, validate and maintain. However, they can also contain directed cycles and analysis algorithms are hard to find. Available algorithms tend to be complicated and slow. On the other hand, the tree analysis (VGRH, Tayl) is well understood and rooted in vast research effort and analytical techniques. The tree analysis algorithms are sophisticated and orders of magnitude faster. Transformation of a digraph (cyclic) into trees (CLP, LP) is a viable approach to blend the advantages of the representations. Neither the digraphs nor the trees provide the ability to handle heuristic knowledge. An expert system, to capture the engineering knowledge, is essential. We propose an approach here, namely, expert network analysis. We combine the digraph representation and tree algorithms. The models are augmented by probabilistic and heuristic knowledge. CLIPS, an expert system shell from NASA-JSC will be used to develop a tool. The technique provides the ability to handle probabilities and heuristic knowledge. Mixed analysis, some nodes with probabilities, is possible. The tool provides graphics interface for input, query, and update. With the combined approach it is expected to be a valuable tool in the design process as well in the capture of final design knowledge.
Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study.

PubMed

Khalkhali, Hamid Reza; Lotfnezhad Afshar, Hadi; Esnaashari, Omid; Jabbari, Nasrollah

2016-01-01

Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.
Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

DOEpatents

Cantu-Paz, Erick [Oakland, CA; Kamath, Chandrika [Tracy, CA

2006-06-13

A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.
[Introduction and advantage analysis of the stepwise method for the construction of vascular trees].

PubMed

Zhang, Yan; Xie, Haiwei; Zhu, Kai

2010-08-01

A new method for constructing the model of vascular trees was proposed in this paper. By use of this method, the arterial trees in good agreement with the actual structure could be grown. In this process, all vessels in the vascular tree were divided into two groups: the conveying vessels, and the delivering branches. And different branches could be built by different ways. Firstly, the distributing rules of conveying vessels were ascertained by use of measurement data, and then the conveying vessels were constructed in accordance to the statistical rule and optimization criterion. Lastly, delivering branches were modeled by constrained constructive optimization (CCO) on the conveying vessel-trees which had already been generated. In order to compare the CCO method and stepwise method proposed here, two 3D arterial trees of human tongue were grown with their vascular tree having a special structure. Based on the corrosion casts of real arterial tree of human tongue, the data about the two trees constructed by different methods were compared and analyzed, including the averaged segment diameters at respective levels, the distribution and the diameters of the branches of first level at respective directions. The results show that the vascular tree built by stepwise method is more similar to the true arterial of human tongue when compared against the tree built by CCO method.
State estimation of spatio-temporal phenomena

NASA Astrophysics Data System (ADS)

Yu, Dan

This dissertation addresses the state estimation problem of spatio-temporal phenomena which can be modeled by partial differential equations (PDEs), such as pollutant dispersion in the atmosphere. After discretizing the PDE, the dynamical system has a large number of degrees of freedom (DOF). State estimation using Kalman Filter (KF) is computationally intractable, and hence, a reduced order model (ROM) needs to be constructed first. Moreover, the nonlinear terms, external disturbances or unknown boundary conditions can be modeled as unknown inputs, which leads to an unknown input filtering problem. Furthermore, the performance of KF could be improved by placing sensors at feasible locations. Therefore, the sensor scheduling problem to place multiple mobile sensors is of interest. The first part of the dissertation focuses on model reduction for large scale systems with a large number of inputs/outputs. A commonly used model reduction algorithm, the balanced proper orthogonal decomposition (BPOD) algorithm, is not computationally tractable for large systems with a large number of inputs/outputs. Inspired by the BPOD and randomized algorithms, we propose a randomized proper orthogonal decomposition (RPOD) algorithm and a computationally optimal RPOD (RPOD*) algorithm, which construct an ROM to capture the input-output behaviour of the full order model, while reducing the computational cost of BPOD by orders of magnitude. It is demonstrated that the proposed RPOD* algorithm could construct the ROM in real-time, and the performance of the proposed algorithms on different advection-diffusion equations. Next, we consider the state estimation problem of linear discrete-time systems with unknown inputs which can be treated as a wide-sense stationary process with rational power spectral density, while no other prior information needs to be known. We propose an autoregressive (AR) model based unknown input realization technique which allows us to recover the input statistics from the output data by solving an appropriate least squares problem, then fit an AR model to the recovered input statistics and construct an innovations model of the unknown inputs using the eigensystem realization algorithm. The proposed algorithm outperforms the augmented two-stage Kalman Filter (ASKF) and the unbiased minimum-variance (UMV) algorithm are shown in several examples. Finally, we propose a framework to place multiple mobile sensors to optimize the long-term performance of KF in the estimation of the state of a PDE. The major challenges are that placing multiple sensors is an NP-hard problem, and the optimization problem is non-convex in general. In this dissertation, first, we construct an ROM using RPOD* algorithm, and then reduce the feasible sensor locations into a subset using the ROM. The Information Space Receding Horizon Control (I-RHC) approach and a modified Monte Carlo Tree Search (MCTS) approach are applied to solve the sensor scheduling problem using the subset. Various applications have been provided to demonstrate the performance of the proposed approach.
Traveling front solutions to directed diffusion-limited aggregation, digital search trees, and the Lempel-Ziv data compression algorithm.

PubMed

Majumdar, Satya N

2003-08-01

We use the traveling front approach to derive exact asymptotic results for the statistics of the number of particles in a class of directed diffusion-limited aggregation models on a Cayley tree. We point out that some aspects of these models are closely connected to two different problems in computer science, namely, the digital search tree problem in data structures and the Lempel-Ziv algorithm for data compression. The statistics of the number of particles studied here is related to the statistics of height in digital search trees which, in turn, is related to the statistics of the length of the longest word formed by the Lempel-Ziv algorithm. Implications of our results to these computer science problems are pointed out.
Traveling front solutions to directed diffusion-limited aggregation, digital search trees, and the Lempel-Ziv data compression algorithm

NASA Astrophysics Data System (ADS)

Majumdar, Satya N.

2003-08-01

We use the traveling front approach to derive exact asymptotic results for the statistics of the number of particles in a class of directed diffusion-limited aggregation models on a Cayley tree. We point out that some aspects of these models are closely connected to two different problems in computer science, namely, the digital search tree problem in data structures and the Lempel-Ziv algorithm for data compression. The statistics of the number of particles studied here is related to the statistics of height in digital search trees which, in turn, is related to the statistics of the length of the longest word formed by the Lempel-Ziv algorithm. Implications of our results to these computer science problems are pointed out.
A uniform energy consumption algorithm for wireless sensor and actuator networks based on dynamic polling point selection.

PubMed

Li, Shuo; Peng, Jun; Liu, Weirong; Zhu, Zhengfa; Lin, Kuo-Chi

2013-12-19

Recent research has indicated that using the mobility of the actuator in wireless sensor and actuator networks (WSANs) to achieve mobile data collection can greatly increase the sensor network lifetime. However, mobile data collection may result in unacceptable collection delays in the network if the path of the actuator is too long. Because real-time network applications require meeting data collection delay constraints, planning the path of the actuator is a very important issue to balance the prolongation of the network lifetime and the reduction of the data collection delay. In this paper, a multi-hop routing mobile data collection algorithm is proposed based on dynamic polling point selection with delay constraints to address this issue. The algorithm can actively update the selection of the actuator's polling points according to the sensor nodes' residual energies and their locations while also considering the collection delay constraint. It also dynamically constructs the multi-hop routing trees rooted by these polling points to balance the sensor node energy consumption and the extension of the network lifetime. The effectiveness of the algorithm is validated by simulation.

A Semi-Automated Machine Learning Algorithm for Tree Cover Delineation from 1-m Naip Imagery Using a High Performance Computing Architecture

NASA Astrophysics Data System (ADS)

Basu, S.; Ganguly, S.; Nemani, R. R.; Mukhopadhyay, S.; Milesi, C.; Votava, P.; Michaelis, A.; Zhang, G.; Cook, B. D.; Saatchi, S. S.; Boyda, E.

2014-12-01

Accurate tree cover delineation is a useful instrument in the derivation of Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) satellite imagery data. Numerous algorithms have been designed to perform tree cover delineation in high to coarse resolution satellite imagery, but most of them do not scale to terabytes of data, typical in these VHR datasets. In this paper, we present an automated probabilistic framework for the segmentation and classification of 1-m VHR data as obtained from the National Agriculture Imagery Program (NAIP) for deriving tree cover estimates for the whole of Continental United States, using a High Performance Computing Architecture. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field (CRF), which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by incorporating expert knowledge through the relabeling of misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the state of California, which covers a total of 11,095 NAIP tiles and spans a total geographical area of 163,696 sq. miles. Our framework produced correct detection rates of around 85% for fragmented forests and 70% for urban tree cover areas, with false positive rates lower than 3% for both regions. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR high-resolution canopy height model shows the effectiveness of our algorithm in generating accurate high-resolution tree cover maps.
A New Algorithm Using the Non-Dominated Tree to Improve Non-Dominated Sorting.

PubMed

Gustavsson, Patrik; Syberfeldt, Anna

2018-01-01

Non-dominated sorting is a technique often used in evolutionary algorithms to determine the quality of solutions in a population. The most common algorithm is the Fast Non-dominated Sort (FNS). This algorithm, however, has the drawback that its performance deteriorates when the population size grows. The same drawback applies also to other non-dominating sorting algorithms such as the Efficient Non-dominated Sort with Binary Strategy (ENS-BS). An algorithm suggested to overcome this drawback is the Divide-and-Conquer Non-dominated Sort (DCNS) which works well on a limited number of objectives but deteriorates when the number of objectives grows. This article presents a new, more efficient algorithm called the Efficient Non-dominated Sort with Non-Dominated Tree (ENS-NDT). ENS-NDT is an extension of the ENS-BS algorithm and uses a novel Non-Dominated Tree (NDTree) to speed up the non-dominated sorting. ENS-NDT is able to handle large population sizes and a large number of objectives more efficiently than existing algorithms for non-dominated sorting. In the article, it is shown that with ENS-NDT the runtime of multi-objective optimization algorithms such as the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) can be substantially reduced.
New Hybrid Algorithms for Estimating Tree Stem Diameters at Breast Height Using a Two Dimensional Terrestrial Laser Scanner

PubMed Central

Kong, Jianlei; Ding, Xiaokang; Liu, Jinhao; Yan, Lei; Wang, Jianli

2015-01-01

In this paper, a new algorithm to improve the accuracy of estimating diameter at breast height (DBH) for tree trunks in forest areas is proposed. First, the information is collected by a two-dimensional terrestrial laser scanner (2DTLS), which emits laser pulses to generate a point cloud. After extraction and filtration, the laser point clusters of the trunks are obtained, which are optimized by an arithmetic means method. Then, an algebraic circle fitting algorithm in polar form is non-linearly optimized by the Levenberg-Marquardt method to form a new hybrid algorithm, which is used to acquire the diameters and positions of the trees. Compared with previous works, this proposed method improves the accuracy of diameter estimation of trees significantly and effectively reduces the calculation time. Moreover, the experimental results indicate that this method is stable and suitable for the most challenging conditions, which has practical significance in improving the operating efficiency of forest harvester and reducing the risk of causing accidents. PMID:26147726
Efficient Delaunay Tessellation through K-D Tree Decomposition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

Delaunay tessellations are fundamental data structures in computational geometry. They are important in data analysis, where they can represent the geometry of a point set or approximate its density. The algorithms for computing these tessellations at scale perform poorly when the input data is unbalanced. We investigate the use of k-d trees to evenly distribute points among processes and compare two strategies for picking split points between domain regions. Because resulting point distributions no longer satisfy the assumptions of existing parallel Delaunay algorithms, we develop a new parallel algorithm that adapts to its input and prove its correctness. We evaluatemore » the new algorithm using two late-stage cosmology datasets. The new running times are up to 50 times faster using k-d tree compared with regular grid decomposition. Moreover, in the unbalanced data sets, decomposing the domain into a k-d tree is up to five times faster than decomposing it into a regular grid.« less
Recursive optimal pruning with applications to tree structured vector quantizers

NASA Technical Reports Server (NTRS)

Kiang, Shei-Zein; Baker, Richard L.; Sullivan, Gary J.; Chiu, Chung-Yen

1992-01-01

A pruning algorithm of Chou et al. (1989) for designing optimal tree structures identifies only those codebooks which lie on the convex hull of the original codebook's operational distortion rate function. The authors introduce a modified version of the original algorithm, which identifies a large number of codebooks having minimum average distortion, under the constraint that, in each step, only modes having no descendents are removed from the tree. All codebooks generated by the original algorithm are also generated by this algorithm. The new algorithm generates a much larger number of codebooks in the middle- and low-rate regions. The additional codebooks permit operation near the codebook's operational distortion rate function without time sharing by choosing from the increased number of available bit rates. Despite the statistical mismatch which occurs when coding data outside the training sequence, these pruned codebooks retain their performance advantage over full search vector quantizers (VQs) for a large range of rates.
A faster 1.375-approximation algorithm for sorting by transpositions.

PubMed

Cunha, Luís Felipe I; Kowada, Luis Antonio B; Hausen, Rodrigo de A; de Figueiredo, Celina M H

2015-11-01

Sorting by Transpositions is an NP-hard problem for which several polynomial-time approximation algorithms have been developed. Hartman and Shamir (2006) developed a 1.5-approximation [Formula: see text] algorithm, whose running time was improved to O(nlogn) by Feng and Zhu (2007) with a data structure they defined, the permutation tree. Elias and Hartman (2006) developed a 1.375-approximation O(n(2)) algorithm, and Firoz et al. (2011) claimed an improvement to the running time, from O(n(2)) to O(nlogn), by using the permutation tree. We provide counter-examples to the correctness of Firoz et al.'s strategy, showing that it is not possible to reach a component by sufficient extensions using the method proposed by them. In addition, we propose a 1.375-approximation algorithm, modifying Elias and Hartman's approach with the use of permutation trees and achieving O(nlogn) time.
Application of the pessimistic pruning to increase the accuracy of C4.5 algorithm in diagnosing chronic kidney disease

NASA Astrophysics Data System (ADS)

Muslim, M. A.; Herowati, A. J.; Sugiharti, E.; Prasetiyo, B.

2018-03-01

A technique to dig valuable information buried or hidden in data collection which is so big to be found an interesting patterns that was previously unknown is called data mining. Data mining has been applied in the healthcare industry. One technique used data mining is classification. The decision tree included in the classification of data mining and algorithm developed by decision tree is C4.5 algorithm. A classifier is designed using applying pessimistic pruning in C4.5 algorithm in diagnosing chronic kidney disease. Pessimistic pruning use to identify and remove branches that are not needed, this is done to avoid overfitting the decision tree generated by the C4.5 algorithm. In this paper, the result obtained using these classifiers are presented and discussed. Using pessimistic pruning shows increase accuracy of C4.5 algorithm of 1.5% from 95% to 96.5% in diagnosing of chronic kidney disease.
Minimum spanning tree filtering of correlations for varying time scales and size of fluctuations

NASA Astrophysics Data System (ADS)

Kwapień, Jarosław; Oświecimka, Paweł; Forczek, Marcin; DroŻdŻ, Stanisław

2017-05-01

Based on a recently proposed q -dependent detrended cross-correlation coefficient, ρq [J. Kwapień, P. Oświęcimka, and S. Drożdż, Phys. Rev. E 92, 052815 (2015), 10.1103/PhysRevE.92.052815], we generalize the concept of the minimum spanning tree (MST) by introducing a family of q -dependent minimum spanning trees (q MST s ) that are selective to cross-correlations between different fluctuation amplitudes and different time scales of multivariate data. They inherit this ability directly from the coefficients ρq, which are processed here to construct a distance matrix being the input to the MST-constructing Kruskal's algorithm. The conventional MST with detrending corresponds in this context to q =2 . In order to illustrate their performance, we apply the q MSTs to sample empirical data from the American stock market and discuss the results. We show that the q MST graphs can complement ρq in disentangling "hidden" correlations that cannot be observed in the MST graphs based on ρDCCA, and therefore, they can be useful in many areas where the multivariate cross-correlations are of interest. As an example, we apply this method to empirical data from the stock market and show that by constructing the q MSTs for a spectrum of q values we obtain more information about the correlation structure of the data than by using q =2 only. More specifically, we show that two sets of signals that differ from each other statistically can give comparable trees for q =2 , while only by using the trees for q ≠2 do we become able to distinguish between these sets. We also show that a family of q MSTs for a range of q expresses the diversity of correlations in a manner resembling the multifractal analysis, where one computes a spectrum of the generalized fractal dimensions, the generalized Hurst exponents, or the multifractal singularity spectra: the more diverse the correlations are, the more variable the tree topology is for different q 's. As regards the correlation structure of the stock market, our analysis exhibits that the stocks belonging to the same or similar industrial sectors are correlated via the fluctuations of moderate amplitudes, while the largest fluctuations often happen to synchronize in those stocks that do not necessarily belong to the same industry.
Hybrid method based on singular value decomposition and embedded zero tree wavelet technique for ECG signal compression.

PubMed

Kumar, Ranjeet; Kumar, A; Singh, G K

2016-06-01

In the field of biomedical, it becomes necessary to reduce data quantity due to the limitation of storage in real-time ambulatory system and telemedicine system. Research has been underway since very beginning for the development of an efficient and simple technique for longer term benefits. This paper, presents an algorithm based on singular value decomposition (SVD), and embedded zero tree wavelet (EZW) techniques for ECG signal compression which deals with the huge data of ambulatory system. The proposed method utilizes the low rank matrix for initial compression on two dimensional (2-D) ECG data array using SVD, and then EZW is initiated for final compression. Initially, 2-D array construction has key issue for the proposed technique in pre-processing. Here, three different beat segmentation approaches have been exploited for 2-D array construction using segmented beat alignment with exploitation of beat correlation. The proposed algorithm has been tested on MIT-BIH arrhythmia record, and it was found that it is very efficient in compression of different types of ECG signal with lower signal distortion based on different fidelity assessments. The evaluation results illustrate that the proposed algorithm has achieved the compression ratio of 24.25:1 with excellent quality of signal reconstruction in terms of percentage-root-mean square difference (PRD) as 1.89% for ECG signal Rec. 100 and consumes only 162bps data instead of 3960bps uncompressed data. The proposed method is efficient and flexible with different types of ECG signal for compression, and controls quality of reconstruction. Simulated results are clearly illustrate the proposed method can play a big role to save the memory space of health data centres as well as save the bandwidth in telemedicine based healthcare systems. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
C-semiring Frameworks for Minimum Spanning Tree Problems

NASA Astrophysics Data System (ADS)

Bistarelli, Stefano; Santini, Francesco

In this paper we define general algebraic frameworks for the Minimum Spanning Tree problem based on the structure of c-semirings. We propose general algorithms that can compute such trees by following different cost criteria, which must be all specific instantiation of c-semirings. Our algorithms are extensions of well-known procedures, as Prim or Kruskal, and show the expressivity of these algebraic structures. They can deal also with partially-ordered costs on the edges.
A comparison of common programming languages used in bioinformatics.

PubMed

Fourment, Mathieu; Gillings, Michael R

2008-02-05

The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from http://www.bioinformatics.org/benchmark/. This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.
A comparison of common programming languages used in bioinformatics

PubMed Central

Fourment, Mathieu; Gillings, Michael R

2008-01-01

Background The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Results Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from Conclusion This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language. PMID:18251993
Merging tree ring chronologies and climate system model simulated temperature by optimal interpolation algorithm in North America

NASA Astrophysics Data System (ADS)

Chen, Xin; Xing, Pei; Luo, Yong; Zhao, Zongci; Nie, Suping; Huang, Jianbin; Wang, Shaowu; Tian, Qinhua

2015-04-01

A new dataset of annual mean surface temperature has been constructed over North America in recent 500 years by performing optimal interpolation (OI) algorithm. Totally, 149 series totally were screened out including 69 tree ring width (MXD) and 80 tree ring width (TRW) chronologies are screened from International Tree Ring Data Bank (ITRDB). The simulated annual mean surface temperature derives from the past1000 experiment results of Community Climate System Model version 4 (CCSM4). Different from existing research that applying data assimilation approach to (General Circulation Models) GCMs simulation, the errors of both the climate model simulation and tree ring reconstruction were considered, with a view to combining the two parts in an optimal way. Variance matching (VM) was employed to calibrate tree ring chronologies on CRUTEM4v, and corresponding errors were estimated through leave-one-out process. Background error covariance matrix was estimated from samples of simulation results in a running 30-year window in a statistical way. Actually, the background error covariance matrix was calculated locally within the scanning range (2000km in this research). Thus, the merging process continued with a time-varying local gain matrix. The merging method (MM) was tested by two kinds of experiments, and the results indicated standard deviation of errors can be reduced by about 0.3 degree centigrade lower than tree ring reconstructions and 0.5 degree centigrade lower than model simulation. During the recent Obvious decadal variability can be identified in MM results including the evident cooling (0.10 degree per decade) in 1940-60s, wherein the model simulation exhibit a weak increasing trend (0.05 degree per decade) instead. MM results revealed a compromised spatial pattern of the linear trend of surface temperature during a typical period (1601-1800 AD) in Little Ice Age, which basically accorded with the phase transitions of the Pacific decadal oscillation (PDO) and Atlantic multi-decadal oscillation (AMO). Through the empirical orthogonal functions and power spectrum analysis, it was demonstrated that, compared with the pure simulations of CCSM4, MM made significant improvement of decadal variability for the gridded temperature in North America by merging the temperature-sensitive tree ring records.
Parallel peak pruning for scalable SMP contour tree computation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.

As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
[Prediction of regional soil quality based on mutual information theory integrated with decision tree algorithm].

PubMed

Lin, Fen-Fang; Wang, Ke; Yang, Ning; Yan, Shi-Guang; Zheng, Xin-Yu

2012-02-01

In this paper, some main factors such as soil type, land use pattern, lithology type, topography, road, and industry type that affect soil quality were used to precisely obtain the spatial distribution characteristics of regional soil quality, mutual information theory was adopted to select the main environmental factors, and decision tree algorithm See 5.0 was applied to predict the grade of regional soil quality. The main factors affecting regional soil quality were soil type, land use, lithology type, distance to town, distance to water area, altitude, distance to road, and distance to industrial land. The prediction accuracy of the decision tree model with the variables selected by mutual information was obviously higher than that of the model with all variables, and, for the former model, whether of decision tree or of decision rule, its prediction accuracy was all higher than 80%. Based on the continuous and categorical data, the method of mutual information theory integrated with decision tree could not only reduce the number of input parameters for decision tree algorithm, but also predict and assess regional soil quality effectively.
SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data.

PubMed

Lee, Tae-Ho; Guo, Hui; Wang, Xiyin; Kim, Changsoo; Paterson, Andrew H

2014-02-26

Phylogenetic trees are widely used for genetic and evolutionary studies in various organisms. Advanced sequencing technology has dramatically enriched data available for constructing phylogenetic trees based on single nucleotide polymorphisms (SNPs). However, massive SNP data makes it difficult to perform reliable analysis, and there has been no ready-to-use pipeline to generate phylogenetic trees from these data. We developed a new pipeline, SNPhylo, to construct phylogenetic trees based on large SNP datasets. The pipeline may enable users to construct a phylogenetic tree from three representative SNP data file formats. In addition, in order to increase reliability of a tree, the pipeline has steps such as removing low quality data and considering linkage disequilibrium. A maximum likelihood method for the inference of phylogeny is also adopted in generation of a tree in our pipeline. Using SNPhylo, users can easily produce a reliable phylogenetic tree from a large SNP data file. Thus, this pipeline can help a researcher focus more on interpretation of the results of analysis of voluminous data sets, rather than manipulations necessary to accomplish the analysis.
A template-finding algorithm and a comprehensive benchmark for homology modeling of proteins

PubMed Central

Vallat, Brinda Kizhakke; Pillardy, Jaroslaw; Elber, Ron

2010-01-01

The first step in homology modeling is to identify a template protein for the target sequence. The template structure is used in later phases of the calculation to construct an atomically detailed model for the target. We have built from the Protein Data Bank a large-scale learning set that includes tens of millions of pair matches that can be either a true template or a false one. Discriminatory learning (learning from positive and negative examples) is employed to train a decision tree. Each branch of the tree is a mathematical programming model. The decision tree is tested on an independent set from PDB entries and on the sequences of CASP7. It provides significant enrichment of true templates (between 50-100 percent) when compared to PSI-BLAST. The model is further verified by building atomically detailed structures for each of the tentative true templates with modeller. The probability that a true match does not yield an acceptable structural model (within 6Å RMSD from the native structure), decays linearly as a function of the TM structural-alignment score. PMID:18300226
Inferring species trees from incongruent multi-copy gene trees using the Robinson-Foulds distance

PubMed Central

2013-01-01

Background Constructing species trees from multi-copy gene trees remains a challenging problem in phylogenetics. One difficulty is that the underlying genes can be incongruent due to evolutionary processes such as gene duplication and loss, deep coalescence, or lateral gene transfer. Gene tree estimation errors may further exacerbate the difficulties of species tree estimation. Results We present a new approach for inferring species trees from incongruent multi-copy gene trees that is based on a generalization of the Robinson-Foulds (RF) distance measure to multi-labeled trees (mul-trees). We prove that it is NP-hard to compute the RF distance between two mul-trees; however, it is easy to calculate this distance between a mul-tree and a singly-labeled species tree. Motivated by this, we formulate the RF problem for mul-trees (MulRF) as follows: Given a collection of multi-copy gene trees, find a singly-labeled species tree that minimizes the total RF distance from the input mul-trees. We develop and implement a fast SPR-based heuristic algorithm for the NP-hard MulRF problem. We compare the performance of the MulRF method (available at http://genome.cs.iastate.edu/CBL/MulRF/) with several gene tree parsimony approaches using gene tree simulations that incorporate gene tree error, gene duplications and losses, and/or lateral transfer. The MulRF method produces more accurate species trees than gene tree parsimony approaches. We also demonstrate that the MulRF method infers in minutes a credible plant species tree from a collection of nearly 2,000 gene trees. Conclusions Our new phylogenetic inference method, based on a generalized RF distance, makes it possible to quickly estimate species trees from large genomic data sets. Since the MulRF method, unlike gene tree parsimony, is based on a generic tree distance measure, it is appealing for analyses of genomic data sets, in which many processes such as deep coalescence, recombination, gene duplication and losses as well as phylogenetic error may contribute to gene tree discord. In experiments, the MulRF method estimated species trees accurately and quickly, demonstrating MulRF as an efficient alternative approach for phylogenetic inference from large-scale genomic data sets. PMID:24180377
Identifying pollution sources and predicting urban air quality using ensemble learning methods

NASA Astrophysics Data System (ADS)

Singh, Kunwar P.; Gupta, Shikha; Rai, Premanjali

2013-12-01

In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management.
Study on Ecological Risk Assessment of Guangxi Coastal Zone Based on 3s Technology

NASA Astrophysics Data System (ADS)

Zhong, Z.; Luo, H.; Ling, Z. Y.; Huang, Y.; Ning, W. Y.; Tang, Y. B.; Shao, G. Z.

2018-05-01

This paper takes Guangxi coastal zone as the study area, following the standards of land use type, divides the coastal zone of ecological landscape into seven kinds of natural wetland landscape types such as woodland, farmland, grassland, water, urban land and wetlands. Using TM data of 2000-2015 such 15 years, with the CART decision tree algorithm, for analysis the characteristic of types of landscape's remote sensing image and build decision tree rules of landscape classification to extract information classification. Analyzing of the evolution process of the landscape pattern in Guangxi coastal zone in nearly 15 years, we may understand the distribution characteristics and change rules. Combined with the natural disaster data, we use of landscape index and the related risk interference degree and construct ecological risk evaluation model in Guangxi coastal zone for ecological risk assessment results of Guangxi coastal zone.

An approach for automated fault diagnosis based on a fuzzy decision tree and boundary analysis of a reconstructed phase space.

PubMed

Aydin, Ilhan; Karakose, Mehmet; Akin, Erhan

2014-03-01

Although reconstructed phase space is one of the most powerful methods for analyzing a time series, it can fail in fault diagnosis of an induction motor when the appropriate pre-processing is not performed. Therefore, boundary analysis based a new feature extraction method in phase space is proposed for diagnosis of induction motor faults. The proposed approach requires the measurement of one phase current signal to construct the phase space representation. Each phase space is converted into an image, and the boundary of each image is extracted by a boundary detection algorithm. A fuzzy decision tree has been designed to detect broken rotor bars and broken connector faults. The results indicate that the proposed approach has a higher recognition rate than other methods on the same dataset. © 2013 ISA Published by ISA All rights reserved.
Heterogeneous Compression of Large Collections of Evolutionary Trees.

PubMed

Matthews, Suzanne J

2015-01-01

Compressing heterogeneous collections of trees is an open problem in computational phylogenetics. In a heterogeneous tree collection, each tree can contain a unique set of taxa. An ideal compression method would allow for the efficient archival of large tree collections and enable scientists to identify common evolutionary relationships over disparate analyses. In this paper, we extend TreeZip to compress heterogeneous collections of trees. TreeZip is the most efficient algorithm for compressing homogeneous tree collections. To the best of our knowledge, no other domain-based compression algorithm exists for large heterogeneous tree collections or enable their rapid analysis. Our experimental results indicate that TreeZip averages 89.03 percent (72.69 percent) space savings on unweighted (weighted) collections of trees when the level of heterogeneity in a collection is moderate. The organization of the TRZ file allows for efficient computations over heterogeneous data. For example, consensus trees can be computed in mere seconds. Lastly, combining the TreeZip compressed (TRZ) file with general-purpose compression yields average space savings of 97.34 percent (81.43 percent) on unweighted (weighted) collections of trees. Our results lead us to believe that TreeZip will prove invaluable in the efficient archival of tree collections, and enables scientists to develop novel methods for relating heterogeneous collections of trees.
Exploiting the wavelet structure in compressed sensing MRI.

PubMed

Chen, Chen; Huang, Junzhou

2014-12-01

Sparsity has been widely utilized in magnetic resonance imaging (MRI) to reduce k-space sampling. According to structured sparsity theories, fewer measurements are required for tree sparse data than the data only with standard sparsity. Intuitively, more accurate image reconstruction can be achieved with the same number of measurements by exploiting the wavelet tree structure in MRI. A novel algorithm is proposed in this article to reconstruct MR images from undersampled k-space data. In contrast to conventional compressed sensing MRI (CS-MRI) that only relies on the sparsity of MR images in wavelet or gradient domain, we exploit the wavelet tree structure to improve CS-MRI. This tree-based CS-MRI problem is decomposed into three simpler subproblems then each of the subproblems can be efficiently solved by an iterative scheme. Simulations and in vivo experiments demonstrate the significant improvement of the proposed method compared to conventional CS-MRI algorithms, and the feasibleness on MR data compared to existing tree-based imaging algorithms. Copyright © 2014 Elsevier Inc. All rights reserved.
Data mining for multiagent rules, strategies, and fuzzy decision tree structure

NASA Astrophysics Data System (ADS)

Smith, James F., III; Rhyne, Robert D., II; Fisher, Kristin

2002-03-01

A fuzzy logic based resource manager (RM) has been developed that automatically allocates electronic attack resources in real-time over many dissimilar platforms. Two different data mining algorithms have been developed to determine rules, strategies, and fuzzy decision tree structure. The first data mining algorithm uses a genetic algorithm as a data mining function and is called from an electronic game. The game allows a human expert to play against the resource manager in a simulated battlespace with each of the defending platforms being exclusively directed by the fuzzy resource manager and the attacking platforms being controlled by the human expert or operating autonomously under their own logic. This approach automates the data mining problem. The game automatically creates a database reflecting the domain expert's knowledge. It calls a data mining function, a genetic algorithm, for data mining of the database as required and allows easy evaluation of the information mined in the second step. The criterion for re- optimization is discussed as well as experimental results. Then a second data mining algorithm that uses a genetic program as a data mining function is introduced to automatically discover fuzzy decision tree structures. Finally, a fuzzy decision tree generated through this process is discussed.
Determining Geometric Parameters of Agricultural Trees from Laser Scanning Data Obtained with Unmanned Aerial Vehicle

NASA Astrophysics Data System (ADS)

Hadas, E.; Jozkow, G.; Walicka, A.; Borkowski, A.

2018-05-01

The estimation of dendrometric parameters has become an important issue for agriculture planning and for the efficient management of orchards. Airborne Laser Scanning (ALS) data is widely used in forestry and many algorithms for automatic estimation of dendrometric parameters of individual forest trees were developed. Unfortunately, due to significant differences between forest and fruit trees, some contradictions exist against adopting the achievements of forestry science to agricultural studies indiscriminately. In this study we present the methodology to identify individual trees in apple orchard and estimate heights of individual trees, using high-density LiDAR data (3200 points/m2) obtained with Unmanned Aerial Vehicle (UAV) equipped with Velodyne HDL32-E sensor. The processing strategy combines the alpha-shape algorithm, principal component analysis (PCA) and detection of local minima. The alpha-shape algorithm is used to separate tree rows. In order to separate trees in a single row, we detect local minima on the canopy profile and slice polygons from alpha-shape results. We successfully separated 92 % of trees in the test area. 6 % of trees in orchard were not separated from each other and 2 % were sliced into two polygons. The RMSE of tree heights determined from the point clouds compared to field measurements was equal to 0.09 m, and the correlation coefficient was equal to 0.96. The results confirm the usefulness of LiDAR data from UAV platform in orchard inventory.
Constructing the tree-level Yang-Mills S-matrix using complex factorization

NASA Astrophysics Data System (ADS)

Schuster, Philip C.; Toro, Natalia

2009-06-01

A remarkable connection between BCFW recursion relations and constraints on the S-matrix was made by Benincasa and Cachazo in 0705.4305, who noted that mutual consistency of different BCFW constructions of four-particle amplitudes generates non-trivial (but familiar) constraints on three-particle coupling constants — these include gauge invariance, the equivalence principle, and the lack of non-trivial couplings for spins > 2. These constraints can also be derived with weaker assumptions, by demanding the existence of four-point amplitudes that factorize properly in all unitarity limits with complex momenta. From this starting point, we show that the BCFW prescription can be interpreted as an algorithm for fully constructing a tree-level S-matrix, and that complex factorization of general BCFW amplitudes follows from the factorization of four-particle amplitudes. The allowed set of BCFW deformations is identified, formulated entirely as a statement on the three-particle sector, and using only complex factorization as a guide. Consequently, our analysis based on the physical consistency of the S-matrix is entirely independent of field theory. We analyze the case of pure Yang-Mills, and outline a proof for gravity. For Yang-Mills, we also show that the well-known scaling behavior of BCFW-deformed amplitudes at large z is a simple consequence of factorization. For gravity, factorization in certain channels requires asymptotic behavior ~ 1/z2.
A combined M5P tree and hazard-based duration model for predicting urban freeway traffic accident durations.

PubMed

Lin, Lei; Wang, Qian; Sadek, Adel W

2016-06-01

The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.
Method for estimating potential tree-grade distributions for northeastern forest species

Treesearch

Daniel A. Yaussy; Daniel A. Yaussy

1993-01-01

Generalized logistic regression was used to distribute trees into four potential tree grades for 20 northeastern species groups. The potential tree grade is defined as the tree grade based on the length and amount of clear cuttings and defects only, disregarding minimum grading diameter. The algorithms described use site index and tree diameter as the predictive...
Integrated Approach To Design And Analysis Of Systems

NASA Technical Reports Server (NTRS)

Patterson-Hine, F. A.; Iverson, David L.

1993-01-01

Object-oriented fault-tree representation unifies evaluation of reliability and diagnosis of faults. Programming/fault tree described more fully in "Object-Oriented Algorithm For Evaluation Of Fault Trees" (ARC-12731). Augmented fault tree object contains more information than fault tree object used in quantitative analysis of reliability. Additional information needed to diagnose faults in system represented by fault tree.
A Model of Desired Performance in Phylogenetic Tree Construction for Teaching Evolution.

ERIC Educational Resources Information Center

Brewer, Steven D.

This research paper examines phylogenetic tree construction-a form of problem solving in biology-by studying the strategies and heuristics used by experts. One result of the research is the development of a model of desired performance for phylogenetic tree construction. A detailed description of the model and the sample problems which illustrate…
MDTS: automatic complex materials design using Monte Carlo tree search.

PubMed

M Dieb, Thaer; Ju, Shenghong; Yoshizoe, Kazuki; Hou, Zhufeng; Shiomi, Junichiro; Tsuda, Koji

2017-01-01

Complex materials design is often represented as a black-box combinatorial optimization problem. In this paper, we present a novel python library called MDTS (Materials Design using Tree Search). Our algorithm employs a Monte Carlo tree search approach, which has shown exceptional performance in computer Go game. Unlike evolutionary algorithms that require user intervention to set parameters appropriately, MDTS has no tuning parameters and works autonomously in various problems. In comparison to a Bayesian optimization package, our algorithm showed competitive search efficiency and superior scalability. We succeeded in designing large Silicon-Germanium (Si-Ge) alloy structures that Bayesian optimization could not deal with due to excessive computational cost. MDTS is available at https://github.com/tsudalab/MDTS.
MDTS: automatic complex materials design using Monte Carlo tree search

NASA Astrophysics Data System (ADS)

Dieb, Thaer M.; Ju, Shenghong; Yoshizoe, Kazuki; Hou, Zhufeng; Shiomi, Junichiro; Tsuda, Koji

2017-12-01

Complex materials design is often represented as a black-box combinatorial optimization problem. In this paper, we present a novel python library called MDTS (Materials Design using Tree Search). Our algorithm employs a Monte Carlo tree search approach, which has shown exceptional performance in computer Go game. Unlike evolutionary algorithms that require user intervention to set parameters appropriately, MDTS has no tuning parameters and works autonomously in various problems. In comparison to a Bayesian optimization package, our algorithm showed competitive search efficiency and superior scalability. We succeeded in designing large Silicon-Germanium (Si-Ge) alloy structures that Bayesian optimization could not deal with due to excessive computational cost. MDTS is available at https://github.com/tsudalab/MDTS.
Application of data mining techniques to explore predictors of HCC in Egyptian patients with HCV-related chronic liver disease.

PubMed

Omran, Dalia Abd El Hamid; Awad, AbuBakr Hussein; Mabrouk, Mahasen Abd El Rahman; Soliman, Ahmad Fouad; Aziz, Ashraf Omar Abdel

2015-01-01

Hepatocellular carcinoma (HCC) is the second most common malignancy in Egypt. Data mining is a method of predictive analysis which can explore tremendous volumes of information to discover hidden patterns and relationships. Our aim here was to develop a non-invasive algorithm for prediction of HCC. Such an algorithm should be economical, reliable, easy to apply and acceptable by domain experts. This cross-sectional study enrolled 315 patients with hepatitis C virus (HCV) related chronic liver disease (CLD); 135 HCC, 116 cirrhotic patients without HCC and 64 patients with chronic hepatitis C. Using data mining analysis, we constructed a decision tree learning algorithm to predict HCC. The decision tree algorithm was able to predict HCC with recall (sensitivity) of 83.5% and precession (specificity) of 83.3% using only routine data. The correctly classified instances were 259 (82.2%), and the incorrectly classified instances were 56 (17.8%). Out of 29 attributes, serum alpha fetoprotein (AFP), with an optimal cutoff value of ≥50.3 ng/ml was selected as the best predictor of HCC. To a lesser extent, male sex, presence of cirrhosis, AST>64U/L, and ascites were variables associated with HCC. Data mining analysis allows discovery of hidden patterns and enables the development of models to predict HCC, utilizing routine data as an alternative to CT and liver biopsy. This study has highlighted a new cutoff for AFP (≥50.3 ng/ml). Presence of a score of >2 risk variables (out of 5) can successfully predict HCC with a sensitivity of 96% and specificity of 82%.
Tree-based solvers for adaptive mesh refinement code FLASH - I: gravity and optical depths

NASA Astrophysics Data System (ADS)

Wünsch, R.; Walch, S.; Dinnbier, F.; Whitworth, A.

2018-04-01

We describe an OctTree algorithm for the MPI parallel, adaptive mesh refinement code FLASH, which can be used to calculate the gas self-gravity, and also the angle-averaged local optical depth, for treating ambient diffuse radiation. The algorithm communicates to the different processors only those parts of the tree that are needed to perform the tree-walk locally. The advantage of this approach is a relatively low memory requirement, important in particular for the optical depth calculation, which needs to process information from many different directions. This feature also enables a general tree-based radiation transport algorithm that will be described in a subsequent paper, and delivers excellent scaling up to at least 1500 cores. Boundary conditions for gravity can be either isolated or periodic, and they can be specified in each direction independently, using a newly developed generalization of the Ewald method. The gravity calculation can be accelerated with the adaptive block update technique by partially re-using the solution from the previous time-step. Comparison with the FLASH internal multigrid gravity solver shows that tree-based methods provide a competitive alternative, particularly for problems with isolated or mixed boundary conditions. We evaluate several multipole acceptance criteria (MACs) and identify a relatively simple approximate partial error MAC which provides high accuracy at low computational cost. The optical depth estimates are found to agree very well with those of the RADMC-3D radiation transport code, with the tree-solver being much faster. Our algorithm is available in the standard release of the FLASH code in version 4.0 and later.
Polynomial algorithms for the Maximal Pairing Problem: efficient phylogenetic targeting on arbitrary trees

PubMed Central

2010-01-01

Background The Maximal Pairing Problem (MPP) is the prototype of a class of combinatorial optimization problems that are of considerable interest in bioinformatics: Given an arbitrary phylogenetic tree T and weights ωxy for the paths between any two pairs of leaves (x, y), what is the collection of edge-disjoint paths between pairs of leaves that maximizes the total weight? Special cases of the MPP for binary trees and equal weights have been described previously; algorithms to solve the general MPP are still missing, however. Results We describe a relatively simple dynamic programming algorithm for the special case of binary trees. We then show that the general case of multifurcating trees can be treated by interleaving solutions to certain auxiliary Maximum Weighted Matching problems with an extension of this dynamic programming approach, resulting in an overall polynomial-time solution of complexity (n4 log n) w.r.t. the number n of leaves. The source code of a C implementation can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/Targeting. For binary trees, we furthermore discuss several constrained variants of the MPP as well as a partition function approach to the probabilistic version of the MPP. Conclusions The algorithms introduced here make it possible to solve the MPP also for large trees with high-degree vertices. This has practical relevance in the field of comparative phylogenetics and, for example, in the context of phylogenetic targeting, i.e., data collection with resource limitations. PMID:20525185
Multi-hop path tracing of mobile robot with multi-range image

NASA Astrophysics Data System (ADS)

Choudhury, Ramakanta; Samal, Chandrakanta; Choudhury, Umakanta

2010-02-01

It is well known that image processing depends heavily upon image representation technique . This paper intends to find out the optimal path of mobile robots for a specified area where obstacles are predefined as well as modified. Here the optimal path is represented by using the Quad tree method. Since there has been rising interest in the use of quad tree, we have tried to use the successive subdivision of images into quadrants from which the quad tree is developed. In the quad tree, obstacles-free area and the partial filled area are represented with different notations. After development of quad tree the algorithm is used to find the optimal path by employing neighbor finding technique, with a view to move the robot from the source to destination. The algorithm, here , permeates through the entire tree, and tries to locate the common ancestor for computation. The computation and the algorithm, aim at easing the ability of the robot to trace the optimal path with the help of adjacencies between the neighboring nodes as well as determining such adjacencies in the horizontal, vertical and diagonal directions. In this paper efforts have been made to determine the movement of the adjacent block in the quad tree and to detect the transition between the blocks equal size and finally generate the result.
A hybrid 3D spatial access method based on quadtrees and R-trees for globe data

NASA Astrophysics Data System (ADS)

Gong, Jun; Ke, Shengnan; Li, Xiaomin; Qi, Shuhua

2009-10-01

3D spatial access method for globe data is very crucial technique for virtual earth. This paper presents a brand-new maintenance method to index 3d objects distributed on the whole surface of the earth, which integrates the 1:1,000,000- scale topographic map tiles, Quad-tree and R-tree. Furthermore, when traditional methods are extended into 3d space, the performance of spatial index deteriorates badly, for example 3D R-tree. In order to effectively solve this difficult problem, a new algorithm of dynamic R-tree is put forward, which includes two sub-procedures, namely node-choosing and node-split. In the node-choosing algorithm, a new strategy is adopted, not like the traditional mode which is from top to bottom, but firstly from bottom to top then from top to bottom. This strategy can effectively solve the negative influence of node overlap. In the node-split algorithm, 2-to-3 split mode substitutes the traditional 1-to-2 mode, which can better concern the shape and size of nodes. Because of the rational tree shape, this R-tree method can easily integrate the concept of LOD. Therefore, it will be later implemented in commercial DBMS and adopted in time-crucial 3d GIS system.
TreeNetViz: revealing patterns of networks over tree structures.

PubMed

Gou, Liang; Zhang, Xiaolong Luke

2011-12-01

Network data often contain important attributes from various dimensions such as social affiliations and areas of expertise in a social network. If such attributes exhibit a tree structure, visualizing a compound graph consisting of tree and network structures becomes complicated. How to visually reveal patterns of a network over a tree has not been fully studied. In this paper, we propose a compound graph model, TreeNet, to support visualization and analysis of a network at multiple levels of aggregation over a tree. We also present a visualization design, TreeNetViz, to offer the multiscale and cross-scale exploration and interaction of a TreeNet graph. TreeNetViz uses a Radial, Space-Filling (RSF) visualization to represent the tree structure, a circle layout with novel optimization to show aggregated networks derived from TreeNet, and an edge bundling technique to reduce visual complexity. Our circular layout algorithm reduces both total edge-crossings and edge length and also considers hierarchical structure constraints and edge weight in a TreeNet graph. These experiments illustrate that the algorithm can reduce visual cluttering in TreeNet graphs. Our case study also shows that TreeNetViz has the potential to support the analysis of a compound graph by revealing multiscale and cross-scale network patterns. © 2011 IEEE
Sequential decision tree using the analytic hierarchy process for decision support in rectal cancer.

PubMed

Suner, Aslı; Çelikoğlu, Can Cengiz; Dicle, Oğuz; Sökmen, Selman

2012-09-01

The aim of the study is to determine the most appropriate method for construction of a sequential decision tree in the management of rectal cancer, using various patient-specific criteria and treatments such as surgery, chemotherapy, and radiotherapy. An analytic hierarchy process (AHP) was used to determine the priorities of variables. Relevant criteria used in two decision steps and their relative priorities were established by a panel of five general surgeons. Data were collected via a web-based application and analyzed using the "Expert Choice" software specifically developed for the AHP. Consistency ratios in the AHP method were calculated for each set of judgments, and the priorities of sub-criteria were determined. A sequential decision tree was constructed for the best treatment decision process, using priorities determined by the AHP method. Consistency ratios in the AHP method were calculated for each decision step, and the judgments were considered consistent. The tumor-related criterion "presence of perforation" (0.331) and the patient-surgeon-related criterion "surgeon's experience" (0.630) had the highest priority in the first decision step. In the second decision step, the tumor-related criterion "the stage of the disease" (0.230) and the patient-surgeon-related criterion "surgeon's experience" (0.281) were the paramount criteria. The results showed some variation in the ranking of criteria between the decision steps. In the second decision step, for instance, the tumor-related criterion "presence of perforation" was just the fifth. The consistency of decision support systems largely depends on the quality of the underlying decision tree. When several choices and variables have to be considered in a decision, it is very important to determine priorities. The AHP method seems to be effective for this purpose. The decision algorithm developed by this method is more realistic and will improve the quality of the decision tree. Copyright © 2012 Elsevier B.V. All rights reserved.
Soft context clustering for F0 modeling in HMM-based speech synthesis

NASA Astrophysics Data System (ADS)

Khorram, Soheil; Sameti, Hossein; King, Simon

2015-12-01

This paper proposes the use of a new binary decision tree, which we call a soft decision tree, to improve generalization performance compared to the conventional `hard' decision tree method that is used to cluster context-dependent model parameters in statistical parametric speech synthesis. We apply the method to improve the modeling of fundamental frequency, which is an important factor in synthesizing natural-sounding high-quality speech. Conventionally, hard decision tree-clustered hidden Markov models (HMMs) are used, in which each model parameter is assigned to a single leaf node. However, this `divide-and-conquer' approach leads to data sparsity, with the consequence that it suffers from poor generalization, meaning that it is unable to accurately predict parameters for models of unseen contexts: the hard decision tree is a weak function approximator. To alleviate this, we propose the soft decision tree, which is a binary decision tree with soft decisions at the internal nodes. In this soft clustering method, internal nodes select both their children with certain membership degrees; therefore, each node can be viewed as a fuzzy set with a context-dependent membership function. The soft decision tree improves model generalization and provides a superior function approximator because it is able to assign each context to several overlapped leaves. In order to use such a soft decision tree to predict the parameters of the HMM output probability distribution, we derive the smoothest (maximum entropy) distribution which captures all partial first-order moments and a global second-order moment of the training samples. Employing such a soft decision tree architecture with maximum entropy distributions, a novel speech synthesis system is trained using maximum likelihood (ML) parameter re-estimation and synthesis is achieved via maximum output probability parameter generation. In addition, a soft decision tree construction algorithm optimizing a log-likelihood measure is developed. Both subjective and objective evaluations were conducted and indicate a considerable improvement over the conventional method.

Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets

PubMed Central

Doubravsky, Karel; Dohnal, Mirko

2015-01-01

Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details. PMID:26158662
Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets.

PubMed

Doubravsky, Karel; Dohnal, Mirko

2015-01-01

Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details.
Computing all hybridization networks for multiple binary phylogenetic input trees.

PubMed

Albrecht, Benjamin

2015-07-30

The computation of phylogenetic trees on the same set of species that are based on different orthologous genes can lead to incongruent trees. One possible explanation for this behavior are interspecific hybridization events recombining genes of different species. An important approach to analyze such events is the computation of hybridization networks. This work presents the first algorithm computing the hybridization number as well as a set of representative hybridization networks for multiple binary phylogenetic input trees on the same set of taxa. To improve its practical runtime, we show how this algorithm can be parallelized. Moreover, we demonstrate the efficiency of the software Hybroscale, containing an implementation of our algorithm, by comparing it to PIRNv2.0, which is so far the best available software computing the exact hybridization number for multiple binary phylogenetic trees on the same set of taxa. The algorithm is part of the software Hybroscale, which was developed specifically for the investigation of hybridization networks including their computation and visualization. Hybroscale is freely available(1) and runs on all three major operating systems. Our simulation study indicates that our approach is on average 100 times faster than PIRNv2.0. Moreover, we show how Hybroscale improves the interpretation of the reported hybridization networks by adding certain features to its graphical representation.
Continuous-time quantum search on balanced trees

NASA Astrophysics Data System (ADS)

Philipp, Pascal; Tarrataca, Luís; Boettcher, Stefan

2016-03-01

We examine the effect of network heterogeneity on the performance of quantum search algorithms. To this end, we study quantum search on a tree for the oracle Hamiltonian formulation employed by continuous-time quantum walks. We use analytical and numerical arguments to show that the exponent of the asymptotic running time ˜Nβ changes uniformly from β =0.5 to β =1 as the searched-for site is moved from the root of the tree towards the leaves. These results imply that the time complexity of the quantum search algorithm on a balanced tree is closely correlated with certain path-based centrality measures of the searched-for site.
Effect of symptom-based risk stratification on the costs of managing patients with chronic rhinosinusitis symptoms.

PubMed

Tan, Bruce K; Lu, Guanning; Kwasny, Mary J; Hsueh, Wayne D; Shintani-Smith, Stephanie; Conley, David B; Chandra, Rakesh K; Kern, Robert C; Leung, Randy

2013-11-01

Current symptom criteria poorly predict a diagnosis of chronic rhinosinusitis (CRS) resulting in excessive treatment of patients with presumed CRS. The objective of this study was analyze the positive predictive value of individual symptoms, or symptoms in combination, in patients with CRS symptoms and examine the costs of the subsequent diagnostic algorithm using a decision tree-based cost analysis. We analyzed previously collected patient-reported symptoms from a cross-sectional study of patients who had received a computed tomography (CT) scan of their sinuses at a tertiary care otolaryngology clinic for evaluation of CRS symptoms to calculate the positive predictive value of individual symptoms. Classification and regression tree (CART) analysis then optimized combinations of symptoms and thresholds to identify CRS patients. The calculated positive predictive values were applied to a previously developed decision tree that compared an upfront CT (uCT) algorithm against an empiric medical therapy (EMT) algorithm with further analysis that considered the availability of point of care (POC) imaging. The positive predictive value of individual symptoms ranged from 0.21 for patients reporting forehead pain and to 0.69 for patients reporting hyposmia. The CART model constructed a dichotomous model based on forehead pain, maxillary pain, hyposmia, nasal discharge, and facial pain (C-statistic 0.83). If POC CT were available, median costs ($64-$415) favored using the upfront CT for all individual symptoms. If POC CT was unavailable, median costs favored uCT for most symptoms except intercanthal pain (-$15), hyposmia (-$100), and discolored nasal discharge (-$24), although these symptoms became equivocal on cost sensitivity analysis. The three-tiered CART model could subcategorize patients into tiers where uCT was always favorable (median costs: $332-$504) and others for which EMT was always favorable (median costs -$121 to -$275). The uCT algorithm was always more costly if the nasal endoscopy was positive. Among patients with classic CRS symptoms, the frequency of individual symptoms varied the likelihood of a CRS diagnosis marginally. Only hyposmia, the absence of facial pain, and discolored discharge sufficiently increased the likelihood of diagnosis to potentially make EMT less costly. The development of an evidence-based, multisymptom-based risk stratification model could substantially affect the management costs of the subsequent diagnostic algorithm. © 2013 ARS-AAOA, LLC.
Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods.

PubMed

Goloboff, Pablo A

2014-10-01

Three different types of data sets, for which the uniquely most parsimonious tree can be known exactly but is hard to find with heuristic tree search methods, are studied. Tree searches are complicated more by the shape of the tree landscape (i.e. the distribution of homoplasy on different trees) than by the sheer abundance of homoplasy or character conflict. Data sets of Type 1 are those constructed by Radel et al. (2013). Data sets of Type 2 present a very rugged landscape, with narrow peaks and valleys, but relatively low amounts of homoplasy. For such a tree landscape, subjecting the trees to TBR and saving suboptimal trees produces much better results when the sequence of clipping for the tree branches is randomized instead of fixed. An unexpected finding for data sets of Types 1 and 2 is that starting a search from a random tree instead of a random addition sequence Wagner tree may increase the probability that the search finds the most parsimonious tree; a small artificial example where these probabilities can be calculated exactly is presented. Data sets of Type 3, the most difficult data sets studied here, comprise only congruent characters, and a single island with only one most parsimonious tree. Even if there is a single island, missing entries create a very flat landscape which is difficult to traverse with tree search algorithms because the number of equally parsimonious trees that need to be saved and swapped to effectively move around the plateaus is too large. Minor modifications of the parameters of tree drifting, ratchet, and sectorial searches allow travelling around these plateaus much more efficiently than saving and swapping large numbers of equally parsimonious trees with TBR. For these data sets, two new related criteria for selecting taxon addition sequences in Wagner trees (the "selected" and "informative" addition sequences) produce much better results than the standard random or closest addition sequences. These new methods for Wagner trees and for moving around plateaus can be useful when analyzing phylogenomic data sets formed by concatenation of genes with uneven taxon representation ("sparse" supermatrices), which are likely to present a tree landscape with extensive plateaus. Copyright © 2014 Elsevier Inc. All rights reserved.
Implementation of Tree and Butterfly Barriers with Optimistic Time Management Algorithms for Discrete Event Simulation

NASA Astrophysics Data System (ADS)

Rizvi, Syed S.; Shah, Dipali; Riasat, Aasia

The Time Wrap algorithm [3] offers a run time recovery mechanism that deals with the causality errors. These run time recovery mechanisms consists of rollback, anti-message, and Global Virtual Time (GVT) techniques. For rollback, there is a need to compute GVT which is used in discrete-event simulation to reclaim the memory, commit the output, detect the termination, and handle the errors. However, the computation of GVT requires dealing with transient message problem and the simultaneous reporting problem. These problems can be dealt in an efficient manner by the Samadi's algorithm [8] which works fine in the presence of causality errors. However, the performance of both Time Wrap and Samadi's algorithms depends on the latency involve in GVT computation. Both algorithms give poor latency for large simulation systems especially in the presence of causality errors. To improve the latency and reduce the processor ideal time, we implement tree and butterflies barriers with the optimistic algorithm. Our analysis shows that the use of synchronous barriers such as tree and butterfly with the optimistic algorithm not only minimizes the GVT latency but also minimizes the processor idle time.
Fish to meat intake ratio and cooking oils are associated with hepatitis C virus carriers with persistently normal alanine aminotransferase levels.

PubMed

Otsuka, Momoka; Uchida, Yuki; Kawaguchi, Takumi; Taniguchi, Eitaro; Kawaguchi, Atsushi; Kitani, Shingo; Itou, Minoru; Oriishi, Tetsuharu; Kakuma, Tatsuyuki; Tanaka, Suiko; Yagi, Minoru; Sata, Michio

2012-10-01

Dietary habits are involved in the development of chronic inflammation; however, the impact of dietary profiles of hepatitis C virus carriers with persistently normal alanine transaminase levels (HCV-PNALT) remains unclear. The decision-tree algorithm is a data-mining statistical technique, which uncovers meaningful profiles of factors from a data collection. We aimed to investigate dietary profiles associated with HCV-PNALT using a decision-tree algorithm. Twenty-seven HCV-PNALT and 41 patients with chronic hepatitis C were enrolled in this study. Dietary habit was assessed using a validated semiquantitative food frequency questionnaire. A decision-tree algorithm was created by dietary variables, and was evaluated by area under the receiver operating characteristic curve analysis (AUROC). In multivariate analysis, fish to meat ratio, dairy product and cooking oils were identified as independent variables associated with HCV-PNALT. The decision-tree algorithm was created with two variables: a fish to meat ratio and cooking oils/ideal bodyweight. When subjects showed a fish to meat ratio of 1.24 or more, 68.8% of the subjects were HCV-PNALT. On the other hand, 11.5% of the subjects were HCV-PNALT when subjects showed a fish to meat ratio of less than 1.24 and cooking oil/ideal bodyweight of less than 0.23 g/kg. The difference in the proportion of HCV-PNALT between these groups are significant (odds ratio 16.87, 95% CI 3.40-83.67, P = 0.0005). Fivefold cross-validation of the decision-tree algorithm showed an AUROC of 0.6947 (95% CI 0.5656-0.8238, P = 0.0067). The decision-tree algorithm disclosed that fish to meat ratio and cooking oil/ideal bodyweight were associated with HCV-PNALT. © 2012 The Japan Society of Hepatology.
Finding minimum spanning trees more efficiently for tile-based phase unwrapping

NASA Astrophysics Data System (ADS)

Sawaf, Firas; Tatam, Ralph P.

2006-06-01

The tile-based phase unwrapping method employs an algorithm for finding the minimum spanning tree (MST) in each tile. We first examine the properties of a tile's representation from a graph theory viewpoint, observing that it is possible to make use of a more efficient class of MST algorithms. We then describe a novel linear time algorithm which reduces the size of the MST problem by half at the least, and solves it completely at best. We also show how this algorithm can be applied to a tile using a sliding window technique. Finally, we show how the reduction algorithm can be combined with any other standard MST algorithm to achieve a more efficient hybrid, using Prim's algorithm for empirical comparison and noting that the reduction algorithm takes only 0.1% of the time taken by the overall hybrid.
Analyzing and synthesizing phylogenies using tree alignment graphs.

PubMed

Smith, Stephen A; Brown, Joseph W; Hinchliff, Cody E

2013-01-01

Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe.
Analyzing and Synthesizing Phylogenies Using Tree Alignment Graphs

PubMed Central

Smith, Stephen A.; Brown, Joseph W.; Hinchliff, Cody E.

2013-01-01

Phylogenetic trees are used to analyze and visualize evolution. However, trees can be imperfect datatypes when summarizing multiple trees. This is especially problematic when accommodating for biological phenomena such as horizontal gene transfer, incomplete lineage sorting, and hybridization, as well as topological conflict between datasets. Additionally, researchers may want to combine information from sets of trees that have partially overlapping taxon sets. To address the problem of analyzing sets of trees with conflicting relationships and partially overlapping taxon sets, we introduce methods for aligning, synthesizing and analyzing rooted phylogenetic trees within a graph, called a tree alignment graph (TAG). The TAG can be queried and analyzed to explore uncertainty and conflict. It can also be synthesized to construct trees, presenting an alternative to supertrees approaches. We demonstrate these methods with two empirical datasets. In order to explore uncertainty, we constructed a TAG of the bootstrap trees from the Angiosperm Tree of Life project. Analysis of the resulting graph demonstrates that areas of the dataset that are unresolved in majority-rule consensus tree analyses can be understood in more detail within the context of a graph structure, using measures incorporating node degree and adjacency support. As an exercise in synthesis (i.e., summarization of a TAG constructed from the alignment trees), we also construct a TAG consisting of the taxonomy and source trees from a recent comprehensive bird study. We synthesized this graph into a tree that can be reconstructed in a repeatable fashion and where the underlying source information can be updated. The methods presented here are tractable for large scale analyses and serve as a basis for an alternative to consensus tree and supertree methods. Furthermore, the exploration of these graphs can expose structures and patterns within the dataset that are otherwise difficult to observe. PMID:24086118
FTUC: A Flooding Tree Uneven Clustering Protocol for a Wireless Sensor Network.

PubMed

He, Wei; Pillement, Sebastien; Xu, Du

2017-11-23

Clustering is an efficient approach in a wireless sensor network (WSN) to reduce the energy consumption of nodes and to extend the lifetime of the network. Unfortunately, this approach requires that all cluster heads (CHs) transmit their data to the base station (BS), which gives rise to the long distance communications problem, and in multi-hop routing, the CHs near the BS have to forward data from other nodes that lead those CHs to die prematurely, creating the hot zones problem. Unequal clustering has been proposed to solve these problems. Most of the current algorithms elect CH only by considering their competition radius, leading to unevenly distributed cluster heads. Furthermore, global distances values are needed when calculating the competition radius, which is a tedious task in large networks. To face these problems, we propose a flooding tree uneven clustering protocol (FTUC) suited for large networks. Based on the construction of a tree type sub-network to calculate the minimum and maximum distances values of the network, we then apply the unequal cluster theory. We also introduce referenced position circles to evenly elect cluster heads. Therefore, cluster heads are elected depending on the node's residual energy and their distance to a referenced circle. FTUC builds the best inter-cluster communications route by evaluating a cluster head cost function to find the best next hop to the BS. The simulation results show that the FTUC algorithm decreases the energy consumption of the nodes and balances the global energy consumption effectively, thus extending the lifetime of the network.
ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography.

PubMed

Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

2016-07-07

Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.
ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

NASA Astrophysics Data System (ADS)

Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

2016-07-01

Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.
The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

NASA Astrophysics Data System (ADS)

Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

2017-05-01

Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Automatic Classification of Trees from Laser Scanning Point Clouds

NASA Astrophysics Data System (ADS)

Sirmacek, B.; Lindenbergh, R.

2015-08-01

Development of laser scanning technologies has promoted tree monitoring studies to a new level, as the laser scanning point clouds enable accurate 3D measurements in a fast and environmental friendly manner. In this paper, we introduce a probability matrix computation based algorithm for automatically classifying laser scanning point clouds into 'tree' and 'non-tree' classes. Our method uses the 3D coordinates of the laser scanning points as input and generates a new point cloud which holds a label for each point indicating if it belongs to the 'tree' or 'non-tree' class. To do so, a grid surface is assigned to the lowest height level of the point cloud. The grids are filled with probability values which are calculated by checking the point density above the grid. Since the tree trunk locations appear with very high values in the probability matrix, selecting the local maxima of the grid surface help to detect the tree trunks. Further points are assigned to tree trunks if they appear in the close proximity of trunks. Since heavy mathematical computations (such as point cloud organization, detailed shape 3D detection methods, graph network generation) are not required, the proposed algorithm works very fast compared to the existing methods. The tree classification results are found reliable even on point clouds of cities containing many different objects. As the most significant weakness, false detection of light poles, traffic signs and other objects close to trees cannot be prevented. Nevertheless, the experimental results on mobile and airborne laser scanning point clouds indicate the possible usage of the algorithm as an important step for tree growth observation, tree counting and similar applications. While the laser scanning point cloud is giving opportunity to classify even very small trees, accuracy of the results is reduced in the low point density areas further away than the scanning location. These advantages and disadvantages of two laser scanning point cloud sources are discussed in detail.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Behroozi, Peter S.; Wechsler, Risa H.; Wu, Hao-Yi

We present a new algorithm for generating merger trees and halo catalogs which explicitly ensures consistency of halo properties (mass, position, and velocity) across time steps. Our algorithm has demonstrated the ability to improve both the completeness (through detecting and inserting otherwise missing halos) and purity (through detecting and removing spurious objects) of both merger trees and halo catalogs. In addition, our method is able to robustly measure the self-consistency of halo finders; it is the first to directly measure the uncertainties in halo positions, halo velocities, and the halo mass function for a given halo finder based on consistencymore » between snapshots in cosmological simulations. We use this algorithm to generate merger trees for two large simulations (Bolshoi and Consuelo) and evaluate two halo finders (ROCKSTAR and BDM). We find that both the ROCKSTAR and BDM halo finders track halos extremely well; in both, the number of halos which do not have physically consistent progenitors is at the 1%-2% level across all halo masses. Our code is publicly available at http://code.google.com/p/consistent-trees. Our trees and catalogs are publicly available at http://hipacc.ucsc.edu/Bolshoi/.« less
Effects of plot size on forest-type algorithm accuracy

Treesearch

James A. Westfall

2009-01-01

The Forest Inventory and Analysis (FIA) program utilizes an algorithm to consistently determine the forest type for forested conditions on sample plots. Forest type is determined from tree size and species information. Thus, the accuracy of results is often dependent on the number of trees present, which is highly correlated with plot area. This research examines the...
Darwin v. 2.0: an interpreted computer language for the biosciences.

PubMed

Gonnet, G H; Hallett, M T; Korostensky, C; Bernardin, L

2000-02-01

We announce the availability of the second release of Darwin v. 2.0, an interpreted computer language especially tailored to researchers in the biosciences. The system is a general tool applicable to a wide range of problems. This second release improves Darwin version 1.6 in several ways: it now contains (1) a larger set of libraries touching most of the classical problems from computational biology (pairwise alignment, all versus all alignments, tree construction, multiple sequence alignment), (2) an expanded set of general purpose algorithms (search algorithms for discrete problems, matrix decomposition routines, complex/long integer arithmetic operations), (3) an improved language with a cleaner syntax, (4) better on-line help, and (5) a number of fixes to user-reported bugs. Darwin is made available for most operating systems free of char ge from the Computational Biochemistry Research Group (CBRG), reachable at http://chrg.inf.ethz.ch. darwin@inf.ethz.ch
Fast Dating Using Least-Squares Criteria and Algorithms.

PubMed

To, Thu-Hien; Jung, Matthieu; Lycett, Samantha; Gascuel, Olivier

2016-01-01

Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley-Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley-Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.

Fast Dating Using Least-Squares Criteria and Algorithms

PubMed Central

To, Thu-Hien; Jung, Matthieu; Lycett, Samantha; Gascuel, Olivier

2016-01-01

Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley–Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley–Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3. PMID:26424727
Labeled trees and the efficient computation of derivations

NASA Technical Reports Server (NTRS)

Grossman, Robert; Larson, Richard G.

1989-01-01

The effective parallel symbolic computation of operators under composition is discussed. Examples include differential operators under composition and vector fields under the Lie bracket. Data structures consisting of formal linear combinations of rooted labeled trees are discussed. A multiplication on rooted labeled trees is defined, thereby making the set of these data structures into an associative algebra. An algebra homomorphism is defined from the original algebra of operators into this algebra of trees. An algebra homomorphism from the algebra of trees into the algebra of differential operators is then described. The cancellation which occurs when noncommuting operators are expressed in terms of commuting ones occurs naturally when the operators are represented using this data structure. This leads to an algorithm which, for operators which are derivations, speeds up the computation exponentially in the degree of the operator. It is shown that the algebra of trees leads naturally to a parallel version of the algorithm.
Computational path planner for product assembly in complex environments

NASA Astrophysics Data System (ADS)

Shang, Wei; Liu, Jianhua; Ning, Ruxin; Liu, Mi

2013-03-01

Assembly path planning is a crucial problem in assembly related design and manufacturing processes. Sampling based motion planning algorithms are used for computational assembly path planning. However, the performance of such algorithms may degrade much in environments with complex product structure, narrow passages or other challenging scenarios. A computational path planner for automatic assembly path planning in complex 3D environments is presented. The global planning process is divided into three phases based on the environment and specific algorithms are proposed and utilized in each phase to solve the challenging issues. A novel ray test based stochastic collision detection method is proposed to evaluate the intersection between two polyhedral objects. This method avoids fake collisions in conventional methods and degrades the geometric constraint when a part has to be removed with surface contact with other parts. A refined history based rapidly-exploring random tree (RRT) algorithm which bias the growth of the tree based on its planning history is proposed and employed in the planning phase where the path is simple but the space is highly constrained. A novel adaptive RRT algorithm is developed for the path planning problem with challenging scenarios and uncertain environment. With extending values assigned on each tree node and extending schemes applied, the tree can adapts its growth to explore complex environments more efficiently. Experiments on the key algorithms are carried out and comparisons are made between the conventional path planning algorithms and the presented ones. The comparing results show that based on the proposed algorithms, the path planner can compute assembly path in challenging complex environments more efficiently and with higher success. This research provides the references to the study of computational assembly path planning under complex environments.
Aligning Biomolecular Networks Using Modular Graph Kernels

NASA Astrophysics Data System (ADS)

Towfic, Fadi; Greenlee, M. Heather West; Honavar, Vasant

Comparative analysis of biomolecular networks constructed using measurements from different conditions, tissues, and organisms offer a powerful approach to understanding the structure, function, dynamics, and evolution of complex biological systems. We explore a class of algorithms for aligning large biomolecular networks by breaking down such networks into subgraphs and computing the alignment of the networks based on the alignment of their subgraphs. The resulting subnetworks are compared using graph kernels as scoring functions. We provide implementations of the resulting algorithms as part of BiNA, an open source biomolecular network alignment toolkit. Our experiments using Drosophila melanogaster, Saccharomyces cerevisiae, Mus musculus and Homo sapiens protein-protein interaction networks extracted from the DIP repository of protein-protein interaction data demonstrate that the performance of the proposed algorithms (as measured by % GO term enrichment of subnetworks identified by the alignment) is competitive with some of the state-of-the-art algorithms for pair-wise alignment of large protein-protein interaction networks. Our results also show that the inter-species similarity scores computed based on graph kernels can be used to cluster the species into a species tree that is consistent with the known phylogenetic relationships among the species.
A Uniform Energy Consumption Algorithm for Wireless Sensor and Actuator Networks Based on Dynamic Polling Point Selection

PubMed Central

Li, Shuo; Peng, Jun; Liu, Weirong; Zhu, Zhengfa; Lin, Kuo-Chi

2014-01-01

Recent research has indicated that using the mobility of the actuator in wireless sensor and actuator networks (WSANs) to achieve mobile data collection can greatly increase the sensor network lifetime. However, mobile data collection may result in unacceptable collection delays in the network if the path of the actuator is too long. Because real-time network applications require meeting data collection delay constraints, planning the path of the actuator is a very important issue to balance the prolongation of the network lifetime and the reduction of the data collection delay. In this paper, a multi-hop routing mobile data collection algorithm is proposed based on dynamic polling point selection with delay constraints to address this issue. The algorithm can actively update the selection of the actuator's polling points according to the sensor nodes' residual energies and their locations while also considering the collection delay constraint. It also dynamically constructs the multi-hop routing trees rooted by these polling points to balance the sensor node energy consumption and the extension of the network lifetime. The effectiveness of the algorithm is validated by simulation. PMID:24451455
Constructing phylogenetic trees using interacting pathways.

PubMed

Wan, Peng; Che, Dongsheng

2013-01-01

Phylogenetic trees are used to represent evolutionary relationships among biological species or organisms. The construction of phylogenetic trees is based on the similarities or differences of their physical or genetic features. Traditional approaches of constructing phylogenetic trees mainly focus on physical features. The recent advancement of high-throughput technologies has led to accumulation of huge amounts of biological data, which in turn changed the way of biological studies in various aspects. In this paper, we report our approach of building phylogenetic trees using the information of interacting pathways. We have applied hierarchical clustering on two domains of organisms-eukaryotes and prokaryotes. Our preliminary results have shown the effectiveness of using the interacting pathways in revealing evolutionary relationships.
Construction of an integrated gene regulatory network link to stress-related immune system in cattle.

PubMed

Behdani, Elham; Bakhtiarizadeh, Mohammad Reza

2017-10-01

The immune system is an important biological system that is negatively impacted by stress. This study constructed an integrated regulatory network to enhance our understanding of the regulatory gene network used in the stress-related immune system. Module inference was used to construct modules of co-expressed genes with bovine leukocyte RNA-Seq data. Transcription factors (TFs) were then assigned to these modules using Lemon-Tree algorithms. In addition, the TFs assigned to each module were confirmed using the promoter analysis and protein-protein interactions data. Therefore, our integrated method identified three TFs which include one TF that is previously known to be involved in immune response (MYBL2) and two TFs (E2F8 and FOXS1) that had not been recognized previously and were identified for the first time in this study as novel regulatory candidates in immune response. This study provides valuable insights on the regulatory programs of genes involved in the stress-related immune system.
Analysis of data mining classification by comparison of C4.5 and ID algorithms

NASA Astrophysics Data System (ADS)

Sudrajat, R.; Irianingsih, I.; Krisnawan, D.

2017-01-01

The rapid development of information technology, triggered by the intensive use of information technology. For example, data mining widely used in investment. Many techniques that can be used assisting in investment, the method that used for classification is decision tree. Decision tree has a variety of algorithms, such as C4.5 and ID3. Both algorithms can generate different models for similar data sets and different accuracy. C4.5 and ID3 algorithms with discrete data provide accuracy are 87.16% and 99.83% and C4.5 algorithm with numerical data is 89.69%. C4.5 and ID3 algorithms with discrete data provides 520 and 598 customers and C4.5 algorithm with numerical data is 546 customers. From the analysis of the both algorithm it can classified quite well because error rate less than 15%.
Fault trees for decision making in systems analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lambert, Howard E.

1975-10-09

The application of fault tree analysis (FTA) to system safety and reliability is presented within the framework of system safety analysis. The concepts and techniques involved in manual and automated fault tree construction are described and their differences noted. The theory of mathematical reliability pertinent to FTA is presented with emphasis on engineering applications. An outline of the quantitative reliability techniques of the Reactor Safety Study is given. Concepts of probabilistic importance are presented within the fault tree framework and applied to the areas of system design, diagnosis and simulation. The computer code IMPORTANCE ranks basic events and cut setsmore » according to a sensitivity analysis. A useful feature of the IMPORTANCE code is that it can accept relative failure data as input. The output of the IMPORTANCE code can assist an analyst in finding weaknesses in system design and operation, suggest the most optimal course of system upgrade, and determine the optimal location of sensors within a system. A general simulation model of system failure in terms of fault tree logic is described. The model is intended for efficient diagnosis of the causes of system failure in the event of a system breakdown. It can also be used to assist an operator in making decisions under a time constraint regarding the future course of operations. The model is well suited for computer implementation. New results incorporated in the simulation model include an algorithm to generate repair checklists on the basis of fault tree logic and a one-step-ahead optimization procedure that minimizes the expected time to diagnose system failure.« less
Multi-test decision tree and its application to microarray data classification.

PubMed

Czajkowski, Marcin; Grześ, Marek; Kretowski, Marek

2014-05-01

The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 14 datasets by an average 6%. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts. Copyright © 2014 Elsevier B.V. All rights reserved.
MASTtreedist: visualization of tree space based on maximum agreement subtree.

PubMed

Huang, Hong; Li, Yongji

2013-01-01

Phylogenetic tree construction process might produce many candidate trees as the "best estimates." As the number of constructed phylogenetic trees grows, the need to efficiently compare their topological or physical structures arises. One of the tree comparison's software tools, the Mesquite's Tree Set Viz module, allows the rapid and efficient visualization of the tree comparison distances using multidimensional scaling (MDS). Tree-distance measures, such as Robinson-Foulds (RF), for the topological distance among different trees have been implemented in Tree Set Viz. New and sophisticated measures such as Maximum Agreement Subtree (MAST) can be continuously built upon Tree Set Viz. MAST can detect the common substructures among trees and provide more precise information on the similarity of the trees, but it is NP-hard and difficult to implement. In this article, we present a practical tree-distance metric: MASTtreedist, a MAST-based comparison metric in Mesquite's Tree Set Viz module. In this metric, the efficient optimizations for the maximum weight clique problem are applied. The results suggest that the proposed method can efficiently compute the MAST distances among trees, and such tree topological differences can be translated as a scatter of points in two-dimensional (2D) space. We also provide statistical evaluation of provided measures with respect to RF-using experimental data sets. This new comparison module provides a new tree-tree pairwise comparison metric based on the differences of the number of MAST leaves among constructed phylogenetic trees. Such a new phylogenetic tree comparison metric improves the visualization of taxa differences by discriminating small divergences of subtree structures for phylogenetic tree reconstruction.
Encoding phylogenetic trees in terms of weighted quartets.

PubMed

Grünewald, Stefan; Huber, Katharina T; Moulton, Vincent; Semple, Charles

2008-04-01

One of the main problems in phylogenetics is to develop systematic methods for constructing evolutionary or phylogenetic trees. For a set of species X, an edge-weighted phylogenetic X-tree or phylogenetic tree is a (graph theoretical) tree with leaf set X and no degree 2 vertices, together with a map assigning a non-negative length to each edge of the tree. Within phylogenetics, several methods have been proposed for constructing such trees that work by trying to piece together quartet trees on X, i.e. phylogenetic trees each having four leaves in X. Hence, it is of interest to characterise when a collection of quartet trees corresponds to a (unique) phylogenetic tree. Recently, Dress and Erdös provided such a characterisation for binary phylogenetic trees, that is, phylogenetic trees all of whose internal vertices have degree 3. Here we provide a new characterisation for arbitrary phylogenetic trees.
Accuracy Assessment of Crown Delineation Methods for the Individual Trees Using LIDAR Data

NASA Astrophysics Data System (ADS)

Chang, K. T.; Lin, C.; Lin, Y. C.; Liu, J. K.

2016-06-01

Forest canopy density and height are used as variables in a number of environmental applications, including the estimation of biomass, forest extent and condition, and biodiversity. The airborne Light Detection and Ranging (LiDAR) is very useful to estimate forest canopy parameters according to the generated canopy height models (CHMs). The purpose of this work is to introduce an algorithm to delineate crown parameters, e.g. tree height and crown radii based on the generated rasterized CHMs. And accuracy assessment for the extraction of volumetric parameters of a single tree is also performed via manual measurement using corresponding aerial photo pairs. A LiDAR dataset of a golf course acquired by Leica ALS70-HP is used in this study. Two algorithms, i.e. a traditional one with the subtraction of a digital elevation model (DEM) from a digital surface model (DSM), and a pit-free approach are conducted to generate the CHMs firstly. Then two algorithms, a multilevel morphological active-contour (MMAC) and a variable window filter (VWF), are implemented and used in this study for individual tree delineation. Finally, experimental results of two automatic estimation methods for individual trees can be evaluated with manually measured stand-level parameters, i.e. tree height and crown diameter. The resulting CHM generated by a simple subtraction is full of empty pixels (called "pits") that will give vital impact on subsequent analysis for individual tree delineation. The experimental results indicated that if more individual trees can be extracted, tree crown shape will became more completely in the CHM data after the pit-free process.
Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis.

PubMed

Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H

2016-01-01

Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
Individual tree detection in intact forest and degraded forest areas in the north region of Mato Grosso State, Brazilian Amazon

NASA Astrophysics Data System (ADS)

Santos, E. G.; Jorge, A.; Shimabukuro, Y. E.; Gasparini, K.

2017-12-01

The State of Mato Grosso - MT has the second largest area with degraded forest among the states of the Brazilian Legal Amazon. Land use and land cover change processes that occur in this region cause the loss of forest biomass, releasing greenhouse gases that contribute to the increase of temperature on earth. These degraded forest areas lose biomass according to the intensity and magnitude of the degradation type. The estimate of forest biomass, commonly performed by forest inventory through sample plots, shows high variance in degraded forest areas. Due to this variance and complexity of tropical forests, the aim of this work was to estimate forest biomass using LiDAR point clouds in three distinct forest areas: one degraded by fire, another by selective logging and one area of intact forest. The approach applied in these areas was the Individual Tree Detection (ITD). To isolate the trees, we generated Canopy Height Models (CHM) images, which are obtained by subtracting the Digital Elevation Model (MDE) and the Digital Terrain Model (MDT), created by the cloud of LiDAR points. The trees in the CHM images are isolated by an algorithm provided by the Quantitative Ecology research group at the School of Forestry at Northern Arizona University (SILVA, 2015). With these points, metrics were calculated for some areas, which were used in the model of biomass estimation. The methodology used in this work was expected to reduce the error in biomass estimate in the study area. The cloud points of the most representative trees were analyzed, and thus field data was correlated with the individual trees found by the proposed algorithm. In a pilot study, the proposed methodology was applied generating the individual tree metrics: total height and area of the crown. When correlating 339 isolated trees, an unsatisfactory R² was obtained, as heights found by the algorithm were lower than those obtained in the field, with an average difference of 2.43 m. This shows that the algorithm used to isolate trees in temperate areas did not obtained satisfactory results in the tropical forest of Mato Grosso State. Due to this, in future works two algorithms, one developed by Dalponte et al. (2015) and another by Li et al. (2012) will be used.
The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees.

PubMed

Soria-Carrasco, Víctor; Talavera, Gerard; Igea, Javier; Castresana, Jose

2007-11-01

We introduce a new phylogenetic comparison method that measures overall differences in the relative branch length and topology of two phylogenetic trees. To do this, the algorithm first scales one of the trees to have a global divergence as similar as possible to the other tree. Then, the branch length distance, which takes differences in topology and branch lengths into account, is applied to the two trees. We thus obtain the minimum branch length distance or K tree score. Two trees with very different relative branch lengths get a high K score whereas two trees that follow a similar among-lineage rate variation get a low score, regardless of the overall rates in both trees. There are several applications of the K tree score, two of which are explained here in more detail. First, this score allows the evaluation of the performance of phylogenetic algorithms, not only with respect to their topological accuracy, but also with respect to the reproduction of a given branch length variation. In a second example, we show how the K score allows the selection of orthologous genes by choosing those that better follow the overall shape of a given reference tree. http://molevol.ibmb.csic.es/Ktreedist.html
Triplet supertree heuristics for the tree of life

PubMed Central

Lin, Harris T; Burleigh, J Gordon; Eulenstein, Oliver

2009-01-01

Background There is much interest in developing fast and accurate supertree methods to infer the tree of life. Supertree methods combine smaller input trees with overlapping sets of taxa to make a comprehensive phylogenetic tree that contains all of the taxa in the input trees. The intrinsically hard triplet supertree problem takes a collection of input species trees and seeks a species tree (supertree) that maximizes the number of triplet subtrees that it shares with the input trees. However, the utility of this supertree problem has been limited by a lack of efficient and effective heuristics. Results We introduce fast hill-climbing heuristics for the triplet supertree problem that perform a step-wise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. To realize time efficient heuristics we designed the first nontrivial algorithms for two standard search problems, which greatly improve on the time complexity to the best known (naïve) solutions by a factor of n and n2 (the number of taxa in the supertree). These algorithms enable large-scale supertree analyses based on the triplet supertree problem that were previously not possible. We implemented hill-climbing heuristics that are based on our new algorithms, and in analyses of two published supertree data sets, we demonstrate that our new heuristics outperform other standard supertree methods in maximizing the number of triplets shared with the input trees. Conclusion With our new heuristics, the triplet supertree problem is now computationally more tractable for large-scale supertree analyses, and it provides a potentially more accurate alternative to existing supertree methods. PMID:19208181
Simulating Urban Tree Effects on Air, Water, and Heat Pollution Mitigation: iTree-Hydro Model

NASA Astrophysics Data System (ADS)

Yang, Y.; Endreny, T. A.; Nowak, D.

2011-12-01

Urban and suburban development changes land surface thermal, radiative, porous, and roughness properties and pollutant loading rates, with the combined effect leading to increased air, water, and heat pollution (e.g., urban heat islands). In this research we present the USDA Forest Service urban forest ecosystem and hydrology model, iTree Eco and Hydro, used to analyze how tree cover can deliver valuable ecosystem services to mitigate air, water, and heat pollution. Air pollution mitigation is simulated by dry deposition processes based on detected pollutant levels for CO, NO2, SO2, O3 and atmospheric stability and leaf area indices. Water quality mitigation is simulated with event mean concentration loading algorithms for N, P, metals, and TSS, and by green infrastructure pollutant filtering algorithms that consider flow path dispersal areas. Urban cooling considers direct shading and indirect evapotranspiration. Spatially distributed estimates of hourly tree evapotranspiration during the growing season are used to estimate human thermal comfort. Two main factors regulating evapotranspiration are soil moisture and canopy radiation. Spatial variation of soil moisture is represented by a modified urban topographic index and radiation for each tree is modified by considering aspect, slope and shade from surrounding buildings or hills. We compare the urban cooling algorithms used in iTree-Hydro with the urban canopy and land surface physics schemes used in the Weather Research and Forecasting model. We conclude by identifying biophysical feedbacks between tree-modulated air and water quality environmental services and how these may respond to urban heating and cooling. Improvements to this iTree model are intended to assist managers identify valuable tree services for urban living.
Tools of the Future: How Decision Tree Analysis Will Impact Mission Planning

NASA Technical Reports Server (NTRS)

Otterstatter, Matthew R.

2005-01-01

The universe is infinitely complex; however, the human mind has a finite capacity. The multitude of possible variables, metrics, and procedures in mission planning are far too many to address exhaustively. This is unfortunate because, in general, considering more possibilities leads to more accurate and more powerful results. To compensate, we can get more insightful results by employing our greatest tool, the computer. The power of the computer will be utilized through a technology that considers every possibility, decision tree analysis. Although decision trees have been used in many other fields, this is innovative for space mission planning. Because this is a new strategy, no existing software is able to completely accommodate all of the requirements. This was determined through extensive research and testing of current technologies. It was necessary to create original software, for which a short-term model was finished this summer. The model was built into Microsoft Excel to take advantage of the familiar graphical interface for user input, computation, and viewing output. Macros were written to automate the process of tree construction, optimization, and presentation. The results are useful and promising. If this tool is successfully implemented in mission planning, our reliance on old-fashioned heuristics, an error-prone shortcut for handling complexity, will be reduced. The computer algorithms involved in decision trees will revolutionize mission planning. The planning will be faster and smarter, leading to optimized missions with the potential for more valuable data.
A voting-based statistical cylinder detection framework applied to fallen tree mapping in terrestrial laser scanning point clouds

NASA Astrophysics Data System (ADS)

Polewski, Przemyslaw; Yao, Wei; Heurich, Marco; Krzystek, Peter; Stilla, Uwe

2017-07-01

This paper introduces a statistical framework for detecting cylindrical shapes in dense point clouds. We target the application of mapping fallen trees in datasets obtained through terrestrial laser scanning. This is a challenging task due to the presence of ground vegetation, standing trees, DTM artifacts, as well as the fragmentation of dead trees into non-collinear segments. Our method shares the concept of voting in parameter space with the generalized Hough transform, however two of its significant drawbacks are improved upon. First, the need to generate samples on the shape's surface is eliminated. Instead, pairs of nearby input points lying on the surface cast a vote for the cylinder's parameters based on the intrinsic geometric properties of cylindrical shapes. Second, no discretization of the parameter space is required: the voting is carried out in continuous space by means of constructing a kernel density estimator and obtaining its local maxima, using automatic, data-driven kernel bandwidth selection. Furthermore, we show how the detected cylindrical primitives can be efficiently merged to obtain object-level (entire tree) semantic information using graph-cut segmentation and a tailored dynamic algorithm for eliminating cylinder redundancy. Experiments were performed on 3 plots from the Bavarian Forest National Park, with ground truth obtained through visual inspection of the point clouds. It was found that relative to sample consensus (SAC) cylinder fitting, the proposed voting framework can improve the detection completeness by up to 10 percentage points while maintaining the correctness rate.

Dendrometer bands made easy: using modified cable ties to measure incremental growth of trees

USGS Publications Warehouse

Anemaet, Evelyn R.; Middleton, Beth A.

2013-01-01

Dendrometer bands are a useful way to make sequential repeated measurements of tree growth, but traditional dendrometer bands can be expensive, time consuming, and difficult to construct in the field. An alternative to the traditional method of band construction is to adapt commercially available materials. This paper describes how to construct and install dendrometer bands using smooth-edged, stainless steel, cable tie banding and attachable rollerball heads. As a performance comparison, both traditional and cable tie dendrometer bands were installed on baldcypress trees at the National Wetlands Research Center in Lafayette, Louisiana, by both an experienced and a novice worker. Band installation times were recorded, and growth of the trees as estimated by the two band types was measured after approximately one year, demonstrating equivalence of the two methods. This efficient approach to dendrometer band construction can help advance the knowledge of long-term tree growth in ecological studies.
A combined NLP-differential evolution algorithm approach for the optimization of looped water distribution systems

NASA Astrophysics Data System (ADS)

Zheng, Feifei; Simpson, Angus R.; Zecchin, Aaron C.

2011-08-01

This paper proposes a novel optimization approach for the least cost design of looped water distribution systems (WDSs). Three distinct steps are involved in the proposed optimization approach. In the first step, the shortest-distance tree within the looped network is identified using the Dijkstra graph theory algorithm, for which an extension is proposed to find the shortest-distance tree for multisource WDSs. In the second step, a nonlinear programming (NLP) solver is employed to optimize the pipe diameters for the shortest-distance tree (chords of the shortest-distance tree are allocated the minimum allowable pipe sizes). Finally, in the third step, the original looped water network is optimized using a differential evolution (DE) algorithm seeded with diameters in the proximity of the continuous pipe sizes obtained in step two. As such, the proposed optimization approach combines the traditional deterministic optimization technique of NLP with the emerging evolutionary algorithm DE via the proposed network decomposition. The proposed methodology has been tested on four looped WDSs with the number of decision variables ranging from 21 to 454. Results obtained show the proposed approach is able to find optimal solutions with significantly less computational effort than other optimization techniques.
Sampling-based real-time motion planning under state uncertainty for autonomous micro-aerial vehicles in GPS-denied environments.

PubMed

Li, Dachuan; Li, Qing; Cheng, Nong; Song, Jingyan

2014-11-18

This paper presents a real-time motion planning approach for autonomous vehicles with complex dynamics and state uncertainty. The approach is motivated by the motion planning problem for autonomous vehicles navigating in GPS-denied dynamic environments, which involves non-linear and/or non-holonomic vehicle dynamics, incomplete state estimates, and constraints imposed by uncertain and cluttered environments. To address the above motion planning problem, we propose an extension of the closed-loop rapid belief trees, the closed-loop random belief trees (CL-RBT), which incorporates predictions of the position estimation uncertainty, using a factored form of the covariance provided by the Kalman filter-based estimator. The proposed motion planner operates by incrementally constructing a tree of dynamically feasible trajectories using the closed-loop prediction, while selecting candidate paths with low uncertainty using efficient covariance update and propagation. The algorithm can operate in real-time, continuously providing the controller with feasible paths for execution, enabling the vehicle to account for dynamic and uncertain environments. Simulation results demonstrate that the proposed approach can generate feasible trajectories that reduce the state estimation uncertainty, while handling complex vehicle dynamics and environment constraints.
Sampling-Based Real-Time Motion Planning under State Uncertainty for Autonomous Micro-Aerial Vehicles in GPS-Denied Environments

PubMed Central

Li, Dachuan; Li, Qing; Cheng, Nong; Song, Jingyan

2014-01-01

This paper presents a real-time motion planning approach for autonomous vehicles with complex dynamics and state uncertainty. The approach is motivated by the motion planning problem for autonomous vehicles navigating in GPS-denied dynamic environments, which involves non-linear and/or non-holonomic vehicle dynamics, incomplete state estimates, and constraints imposed by uncertain and cluttered environments. To address the above motion planning problem, we propose an extension of the closed-loop rapid belief trees, the closed-loop random belief trees (CL-RBT), which incorporates predictions of the position estimation uncertainty, using a factored form of the covariance provided by the Kalman filter-based estimator. The proposed motion planner operates by incrementally constructing a tree of dynamically feasible trajectories using the closed-loop prediction, while selecting candidate paths with low uncertainty using efficient covariance update and propagation. The algorithm can operate in real-time, continuously providing the controller with feasible paths for execution, enabling the vehicle to account for dynamic and uncertain environments. Simulation results demonstrate that the proposed approach can generate feasible trajectories that reduce the state estimation uncertainty, while handling complex vehicle dynamics and environment constraints. PMID:25412217
Mathematics and evolutionary biology make bioinformatics education comprehensible.

PubMed

Jungck, John R; Weisstein, Anton E

2013-09-01

The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes-the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software-the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a 'two-culture' problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses.
Mathematics and evolutionary biology make bioinformatics education comprehensible

PubMed Central

Weisstein, Anton E.

2013-01-01

The patterns of variation within a molecular sequence data set result from the interplay between population genetic, molecular evolutionary and macroevolutionary processes—the standard purview of evolutionary biologists. Elucidating these patterns, particularly for large data sets, requires an understanding of the structure, assumptions and limitations of the algorithms used by bioinformatics software—the domain of mathematicians and computer scientists. As a result, bioinformatics often suffers a ‘two-culture’ problem because of the lack of broad overlapping expertise between these two groups. Collaboration among specialists in different fields has greatly mitigated this problem among active bioinformaticians. However, science education researchers report that much of bioinformatics education does little to bridge the cultural divide, the curriculum too focused on solving narrow problems (e.g. interpreting pre-built phylogenetic trees) rather than on exploring broader ones (e.g. exploring alternative phylogenetic strategies for different kinds of data sets). Herein, we present an introduction to the mathematics of tree enumeration, tree construction, split decomposition and sequence alignment. We also introduce off-line downloadable software tools developed by the BioQUEST Curriculum Consortium to help students learn how to interpret and critically evaluate the results of standard bioinformatics analyses. PMID:23821621
Exploiting machine learning algorithms for tree species classification in a semiarid woodland using RapidEye image

NASA Astrophysics Data System (ADS)

Adelabu, Samuel; Mutanga, Onisimo; Adam, Elhadi; Cho, Moses Azong

2013-01-01

Classification of different tree species in semiarid areas can be challenging as a result of the change in leaf structure and orientation due to soil moisture constraints. Tree species mapping is, however, a key parameter for forest management in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples.We performed classification using random forest (RF) and support vector machines (SVM) based on EnMap box. The overall accuracies for classifying the five tree species was 88.75 and 85% for both SVM and RF, respectively. We also demonstrated that the new red-edge band in the RapidEye sensor has the potential for classifying tree species in semiarid environments when integrated with other standard bands. Similarly, we observed that where there are limited training samples, SVM is preferred over RF. Finally, we demonstrated that the two accuracy measures of quantity and allocation disagreement are simpler and more helpful for the vast majority of remote sensing classification process than the kappa coefficient. Overall, high species classification can be achieved using strategically located RapidEye bands integrated with advanced processing algorithms.
Distributed-observer-based cooperative control for synchronization of linear discrete-time multi-agent systems.

PubMed

Liang, Hongjing; Zhang, Huaguang; Wang, Zhanshan

2015-11-01

This paper considers output synchronization of discrete-time multi-agent systems with directed communication topologies. The directed communication graph contains a spanning tree and the exosystem as its root. Distributed observer-based consensus protocols are proposed, based on the relative outputs of neighboring agents. A multi-step algorithm is presented to construct the observer-based protocols. In light of the discrete-time algebraic Riccati equation and internal model principle, synchronization problem is completed. At last, numerical simulation is provided to verify the effectiveness of the theoretical results. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE PAGES

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott; ...

2017-08-29

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
MLViS: A Web Tool for Machine Learning-Based Virtual Screening in Early-Phase of Drug Discovery and Development

PubMed Central

Korkmaz, Selcuk; Zararsiz, Gokmen; Goksuluk, Dincer

2015-01-01

Virtual screening is an important step in early-phase of drug discovery process. Since there are thousands of compounds, this step should be both fast and effective in order to distinguish drug-like and nondrug-like molecules. Statistical machine learning methods are widely used in drug discovery studies for classification purpose. Here, we aim to develop a new tool, which can classify molecules as drug-like and nondrug-like based on various machine learning methods, including discriminant, tree-based, kernel-based, ensemble and other algorithms. To construct this tool, first, performances of twenty-three different machine learning algorithms are compared by ten different measures, then, ten best performing algorithms have been selected based on principal component and hierarchical cluster analysis results. Besides classification, this application has also ability to create heat map and dendrogram for visual inspection of the molecules through hierarchical cluster analysis. Moreover, users can connect the PubChem database to download molecular information and to create two-dimensional structures of compounds. This application is freely available through www.biosoft.hacettepe.edu.tr/MLViS/. PMID:25928885
Complexity of possibly gapped histogram and analysis of histogram.

PubMed

Fushing, Hsieh; Roy, Tania

2018-02-01

We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis of histogram (ANOHT). Constructing a possibly gapped histogram is a complex problem of statistical mechanics due to the ensemble of candidate histograms being captured by a two-layer Ising model. This construction is also a distinctive problem of Information Theory from the perspective of data compression via uniformity. By defining a Hamiltonian (or energy) as a sum of total coding lengths of boundaries and total decoding errors within bins, this issue of computing the minimum energy macroscopic states is surprisingly resolved by applying the hierarchical clustering algorithm. Thus, a possibly gapped histogram corresponds to a macro-state. And then the first phase of ANOHT is developed for simultaneous comparison of multiple treatments, while the second phase of ANOHT is developed based on classical empirical process theory for a tree-geometry that can check the authenticity of branches of the treatment tree. The well-known Iris data are used to illustrate our technical developments. Also, a large baseball pitching dataset and a heavily right-censored divorce data are analysed to showcase the existential gaps and utilities of ANOHT.
Complexity of possibly gapped histogram and analysis of histogram

PubMed Central

Roy, Tania

2018-01-01

We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis of histogram (ANOHT). Constructing a possibly gapped histogram is a complex problem of statistical mechanics due to the ensemble of candidate histograms being captured by a two-layer Ising model. This construction is also a distinctive problem of Information Theory from the perspective of data compression via uniformity. By defining a Hamiltonian (or energy) as a sum of total coding lengths of boundaries and total decoding errors within bins, this issue of computing the minimum energy macroscopic states is surprisingly resolved by applying the hierarchical clustering algorithm. Thus, a possibly gapped histogram corresponds to a macro-state. And then the first phase of ANOHT is developed for simultaneous comparison of multiple treatments, while the second phase of ANOHT is developed based on classical empirical process theory for a tree-geometry that can check the authenticity of branches of the treatment tree. The well-known Iris data are used to illustrate our technical developments. Also, a large baseball pitching dataset and a heavily right-censored divorce data are analysed to showcase the existential gaps and utilities of ANOHT. PMID:29515829
Complexity of possibly gapped histogram and analysis of histogram

NASA Astrophysics Data System (ADS)

Fushing, Hsieh; Roy, Tania

2018-02-01

We demonstrate that gaps and distributional patterns embedded within real-valued measurements are inseparable biological and mechanistic information contents of the system. Such patterns are discovered through data-driven possibly gapped histogram, which further leads to the geometry-based analysis of histogram (ANOHT). Constructing a possibly gapped histogram is a complex problem of statistical mechanics due to the ensemble of candidate histograms being captured by a two-layer Ising model. This construction is also a distinctive problem of Information Theory from the perspective of data compression via uniformity. By defining a Hamiltonian (or energy) as a sum of total coding lengths of boundaries and total decoding errors within bins, this issue of computing the minimum energy macroscopic states is surprisingly resolved by applying the hierarchical clustering algorithm. Thus, a possibly gapped histogram corresponds to a macro-state. And then the first phase of ANOHT is developed for simultaneous comparison of multiple treatments, while the second phase of ANOHT is developed based on classical empirical process theory for a tree-geometry that can check the authenticity of branches of the treatment tree. The well-known Iris data are used to illustrate our technical developments. Also, a large baseball pitching dataset and a heavily right-censored divorce data are analysed to showcase the existential gaps and utilities of ANOHT.
A new method of optimal capacitor switching based on minimum spanning tree theory in distribution systems

NASA Astrophysics Data System (ADS)

Li, H. W.; Pan, Z. Y.; Ren, Y. B.; Wang, J.; Gan, Y. L.; Zheng, Z. Z.; Wang, W.

2018-03-01

According to the radial operation characteristics in distribution systems, this paper proposes a new method based on minimum spanning trees method for optimal capacitor switching. Firstly, taking the minimal active power loss as objective function and not considering the capacity constraints of capacitors and source, this paper uses Prim algorithm among minimum spanning trees algorithms to get the power supply ranges of capacitors and source. Then with the capacity constraints of capacitors considered, capacitors are ranked by the method of breadth-first search. In term of the order from high to low of capacitor ranking, capacitor compensation capacity based on their power supply range is calculated. Finally, IEEE 69 bus system is adopted to test the accuracy and practicality of the proposed algorithm.
Thread Graphs, Linear Rank-Width and Their Algorithmic Applications

NASA Astrophysics Data System (ADS)

Ganian, Robert

The introduction of tree-width by Robertson and Seymour [7] was a breakthrough in the design of graph algorithms. A lot of research since then has focused on obtaining a width measure which would be more general and still allowed efficient algorithms for a wide range of NP-hard problems on graphs of bounded width. To this end, Oum and Seymour have proposed rank-width, which allows the solution of many such hard problems on a less restricted graph classes (see e.g. [3,4]). But what about problems which are NP-hard even on graphs of bounded tree-width or even on trees? The parameter used most often for these exceptionally hard problems is path-width, however it is extremely restrictive - for example the graphs of path-width 1 are exactly paths.
A decomposition theory for phylogenetic networks and incompatible characters.

PubMed

Gusfield, Dan; Bansal, Vikas; Bafna, Vineet; Song, Yun S

2007-12-01

Phylogenetic networks are models of evolution that go beyond trees, incorporating non-tree-like biological events such as recombination (or more generally reticulation), which occur either in a single species (meiotic recombination) or between species (reticulation due to lateral gene transfer and hybrid speciation). The central algorithmic problems are to reconstruct a plausible history of mutations and non-tree-like events, or to determine the minimum number of such events needed to derive a given set of binary sequences, allowing one mutation per site. Meiotic recombination, reticulation and recurrent mutation can cause conflict or incompatibility between pairs of sites (or characters) of the input. Previously, we used "conflict graphs" and "incompatibility graphs" to compute lower bounds on the minimum number of recombination nodes needed, and to efficiently solve constrained cases of the minimization problem. Those results exposed the structural and algorithmic importance of the non-trivial connected components of those two graphs. In this paper, we more fully develop the structural importance of non-trivial connected components of the incompatibility and conflict graphs, proving a general decomposition theorem (Gusfield and Bansal, 2005) for phylogenetic networks. The decomposition theorem depends only on the incompatibilities in the input sequences, and hence applies to many types of phylogenetic networks, and to any biological phenomena that causes pairwise incompatibilities. More generally, the proof of the decomposition theorem exposes a maximal embedded tree structure that exists in the network when the sequences cannot be derived on a perfect phylogenetic tree. This extends the theory of perfect phylogeny in a natural and important way. The proof is constructive and leads to a polynomial-time algorithm to find the unique underlying maximal tree structure. We next examine and fully solve the major open question from Gusfield and Bansal (2005): Is it true that for every input there must be a fully decomposed phylogenetic network that minimizes the number of recombination nodes used, over all phylogenetic networks for the input. We previously conjectured that the answer is yes. In this paper, we show that the answer in is no, both for the case that only single-crossover recombination is allowed, and also for the case that unbounded multiple-crossover recombination is allowed. The latter case also resolves a conjecture recently stated in (Huson and Klopper, 2007) in the context of reticulation networks. Although the conjecture from Gusfield and Bansal (2005) is disproved in general, we show that the answer to the conjecture is yes in several natural special cases, and establish necessary combinatorial structure that counterexamples to the conjecture must possess. We also show that counterexamples to the conjecture are rare (for the case of single-crossover recombination) in simulated data.
The Proposal of a Evolutionary Strategy Generating the Data Structures Based on a Horizontal Tree for the Tests

NASA Astrophysics Data System (ADS)

Żukowicz, Marek; Markiewicz, Michał

2016-09-01

The aim of the article is to present a mathematical definition of the object model, that is known in computer science as TreeList and to show application of this model for design evolutionary algorithm, that purpose is to generate structures based on this object. The first chapter introduces the reader to the problem of presenting data using the TreeList object. The second chapter describes the problem of testing data structures based on TreeList. The third one shows a mathematical model of the object TreeList and the parameters, used in determining the utility of structures created through this model and in evolutionary strategy, that generates these structures for testing purposes. The last chapter provides a brief summary and plans for future research related to the algorithm presented in the article.
Constraint Embedding Technique for Multibody System Dynamics

NASA Technical Reports Server (NTRS)

Woo, Simon S.; Cheng, Michael K.

2011-01-01

Multibody dynamics play a critical role in simulation testbeds for space missions. There has been a considerable interest in the development of efficient computational algorithms for solving the dynamics of multibody systems. Mass matrix factorization and inversion techniques and the O(N) class of forward dynamics algorithms developed using a spatial operator algebra stand out as important breakthrough on this front. Techniques such as these provide the efficient algorithms and methods for the application and implementation of such multibody dynamics models. However, these methods are limited only to tree-topology multibody systems. Closed-chain topology systems require different techniques that are not as efficient or as broad as those for tree-topology systems. The closed-chain forward dynamics approach consists of treating the closed-chain topology as a tree-topology system subject to additional closure constraints. The resulting forward dynamics solution consists of: (a) ignoring the closure constraints and using the O(N) algorithm to solve for the free unconstrained accelerations for the system; (b) using the tree-topology solution to compute a correction force to enforce the closure constraints; and (c) correcting the unconstrained accelerations with correction accelerations resulting from the correction forces. This constraint-embedding technique shows how to use direct embedding to eliminate local closure-loops in the system and effectively convert the system back to a tree-topology system. At this point, standard tree-topology techniques can be brought to bear on the problem. The approach uses a spatial operator algebra approach to formulating the equations of motion. The operators are block-partitioned around the local body subgroups to convert them into aggregate bodies. Mass matrix operator factorization and inversion techniques are applied to the reformulated tree-topology system. Thus in essence, the new technique allows conversion of a system with closure-constraints into an equivalent tree-topology system, and thus allows one to take advantage of the host of techniques available to the latter class of systems. This technology is highly suitable for the class of multibody systems where the closure-constraints are local, i.e., where they are confined to small groupings of bodies within the system. Important examples of such local closure-constraints are constraints associated with four-bar linkages, geared motors, differential suspensions, etc. One can eliminate these closure-constraints and convert the system into a tree-topology system by embedding the constraints directly into the system dynamics and effectively replacing the body groupings with virtual aggregate bodies. Once eliminated, one can apply the well-known results and algorithms for tree-topology systems to solve the dynamics of such closed-chain system.
A tree-like Bayesian structure learning algorithm for small-sample datasets from complex biological model systems.

PubMed

Yin, Weiwei; Garimalla, Swetha; Moreno, Alberto; Galinski, Mary R; Styczynski, Mark P

2015-08-28

There are increasing efforts to bring high-throughput systems biology techniques to bear on complex animal model systems, often with a goal of learning about underlying regulatory network structures (e.g., gene regulatory networks). However, complex animal model systems typically have significant limitations on cohort sizes, number of samples, and the ability to perform follow-up and validation experiments. These constraints are particularly problematic for many current network learning approaches, which require large numbers of samples and may predict many more regulatory relationships than actually exist. Here, we test the idea that by leveraging the accuracy and efficiency of classifiers, we can construct high-quality networks that capture important interactions between variables in datasets with few samples. We start from a previously-developed tree-like Bayesian classifier and generalize its network learning approach to allow for arbitrary depth and complexity of tree-like networks. Using four diverse sample networks, we demonstrate that this approach performs consistently better at low sample sizes than the Sparse Candidate Algorithm, a representative approach for comparison because it is known to generate Bayesian networks with high positive predictive value. We develop and demonstrate a resampling-based approach to enable the identification of a viable root for the learned tree-like network, important for cases where the root of a network is not known a priori. We also develop and demonstrate an integrated resampling-based approach to the reduction of variable space for the learning of the network. Finally, we demonstrate the utility of this approach via the analysis of a transcriptional dataset of a malaria challenge in a non-human primate model system, Macaca mulatta, suggesting the potential to capture indicators of the earliest stages of cellular differentiation during leukopoiesis. We demonstrate that by starting from effective and efficient approaches for creating classifiers, we can identify interesting tree-like network structures with significant ability to capture the relationships in the training data. This approach represents a promising strategy for inferring networks with high positive predictive value under the constraint of small numbers of samples, meeting a need that will only continue to grow as more high-throughput studies are applied to complex model systems.

Global interrupt and barrier networks

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E; Heidelberger, Philip; Kopcsay, Gerard V.; Steinmacher-Burow, Burkhard D.; Takken, Todd E.

2008-10-28

A system and method for generating global asynchronous signals in a computing structure. Particularly, a global interrupt and barrier network is implemented that implements logic for generating global interrupt and barrier signals for controlling global asynchronous operations performed by processing elements at selected processing nodes of a computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes for communicating the global interrupt and barrier signals to the elements via low-latency paths. The global asynchronous signals respectively initiate interrupt and barrier operations at the processing nodes at times selected for optimizing performance of the processing algorithms. In one embodiment, the global interrupt and barrier network is implemented in a scalable, massively parallel supercomputing device structure comprising a plurality of processing nodes interconnected by multiple independent networks, with each node including one or more processing elements for performing computation or communication activity as required when performing parallel algorithm operations. One multiple independent network includes a global tree network for enabling high-speed global tree communications among global tree network nodes or sub-trees thereof. The global interrupt and barrier network may operate in parallel with the global tree network for providing global asynchronous sideband signals.
Cost-effectiveness Analysis with Influence Diagrams.

PubMed

Arias, M; Díez, F J

2015-01-01

Cost-effectiveness analysis (CEA) is used increasingly in medicine to determine whether the health benefit of an intervention is worth the economic cost. Decision trees, the standard decision modeling technique for non-temporal domains, can only perform CEA for very small problems. To develop a method for CEA in problems involving several dozen variables. We explain how to build influence diagrams (IDs) that explicitly represent cost and effectiveness. We propose an algorithm for evaluating cost-effectiveness IDs directly, i.e., without expanding an equivalent decision tree. The evaluation of an ID returns a set of intervals for the willingness to pay - separated by cost-effectiveness thresholds - and, for each interval, the cost, the effectiveness, and the optimal intervention. The algorithm that evaluates the ID directly is in general much more efficient than the brute-force method, which is in turn more efficient than the expansion of an equivalent decision tree. Using OpenMarkov, an open-source software tool that implements this algorithm, we have been able to perform CEAs on several IDs whose equivalent decision trees contain millions of branches. IDs can perform CEA on large problems that cannot be analyzed with decision trees.
Tanglegrams for rooted phylogenetic trees and networks

PubMed Central

Scornavacca, Celine; Zickmann, Franziska; Huson, Daniel H.

2011-01-01

Motivation: In systematic biology, one is often faced with the task of comparing different phylogenetic trees, in particular in multi-gene analysis or cospeciation studies. One approach is to use a tanglegram in which two rooted phylogenetic trees are drawn opposite each other, using auxiliary lines to connect matching taxa. There is an increasing interest in using rooted phylogenetic networks to represent evolutionary history, so as to explicitly represent reticulate events, such as horizontal gene transfer, hybridization or reassortment. Thus, the question arises how to define and compute a tanglegram for such networks. Results: In this article, we present the first formal definition of a tanglegram for rooted phylogenetic networks and present a heuristic approach for computing one, called the NN-tanglegram method. We compare the performance of our method with existing tree tanglegram algorithms and also show a typical application to real biological datasets. For maximum usability, the algorithm does not require that the trees or networks are bifurcating or bicombining, or that they are on identical taxon sets. Availability: The algorithm is implemented in our program Dendroscope 3, which is freely available from www.dendroscope.org. Contact: scornava@informatik.uni-tuebingen.de; huson@informatik.uni-tuebingen.de PMID:21685078
Consensus properties and their large-scale applications for the gene duplication problem.

PubMed

Moon, Jucheol; Lin, Harris T; Eulenstein, Oliver

2016-06-01

Solving the gene duplication problem is a classical approach for species tree inference from gene trees that are confounded by gene duplications. This problem takes a collection of gene trees and seeks a species tree that implies the minimum number of gene duplications. Wilkinson et al. posed the conjecture that the gene duplication problem satisfies the desirable Pareto property for clusters. That is, for every instance of the problem, all clusters that are commonly present in the input gene trees of this instance, called strict consensus, will also be found in every solution to this instance. We prove that this conjecture does not generally hold. Despite this negative result we show that the gene duplication problem satisfies a weaker version of the Pareto property where the strict consensus is found in at least one solution (rather than all solutions). This weaker property contributes to our design of an efficient scalable algorithm for the gene duplication problem. We demonstrate the performance of our algorithm in analyzing large-scale empirical datasets. Finally, we utilize the algorithm to evaluate the accuracy of standard heuristics for the gene duplication problem using simulated datasets.
Node Deployment Algorithm Based on Connected Tree for Underwater Sensor Networks

PubMed Central

Jiang, Peng; Wang, Xingmin; Jiang, Lurong

2015-01-01

Designing an efficient deployment method to guarantee optimal monitoring quality is one of the key topics in underwater sensor networks. At present, a realistic approach of deployment involves adjusting the depths of nodes in water. One of the typical algorithms used in such process is the self-deployment depth adjustment algorithm (SDDA). This algorithm mainly focuses on maximizing network coverage by constantly adjusting node depths to reduce coverage overlaps between two neighboring nodes, and thus, achieves good performance. However, the connectivity performance of SDDA is irresolute. In this paper, we propose a depth adjustment algorithm based on connected tree (CTDA). In CTDA, the sink node is used as the first root node to start building a connected tree. Finally, the network can be organized as a forest to maintain network connectivity. Coverage overlaps between the parent node and the child node are then reduced within each sub-tree to optimize coverage. The hierarchical strategy is used to adjust the distance between the parent node and the child node to reduce node movement. Furthermore, the silent mode is adopted to reduce communication cost. Simulations show that compared with SDDA, CTDA can achieve high connectivity with various communication ranges and different numbers of nodes. Moreover, it can realize coverage as high as that of SDDA with various sensing ranges and numbers of nodes but with less energy consumption. Simulations under sparse environments show that the connectivity and energy consumption performances of CTDA are considerably better than those of SDDA. Meanwhile, the connectivity and coverage performances of CTDA are close to those depth adjustment algorithms base on connected dominating set (CDA), which is an algorithm similar to CTDA. However, the energy consumption of CTDA is less than that of CDA, particularly in sparse underwater environments. PMID:26184209
Amazon Forest Structure from IKONOS Satellite Data and the Automated Characterization of Forest Canopy Properties

Treesearch

Michael Palace; Michael Keller; Gregory P. Asner; Stephen Hagen; Bobby Braswell

2008-01-01

We developed an automated tree crown analysis algorithm using 1-m panchromatic IKONOS satellite images to examine forest canopy structure in the Brazilian Amazon. The algorithm was calibrated on the landscape level with tree geometry and forest stand data at the Fazenda Cauaxi (3.75◦ S, 48.37◦ W) in the eastern Amazon, and then compared with forest...
Obesity as a risk factor for developing functional limitation among older adults: A conditional inference tree analysis.

PubMed

Cheng, Feon W; Gao, Xiang; Bao, Le; Mitchell, Diane C; Wood, Craig; Sliwinski, Martin J; Smiciklas-Wright, Helen; Still, Christopher D; Rolston, David D K; Jensen, Gordon L

2017-07-01

To examine the risk factors of developing functional decline and make probabilistic predictions by using a tree-based method that allows higher order polynomials and interactions of the risk factors. The conditional inference tree analysis, a data mining approach, was used to construct a risk stratification algorithm for developing functional limitation based on BMI and other potential risk factors for disability in 1,951 older adults without functional limitations at baseline (baseline age 73.1 ± 4.2 y). We also analyzed the data with multivariate stepwise logistic regression and compared the two approaches (e.g., cross-validation). Over a mean of 9.2 ± 1.7 years of follow-up, 221 individuals developed functional limitation. Higher BMI, age, and comorbidity were consistently identified as significant risk factors for functional decline using both methods. Based on these factors, individuals were stratified into four risk groups via the conditional inference tree analysis. Compared to the low-risk group, all other groups had a significantly higher risk of developing functional limitation. The odds ratio comparing two extreme categories was 9.09 (95% confidence interval: 4.68, 17.6). Higher BMI, age, and comorbid disease were consistently identified as significant risk factors for functional decline among older individuals across all approaches and analyses. © 2017 The Obesity Society.
Research on the key technologies of 3D spatial data organization and management for virtual building environments

NASA Astrophysics Data System (ADS)

Gong, Jun; Zhu, Qing

2006-10-01

As the special case of VGE in the fields of AEC (architecture, engineering and construction), Virtual Building Environment (VBE) has been broadly concerned. Highly complex, large-scale 3d spatial data is main bottleneck of VBE applications, so 3d spatial data organization and management certainly becomes the core technology for VBE. This paper puts forward 3d spatial data model for VBE, and the performance to implement it is very high. Inherent storage method of CAD data makes data redundant, and doesn't concern efficient visualization, which is a practical bottleneck to integrate CAD model, so An Efficient Method to Integrate CAD Model Data is put forward. Moreover, Since the 3d spatial indices based on R-tree are usually limited by their weakness of low efficiency due to the severe overlap of sibling nodes and the uneven size of nodes, a new node-choosing algorithm of R-tree are proposed.
75 FR 5941 - Umatilla National Forest, Walla Walla Ranger District, Walla Walla, WA; Cobbler II Timber Sale...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-02-05

... construction (that will be decommissioned after project use), new road construction, danger tree removal along... increasing population. Late seral tree species have become dominant after long periods without disturbance... and vigor. Timber stands of seral tree species such as western larch and ponderosa pine are infilling...
Reinforcement Learning Trees

PubMed Central

Zhu, Ruoqing; Zeng, Donglin; Kosorok, Michael R.

2015-01-01

In this paper, we introduce a new type of tree-based method, reinforcement learning trees (RLT), which exhibits significantly improved performance over traditional methods such as random forests (Breiman, 2001) under high-dimensional settings. The innovations are three-fold. First, the new method implements reinforcement learning at each selection of a splitting variable during the tree construction processes. By splitting on the variable that brings the greatest future improvement in later splits, rather than choosing the one with largest marginal effect from the immediate split, the constructed tree utilizes the available samples in a more efficient way. Moreover, such an approach enables linear combination cuts at little extra computational cost. Second, we propose a variable muting procedure that progressively eliminates noise variables during the construction of each individual tree. The muting procedure also takes advantage of reinforcement learning and prevents noise variables from being considered in the search for splitting rules, so that towards terminal nodes, where the sample size is small, the splitting rules are still constructed from only strong variables. Last, we investigate asymptotic properties of the proposed method under basic assumptions and discuss rationale in general settings. PMID:26903687
Vlsi implementation of flexible architecture for decision tree classification in data mining

NASA Astrophysics Data System (ADS)

Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

2017-07-01

The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.
Logistic regression trees for initial selection of interesting loci in case-control studies

PubMed Central

Nickolov, Radoslav Z; Milanov, Valentin B

2007-01-01

Modern genetic epidemiology faces the challenge of dealing with hundreds of thousands of genetic markers. The selection of a small initial subset of interesting markers for further investigation can greatly facilitate genetic studies. In this contribution we suggest the use of a logistic regression tree algorithm known as logistic tree with unbiased selection. Using the simulated data provided for Genetic Analysis Workshop 15, we show how this algorithm, with incorporation of multifactor dimensionality reduction method, can reduce an initial large pool of markers to a small set that includes the interesting markers with high probability. PMID:18466557
An IPv6 routing lookup algorithm using weight-balanced tree based on prefix value for virtual router

NASA Astrophysics Data System (ADS)

Chen, Lingjiang; Zhou, Shuguang; Zhang, Qiaoduo; Li, Fenghua

2016-10-01

Virtual router enables the coexistence of different networks on the same physical facility and has lately attracted a great deal of attention from researchers. As the number of IPv6 addresses is rapidly increasing in virtual routers, designing an efficient IPv6 routing lookup algorithm is of great importance. In this paper, we present an IPv6 lookup algorithm called weight-balanced tree (WBT). WBT merges Forwarding Information Bases (FIBs) of virtual routers into one spanning tree, and compresses the space cost. WBT's average time complexity and the worst case time complexity of lookup and update process are both O(logN) and space complexity is O(cN) where N is the size of routing table and c is a constant. Experiments show that WBT helps reduce more than 80% Static Random Access Memory (SRAM) cost in comparison to those separation schemes. WBT also achieves the least average search depth comparing with other homogeneous algorithms.
Combinatorics of least-squares trees.

PubMed

Mihaescu, Radu; Pachter, Lior

2008-09-09

A recurring theme in the least-squares approach to phylogenetics has been the discovery of elegant combinatorial formulas for the least-squares estimates of edge lengths. These formulas have proved useful for the development of efficient algorithms, and have also been important for understanding connections among popular phylogeny algorithms. For example, the selection criterion of the neighbor-joining algorithm is now understood in terms of the combinatorial formulas of Pauplin for estimating tree length. We highlight a phylogenetically desirable property that weighted least-squares methods should satisfy, and provide a complete characterization of methods that satisfy the property. The necessary and sufficient condition is a multiplicative four-point condition that the variance matrix needs to satisfy. The proof is based on the observation that the Lagrange multipliers in the proof of the Gauss-Markov theorem are tree-additive. Our results generalize and complete previous work on ordinary least squares, balanced minimum evolution, and the taxon-weighted variance model. They also provide a time-optimal algorithm for computation.
Alternative construction of graceful symmetric trees

NASA Astrophysics Data System (ADS)

Sandy, I. P.; Rizal, A.; Manurung, E. N.; Sugeng, K. A.

2018-04-01

Graceful labeling is one of the interesting topics in graph theory. Let G = (V, E) be a tree. The injective mapping f:V\\to \\{0,1,\\ldots,|E|\\} is called graceful if the weight of edge w(xy)=|f(x)-f(y)| are all different for every edge xy. The famous conjecture in this area is all trees are graceful. In this paper we give alternative construction of graceful labeling on symmetric tree using adjacency matrix.
Automated Reconstruction of Neural Trees Using Front Re-initialization

PubMed Central

Mukherjee, Amit; Stepanyants, Armen

2013-01-01

This paper proposes a greedy algorithm for automated reconstruction of neural arbors from light microscopy stacks of images. The algorithm is based on the minimum cost path method. While the minimum cost path, obtained using the Fast Marching Method, results in a trace with the least cumulative cost between the start and the end points, it is not sufficient for the reconstruction of neural trees. This is because sections of the minimum cost path can erroneously travel through the image background with undetectable detriment to the cumulative cost. To circumvent this problem we propose an algorithm that grows a neural tree from a specified root by iteratively re-initializing the Fast Marching fronts. The speed image used in the Fast Marching Method is generated by computing the average outward flux of the gradient vector flow field. Each iteration of the algorithm produces a candidate extension by allowing the front to travel a specified distance and then tracking from the farthest point of the front back to the tree. Robust likelihood ratio test is used to evaluate the quality of the candidate extension by comparing voxel intensities along the extension to those in the foreground and the background. The qualified extensions are appended to the current tree, the front is re-initialized, and Fast Marching is continued until the stopping criterion is met. To evaluate the performance of the algorithm we reconstructed 6 stacks of two-photon microscopy images and compared the results to the ground truth reconstructions by using the DIADEM metric. The average comparison score was 0.82 out of 1.0, which is on par with the performance achieved by expert manual tracers. PMID:24386539
Secure Multicast Tree Structure Generation Method for Directed Diffusion Using A* Algorithms

NASA Astrophysics Data System (ADS)

Kim, Jin Myoung; Lee, Hae Young; Cho, Tae Ho

The application of wireless sensor networks to areas such as combat field surveillance, terrorist tracking, and highway traffic monitoring requires secure communication among the sensor nodes within the networks. Logical key hierarchy (LKH) is a tree based key management model which provides secure group communication. When a sensor node is added or evicted from the communication group, LKH updates the group key in order to ensure the security of the communications. In order to efficiently update the group key in directed diffusion, we propose a method for secure multicast tree structure generation, an extension to LKH that reduces the number of re-keying messages by considering the addition and eviction ratios of the history data. For the generation of the proposed key tree structure the A* algorithm is applied, in which the branching factor at each level can take on different value. The experiment results demonstrate the efficiency of the proposed key tree structure against the existing key tree structures of fixed branching factors.
Comparison of rule induction, decision trees and formal concept analysis approaches for classification

NASA Astrophysics Data System (ADS)

Kotelnikov, E. V.; Milov, V. R.

2018-05-01

Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.
A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework

DOE PAGES

Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; ...

2015-01-31

Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less
Applied Swarm-based medicine: collecting decision trees for patterns of algorithms analysis.

PubMed

Panje, Cédric M; Glatzer, Markus; von Rappard, Joscha; Rothermundt, Christian; Hundsberger, Thomas; Zumstein, Valentin; Plasswilm, Ludwig; Putora, Paul Martin

2017-08-16

The objective consensus methodology has recently been applied in consensus finding in several studies on medical decision-making among clinical experts or guidelines. The main advantages of this method are an automated analysis and comparison of treatment algorithms of the participating centers which can be performed anonymously. Based on the experience from completed consensus analyses, the main steps for the successful implementation of the objective consensus methodology were identified and discussed among the main investigators. The following steps for the successful collection and conversion of decision trees were identified and defined in detail: problem definition, population selection, draft input collection, tree conversion, criteria adaptation, problem re-evaluation, results distribution and refinement, tree finalisation, and analysis. This manuscript provides information on the main steps for successful collection of decision trees and summarizes important aspects at each point of the analysis.

Prediction of Baseflow Index of Catchments using Machine Learning Algorithms

NASA Astrophysics Data System (ADS)

Yadav, B.; Hatfield, K.

2017-12-01

We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Soner Yorgun, M.; Rood, Richard B.

An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

DOE PAGES

Soner Yorgun, M.; Rood, Richard B.

2016-11-11

An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
Accurate airway segmentation based on intensity structure analysis and graph-cut

NASA Astrophysics Data System (ADS)

Meng, Qier; Kitsaka, Takayuki; Nimura, Yukitaka; Oda, Masahiro; Mori, Kensaku

2016-03-01

This paper presents a novel airway segmentation method based on intensity structure analysis and graph-cut. Airway segmentation is an important step in analyzing chest CT volumes for computerized lung cancer detection, emphysema diagnosis, asthma diagnosis, and pre- and intra-operative bronchoscope navigation. However, obtaining a complete 3-D airway tree structure from a CT volume is quite challenging. Several researchers have proposed automated algorithms basically based on region growing and machine learning techniques. However these methods failed to detect the peripheral bronchi branches. They caused a large amount of leakage. This paper presents a novel approach that permits more accurate extraction of complex bronchial airway region. Our method are composed of three steps. First, the Hessian analysis is utilized for enhancing the line-like structure in CT volumes, then a multiscale cavity-enhancement filter is employed to detect the cavity-like structure from the previous enhanced result. In the second step, we utilize the support vector machine (SVM) to construct a classifier for removing the FP regions generated. Finally, the graph-cut algorithm is utilized to connect all of the candidate voxels to form an integrated airway tree. We applied this method to sixteen cases of 3D chest CT volumes. The results showed that the branch detection rate of this method can reach about 77.7% without leaking into the lung parenchyma areas.
Efficient gaussian density formulation of volume and surface areas of macromolecules on graphical processing units.

PubMed

Zhang, Baofeng; Kilburg, Denise; Eastman, Peter; Pande, Vijay S; Gallicchio, Emilio

2017-04-15

We present an algorithm to efficiently compute accurate volumes and surface areas of macromolecules on graphical processing unit (GPU) devices using an analytic model which represents atomic volumes by continuous Gaussian densities. The volume of the molecule is expressed by means of the inclusion-exclusion formula, which is based on the summation of overlap integrals among multiple atomic densities. The surface area of the molecule is obtained by differentiation of the molecular volume with respect to atomic radii. The many-body nature of the model makes a port to GPU devices challenging. To our knowledge, this is the first reported full implementation of this model on GPU hardware. To accomplish this, we have used recursive strategies to construct the tree of overlaps and to accumulate volumes and their gradients on the tree data structures so as to minimize memory contention. The algorithm is used in the formulation of a surface area-based non-polar implicit solvent model implemented as an open source plug-in (named GaussVol) for the popular OpenMM library for molecular mechanics modeling. GaussVol is 50 to 100 times faster than our best optimized implementation for the CPUs, achieving speeds in excess of 100 ns/day with 1 fs time-step for protein-sized systems on commodity GPUs. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Towards a hybrid energy efficient multi-tree-based optimized routing protocol for wireless networks.

PubMed

Mitton, Nathalie; Razafindralambo, Tahiry; Simplot-Ryl, David; Stojmenovic, Ivan

2012-12-13

This paper considers the problem of designing power efficient routing with guaranteed delivery for sensor networks with unknown geographic locations. We propose HECTOR, a hybrid energy efficient tree-based optimized routing protocol, based on two sets of virtual coordinates. One set is based on rooted tree coordinates, and the other is based on hop distances toward several landmarks. In HECTOR, the node currently holding the packet forwards it to its neighbor that optimizes ratio of power cost over distance progress with landmark coordinates, among nodes that reduce landmark coordinates and do not increase distance in tree coordinates. If such a node does not exist, then forwarding is made to the neighbor that reduces tree-based distance only and optimizes power cost over tree distance progress ratio. We theoretically prove the packet delivery and propose an extension based on the use of multiple trees. Our simulations show the superiority of our algorithm over existing alternatives while guaranteeing delivery, and only up to 30% additional power compared to centralized shortest weighted path algorithm.
Towards a Hybrid Energy Efficient Multi-Tree-Based Optimized Routing Protocol for Wireless Networks

PubMed Central

Mitton, Nathalie; Razafindralambo, Tahiry; Simplot-Ryl, David; Stojmenovic, Ivan

2012-01-01

This paper considers the problem of designing power efficient routing with guaranteed delivery for sensor networks with unknown geographic locations. We propose HECTOR, a hybrid energy efficient tree-based optimized routing protocol, based on two sets of virtual coordinates. One set is based on rooted tree coordinates, and the other is based on hop distances toward several landmarks. In HECTOR, the node currently holding the packet forwards it to its neighbor that optimizes ratio of power cost over distance progress with landmark coordinates, among nodes that reduce landmark coordinates and do not increase distance in tree coordinates. If such a node does not exist, then forwarding is made to the neighbor that reduces tree-based distance only and optimizes power cost over tree distance progress ratio. We theoretically prove the packet delivery and propose an extension based on the use of multiple trees. Our simulations show the superiority of our algorithm over existing alternatives while guaranteeing delivery, and only up to 30% additional power compared to centralized shortest weighted path algorithm. PMID:23443398
Dendrometer bands made easy: Using modified cable ties to measure incremental growth of trees1

PubMed Central

Anemaet, Evelyn R.; Middleton, Beth A.

2013-01-01

• Premise of the study: Dendrometer bands are a useful way to make sequential repeated measurements of tree growth, but traditional dendrometer bands can be expensive, time consuming, and difficult to construct in the field. An alternative to the traditional method of band construction is to adapt commercially available materials. This paper describes how to construct and install dendrometer bands using smooth-edged, stainless steel, cable tie banding and attachable rollerball heads. • Methods and Results: As a performance comparison, both traditional and cable tie dendrometer bands were installed on baldcypress trees at the National Wetlands Research Center in Lafayette, Louisiana, by both an experienced and a novice worker. Band installation times were recorded, and growth of the trees as estimated by the two band types was measured after approximately one year, demonstrating equivalence of the two methods. • Conclusions: This efficient approach to dendrometer band construction can help advance the knowledge of long-term tree growth in ecological studies. PMID:25202589
A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery Using a Probabilistic Learning Framework

NASA Technical Reports Server (NTRS)

Basu, Saikat; Ganguly, Sangram; Michaelis, Andrew; Votava, Petr; Roy, Anshuman; Mukhopadhyay, Supratik; Nemani, Ramakrishna

2015-01-01

Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets, which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.
A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery using a Probabilistic Learning Framework

NASA Astrophysics Data System (ADS)

Basu, S.; Ganguly, S.; Michaelis, A.; Votava, P.; Roy, A.; Mukhopadhyay, S.; Nemani, R. R.

2015-12-01

Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.
Constructing Student Problems in Phylogenetic Tree Construction.

ERIC Educational Resources Information Center

Brewer, Steven D.

Evolution is often equated with natural selection and is taught from a primarily functional perspective while comparative and historical approaches, which are critical for developing an appreciation of the power of evolutionary theory, are often neglected. This report describes a study of expert problem-solving in phylogenetic tree construction.…
hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm.

PubMed

Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid

2017-04-01

Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.
A Depth-Adjustment Deployment Algorithm Based on Two-Dimensional Convex Hull and Spanning Tree for Underwater Wireless Sensor Networks.

PubMed

Jiang, Peng; Liu, Shuai; Liu, Jun; Wu, Feng; Zhang, Le

2016-07-14

Most of the existing node depth-adjustment deployment algorithms for underwater wireless sensor networks (UWSNs) just consider how to optimize network coverage and connectivity rate. However, these literatures don't discuss full network connectivity, while optimization of network energy efficiency and network reliability are vital topics for UWSN deployment. Therefore, in this study, a depth-adjustment deployment algorithm based on two-dimensional (2D) convex hull and spanning tree (NDACS) for UWSNs is proposed. First, the proposed algorithm uses the geometric characteristics of a 2D convex hull and empty circle to find the optimal location of a sleep node and activate it, minimizes the network coverage overlaps of the 2D plane, and then increases the coverage rate until the first layer coverage threshold is reached. Second, the sink node acts as a root node of all active nodes on the 2D convex hull and then forms a small spanning tree gradually. Finally, the depth-adjustment strategy based on time marker is used to achieve the three-dimensional overall network deployment. Compared with existing depth-adjustment deployment algorithms, the simulation results show that the NDACS algorithm can maintain full network connectivity with high network coverage rate, as well as improved network average node degree, thus increasing network reliability.
A Depth-Adjustment Deployment Algorithm Based on Two-Dimensional Convex Hull and Spanning Tree for Underwater Wireless Sensor Networks

PubMed Central

Jiang, Peng; Liu, Shuai; Liu, Jun; Wu, Feng; Zhang, Le

2016-01-01

Most of the existing node depth-adjustment deployment algorithms for underwater wireless sensor networks (UWSNs) just consider how to optimize network coverage and connectivity rate. However, these literatures don’t discuss full network connectivity, while optimization of network energy efficiency and network reliability are vital topics for UWSN deployment. Therefore, in this study, a depth-adjustment deployment algorithm based on two-dimensional (2D) convex hull and spanning tree (NDACS) for UWSNs is proposed. First, the proposed algorithm uses the geometric characteristics of a 2D convex hull and empty circle to find the optimal location of a sleep node and activate it, minimizes the network coverage overlaps of the 2D plane, and then increases the coverage rate until the first layer coverage threshold is reached. Second, the sink node acts as a root node of all active nodes on the 2D convex hull and then forms a small spanning tree gradually. Finally, the depth-adjustment strategy based on time marker is used to achieve the three-dimensional overall network deployment. Compared with existing depth-adjustment deployment algorithms, the simulation results show that the NDACS algorithm can maintain full network connectivity with high network coverage rate, as well as improved network average node degree, thus increasing network reliability. PMID:27428970
AntiClustal: Multiple Sequence Alignment by antipole clustering and linear approximate 1-median computation.

PubMed

Di Pietro, C; Di Pietro, V; Emmanuele, G; Ferro, A; Maugeri, T; Modica, E; Pigola, G; Pulvirenti, A; Purrello, M; Ragusa, M; Scalia, M; Shasha, D; Travali, S; Zimmitti, V

2003-01-01

In this paper we present a new Multiple Sequence Alignment (MSA) algorithm called AntiClusAl. The method makes use of the commonly use idea of aligning homologous sequences belonging to classes generated by some clustering algorithm, and then continue the alignment process ina bottom-up way along a suitable tree structure. The final result is then read at the root of the tree. Multiple sequence alignment in each cluster makes use of the progressive alignment with the 1-median (center) of the cluster. The 1-median of set S of sequences is the element of S which minimizes the average distance from any other sequence in S. Its exact computation requires quadratic time. The basic idea of our proposed algorithm is to make use of a simple and natural algorithmic technique based on randomized tournaments which has been successfully applied to large size search problems in general metric spaces. In particular a clustering algorithm called Antipole tree and an approximate linear 1-median computation are used. Our algorithm compared with Clustal W, a widely used tool to MSA, shows a better running time results with fully comparable alignment quality. A successful biological application showing high aminoacid conservation during evolution of Xenopus laevis SOD2 is also cited.
Disturbance legacies and climate jointly drive tree growth and mortality in an intensively studied boreal forest

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bond-Lamberty, Benjamin; Rocha, Adrian; Calvin, Katherine V.

2014-01-01

How will regional growth and mortality change with even relatively small climate shifts, even independent of catastrophic disturbances? This question is particularly acute for the North American boreal forest, which is carbon-dense and subject The goals of this study were to combine dendrochronological sampling, inventory records, and machine-learning algorithms to understand how tree growth and death have changed at one highly studied site (Northern Old Black Spruce, NOBS) in the central Canadian boreal forest. Over the 1999-2012 inventory period, mean DBH increased even as stand density and basal area declined significantly from 41.3 to 37.5 m2 ha-1. Tree mortality averagedmore » 1.4±0.6% yr-1, with most mortality occurring in medium-sized trees. A combined tree ring chronology constructed from 2001, 2004, and 2012 sampling showed several periods of extreme growth depression, with increased mortality lagging depressed growth by ~5 years. Minimum and maximum air temperatures exerted a negative influence on tree growth, while precipitation and climate moisture index had a positive effect; both current- and previous-year data exerted significant effects. Models based on these variables explained 23-44% of the ring-width variability. There have been at least one, and probably two, significant recruitment episodes since stand initiation, and we infer that past climate extremes led to significant NOBS mortality still visible in the current forest structure. These results imply that a combination of successional and demographic processes, along with mortality driven by abiotic factors, continue to affect the stand, with significant implications for our understanding of previous work at NOBS and the sustainable management of regional forests.« less
How Hierarchical Topics Evolve in Large Text Corpora.

PubMed

Cui, Weiwei; Liu, Shixia; Wu, Zhuofeng; Wei, Hao

2014-12-01

Using a sequence of topic trees to organize documents is a popular way to represent hierarchical and evolving topics in text corpora. However, following evolving topics in the context of topic trees remains difficult for users. To address this issue, we present an interactive visual text analysis approach to allow users to progressively explore and analyze the complex evolutionary patterns of hierarchical topics. The key idea behind our approach is to exploit a tree cut to approximate each tree and allow users to interactively modify the tree cuts based on their interests. In particular, we propose an incremental evolutionary tree cut algorithm with the goal of balancing 1) the fitness of each tree cut and the smoothness between adjacent tree cuts; 2) the historical and new information related to user interests. A time-based visualization is designed to illustrate the evolving topics over time. To preserve the mental map, we develop a stable layout algorithm. As a result, our approach can quickly guide users to progressively gain profound insights into evolving hierarchical topics. We evaluate the effectiveness of the proposed method on Amazon's Mechanical Turk and real-world news data. The results show that users are able to successfully analyze evolving topics in text data.
Detection of dead standing Eucalyptus camaldulensis without tree delineation for managing biodiversity in native Australian forest

NASA Astrophysics Data System (ADS)

Miltiadou, Milto; Campbell, Neil D. F.; Gonzalez Aracil, Susana; Brown, Tony; Grant, Michael G.

2018-05-01

In Australia, many birds and arboreal animals use hollows for shelters, but studies predict shortage of hollows in near future. Aged dead trees are more likely to contain hollows and therefore automated detection of them plays a substantial role in preserving biodiversity and consequently maintaining a resilient ecosystem. For this purpose full-waveform LiDAR data were acquired from a native Eucalypt forest in Southern Australia. The structure of the forest significantly varies in terms of tree density, age and height. Additionally, Eucalyptus camaldulensis have multiple trunk splits making tree delineation very challenging. For that reason, this paper investigates automated detection of dead standing Eucalyptus camaldulensis without tree delineation. It also presents the new feature of the open source software DASOS, which extracts features for 3D object detection in voxelised FW LiDAR. A random forest classifier, a weighted-distance KNN algorithm and a seed growth algorithm are used to create a 2D probabilistic field and to then predict potential positions of dead trees. It is shown that tree health assessment is possible without tree delineation but since it is a new research directions there are many improvements to be made.
Enhancement of Fast Face Detection Algorithm Based on a Cascade of Decision Trees

NASA Astrophysics Data System (ADS)

Khryashchev, V. V.; Lebedev, A. A.; Priorov, A. L.

2017-05-01

Face detection algorithm based on a cascade of ensembles of decision trees (CEDT) is presented. The new approach allows detecting faces other than the front position through the use of multiple classifiers. Each classifier is trained for a specific range of angles of the rotation head. The results showed a high rate of productivity for CEDT on images with standard size. The algorithm increases the area under the ROC-curve of 13% compared to a standard Viola-Jones face detection algorithm. Final realization of given algorithm consist of 5 different cascades for frontal/non-frontal faces. One more thing which we take from the simulation results is a low computational complexity of CEDT algorithm in comparison with standard Viola-Jones approach. This could prove important in the embedded system and mobile device industries because it can reduce the cost of hardware and make battery life longer.
Disturbance legacies and climate jointly drive tree growth and mortality in an intensively studied boreal forest.

PubMed

Bond-Lamberty, Ben; Rocha, Adrian V; Calvin, Katherine; Holmes, Bruce; Wang, Chuankuan; Goulden, Michael L

2014-01-01

Most North American forests are at some stage of post-disturbance regrowth, subject to a changing climate, and exhibit growth and mortality patterns that may not be closely coupled to annual environmental conditions. Distinguishing the possibly interacting effects of these processes is necessary to put short-term studies in a longer term context, and particularly important for the carbon-dense, fire-prone boreal forest. The goals of this study were to combine dendrochronological sampling, inventory records, and machine-learning algorithms to understand how tree growth and death have changed at one highly studied site (Northern Old Black Spruce, NOBS) in the central Canadian boreal forest. Over the 1999-2012 inventory period, mean tree diameter increased even as stand density and basal area declined significantly. Tree mortality averaged 1.4 ± 0.6% yr-(1), with most mortality occurring in medium-sized trees; new recruitment was minimal. There have been at least two, and probably three, significant influxes of new trees since stand initiation, but none in recent decades. A combined tree ring chronology constructed from sampling in 2001, 2004, and 2012 showed several periods of extreme growth depression, with increased mortality lagging depressed growth by ~5 years. Higher minimum and maximum air temperatures exerted a negative influence on tree growth, while precipitation and climate moisture index had a positive effect; both current- and previous-year data exerted significant effects. Models based on these variables explained 23-44% of the ring-width variability. We suggest that past climate extremes led to significant mortality still visible in the current forest structure, with decadal dynamics superimposed on slower patterns of fire and succession. These results have significant implications for our understanding of previous work at NOBS, the carbon sequestration capability of old-growth stands in a disturbance-prone landscape, and the sustainable management of regional forests in a changing climate.

DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony.

PubMed

Wehe, André; Bansal, Mukul S; Burleigh, J Gordon; Eulenstein, Oliver

2008-07-01

DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree
Comparison of Naive Bayes and Decision Tree on Feature Selection Using Genetic Algorithm for Classification Problem

NASA Astrophysics Data System (ADS)

Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias

2018-03-01

This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.
A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover

USGS Publications Warehouse

Huang, C.; Townshend, J.R.G.

2003-01-01

A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.
Improving generalized inverted index lock wait times

NASA Astrophysics Data System (ADS)

Borodin, A.; Mirvoda, S.; Porshnev, S.; Ponomareva, O.

2018-01-01

Concurrent operations on tree like data structures is a cornerstone of any database system. Concurrent operations intended for improving read\\write performance and usually implemented via some way of locking. Deadlock-free methods of concurrency control are known as tree locking protocols. These protocols provide basic operations(verbs) and algorithm (ways of operation invocations) for applying it to any tree-like data structure. These algorithms operate on data, managed by storage engine which are very different among RDBMS implementations. In this paper, we discuss tree locking protocol implementation for General inverted index (Gin) applied to multiversion concurrency control (MVCC) storage engine inside PostgreSQL RDBMS. After that we introduce improvements to locking protocol and provide usage statistics about evaluation of our improvement in very high load environment in one of the world’s largest IT company.
Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family.

PubMed

Song, Jia; Zheng, Sisi; Nguyen, Nhung; Wang, Youjun; Zhou, Yubin; Lin, Kui

2017-10-03

Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca 2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated.
Isosurface Extraction in Time-Varying Fields Using a Temporal Hierarchical Index Tree

NASA Technical Reports Server (NTRS)

Shen, Han-Wei; Gerald-Yamasaki, Michael (Technical Monitor)

1998-01-01

Many high-performance isosurface extraction algorithms have been proposed in the past several years as a result of intensive research efforts. When applying these algorithms to large-scale time-varying fields, the storage overhead incurred from storing the search index often becomes overwhelming. this paper proposes an algorithm for locating isosurface cells in time-varying fields. We devise a new data structure, called Temporal Hierarchical Index Tree, which utilizes the temporal coherence that exists in a time-varying field and adoptively coalesces the cells' extreme values over time; the resulting extreme values are then used to create the isosurface cell search index. For a typical time-varying scalar data set, not only does this temporal hierarchical index tree require much less storage space, but also the amount of I/O required to access the indices from the disk at different time steps is substantially reduced. We illustrate the utility and speed of our algorithm with data from several large-scale time-varying CID simulations. Our algorithm can achieve more than 80% of disk-space savings when compared with the existing techniques, while the isosurface extraction time is nearly optimal.
A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees.

PubMed

van Iersel, Leo; Kelk, Steven; Lekić, Nela; Scornavacca, Celine

2014-05-05

Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work (SIDMA 26(4):1635-1656, TCBB 10(1):18-25, SIDMA 28(1):49-66) and are publicly available. We also apply our methods to real data.
Characterizing the phylogenetic tree-search problem.

PubMed

Money, Daniel; Whelan, Simon

2012-03-01

Phylogenetic trees are important in many areas of biological research, ranging from systematic studies to the methods used for genome annotation. Finding the best scoring tree under any optimality criterion is an NP-hard problem, which necessitates the use of heuristics for tree-search. Although tree-search plays a major role in obtaining a tree estimate, there remains a limited understanding of its characteristics and how the elements of the statistical inferential procedure interact with the algorithms used. This study begins to answer some of these questions through a detailed examination of maximum likelihood tree-search on a wide range of real genome-scale data sets. We examine all 10,395 trees for each of the 106 genes of an eight-taxa yeast phylogenomic data set, then apply different tree-search algorithms to investigate their performance. We extend our findings by examining two larger genome-scale data sets and a large disparate data set that has been previously used to benchmark the performance of tree-search programs. We identify several broad trends occurring during tree-search that provide an insight into the performance of heuristics and may, in the future, aid their development. These trends include a tendency for the true maximum likelihood (best) tree to also be the shortest tree in terms of branch lengths, a weak tendency for tree-search to recover the best tree, and a tendency for tree-search to encounter fewer local optima in genes that have a high information content. When examining current heuristics for tree-search, we find that nearest-neighbor-interchange performs poorly, and frequently finds trees that are significantly different from the best tree. In contrast, subtree-pruning-and-regrafting tends to perform well, nearly always finding trees that are not significantly different to the best tree. Finally, we demonstrate that the precise implementation of a tree-search strategy, including when and where parameters are optimized, can change the character of tree-search, and that good strategies for tree-search may combine existing tree-search programs.
An efficient and extensible approach for compressing phylogenetic trees

PubMed Central

2011-01-01

Background Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference. Results On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings. Conclusions TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community. PMID:22165819
An efficient and extensible approach for compressing phylogenetic trees.

PubMed

Matthews, Suzanne J; Williams, Tiffani L

2011-10-18

Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference. On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings. TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community.
Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine

NASA Technical Reports Server (NTRS)

Schwabacher, Mark A.; Aguilar, Robert; Figueroa, Fernando F.

2009-01-01

The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically "learns" a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to "train" and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it "learned" a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location.
Molecular Infectious Disease Epidemiology: Survival Analysis and Algorithms Linking Phylogenies to Transmission Trees

PubMed Central

Kenah, Eben; Britton, Tom; Halloran, M. Elizabeth; Longini, Ira M.

2016-01-01

Recent work has attempted to use whole-genome sequence data from pathogens to reconstruct the transmission trees linking infectors and infectees in outbreaks. However, transmission trees from one outbreak do not generalize to future outbreaks. Reconstruction of transmission trees is most useful to public health if it leads to generalizable scientific insights about disease transmission. In a survival analysis framework, estimation of transmission parameters is based on sums or averages over the possible transmission trees. A phylogeny can increase the precision of these estimates by providing partial information about who infected whom. The leaves of the phylogeny represent sampled pathogens, which have known hosts. The interior nodes represent common ancestors of sampled pathogens, which have unknown hosts. Starting from assumptions about disease biology and epidemiologic study design, we prove that there is a one-to-one correspondence between the possible assignments of interior node hosts and the transmission trees simultaneously consistent with the phylogeny and the epidemiologic data on person, place, and time. We develop algorithms to enumerate these transmission trees and show these can be used to calculate likelihoods that incorporate both epidemiologic data and a phylogeny. A simulation study confirms that this leads to more efficient estimates of hazard ratios for infectiousness and baseline hazards of infectious contact, and we use these methods to analyze data from a foot-and-mouth disease virus outbreak in the United Kingdom in 2001. These results demonstrate the importance of data on individuals who escape infection, which is often overlooked. The combination of survival analysis and algorithms linking phylogenies to transmission trees is a rigorous but flexible statistical foundation for molecular infectious disease epidemiology. PMID:27070316
Classifying Volcanic Activity Using an Empirical Decision Making Algorithm

NASA Astrophysics Data System (ADS)

Junek, W. N.; Jones, W. L.; Woods, M. T.

2012-12-01

Detection and classification of developing volcanic activity is vital to eruption forecasting. Timely information regarding an impending eruption would aid civil authorities in determining the proper response to a developing crisis. In this presentation, volcanic activity is characterized using an event tree classifier and a suite of empirical statistical models derived through logistic regression. Forecasts are reported in terms of the United States Geological Survey (USGS) volcano alert level system. The algorithm employs multidisciplinary data (e.g., seismic, GPS, InSAR) acquired by various volcano monitoring systems and source modeling information to forecast the likelihood that an eruption, with a volcanic explosivity index (VEI) > 1, will occur within a quantitatively constrained area. Logistic models are constructed from a sparse and geographically diverse dataset assembled from a collection of historic volcanic unrest episodes. Bootstrapping techniques are applied to the training data to allow for the estimation of robust logistic model coefficients. Cross validation produced a series of receiver operating characteristic (ROC) curves with areas ranging between 0.78-0.81, which indicates the algorithm has good predictive capabilities. The ROC curves also allowed for the determination of a false positive rate and optimum detection for each stage of the algorithm. Forecasts for historic volcanic unrest episodes in North America and Iceland were computed and are consistent with the actual outcome of the events.
Trees, bialgebras and intrinsic numerical algorithms

NASA Technical Reports Server (NTRS)

Crouch, Peter; Grossman, Robert; Larson, Richard

1990-01-01

Preliminary work about intrinsic numerical integrators evolving on groups is described. Fix a finite dimensional Lie group G; let g denote its Lie algebra, and let Y(sub 1),...,Y(sub N) denote a basis of g. A class of numerical algorithms is presented that approximate solutions to differential equations evolving on G of the form: dot-x(t) = F(x(t)), x(0) = p is an element of G. The algorithms depend upon constants c(sub i) and c(sub ij), for i = 1,...,k and j is less than i. The algorithms have the property that if the algorithm starts on the group, then it remains on the group. In addition, they also have the property that if G is the abelian group R(N), then the algorithm becomes the classical Runge-Kutta algorithm. The Cayley algebra generated by labeled, ordered trees is used to generate the equations that the coefficients c(sub i) and c(sub ij) must satisfy in order for the algorithm to yield an rth order numerical integrator and to analyze the resulting algorithms.
Graphical models for optimal power flow

DOE PAGES

Dvijotham, Krishnamurthy; Chertkov, Michael; Van Hentenryck, Pascal; ...

2016-09-13

Optimal power flow (OPF) is the central optimization problem in electric power grids. Although solved routinely in the course of power grid operations, it is known to be strongly NP-hard in general, and weakly NP-hard over tree networks. In this paper, we formulate the optimal power flow problem over tree networks as an inference problem over a tree-structured graphical model where the nodal variables are low-dimensional vectors. We adapt the standard dynamic programming algorithm for inference over a tree-structured graphical model to the OPF problem. Combining this with an interval discretization of the nodal variables, we develop an approximation algorithmmore » for the OPF problem. Further, we use techniques from constraint programming (CP) to perform interval computations and adaptive bound propagation to obtain practically efficient algorithms. Compared to previous algorithms that solve OPF with optimality guarantees using convex relaxations, our approach is able to work for arbitrary tree-structured distribution networks and handle mixed-integer optimization problems. Further, it can be implemented in a distributed message-passing fashion that is scalable and is suitable for “smart grid” applications like control of distributed energy resources. In conclusion, numerical evaluations on several benchmark networks show that practical OPF problems can be solved effectively using this approach.« less
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations

PubMed Central

Mitchell, William F.

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given. PMID:28009355
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations.

PubMed

Mitchell, William F

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given.
Multi-terminal pipe routing by Steiner minimal tree and particle swarm optimisation

NASA Astrophysics Data System (ADS)

Liu, Qiang; Wang, Chengen

2012-08-01

Computer-aided design of pipe routing is of fundamental importance for complex equipments' developments. In this article, non-rectilinear branch pipe routing with multiple terminals that can be formulated as a Euclidean Steiner Minimal Tree with Obstacles (ESMTO) problem is studied in the context of an aeroengine-integrated design engineering. Unlike the traditional methods that connect pipe terminals sequentially, this article presents a new branch pipe routing algorithm based on the Steiner tree theory. The article begins with a new algorithm for solving the ESMTO problem by using particle swarm optimisation (PSO), and then extends the method to the surface cases by using geodesics to meet the requirements of routing non-rectilinear pipes on the surfaces of aeroengines. Subsequently, the adaptive region strategy and the basic visibility graph method are adopted to increase the computation efficiency. Numeral computations show that the proposed routing algorithm can find satisfactory routing layouts while running in polynomial time.
The explicit computation of integration algorithms and first integrals for ordinary differential equations with polynomials coefficients using trees

NASA Technical Reports Server (NTRS)

Crouch, P. E.; Grossman, Robert

1992-01-01

This note is concerned with the explicit symbolic computation of expressions involving differential operators and their actions on functions. The derivation of specialized numerical algorithms, the explicit symbolic computation of integrals of motion, and the explicit computation of normal forms for nonlinear systems all require such computations. More precisely, if R = k(x(sub 1),...,x(sub N)), where k = R or C, F denotes a differential operator with coefficients from R, and g member of R, we describe data structures and algorithms for efficiently computing g. The basic idea is to impose a multiplicative structure on the vector space with basis the set of finite rooted trees and whose nodes are labeled with the coefficients of the differential operators. Cancellations of two trees with r + 1 nodes translates into cancellation of O(N(exp r)) expressions involving the coefficient functions and their derivatives.
Performance of a cavity-method-based algorithm for the prize-collecting Steiner tree problem on graphs

NASA Astrophysics Data System (ADS)

Biazzo, Indaco; Braunstein, Alfredo; Zecchina, Riccardo

2012-08-01

We study the behavior of an algorithm derived from the cavity method for the prize-collecting steiner tree (PCST) problem on graphs. The algorithm is based on the zero temperature limit of the cavity equations and as such is formally simple (a fixed point equation resolved by iteration) and distributed (parallelizable). We provide a detailed comparison with state-of-the-art algorithms on a wide range of existing benchmarks, networks, and random graphs. Specifically, we consider an enhanced derivative of the Goemans-Williamson heuristics and the dhea solver, a branch and cut integer linear programming based approach. The comparison shows that the cavity algorithm outperforms the two algorithms in most large instances both in running time and quality of the solution. Finally we prove a few optimality properties of the solutions provided by our algorithm, including optimality under the two postprocessing procedures defined in the Goemans-Williamson derivative and global optimality in some limit cases.

A physarum-inspired prize-collecting steiner tree approach to identify subnetworks for drug repositioning.

PubMed

Sun, Yahui; Hameed, Pathima Nusrath; Verspoor, Karin; Halgamuge, Saman

2016-12-05

Drug repositioning can reduce the time, costs and risks of drug development by identifying new therapeutic effects for known drugs. It is challenging to reposition drugs as pharmacological data is large and complex. Subnetwork identification has already been used to simplify the visualization and interpretation of biological data, but it has not been applied to drug repositioning so far. In this paper, we fill this gap by proposing a new Physarum-inspired Prize-Collecting Steiner Tree algorithm to identify subnetworks for drug repositioning. Drug Similarity Networks (DSN) are generated using the chemical, therapeutic, protein, and phenotype features of drugs. In DSNs, vertex prizes and edge costs represent the similarities and dissimilarities between drugs respectively, and terminals represent drugs in the cardiovascular class, as defined in the Anatomical Therapeutic Chemical classification system. A new Physarum-inspired Prize-Collecting Steiner Tree algorithm is proposed in this paper to identify subnetworks. We apply both the proposed algorithm and the widely-used GW algorithm to identify subnetworks in our 18 generated DSNs. In these DSNs, our proposed algorithm identifies subnetworks with an average Rand Index of 81.1%, while the GW algorithm can only identify subnetworks with an average Rand Index of 64.1%. We select 9 subnetworks with high Rand Index to find drug repositioning opportunities. 10 frequently occurring drugs in these subnetworks are identified as candidates to be repositioned for cardiovascular diseases. We find evidence to support previous discoveries that nitroglycerin, theophylline and acarbose may be able to be repositioned for cardiovascular diseases. Moreover, we identify seven previously unknown drug candidates that also may interact with the biological cardiovascular system. These discoveries show our proposed Prize-Collecting Steiner Tree approach as a promising strategy for drug repositioning.
Automatic Inference of Cryptographic Key Length Based on Analysis of Proof Tightness

DTIC Science & Technology

2016-06-01

within an attack tree structure, then expand attack tree methodology to include cryptographic reductions. We then provide the algorithms for...maintaining and automatically reasoning about these expanded attack trees . We provide a software tool that utilizes machine-readable proof and attack metadata...and the attack tree methodology to provide rapid and precise answers regarding security parameters and effective security. This eliminates the need
Layer stacking: A novel algorithm for individual forest tree segmentation from LiDAR point clouds

Treesearch

Elias Ayrey; Shawn Fraver; John A. Kershaw; Laura S. Kenefic; Daniel Hayes; Aaron R. Weiskittel; Brian E. Roth

2017-01-01

As light detection and ranging (LiDAR) technology advances, it has become common for datasets to be acquired at a point density high enough to capture structural information from individual trees. To process these data, an automatic method of isolating individual trees from a LiDAR point cloud is required. Traditional methods for segmenting trees attempt to isolate...
Effects of Phylogenetic Tree Style on Student Comprehension

NASA Astrophysics Data System (ADS)

Dees, Jonathan Andrew

Phylogenetic trees are powerful tools of evolutionary biology that have become prominent across the life sciences. Consequently, learning to interpret and reason from phylogenetic trees is now an essential component of biology education. However, students often struggle to understand these diagrams, even after explicit instruction. One factor that has been observed to affect student understanding of phylogenetic trees is style (i.e., diagonal or bracket). The goal of this dissertation research was to systematically explore effects of style on student interpretations and construction of phylogenetic trees in the context of an introductory biology course. Before instruction, students were significantly more accurate with bracket phylogenetic trees for a variety of interpretation and construction tasks. Explicit instruction that balanced the use of diagonal and bracket phylogenetic trees mitigated some, but not all, style effects. After instruction, students were significantly more accurate for interpretation tasks involving taxa relatedness and construction exercises when using the bracket style. Based on this dissertation research and prior studies on style effects, I advocate for introductory biology instructors to use only the bracket style. Future research should examine causes of style effects and variables other than style to inform the development of research-based instruction that best supports student understanding of phylogenetic trees.
IND - THE IND DECISION TREE PACKAGE

NASA Technical Reports Server (NTRS)

Buntine, W.

1994-01-01

A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The generation routines are used to build classifiers. The test routines are used to evaluate classifiers and to classify data using a classifier. And the display routines are used to display classifiers in various formats. IND is written in C-language for Sun4 series computers. It consists of several programs with controlling shell scripts. Extensive UNIX man entries are included. IND is designed to be used on any UNIX system, although it has only been thoroughly tested on SUN platforms. The standard distribution medium for IND is a .25 inch streaming magnetic tape cartridge in UNIX tar format. An electronic copy of the documentation in PostScript format is included on the distribution medium. IND was developed in 1992.
On Determining if Tree-based Networks Contain Fixed Trees.

PubMed

Anaya, Maria; Anipchenko-Ulaj, Olga; Ashfaq, Aisha; Chiu, Joyce; Kaiser, Mahedi; Ohsawa, Max Shoji; Owen, Megan; Pavlechko, Ella; St John, Katherine; Suleria, Shivam; Thompson, Keith; Yap, Corrine

2016-05-01

We address an open question of Francis and Steel about phylogenetic networks and trees. They give a polynomial time algorithm to decide if a phylogenetic network, N, is tree-based and pose the problem: given a fixed tree T and network N, is N based on T? We show that it is [Formula: see text]-hard to decide, by reduction from 3-Dimensional Matching (3DM) and further that the problem is fixed-parameter tractable.
Parallel Continuous Flow: A Parallel Suffix Tree Construction Tool for Whole Genomes

PubMed Central

Farreras, Montse

2014-01-01

Abstract The construction of suffix trees for very long sequences is essential for many applications, and it plays a central role in the bioinformatic domain. With the advent of modern sequencing technologies, biological sequence databases have grown dramatically. Also the methodologies required to analyze these data have become more complex everyday, requiring fast queries to multiple genomes. In this article, we present parallel continuous flow (PCF), a parallel suffix tree construction method that is suitable for very long genomes. We tested our method for the suffix tree construction of the entire human genome, about 3GB. We showed that PCF can scale gracefully as the size of the input genome grows. Our method can work with an efficiency of 90% with 36 processors and 55% with 172 processors. We can index the human genome in 7 minutes using 172 processes. PMID:24597675
Man-made objects cuing in satellite imagery

DOE Office of Scientific and Technical Information (OSTI.GOV)

Skurikhin, Alexei N

2009-01-01

We present a multi-scale framework for man-made structures cuing in satellite image regions. The approach is based on a hierarchical image segmentation followed by structural analysis. A hierarchical segmentation produces an image pyramid that contains a stack of irregular image partitions, represented as polygonized pixel patches, of successively reduced levels of detail (LOOs). We are jumping off from the over-segmented image represented by polygons attributed with spectral and texture information. The image is represented as a proximity graph with vertices corresponding to the polygons and edges reflecting polygon relations. This is followed by the iterative graph contraction based on Boruvka'smore » Minimum Spanning Tree (MST) construction algorithm. The graph contractions merge the patches based on their pairwise spectral and texture differences. Concurrently with the construction of the irregular image pyramid, structural analysis is done on the agglomerated patches. Man-made object cuing is based on the analysis of shape properties of the constructed patches and their spatial relations. The presented framework can be used as pre-scanning tool for wide area monitoring to quickly guide the further analysis to regions of interest.« less
Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information

PubMed Central

2009-01-01

Background The identification of essential genes is important for the understanding of the minimal requirements for cellular life and for practical purposes, such as drug design. However, the experimental techniques for essential genes discovery are labor-intensive and time-consuming. Considering these experimental constraints, a computational approach capable of accurately predicting essential genes would be of great value. We therefore present here a machine learning-based computational approach relying on network topological features, cellular localization and biological process information for prediction of essential genes. Results We constructed a decision tree-based meta-classifier and trained it on datasets with individual and grouped attributes-network topological features, cellular compartments and biological processes-to generate various predictors of essential genes. We showed that the predictors with better performances are those generated by datasets with integrated attributes. Using the predictor with all attributes, i.e., network topological features, cellular compartments and biological processes, we obtained the best predictor of essential genes that was then used to classify yeast genes with unknown essentiality status. Finally, we generated decision trees by training the J48 algorithm on datasets with all network topological features, cellular localization and biological process information to discover cellular rules for essentiality. We found that the number of protein physical interactions, the nuclear localization of proteins and the number of regulating transcription factors are the most important factors determining gene essentiality. Conclusion We were able to demonstrate that network topological features, cellular localization and biological process information are reliable predictors of essential genes. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing essentiality. PMID:19758426
Lung tumor diagnosis and subtype discovery by gene expression profiling.

PubMed

Wang, Lu-yong; Tu, Zhuowen

2006-01-01

The optimal treatment of patients with complex diseases, such as cancers, depends on the accurate diagnosis by using a combination of clinical and histopathological data. In many scenarios, it becomes tremendously difficult because of the limitations in clinical presentation and histopathology. To accurate diagnose complex diseases, the molecular classification based on gene or protein expression profiles are indispensable for modern medicine. Moreover, many heterogeneous diseases consist of various potential subtypes in molecular basis and differ remarkably in their response to therapies. It is critical to accurate predict subgroup on disease gene expression profiles. More fundamental knowledge of the molecular basis and classification of disease could aid in the prediction of patient outcome, the informed selection of therapies, and identification of novel molecular targets for therapy. In this paper, we propose a new disease diagnostic method, probabilistic boosting tree (PB tree) method, on gene expression profiles of lung tumors. It enables accurate disease classification and subtype discovery in disease. It automatically constructs a tree in which each node combines a number of weak classifiers into a strong classifier. Also, subtype discovery is naturally embedded in the learning process. Our algorithm achieves excellent diagnostic performance, and meanwhile it is capable of detecting the disease subtype based on gene expression profile.
A Knowledge Discovery Approach to Diagnosing Intracranial Hematomas on Brain CT: Recognition, Measurement and Classification

NASA Astrophysics Data System (ADS)

Liao, Chun-Chih; Xiao, Furen; Wong, Jau-Min; Chiang, I.-Jen

Computed tomography (CT) of the brain is preferred study on neurological emergencies. Physicians use CT to diagnose various types of intracranial hematomas, including epidural, subdural and intracerebral hematomas according to their locations and shapes. We propose a novel method that can automatically diagnose intracranial hematomas by combining machine vision and knowledge discovery techniques. The skull on the CT slice is located and the depth of each intracranial pixel is labeled. After normalization of the pixel intensities by their depth, the hyperdense area of intracranial hematoma is segmented with multi-resolution thresholding and region-growing. We then apply C4.5 algorithm to construct a decision tree using the features of the segmented hematoma and the diagnoses made by physicians. The algorithm was evaluated on 48 pathological images treated in a single institute. The two discovered rules closely resemble those used by human experts, and are able to make correct diagnoses in all cases.
A Petri net synthesis theory for modeling flexible manufacturing systems.

PubMed

Jeng, M D

1997-01-01

A theory that synthesizes Petri nets for modeling flexible manufacturing systems is presented. The theory adopts a bottom-up or modular-composition approach to construct net models. Each module is modeled as a resource control net (RCN), which represents a subsystem that controls a resource type in a flexible manufacturing system. Interactions among the modules are described as the common transition and transition subnets. The net obtained by merging the modules with two minimal restrictions is shown to be conservative and thus bounded. An algorithm is developed to detect two sufficient conditions for structural liveness of the net. The algorithm examines only the net's structure and the initial marking, and appears to be more efficient than state enumeration techniques such as the reachability tree method. In this paper, the sufficient conditions for liveness are shown to be related to some structural objects called siphons. To demonstrate the applicability of the theory, a flexible manufacturing system of a moderate size is modeled and analyzed using the proposed theory.
Graphical fault tree analysis for fatal falls in the construction industry.

PubMed

Chi, Chia-Fen; Lin, Syuan-Zih; Dewi, Ratna Sari

2014-11-01

The current study applied a fault tree analysis to represent the causal relationships among events and causes that contributed to fatal falls in the construction industry. Four hundred and eleven work-related fatalities in the Taiwanese construction industry were analyzed in terms of age, gender, experience, falling site, falling height, company size, and the causes for each fatality. Given that most fatal accidents involve multiple events, the current study coded up to a maximum of three causes for each fall fatality. After the Boolean algebra and minimal cut set analyses, accident causes associated with each falling site can be presented as a fault tree to provide an overview of the basic causes, which could trigger fall fatalities in the construction industry. Graphical icons were designed for each falling site along with the associated accident causes to illustrate the fault tree in a graphical manner. A graphical fault tree can improve inter-disciplinary discussion of risk management and the communication of accident causation to first line supervisors. Copyright © 2014 Elsevier Ltd. All rights reserved.
TREEGRAD: a grading program for eastern hardwoods

Treesearch

J.W. Stringer; D.W. Cremeans

1991-01-01

Assigning tree grades to eastern hardwoods is often a difficult task for neophyte graders. Recently several "dichotomous keys" have been developed for training graders in the USFS hardwood tree grading system. TREEGRAD uses the Tree Grading Algorithm (TGA) for determining grades from defect location data and is designed to be used as a teaching aid.
Key algorithms used in GR02: A computer simulation model for predicting tree and stand growth

Treesearch

Garrett A. Hughes; Paul E. Sendak; Paul E. Sendak

1985-01-01

GR02 is an individual tree, distance-independent simulation model for predicting tree and stand growth over time. It performs five major functions during each run: (1) updates diameter at breast height, (2) updates total height, (3) estimates mortality, (4) determines regeneration, and (5) updates crown class.
Using decision trees to characterize verbal communication during change and stuck episodes in the therapeutic process

PubMed Central

Masías, Víctor H.; Krause, Mariane; Valdés, Nelson; Pérez, J. C.; Laengle, Sigifredo

2015-01-01

Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice. PMID:25914657
Using decision trees to characterize verbal communication during change and stuck episodes in the therapeutic process.

PubMed

Masías, Víctor H; Krause, Mariane; Valdés, Nelson; Pérez, J C; Laengle, Sigifredo

2015-01-01

Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.
Treelink: data integration, clustering and visualization of phylogenetic trees.

PubMed

Allende, Christian; Sohn, Erik; Little, Cedric

2015-12-29

Phylogenetic trees are central to a wide range of biological studies. In many of these studies, tree nodes need to be associated with a variety of attributes. For example, in studies concerned with viral relationships, tree nodes are associated with epidemiological information, such as location, age and subtype. Gene trees used in comparative genomics are usually linked with taxonomic information, such as functional annotations and events. A wide variety of tree visualization and annotation tools have been developed in the past, however none of them are intended for an integrative and comparative analysis. Treelink is a platform-independent software for linking datasets and sequence files to phylogenetic trees. The application allows an automated integration of datasets to trees for operations such as classifying a tree based on a field or showing the distribution of selected data attributes in branches and leafs. Genomic and proteonomic sequences can also be linked to the tree and extracted from internal and external nodes. A novel clustering algorithm to simplify trees and display the most divergent clades was also developed, where validation can be achieved using the data integration and classification function. Integrated geographical information allows ancestral character reconstruction for phylogeographic plotting based on parsimony and likelihood algorithms. Our software can successfully integrate phylogenetic trees with different data sources, and perform operations to differentiate and visualize those differences within a tree. File support includes the most popular formats such as newick and csv. Exporting visualizations as images, cluster outputs and genomic sequences is supported. Treelink is available as a web and desktop application at http://www.treelinkapp.com .
M-AMST: an automatic 3D neuron tracing method based on mean shift and adapted minimum spanning tree.

PubMed

Wan, Zhijiang; He, Yishan; Hao, Ming; Yang, Jian; Zhong, Ning

2017-03-29

Understanding the working mechanism of the brain is one of the grandest challenges for modern science. Toward this end, the BigNeuron project was launched to gather a worldwide community to establish a big data resource and a set of the state-of-the-art of single neuron reconstruction algorithms. Many groups contributed their own algorithms for the project, including our mean shift and minimum spanning tree (M-MST). Although M-MST is intuitive and easy to implement, the MST just considers spatial information of single neuron and ignores the shape information, which might lead to less precise connections between some neuron segments. In this paper, we propose an improved algorithm, namely M-AMST, in which a rotating sphere model based on coordinate transformation is used to improve the weight calculation method in M-MST. Two experiments are designed to illustrate the effect of adapted minimum spanning tree algorithm and the adoptability of M-AMST in reconstructing variety of neuron image datasets respectively. In the experiment 1, taking the reconstruction of APP2 as reference, we produce the four difference scores (entire structure average (ESA), different structure average (DSA), percentage of different structure (PDS) and max distance of neurons' nodes (MDNN)) by comparing the neuron reconstruction of the APP2 and the other 5 competing algorithm. The result shows that M-AMST gets lower difference scores than M-MST in ESA, PDS and MDNN. Meanwhile, M-AMST is better than N-MST in ESA and MDNN. It indicates that utilizing the adapted minimum spanning tree algorithm which took the shape information of neuron into account can achieve better neuron reconstructions. In the experiment 2, 7 neuron image datasets are reconstructed and the four difference scores are calculated by comparing the gold standard reconstruction and the reconstructions produced by 6 competing algorithms. Comparing the four difference scores of M-AMST and the other 5 algorithm, we can conclude that M-AMST is able to achieve the best difference score in 3 datasets and get the second-best difference score in the other 2 datasets. We develop a pathway extraction method using a rotating sphere model based on coordinate transformation to improve the weight calculation approach in MST. The experimental results show that M-AMST utilizes the adapted minimum spanning tree algorithm which takes the shape information of neuron into account can achieve better neuron reconstructions. Moreover, M-AMST is able to get good neuron reconstruction in variety of image datasets.
Evaluation of metabolites extraction strategies for identifying different brain regions and their relationship with alcohol preference and gender difference using NMR metabolomics.

PubMed

Wang, Jie; Zeng, Hao-Long; Du, Hongying; Liu, Zeyuan; Cheng, Ji; Liu, Taotao; Hu, Ting; Kamal, Ghulam Mustafa; Li, Xihai; Liu, Huili; Xu, Fuqiang

2018-03-01

Metabolomics generate a profile of small molecules from cellular/tissue metabolism, which could directly reflect the mechanisms of complex networks of biochemical reactions. Traditional metabolomics methods, such as OPLS-DA, PLS-DA are mainly used for binary class discrimination. Multiple groups are always involved in the biological system, especially for brain research. Multiple brain regions are involved in the neuronal study of brain metabolic dysfunctions such as alcoholism, Alzheimer's disease, etc. In the current study, 10 different brain regions were utilized for comparative studies between alcohol preferring and non-preferring rats, male and female rats respectively. As many classes are involved (ten different regions and four types of animals), traditional metabolomics methods are no longer efficient for showing differentiation. Here, a novel strategy based on the decision tree algorithm was employed for successfully constructing different classification models to screen out the major characteristics of ten brain regions at the same time. Subsequently, this method was also utilized to select the major effective brain regions related to alcohol preference and gender difference. Compared with the traditional multivariate statistical methods, the decision tree could construct acceptable and understandable classification models for multi-class data analysis. Therefore, the current technology could also be applied to other general metabolomics studies involving multi class data. Copyright © 2017 Elsevier B.V. All rights reserved.

Application of the Feynman-tree theorem together with BCFW recursion relations

NASA Astrophysics Data System (ADS)

Maniatis, M.

2018-03-01

Recently, it has been shown that on-shell scattering amplitudes can be constructed by the Feynman-tree theorem combined with the BCFW recursion relations. Since the BCFW relations are restricted to tree diagrams, the preceding application of the Feynman-tree theorem is essential. In this way, amplitudes can be constructed by on-shell and gauge-invariant tree amplitudes. Here, we want to apply this method to the electron-photon vertex correction. We present all the single, double, and triple phase-space tensor integrals explicitly and show that the sum of amplitudes coincides with the result of the conventional calculation of a virtual loop correction.
Efficient algorithms for dilated mappings of binary trees

NASA Technical Reports Server (NTRS)

Iqbal, M. Ashraf

1990-01-01

The problem is addressed to find a 1-1 mapping of the vertices of a binary tree onto those of a target binary tree such that the son of a node on the first binary tree is mapped onto a descendent of the image of that node in the second binary tree. There are two natural measures of the cost of this mapping, namely the dilation cost, i.e., the maximum distance in the target binary tree between the images of vertices that are adjacent in the original tree. The other measure, expansion cost, is defined as the number of extra nodes/edges to be added to the target binary tree in order to ensure a 1-1 mapping. An efficient algorithm to find a mapping of one binary tree onto another is described. It is shown that it is possible to minimize one cost of mapping at the expense of the other. This problem arises when designing pipelined arithmetic logic units (ALU) for special purpose computers. The pipeline is composed of ALU chips connected in the form of a binary tree. The operands to the pipeline can be supplied to the leaf nodes of the binary tree which then process and pass the results up to their parents. The final result is available at the root. As each new application may require a distinct nesting of operations, it is useful to be able to find a good mapping of a new binary tree over existing ALU tree. Another problem arises if every distinct required binary tree is known beforehand. Here it is useful to hardwire the pipeline in the form of a minimal supertree that contains all required binary trees.
Algorithmic Complexity. Volume II.

DTIC Science & Technology

1982-06-01

digital computers, this improvement will go unnoticed if only a few complex products are to be taken, however it can become increasingly important as...computed in the reverse order. If the products are formed moving from the top of the tree downward, and then the divisions are performed going from the...the reverse order, going up the tree. (r- a mod m means that r is the remainder when a is divided by M.) The overall running time of the algorithm is
Orthology and paralogy constraints: satisfiability and consistency.

PubMed

Lafond, Manuel; El-Mabrouk, Nadia

2014-01-01

A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.
Orthology and paralogy constraints: satisfiability and consistency

PubMed Central

2014-01-01

Background A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? Results Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships. PMID:25572629
Nine Hundred Years of Weekly Streamflows: Stochastic Downscaling of Ensemble Tree-Ring Reconstructions

NASA Astrophysics Data System (ADS)

Sauchyn, David; Ilich, Nesa

2017-11-01

We combined the methods and advantages of stochastic hydrology and paleohydrology to estimate 900 years of weekly flows for the North and South Saskatchewan Rivers at Edmonton and Medicine Hat, Alberta, respectively. Regression models of water-year streamflow were constructed using historical naturalized flow data and a pool of 196 tree-ring (earlywood, latewood, and annual) ring-width chronologies from 76 sites. The tree-ring models accounted for up to 80% of the interannual variability in historical naturalized flows. We developed a new algorithm for generating stochastic time series of weekly flows constrained by the statistical properties of both the historical record and proxy streamflow data, and by the necessary condition that weekly flows correlate between the end of a year and the start of the next. A second innovation, enabled by the density of our tree-ring network, is to derive the paleohydrology from an ensemble of 100 statistically significant reconstructions at each gauge. Using paleoclimatic data to generate long series of weekly flow estimates augments the short historical record with an expanded range of hydrologic variability, including sequences of wet and dry years of greater length and severity. This unique hydrometric time series will enable evaluation of the reliability of current water supply and management systems given the range of hydroclimatic variability and extremes contained in the stochastic paleohydrology. It also could inform evaluation of the uncertainty in climate model projections, given that internal hydroclimatic variability is the dominant source of uncertainty.
Mirroring co-evolving trees in the light of their topologies.

PubMed

Hajirasouliha, Iman; Schönhuth, Alexander; de Juan, David; Valencia, Alfonso; Sahinalp, S Cenk

2012-05-01

Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationally difficult, many approaches employ heuristics to evaluate the distance matrices corresponding to the tree topologies in question. In this article, we devise an efficient deterministic algorithm which directly maximizes the similarity between two leaf labeled trees with edge lengths, obtaining a score-optimal alignment of the two trees in question. Our algorithm is significantly faster than those methods based on distance matrix comparison: 1 min on a single processor versus 730 h on a supercomputer. Furthermore, we outperform the current state-of-the-art exhaustive search approach in terms of precision, while incurring acceptable losses in recall. A C implementation of the method demonstrated in this article is available at http://compbio.cs.sfu.ca/mirrort.htm
An efficient 3D R-tree spatial index method for virtual geographic environments

NASA Astrophysics Data System (ADS)

Zhu, Qing; Gong, Jun; Zhang, Yeting

A three-dimensional (3D) spatial index is required for real time applications of integrated organization and management in virtual geographic environments of above ground, underground, indoor and outdoor objects. Being one of the most promising methods, the R-tree spatial index has been paid increasing attention in 3D geospatial database management. Since the existing R-tree methods are usually limited by their weakness of low efficiency, due to the critical overlap of sibling nodes and the uneven size of nodes, this paper introduces the k-means clustering method and employs the 3D overlap volume, 3D coverage volume and the minimum bounding box shape value of nodes as the integrative grouping criteria. A new spatial cluster grouping algorithm and R-tree insertion algorithm is then proposed. Experimental analysis on comparative performance of spatial indexing shows that by the new method the overlap of R-tree sibling nodes is minimized drastically and a balance in the volumes of the nodes is maintained.
Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests.

PubMed

Hu, Chen; Steingrimsson, Jon Arni

2018-01-01

A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
On the Complexity of the Metric TSP under Stability Considerations

NASA Astrophysics Data System (ADS)

Mihalák, Matúš; Schöngens, Marcel; Šrámek, Rastislav; Widmayer, Peter

We consider the metric Traveling Salesman Problem (Δ-TSP for short) and study how stability (as defined by Bilu and Linial [3]) influences the complexity of the problem. On an intuitive level, an instance of Δ-TSP is γ-stable (γ> 1), if there is a unique optimum Hamiltonian tour and any perturbation of arbitrary edge weights by at most γ does not change the edge set of the optimal solution (i.e., there is a significant gap between the optimum tour and all other tours). We show that for γ ≥ 1.8 a simple greedy algorithm (resembling Prim's algorithm for constructing a minimum spanning tree) computes the optimum Hamiltonian tour for every γ-stable instance of the Δ-TSP, whereas a simple local search algorithm can fail to find the optimum even if γ is arbitrary. We further show that there are γ-stable instances of Δ-TSP for every 1 < γ< 2. These results provide a different view on the hardness of the Δ-TSP and give rise to a new class of problem instances which are substantially easier to solve than instances of the general Δ-TSP.
Initialization Method for Grammar-Guided Genetic Programming

NASA Astrophysics Data System (ADS)

García-Arnau, M.; Manrique, D.; Ríos, J.; Rodríguez-Patón, A.

This paper proposes a new tree-generation algorithm for grammarguided genetic programming that includes a parameter to control the maximum size of the trees to be generated. An important feature of this algorithm is that the initial populations generated are adequately distributed in terms of tree size and distribution within the search space. Consequently, genetic programming systems starting from the initial populations generated by the proposed method have a higher convergence speed. Two different problems have been chosen to carry out the experiments: a laboratory test involving searching for arithmetical equalities and the real-world task of breast cancer prognosis. In both problems, comparisons have been made to another five important initialization methods.
A flooding algorithm for multirobot exploration.

PubMed

Cabrera-Mora, Flavio; Xiao, Jizhong

2012-06-01

In this paper, we present a multirobot exploration algorithm that aims at reducing the exploration time and to minimize the overall traverse distance of the robots by coordinating the movement of the robots performing the exploration. Modeling the environment as a tree, we consider a coordination model that restricts the number of robots allowed to traverse an edge and to enter a vertex during each step. This coordination is achieved in a decentralized manner by the robots using a set of active landmarks that are dropped by them at explored vertices. We mathematically analyze the algorithm on trees, obtaining its main properties and specifying its bounds on the exploration time. We also define three metrics of performance for multirobot algorithms. We simulate and compare the performance of this new algorithm with those of our multirobot depth first search (MR-DFS) approach presented in our recent paper and classic single-robot DFS.
Degree-constrained multicast routing for multimedia communications

NASA Astrophysics Data System (ADS)

Wang, Yanlin; Sun, Yugeng; Li, Guidan

2005-02-01

Multicast services have been increasingly used by many multimedia applications. As one of the key techniques to support multimedia applications, the rational and effective multicast routing algorithms are very important to networks performance. When switch nodes in networks have different multicast capability, multicast routing problem is modeled as the degree-constrained Steiner problem. We presented two heuristic algorithms, named BMSTA and BSPTA, for the degree-constrained case in multimedia communications. Both algorithms are used to generate degree-constrained multicast trees with bandwidth and end to end delay bound. Simulations over random networks were carried out to compare the performance of the two proposed algorithms. Experimental results show that the proposed algorithms have advantages in traffic load balancing, which can avoid link blocking and enhance networks performance efficiently. BMSTA has better ability in finding unsaturated links and (or) unsaturated nodes to generate multicast trees than BSPTA. The performance of BMSTA is affected by the variation of degree constraints.
Multistage classification of multispectral Earth observational data: The design approach

NASA Technical Reports Server (NTRS)

Bauer, M. E. (Principal Investigator); Muasher, M. J.; Landgrebe, D. A.

1981-01-01

An algorithm is proposed which predicts the optimal features at every node in a binary tree procedure. The algorithm estimates the probability of error by approximating the area under the likelihood ratio function for two classes and taking into account the number of training samples used in estimating each of these two classes. Some results on feature selection techniques, particularly in the presence of a very limited set of training samples, are presented. Results comparing probabilities of error predicted by the proposed algorithm as a function of dimensionality as compared to experimental observations are shown for aircraft and LANDSAT data. Results are obtained for both real and simulated data. Finally, two binary tree examples which use the algorithm are presented to illustrate the usefulness of the procedure.
The scientific dating of standing buildings.

PubMed

Alcock, Nathaniel W

2017-11-17

The techniques of dendrochronology (tree-ring dating) and radiocarbon (14C) dating are described, as they are applied to historic buildings. Both rely on determining the felling dates of the trees used in their construction. For dendrochronology, the construction of master chronologies and the matching of individual ring-width sequences to them is described and, for radiocarbon dating, the use of tree-ring results in calibration. Results of dating are discussed, ranging from the cathedrals of Peterborough and Beauvais and the development of crown-post roof structures, to the dating and identification of standing medieval peasant houses, particularly those built using cruck construction.
A Deliberate Practice Approach to Teaching Phylogenetic Analysis

PubMed Central

Hobbs, F. Collin; Johnson, Daniel J.; Kearns, Katherine D.

2013-01-01

One goal of postsecondary education is to assist students in developing expert-level understanding. Previous attempts to encourage expert-level understanding of phylogenetic analysis in college science classrooms have largely focused on isolated, or “one-shot,” in-class activities. Using a deliberate practice instructional approach, we designed a set of five assignments for a 300-level plant systematics course that incrementally introduces the concepts and skills used in phylogenetic analysis. In our assignments, students learned the process of constructing phylogenetic trees through a series of increasingly difficult tasks; thus, skill development served as a framework for building content knowledge. We present results from 5 yr of final exam scores, pre- and postconcept assessments, and student surveys to assess the impact of our new pedagogical materials on student performance related to constructing and interpreting phylogenetic trees. Students improved in their ability to interpret relationships within trees and improved in several aspects related to between-tree comparisons and tree construction skills. Student feedback indicated that most students believed our approach prepared them to engage in tree construction and gave them confidence in their abilities. Overall, our data confirm that instructional approaches implementing deliberate practice address student misconceptions, improve student experiences, and foster deeper understanding of difficult scientific concepts. PMID:24297294
snpTree--a web-server to identify and construct SNP trees from whole genome sequence data.

PubMed

Leekitcharoenphon, Pimlapas; Kaas, Rolf S; Thomsen, Martin Christen Frølund; Friis, Carsten; Rasmussen, Simon; Aarestrup, Frank M

2012-01-01

The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data. Here we introduce snpTree, a server for online-automatic SNPs analysis. This tool is composed of different SNPs analysis suites, perl and python scripts. snpTree can identify SNPs and construct phylogenetic trees from WGS as well as from assembled genomes or contigs. WGS data in fastq format are aligned to reference genomes by BWA while contigs in fasta format are processed by Nucmer. SNPs are concatenated based on position on reference genome and a tree is constructed from concatenated SNPs using FastTree and a perl script. The online server was implemented by HTML, Java and python script.The server was evaluated using four published bacterial WGS data sets (V. cholerae, S. aureus CC398, S. Typhimurium and M. tuberculosis). The evaluation results for the first three cases was consistent and concordant for both raw reads and assembled genomes. In the latter case the original publication involved extensive filtering of SNPs, which could not be repeated using snpTree. The snpTree server is an easy to use option for rapid standardised and automatic SNP analysis in epidemiological studies also for users with limited bioinformatic experience. The web server is freely accessible at http://www.cbs.dtu.dk/services/snpTree-1.0/.
Forest structures retrieval from LiDAR onboard ULA

NASA Astrophysics Data System (ADS)

Shang, Xiaoxia; Chazette, Patrick; Totems, Julien; Marnas, Fabien; Sanak, Joseph

2013-04-01

Following the United Nations Framework Convention on Climate Change, the assessment of forest carbon stock is one of the main elements for a better understanding of the carbon cycle and its evolution following the climate change. The forests sequester 80% of the continental biospheric carbon and this efficiency is a function of the tree species and the tree health. The airborne backscatter LiDAR onboard the ultra light aircraft (ULA) can provide the key information on the forest vertical structures and evolution in the time. The most important structural parameter is the tree top height, which is directly linked to the above-ground biomass using non-linear relationships. In order to test the LiDAR capability for retrieving the tree top height, the LiDAR ULICE (Ultraviolet LIdar for Canopy Experiment) has been used over different forest types, from coniferous (maritime pins) to deciduous (oaks, hornbeams ...) trees. ULICE works at the wavelength of 355 nm with a sampling along the line-of-sight between 15 and 75 cm. According to the LiDAR signal to noise ratio (SNR), two different algorithms have been used in our study. The first algorithm is a threshold method directly based on the comparison between the LiDAR signal and the noise distributions, while the second one used a low pass filter by fitting a Gaussian curve family. In this paper, we will present these two algorithms and their evolution as a function of the SNR. The main error sources will be also discussed and assessed for each algorithm. The results show that these algorithms have great potential for ground-segment of future space borne LiDAR missions dedicated to the forest survey at the global scale. Acknowledgements: the canopy LiDAR system ULICE has been developed by CEA (Commissariat à l'Energie Atomique). It has been deployed with the support of CNES (Centre National d'Etude Spariales) and ANR (Agence Nationale de la Recherche). We acknowledge the ULA pilots Franck Toussaint for logistical help during the ULA campaign.
Construction of a Species-Level Tree of Life for the Insects and Utility in Taxonomic Profiling

PubMed Central

Chesters, Douglas

2017-01-01

Abstract Although comprehensive phylogenies have proven an invaluable tool in ecology and evolution, their construction is made increasingly challenging both by the scale and structure of publically available sequences. The distinct partition between gene-rich (genomic) and species-rich (DNA barcode) data is a feature of data that has been largely overlooked, yet presents a key obstacle to scaling supermatrix analysis. I present a phyloinformatics framework for draft construction of a species-level phylogeny of insects (Class Insecta). Matrix-building requires separately optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and species-rich markers, whereas tree-building requires hierarchical inference in order to capture species-breadth while retaining deep-level resolution. The phylogeny of insects contains 49,358 species, 13,865 genera, 760 families. Deep-level splits largely reflected previous findings for sections of the tree that are data rich or unambiguous, such as inter-ordinal Endopterygota and Dictyoptera, the recently evolved and relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera), and Cucujiformia (Coleoptera). However, analysis of bias, matrix construction and gene-tree variation suggests confidence in some relationships (such as in Polyneoptera) is less than has been indicated by the matrix bootstrap method. To assess the utility of the insect tree as a tool in query profiling several tree-based taxonomic assignment methods are compared. Using test data sets with existing taxonomic annotations, a tendency is observed for greater accuracy of species-level assignments where using a fixed comprehensive tree of life in contrast to methods generating smaller de novo reference trees. Described herein is a solution to the discrepancy in the way data are fit into supermatrices. The resulting tree facilitates wider studies of insect diversification and application of advanced descriptions of diversity in community studies, among other presumed applications. PMID:27798407
Deriving Continuous Fields of Tree Cover at 1-m over the Continental United States From the National Agriculture Imagery Program (NAIP) Imagery to Reduce Uncertainties in Forest Carbon Stock Estimation

NASA Astrophysics Data System (ADS)

Ganguly, S.; Basu, S.; Mukhopadhyay, S.; Michaelis, A.; Milesi, C.; Votava, P.; Nemani, R. R.

2013-12-01

An unresolved issue with coarse-to-medium resolution satellite-based forest carbon mapping over regional to continental scales is the high level of uncertainty in above ground biomass (AGB) estimates caused by the absence of forest cover information at a high enough spatial resolution (current spatial resolution is limited to 30-m). To put confidence in existing satellite-derived AGB density estimates, it is imperative to create continuous fields of tree cover at a sufficiently high resolution (e.g. 1-m) such that large uncertainties in forested area are reduced. The proposed work will provide means to reduce uncertainty in present satellite-derived AGB maps and Forest Inventory and Analysis (FIA) based regional estimates. Our primary objective will be to create Very High Resolution (VHR) estimates of tree cover at a spatial resolution of 1-m for the Continental United States using all available National Agriculture Imaging Program (NAIP) color-infrared imagery from 2010 till 2012. We will leverage the existing capabilities of the NASA Earth Exchange (NEX) high performance computing and storage facilities. The proposed 1-m tree cover map can be further aggregated to provide percent tree cover at any medium-to-coarse resolution spatial grid, which will aid in reducing uncertainties in AGB density estimation at the respective grid and overcome current limitations imposed by medium-to-coarse resolution land cover maps. We have implemented a scalable and computationally-efficient parallelized framework for tree-cover delineation - the core components of the algorithm [that] include a feature extraction process, a Statistical Region Merging image segmentation algorithm and a classification algorithm based on Deep Belief Network and a Feedforward Backpropagation Neural Network algorithm. An initial pilot exercise has been performed over the state of California (~11,000 scenes) to create a wall-to-wall 1-m tree cover map and the classification accuracy has been assessed. Results show an improvement in accuracy of tree-cover delineation as compared to existing forest cover maps from NLCD, especially over fragmented, heterogeneous and urban landscapes. Estimates of VHR tree cover will complement and enhance the accuracy of present remote-sensing based AGB modeling approaches and forest inventory based estimates at both national and local scales. A requisite step will be to characterize the inherent uncertainties in tree cover estimates and propagate them to estimate AGB.

Binary space partitioning trees and their uses

NASA Technical Reports Server (NTRS)

Bell, Bradley N.

1989-01-01

Binary Space Partitioning (BSP) trees have some qualities that make them useful in solving many graphics related problems. The purpose is to describe what a BSP tree is, and how it can be used to solve the problem of hidden surface removal, and constructive solid geometry. The BSP tree is based on the idea that a plane acting as a divider subdivides space into two parts with one being on the positive side and the other on the negative. A polygonal solid is then represented as the volume defined by the collective interior half spaces of the solid's bounding surfaces. The nature of how the tree is organized lends itself well for sorting polygons relative to an arbitrary point in 3 space. The speed at which the tree can be traversed for depth sorting is fast enough to provide hidden surface removal at interactive speeds. The fact that a BSP tree actually represents a polygonal solid as a bounded volume also makes it quite useful in performing the boolean operations used in constructive solid geometry. Due to the nature of the BSP tree, polygons can be classified as they are subdivided. The ability to classify polygons as they are subdivided can enhance the simplicity of implementing constructive solid geometry.
ANTLR Tree Grammar Generator and Extensions

NASA Technical Reports Server (NTRS)

Craymer, Loring

2005-01-01

A computer program implements two extensions of ANTLR (Another Tool for Language Recognition), which is a set of software tools for translating source codes between different computing languages. ANTLR supports predicated- LL(k) lexer and parser grammars, a notation for annotating parser grammars to direct tree construction, and predicated tree grammars. [ LL(k) signifies left-right, leftmost derivation with k tokens of look-ahead, referring to certain characteristics of a grammar.] One of the extensions is a syntax for tree transformations. The other extension is the generation of tree grammars from annotated parser or input tree grammars. These extensions can simplify the process of generating source-to-source language translators and they make possible an approach, called "polyphase parsing," to translation between computing languages. The typical approach to translator development is to identify high-level semantic constructs such as "expressions," "declarations," and "definitions" as fundamental building blocks in the grammar specification used for language recognition. The polyphase approach is to lump ambiguous syntactic constructs during parsing and then disambiguate the alternatives in subsequent tree transformation passes. Polyphase parsing is believed to be useful for generating efficient recognizers for C++ and other languages that, like C++, have significant ambiguities.
Node Redeployment Algorithm Based on Stratified Connected Tree for Underwater Sensor Networks

PubMed Central

Liu, Jun; Jiang, Peng; Wu, Feng; Yu, Shanen; Song, Chunyue

2016-01-01

During the underwater sensor networks (UWSNs) operation, node drift with water environment causes network topology changes. Periodic node location examination and adjustment are needed to maintain good network monitoring quality as long as possible. In this paper, a node redeployment algorithm based on stratified connected tree for UWSNs is proposed. At every network adjustment moment, self-examination and adjustment on node locations are performed firstly. If a node is outside the monitored space, it returns to the last location recorded in its memory along straight line. Later, the network topology is stratified into a connected tree that takes the sink node as the root node by broadcasting ready information level by level, which can improve the network connectivity rate. Finally, with synthetically considering network coverage and connectivity rates, and node movement distance, the sink node performs centralized optimization on locations of leaf nodes in the stratified connected tree. Simulation results show that the proposed redeployment algorithm can not only keep the number of nodes in the monitored space as much as possible and maintain good network coverage and connectivity rates during network operation, but also reduce node movement distance during node redeployment and prolong the network lifetime. PMID:28029124
Intelligent Diagnostic Assistant for Complicated Skin Diseases through C5's Algorithm.

PubMed

Jeddi, Fatemeh Rangraz; Arabfard, Masoud; Kermany, Zahra Arab

2017-09-01

Intelligent Diagnostic Assistant can be used for complicated diagnosis of skin diseases, which are among the most common causes of disability. The aim of this study was to design and implement a computerized intelligent diagnostic assistant for complicated skin diseases through C5's Algorithm. An applied-developmental study was done in 2015. Knowledge base was developed based on interviews with dermatologists through questionnaires and checklists. Knowledge representation was obtained from the train data in the database using Excel Microsoft Office. Clementine Software and C5's Algorithms were applied to draw the decision tree. Analysis of test accuracy was performed based on rules extracted using inference chains. The rules extracted from the decision tree were entered into the CLIPS programming environment and the intelligent diagnostic assistant was designed then. The rules were defined using forward chaining inference technique and were entered into Clips programming environment as RULE. The accuracy and error rates obtained in the training phase from the decision tree were 99.56% and 0.44%, respectively. The accuracy of the decision tree was 98% and the error was 2% in the test phase. Intelligent diagnostic assistant can be used as a reliable system with high accuracy, sensitivity, specificity, and agreement.
An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines

PubMed Central

2014-01-01

Background As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Results Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each “strategy” for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. Conclusions When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates. PMID:24678701
An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines.

PubMed

DeGiorgio, Michael; Syring, John; Eckert, Andrew J; Liston, Aaron; Cronn, Richard; Neale, David B; Rosenberg, Noah A

2014-03-29

As it becomes increasingly possible to obtain DNA sequences of orthologous genes from diverse sets of taxa, species trees are frequently being inferred from multilocus data. However, the behavior of many methods for performing this inference has remained largely unexplored. Some methods have been proven to be consistent given certain evolutionary models, whereas others rely on criteria that, although appropriate for many parameter values, have peculiar zones of the parameter space in which they fail to converge on the correct estimate as data sets increase in size. Here, using North American pines, we empirically evaluate the behavior of 24 strategies for species tree inference using three alternative outgroups (72 strategies total). The data consist of 120 individuals sampled in eight ingroup species from subsection Strobus and three outgroup species from subsection Gerardianae, spanning ∼47 kilobases of sequence at 121 loci. Each "strategy" for inferring species trees consists of three features: a species tree construction method, a gene tree inference method, and a choice of outgroup. We use multivariate analysis techniques such as principal components analysis and hierarchical clustering to identify tree characteristics that are robustly observed across strategies, as well as to identify groups of strategies that produce trees with similar features. We find that strategies that construct species trees using only topological information cluster together and that strategies that use additional non-topological information (e.g., branch lengths) also cluster together. Strategies that utilize more than one individual within a species to infer gene trees tend to produce estimates of species trees that contain clades present in trees estimated by other strategies. Strategies that use the minimize-deep-coalescences criterion to construct species trees tend to produce species tree estimates that contain clades that are not present in trees estimated by the Concatenation, RTC, SMRT, STAR, and STEAC methods, and that in general are more balanced than those inferred by these other strategies. When constructing a species tree from a multilocus set of sequences, our observations provide a basis for interpreting differences in species tree estimates obtained via different approaches that have a two-stage structure in common, one step for gene tree estimation and a second step for species tree estimation. The methods explored here employ a number of distinct features of the data, and our analysis suggests that recovery of the same results from multiple methods that tend to differ in their patterns of inference can be a valuable tool for obtaining reliable estimates.
Optimal tree-stem bucking of northeastern species of China

Treesearch

Jingxin Wang; Chris B. LeDoux; Joseph McNeel

2004-01-01

An application of optimal tree-stem bucking to the northeastern tree species of China is reported. The bucking procedures used in this region are summarized, which are the basic guidelines for the optimal bucking design. The directed graph approach was adopted to generate the bucking patterns by using the network analysis labeling algorithm. A computer-based bucking...
Big cat phylogenies, consensus trees, and computational thinking.

PubMed

Sul, Seung-Jin; Williams, Tiffani L

2011-07-01

Phylogenetics seeks to deduce the pattern of relatedness between organisms by using a phylogeny or evolutionary tree. For a given set of organisms or taxa, there may be many evolutionary trees depicting how these organisms evolved from a common ancestor. As a result, consensus trees are a popular approach for summarizing the shared evolutionary relationships in a group of trees. We examine these consensus techniques by studying how the pantherine lineage of cats (clouded leopard, jaguar, leopard, lion, snow leopard, and tiger) evolved, which is hotly debated. While there are many phylogenetic resources that describe consensus trees, there is very little information, written for biologists, regarding the underlying computational techniques for building them. The pantherine cats provide us with a small, relevant example to explore the computational techniques (such as sorting numbers, hashing functions, and traversing trees) for constructing consensus trees. Our hope is that life scientists enjoy peeking under the computational hood of consensus tree construction and share their positive experiences with others in their community.
Shrub Abundance Mapping in Arctic Tundra with Misr

NASA Astrophysics Data System (ADS)

Duchesne, R.; Chopping, M. J.; Wang, Z.; Schaaf, C.; Tape, K. D.

2013-12-01

Over the last 60 years an increase in shrub abundance has been observed in the Arctic tundra in connection with a rapid surface warming trend. Rapid shrub expansion may have consequences in terms of ecosystem structure and function, albedo, and feedbacks to climate; however, its rate is not yet known. The goal of this research effort is thus to map large scale changes in Arctic tundra vegetation by exploiting the structural signal in moderate resolution satellite remote sensing images from NASA's Multiangle Imaging SpectroRadiometer (MISR), mapped onto a 250m Albers Conic Equal Area grid. We present here large area shrub mapping supported by reference data collated using extensive field inventory data and high resolution panchromatic imagery. MISR Level 1B2 Terrain radiance scenes from the Terra satellite from 15 June-31 July, 2000 - 2010 were converted to surface bidirectional reflectance factors (BRF) using MISR Toolkit routines and the MISR 1 km LAND product BRFs. The red band data in all available cameras were used to invert the RossThick-LiSparse-Reciprocal BRDF model to retrieve kernel weights, model-fitting RMSE, and Weights of Determination. The reference database was constructed using aerial survey, three field campaigns (field inventory for shrub count, cover, mean radius and height), and high resolution imagery. Tall shrub number, mean crown radius, cover, and mean height estimates were obtained from QuickBird and GeoEye panchromatic image chips using the CANAPI algorithm, and calibrated using field-based estimates, thus extending the database to over eight hundred locations. Tall shrub fractional cover maps for the North Slope of Alaska were constructed using the bootstrap forest machine learning algorithm that exploits the surface information provided by MISR. The reference database was divided into two datasets for training and validation. The model derived used a set of 19 independent variables(the three kernel weights, ratios and interaction terms; white and black sky albedos; and blue, green, red, and NIR nadir camera BRFs), to grow a forest of decision trees. The final estimate is the average of the predicted values from each tree. Observations not used in constructing the trees were used in validation. The model was applied with a large volume of MISR data and the resulting fractional cover estimates were combined into annual maps using a compositing algorithm that flags results affected by cloud, cloud shadow, surface water, extreme outliers, topographic shading, and burned areas. The maps show that shrub cover is lower on the north slope in comparison to southern part, as expected, however, a preliminary assessment of the fractional cover change over the last decade, achieved by averaging fractional cover values for 2000-2002 and 2008-2010 and then calculating the change between the two periods, revealed that there are large areas for which we cannot determine the sign of the change with high confidence, as the precision of our estimate is close to the magnitude of the cover values. Additional research is thus required to reliably map shrub cover in this environment at annual intervals.
Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.

PubMed

Nye, Tom M W; Tang, Xiaoxian; Weyenberg, Grady; Yoshida, Ruriko

2017-12-01

Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample's structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the [Formula: see text]th principal component in Euclidean space: the locus of the weighted Fréchet mean of [Formula: see text] vertex trees when the weights vary over the [Formula: see text]-simplex. We establish some basic properties of these objects, in particular showing that they have dimension [Formula: see text], and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.
Efficient structure from motion for oblique UAV images based on maximal spanning tree expansion

NASA Astrophysics Data System (ADS)

Jiang, San; Jiang, Wanshou

2017-10-01

The primary contribution of this paper is an efficient Structure from Motion (SfM) solution for oblique unmanned aerial vehicle (UAV) images. First, an algorithm, considering spatial relationship constraints between image footprints, is designed for match pair selection with the assistance of UAV flight control data and oblique camera mounting angles. Second, a topological connection network (TCN), represented by an undirected weighted graph, is constructed from initial match pairs, which encodes the overlap areas and intersection angles into edge weights. Then, an algorithm, termed MST-Expansion, is proposed to extract the match graph from the TCN, where the TCN is first simplified by a maximum spanning tree (MST). By further analysis of the local structure in the MST, expansion operations are performed on the vertices of the MST for match graph enhancement, which is achieved by introducing critical connections in the expansion directions. Finally, guided by the match graph, an efficient SfM is proposed. Under extensive analysis and comparison, its performance is verified by using three oblique UAV datasets captured with different multi-camera systems. Experimental results demonstrate that the efficiency of image matching is improved, with speedup ratios ranging from 19 to 35, and competitive orientation accuracy is achieved from both relative bundle adjustment (BA) without GCPs (Ground Control Points) and absolute BA with GCPs. At the same time, images in the three datasets are successfully oriented. For the orientation of oblique UAV images, the proposed method can be a more efficient solution.
An efficient non-dominated sorting method for evolutionary algorithms.

PubMed

Fang, Hongbing; Wang, Qian; Tu, Yi-Cheng; Horstemeyer, Mark F

2008-01-01

We present a new non-dominated sorting algorithm to generate the non-dominated fronts in multi-objective optimization with evolutionary algorithms, particularly the NSGA-II. The non-dominated sorting algorithm used by NSGA-II has a time complexity of O(MN(2)) in generating non-dominated fronts in one generation (iteration) for a population size N and M objective functions. Since generating non-dominated fronts takes the majority of total computational time (excluding the cost of fitness evaluations) of NSGA-II, making this algorithm faster will significantly improve the overall efficiency of NSGA-II and other genetic algorithms using non-dominated sorting. The new non-dominated sorting algorithm proposed in this study reduces the number of redundant comparisons existing in the algorithm of NSGA-II by recording the dominance information among solutions from their first comparisons. By utilizing a new data structure called the dominance tree and the divide-and-conquer mechanism, the new algorithm is faster than NSGA-II for different numbers of objective functions. Although the number of solution comparisons by the proposed algorithm is close to that of NSGA-II when the number of objectives becomes large, the total computational time shows that the proposed algorithm still has better efficiency because of the adoption of the dominance tree structure and the divide-and-conquer mechanism.
Land cover and land use mapping of the iSimangaliso Wetland Park, South Africa: comparison of oblique and orthogonal random forest algorithms

NASA Astrophysics Data System (ADS)

Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad

2016-01-01

In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.
Rapid Leaf Deployment Strategies in a Deciduous Savanna

PubMed Central

2016-01-01

Deciduous plants avoid the costs of maintaining leaves in the unfavourable season, but carry the costs of constructing new leaves every year. Deciduousness is therefore expected in ecological situations with pronounced seasonality and low costs of leaf construction. In our study system, a seasonally dry tropical savanna, many trees are deciduous, suggesting that leaf construction costs must be low. Previous studies have, however, shown that nitrogen is limiting in this system, suggesting that leaf construction costs are high. Here we examine this conundrum using a time series of soil moisture availability, leaf phenology and nitrogen distribution in the tree canopy to illustrate how trees resorb nitrogen before leaf abscission and use stored reserves of nitrogen and carbon to construct new leaves at the onset of the growing season. Our results show that trees deployed leaves shortly before and in anticipation of the first rains with its associated pulse of nitrogen mineralisation. Our results also show that trees rapidly constructed a full canopy of leaves within two weeks of the first rains. We detected an increase in leaf nitrogen content that corresponded with the first rains and with the movement of nitrogen to more distal branches, suggesting that stored nitrogen reserves are used to construct leaves. Furthermore the stable carbon isotope ratios (δ13C) of these leaves suggest the use of stored carbon for leaf construction. Our findings suggest that the early deployment of leaves using stored nitrogen and carbon reserves is a strategy that is integrally linked with the onset of the first rains. This strategy may confer a competitive advantage over species that deploy leaves at or after the onset of the rains. PMID:27310398
A simple and robust classification tree for differentiation between benign and malignant lesions in MR-mammography.

PubMed

Baltzer, Pascal A T; Dietzel, Matthias; Kaiser, Werner A

2013-08-01

In the face of multiple available diagnostic criteria in MR-mammography (MRM), a practical algorithm for lesion classification is needed. Such an algorithm should be as simple as possible and include only important independent lesion features to differentiate benign from malignant lesions. This investigation aimed to develop a simple classification tree for differential diagnosis in MRM. A total of 1,084 lesions in standardised MRM with subsequent histological verification (648 malignant, 436 benign) were investigated. Seventeen lesion criteria were assessed by 2 readers in consensus. Classification analysis was performed using the chi-squared automatic interaction detection (CHAID) method. Results include the probability for malignancy for every descriptor combination in the classification tree. A classification tree incorporating 5 lesion descriptors with a depth of 3 ramifications (1, root sign; 2, delayed enhancement pattern; 3, border, internal enhancement and oedema) was calculated. Of all 1,084 lesions, 262 (40.4 %) and 106 (24.3 %) could be classified as malignant and benign with an accuracy above 95 %, respectively. Overall diagnostic accuracy was 88.4 %. The classification algorithm reduced the number of categorical descriptors from 17 to 5 (29.4 %), resulting in a high classification accuracy. More than one third of all lesions could be classified with accuracy above 95 %. • A practical algorithm has been developed to classify lesions found in MR-mammography. • A simple decision tree consisting of five criteria reaches high accuracy of 88.4 %. • Unique to this approach, each classification is associated with a diagnostic certainty. • Diagnostic certainty of greater than 95 % is achieved in 34 % of all cases.
Statistical Methods in Ai: Rare Event Learning Using Associative Rules and Higher-Order Statistics

NASA Astrophysics Data System (ADS)

Iyer, V.; Shetty, S.; Iyengar, S. S.

2015-07-01

Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (t) in our proposed ensemble always yields minimum number (m) of leafs keeping pre-processing computation to n × t log m compared to N2 for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.
Representations of the language recognition problem for a theorem prover

NASA Technical Reports Server (NTRS)

Minker, J.; Vanderbrug, G. J.

1972-01-01

Two representations of the language recognition problem for a theorem prover in first order logic are presented and contrasted. One of the representations is based on the familiar method of generating sentential forms of the language, and the other is based on the Cocke parsing algorithm. An augmented theorem prover is described which permits recognition of recursive languages. The state-transformation method developed by Cordell Green to construct problem solutions in resolution-based systems can be used to obtain the parse tree. In particular, the end-order traversal of the parse tree is derived in one of the representations. An inference system, termed the cycle inference system, is defined which makes it possible for the theorem prover to model the method on which the representation is based. The general applicability of the cycle inference system to state space problems is discussed. Given an unsatisfiable set S, where each clause has at most one positive literal, it is shown that there exists an input proof. The clauses for the two representations satisfy these conditions, as do many state space problems.
Populations of Erythrina velutina Willd. at risk of extinction.

PubMed

Melo, M F V; Gonçalves, L O; Rabbani, A R C; Álvares-Carvalho, S V; Pinheiro, J B; Zucchi, M I; Silva-Mann, R

2015-08-28

The goal of this study was to characterize the structure of two natural populations of the coral tree using RAPD and ISSR markers. The study evaluated all individuals in two different areas in the northeastern region of Brazil: the first was in the riparian area, 10 km x 100 m along the edge of the lower São Francisco River, and the second was in the municipality of Pinhão, in a semiarid region between the municipalities of Neópolis and Santana do São Francisco. We used all the coral trees present in those two areas (37 individuals). The results of the RAPD and ISSR markers were highly congruent, supporting the reliability of the techniques used. Similarity was estimated using the Jaccard arithmetic complement index. A dendrogram was constructed using the unweighted pair group method with arithmetic mean cluster algorithm, and the robustness of the data was bootstrapped with 5000 replicates. A principal coordinate analysis was performed on the basis of Jaccard coefficients. The total genetic variation observed was 21%, corresponding to the variation between the populations, and 79% of the variation was observed within the populations.
Optimal parallel solution of sparse triangular systems

NASA Technical Reports Server (NTRS)

Alvarado, Fernando L.; Schreiber, Robert

1990-01-01

A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees.
Analytical and CASE study on Limited Search, ID3, CHAID, C4.5, Improved C4.5 and OVA Decision Tree Algorithms to design Decision Support System

NASA Astrophysics Data System (ADS)

Kaur, Parneet; Singh, Sukhwinder; Garg, Sushil; Harmanpreet

2010-11-01

In this paper we study about classification algorithms for farm DSS. By applying classification algorithms i.e. Limited search, ID3, CHAID, C4.5, Improved C4.5 and One VS all Decision Tree on common data set of crop with specified class, results are obtained. The tool used to derive results is SPINA. The graphical results obtained from tool are compared to suggest best technique to develop farm Decision Support System. This analysis would help to researchers to design effective and fast DSS for farmer to take decision for enhancing their yield.

Inferring patterns in mitochondrial DNA sequences through hypercube independent spanning trees.

PubMed

Silva, Eduardo Sant Ana da; Pedrini, Helio

2016-03-01

Given a graph G, a set of spanning trees rooted at a vertex r of G is said vertex/edge independent if, for each vertex v of G, v≠r, the paths of r to v in any pair of trees are vertex/edge disjoint. Independent spanning trees (ISTs) provide a number of advantages in data broadcasting due to their fault tolerant properties. For this reason, some studies have addressed the issue by providing mechanisms for constructing independent spanning trees efficiently. In this work, we investigate how to construct independent spanning trees on hypercubes, which are generated based upon spanning binomial trees, and how to use them to predict mitochondrial DNA sequence parts through paths on the hypercube. The prediction works both for inferring mitochondrial DNA sequences comprised of six bases as well as infer anomalies that probably should not belong to the mitochondrial DNA standard. Copyright © 2016 Elsevier Ltd. All rights reserved.
Construction of a Species-Level Tree of Life for the Insects and Utility in Taxonomic Profiling.

PubMed

Chesters, Douglas

2017-05-01

Although comprehensive phylogenies have proven an invaluable tool in ecology and evolution, their construction is made increasingly challenging both by the scale and structure of publically available sequences. The distinct partition between gene-rich (genomic) and species-rich (DNA barcode) data is a feature of data that has been largely overlooked, yet presents a key obstacle to scaling supermatrix analysis. I present a phyloinformatics framework for draft construction of a species-level phylogeny of insects (Class Insecta). Matrix-building requires separately optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and species-rich markers, whereas tree-building requires hierarchical inference in order to capture species-breadth while retaining deep-level resolution. The phylogeny of insects contains 49,358 species, 13,865 genera, 760 families. Deep-level splits largely reflected previous findings for sections of the tree that are data rich or unambiguous, such as inter-ordinal Endopterygota and Dictyoptera, the recently evolved and relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera), and Cucujiformia (Coleoptera). However, analysis of bias, matrix construction and gene-tree variation suggests confidence in some relationships (such as in Polyneoptera) is less than has been indicated by the matrix bootstrap method. To assess the utility of the insect tree as a tool in query profiling several tree-based taxonomic assignment methods are compared. Using test data sets with existing taxonomic annotations, a tendency is observed for greater accuracy of species-level assignments where using a fixed comprehensive tree of life in contrast to methods generating smaller de novo reference trees. Described herein is a solution to the discrepancy in the way data are fit into supermatrices. The resulting tree facilitates wider studies of insect diversification and application of advanced descriptions of diversity in community studies, among other presumed applications. [Data integration; data mining; insects; phylogenomics; phyloinformatics; tree of life.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Searching Dynamic Agents with a Team of Mobile Robots

PubMed Central

Juliá, Miguel; Gil, Arturo; Reinoso, Oscar

2012-01-01

This paper presents a new algorithm that allows a team of robots to cooperatively search for a set of moving targets. An estimation of the areas of the environment that are more likely to hold a target agent is obtained using a grid-based Bayesian filter. The robot sensor readings and the maximum speed of the moving targets are used in order to update the grid. This representation is used in a search algorithm that commands the robots to those areas that are more likely to present target agents. This algorithm splits the environment in a tree of connected regions using dynamic programming. This tree is used in order to decide the destination for each robot in a coordinated manner. The algorithm has been successfully tested in known and unknown environments showing the validity of the approach. PMID:23012519
Searching dynamic agents with a team of mobile robots.

PubMed

Juliá, Miguel; Gil, Arturo; Reinoso, Oscar

2012-01-01

This paper presents a new algorithm that allows a team of robots to cooperatively search for a set of moving targets. An estimation of the areas of the environment that are more likely to hold a target agent is obtained using a grid-based Bayesian filter. The robot sensor readings and the maximum speed of the moving targets are used in order to update the grid. This representation is used in a search algorithm that commands the robots to those areas that are more likely to present target agents. This algorithm splits the environment in a tree of connected regions using dynamic programming. This tree is used in order to decide the destination for each robot in a coordinated manner. The algorithm has been successfully tested in known and unknown environments showing the validity of the approach.
Scheduling and control strategies for the departure problem in air traffic control

NASA Astrophysics Data System (ADS)

Bolender, Michael Alan

Two problems relating to the departure problem in air traffic control automation are examined. The first problem that is addressed is the scheduling of aircraft for departure. The departure operations at a major US hub airport are analyzed, and a discrete event simulation of the departure operations is constructed. Specifically, the case where there is a single departure runway is considered. The runway is fed by two queues of aircraft. Each queue, in turn, is fed by a single taxiway. Two salient areas regarding scheduling are addressed. The first is the construction of optimal departure sequences for the aircraft that are queued. Several greedy search algorithms are designed to minimize the total time to depart a set of queued aircraft. Each algorithm has a different set of heuristic rules to resolve situations within the search space whenever two branches of the search tree with equal edge costs are encountered. These algorithms are then compared and contrasted with a genetic search algorithm in order to assess the performance of the heuristics. This is done in the context of a static departure problem where the length of the departure queue is fixed. A greedy algorithm which deepens the search whenever two branches of the search tree with non-unique costs are encountered is shown to outperform the other heuristic algorithms. This search strategy is then implemented in the discrete event simulation. A baseline performance level is established, and a sensitivity analysis is performed by implementing changes in traffic mix, routing, and miles-in-trail restrictions for comparison. It is concluded that to minimize the average time spent in the queue for different traffic conditions, a queue assignment algorithm is needed to maintain an even balance of aircraft in the queues. A necessary consideration is to base queue assignment upon traffic management restrictions such as miles-in-trail constraints. The second problem addresses the technical challenges associated with merging departure aircraft onto their filed routes in a congested airspace environment. Conflicts between departures and en route aircraft within the Center airspace are analyzed. Speed control, holding the aircraft; at an intermediate altitude, re-routing, and vectoring are posed as possible deconfliction maneuvers. A cost assessment of these merge strategies, which are based upon 4D fight management and conflict detection and resolution principles, is given. Several merge conflicts are studied and a cost for each resolution is computed. It is shown that vectoring tends to be the most expensive resolution technique. Altitude hold is simple, costs less than vectoring, but may require a long time for the aircraft to achieve separation. Re-routing is the simplest, and provides the most cost benefit since the aircraft flies a shorter distance than if it had followed its filed route. Speed control is shown to be ineffective as a means of increasing separation, but is effective for maintaining separation between aircraft. In addition, the affects of uncertainties on the cost are assessed. The analysis shows that cost is invariant with the decision time.
New methods, algorithms, and software for rapid mapping of tree positions in coordinate forest plots

Treesearch

A. Dan Wilson

2000-01-01

The theories and methodologies for two new tree mapping methods, the Sequential-target method and the Plot-origin radial method, are described. The methods accommodate the use of any conventional distance measuring device and compass to collect horizontal distance and azimuth data between source or reference positions (origins) and target trees. Conversion equations...
Optimal Path Planning Program for Autonomous Speed Sprayer in Orchard Using Order-Picking Algorithm

NASA Astrophysics Data System (ADS)

Park, T. S.; Park, S. J.; Hwang, K. Y.; Cho, S. I.

This study was conducted to develop a software program which computes optimal path for autonomous navigation in orchard, especially for speed sprayer. Possibilities of autonomous navigation in orchard were shown by other researches which have minimized distance error between planned path and performed path. But, research of planning an optimal path for speed sprayer in orchard is hardly founded. In this study, a digital map and a database for orchard which contains GPS coordinate information (coordinates of trees and boundary of orchard) and entity information (heights and widths of trees, radius of main stem of trees, disease of trees) was designed. An orderpicking algorithm which has been used for management of warehouse was used to calculate optimum path based on the digital map. Database for digital map was created by using Microsoft Access and graphic interface for database was made by using Microsoft Visual C++ 6.0. It was possible to search and display information about boundary of an orchard, locations of trees, daily plan for scattering chemicals and plan optimal path on different orchard based on digital map, on each circumstance (starting speed sprayer in different location, scattering chemicals for only selected trees).
Real-Time Interactive Tree Animation.

PubMed

Quigley, Ed; Yu, Yue; Huang, Jingwei; Lin, Winnie; Fedkiw, Ronald

2018-05-01

We present a novel method for posing and animating botanical tree models interactively in real time. Unlike other state of the art methods which tend to produce trees that are overly flexible, bending and deforming as if they were underwater plants, our approach allows for arbitrarily high stiffness while still maintaining real-time frame rates without spurious artifacts, even on quite large trees with over ten thousand branches. This is accomplished by using an articulated rigid body model with as-stiff-as-desired rotational springs in conjunction with our newly proposed simulation technique, which is motivated both by position based dynamics and the typical algorithms for articulated rigid bodies. The efficiency of our algorithm allows us to pose and animate trees with millions of branches or alternatively simulate a small forest comprised of many highly detailed trees. Even using only a single CPU core, we can simulate ten thousand branches in real time while still maintaining quite crisp user interactivity. This has allowed us to incorporate our framework into a commodity game engine to run interactively even on a low-budget tablet. We show that our method is amenable to the incorporation of a large variety of desirable effects such as wind, leaves, fictitious forces, collisions, fracture, etc.
At-Least Version of the Generalized Minimum Spanning Tree Problem: Optimization Through Ant Colony System and Genetic Algorithms

NASA Technical Reports Server (NTRS)

Janich, Karl W.

2005-01-01

The At-Least version of the Generalized Minimum Spanning Tree Problem (L-GMST) is a problem in which the optimal solution connects all defined clusters of nodes in a given network at a minimum cost. The L-GMST is NPHard; therefore, metaheuristic algorithms have been used to find reasonable solutions to the problem as opposed to computationally feasible exact algorithms, which many believe do not exist for such a problem. One such metaheuristic uses a swarm-intelligent Ant Colony System (ACS) algorithm, in which agents converge on a solution through the weighing of local heuristics, such as the shortest available path and the number of agents that recently used a given path. However, in a network using a solution derived from the ACS algorithm, some nodes may move around to different clusters and cause small changes in the network makeup. Rerunning the algorithm from the start would be somewhat inefficient due to the significance of the changes, so a genetic algorithm based on the top few solutions found in the ACS algorithm is proposed to quickly and efficiently adapt the network to these small changes.
Computer-automated evolution of an X-band antenna for NASA's Space Technology 5 mission.

PubMed

Hornby, Gregory S; Lohn, Jason D; Linden, Derek S

2011-01-01

Whereas the current practice of designing antennas by hand is severely limited because it is both time and labor intensive and requires a significant amount of domain knowledge, evolutionary algorithms can be used to search the design space and automatically find novel antenna designs that are more effective than would otherwise be developed. Here we present our work in using evolutionary algorithms to automatically design an X-band antenna for NASA's Space Technology 5 (ST5) spacecraft. Two evolutionary algorithms were used: the first uses a vector of real-valued parameters and the second uses a tree-structured generative representation for constructing the antenna. The highest-performance antennas from both algorithms were fabricated and tested and both outperformed a hand-designed antenna produced by the antenna contractor for the mission. Subsequent changes to the spacecraft orbit resulted in a change in requirements for the spacecraft antenna. By adjusting our fitness function we were able to rapidly evolve a new set of antennas for this mission in less than a month. One of these new antenna designs was built, tested, and approved for deployment on the three ST5 spacecraft, which were successfully launched into space on March 22, 2006. This evolved antenna design is the first computer-evolved antenna to be deployed for any application and is the first computer-evolved hardware in space.
Environmental impacts of forest road construction on mountainous terrain.

PubMed

Caliskan, Erhan

2013-03-15

Forest roads are the base infrastructure foundation of forestry operations. These roads entail a complex engineering effort because they can cause substantial environmental damage to forests and include a high-cost construction. This study was carried out in four sample sites of Giresun, Trabzon(2) and Artvin Forest Directorate, which is in the Black Sea region of Turkey. The areas have both steep terrain (30-50% gradient) and very steep terrain (51-80% gradient). Bulldozers and hydraulic excavators were determined to be the main machines for forest road construction, causing environmental damage and cross sections in mountainous areas.As a result of this study, the percent damage to forests was determined as follows: on steep terrain, 21% of trees were damaged by excavators and 33% of trees were damaged by bulldozers during forest road construction, and on very steep terrain, 27% of trees were damaged by excavators and 44% of trees were damaged by bulldozers during forest road construction. It was also determined that on steep terrain, when excavators were used, 12.23% less forest area was destroyed compared with when bulldozers were used and 16.13% less area was destroyed by excavators on very steep terrain. In order to reduce the environmental damage on the forest ecosystem, especially in steep terrains, hydraulic excavators should replace bulldozers in forest road construction activities.
Decision-Tree Analysis for Predicting First-Time Pass/Fail Rates for the NCLEX-RN® in Associate Degree Nursing Students.

PubMed

Chen, Hsiu-Chin; Bennett, Sean

2016-08-01

Little evidence shows the use of decision-tree algorithms in identifying predictors and analyzing their associations with pass rates for the NCLEX-RN(®) in associate degree nursing students. This longitudinal and retrospective cohort study investigated whether a decision-tree algorithm could be used to develop an accurate prediction model for the students' passing or failing the NCLEX-RN. This study used archived data from 453 associate degree nursing students in a selected program. The chi-squared automatic interaction detection analysis of the decision trees module was used to examine the effect of the collected predictors on passing/failing the NCLEX-RN. The actual percentage scores of Assessment Technologies Institute®'s RN Comprehensive Predictor(®) accurately identified students at risk of failing. The classification model correctly classified 92.7% of the students for passing. This study applied the decision-tree model to analyze a sequence database for developing a prediction model for early remediation in preparation for the NCLEXRN. [J Nurs Educ. 2016;55(8):454-457.]. Copyright 2016, SLACK Incorporated.
Polynomial Conjoint Analysis of Similarities: A Model for Constructing Polynomial Conjoint Measurement Algorithms.

ERIC Educational Resources Information Center

Young, Forrest W.

A model permitting construction of algorithms for the polynomial conjoint analysis of similarities is presented. This model, which is based on concepts used in nonmetric scaling, permits one to obtain the best approximate solution. The concepts used to construct nonmetric scaling algorithms are reviewed. Finally, examples of algorithmic models for…
Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time.

PubMed

Dhar, Amrit; Minin, Vladimir N

2017-05-01

Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences.
Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time

PubMed Central

Dhar, Amrit

2017-01-01

Abstract Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences. PMID:28177780
Species Tree Inference Using a Mixture Model.

PubMed

Ullah, Ikram; Parviainen, Pekka; Lagergren, Jens

2015-09-01

Species tree reconstruction has been a subject of substantial research due to its central role across biology and medicine. A species tree is often reconstructed using a set of gene trees or by directly using sequence data. In either of these cases, one of the main confounding phenomena is the discordance between a species tree and a gene tree due to evolutionary events such as duplications and losses. Probabilistic methods can resolve the discordance by coestimating gene trees and the species tree but this approach poses a scalability problem for larger data sets. We present MixTreEM-DLRS: A two-phase approach for reconstructing a species tree in the presence of gene duplications and losses. In the first phase, MixTreEM, a novel structural expectation maximization algorithm based on a mixture model is used to reconstruct a set of candidate species trees, given sequence data for monocopy gene families from the genomes under study. In the second phase, PrIME-DLRS, a method based on the DLRS model (Åkerborg O, Sennblad B, Arvestad L, Lagergren J. 2009. Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci U S A. 106(14):5714-5719), is used for selecting the best species tree. PrIME-DLRS can handle multicopy gene families since DLRS, apart from modeling sequence evolution, models gene duplication and loss using a gene evolution model (Arvestad L, Lagergren J, Sennblad B. 2009. The gene evolution model and computing its associated probabilities. J ACM. 56(2):1-44). We evaluate MixTreEM-DLRS using synthetic and biological data, and compare its performance with a recent genome-scale species tree reconstruction method PHYLDOG (Boussau B, Szöllősi GJ, Duret L, Gouy M, Tannier E, Daubin V. 2013. Genome-scale coestimation of species and gene trees. Genome Res. 23(2):323-330) as well as with a fast parsimony-based algorithm Duptree (Wehe A, Bansal MS, Burleigh JG, Eulenstein O. 2008. Duptree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13):1540-1541). Our method is competitive with PHYLDOG in terms of accuracy and runs significantly faster and our method outperforms Duptree in accuracy. The analysis constituted by MixTreEM without DLRS may also be used for selecting the target species tree, yielding a fast and yet accurate algorithm for larger data sets. MixTreEM is freely available at http://prime.scilifelab.se/mixtreem/. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Joint Power Charging and Routing in Wireless Rechargeable Sensor Networks.

PubMed

Jia, Jie; Chen, Jian; Deng, Yansha; Wang, Xingwei; Aghvami, Abdol-Hamid

2017-10-09

The development of wireless power transfer (WPT) technology has inspired the transition from traditional battery-based wireless sensor networks (WSNs) towards wireless rechargeable sensor networks (WRSNs). While extensive efforts have been made to improve charging efficiency, little has been done for routing optimization. In this work, we present a joint optimization model to maximize both charging efficiency and routing structure. By analyzing the structure of the optimization model, we first decompose the problem and propose a heuristic algorithm to find the optimal charging efficiency for the predefined routing tree. Furthermore, by coding the many-to-one communication topology as an individual, we further propose to apply a genetic algorithm (GA) for the joint optimization of both routing and charging. The genetic operations, including tree-based recombination and mutation, are proposed to obtain a fast convergence. Our simulation results show that the heuristic algorithm reduces the number of resident locations and the total moving distance. We also show that our proposed algorithm achieves a higher charging efficiency compared with existing algorithms.
Joint Power Charging and Routing in Wireless Rechargeable Sensor Networks

PubMed Central

Jia, Jie; Chen, Jian; Deng, Yansha; Wang, Xingwei; Aghvami, Abdol-Hamid

2017-01-01

The development of wireless power transfer (WPT) technology has inspired the transition from traditional battery-based wireless sensor networks (WSNs) towards wireless rechargeable sensor networks (WRSNs). While extensive efforts have been made to improve charging efficiency, little has been done for routing optimization. In this work, we present a joint optimization model to maximize both charging efficiency and routing structure. By analyzing the structure of the optimization model, we first decompose the problem and propose a heuristic algorithm to find the optimal charging efficiency for the predefined routing tree. Furthermore, by coding the many-to-one communication topology as an individual, we further propose to apply a genetic algorithm (GA) for the joint optimization of both routing and charging. The genetic operations, including tree-based recombination and mutation, are proposed to obtain a fast convergence. Our simulation results show that the heuristic algorithm reduces the number of resident locations and the total moving distance. We also show that our proposed algorithm achieves a higher charging efficiency compared with existing algorithms. PMID:28991200
Simulation of land use change in the three gorges reservoir area based on CART-CA

NASA Astrophysics Data System (ADS)

Yuan, Min

2018-05-01

This study proposes a new method to simulate spatiotemporal complex multiple land uses by using classification and regression tree algorithm (CART) based CA model. In this model, we use classification and regression tree algorithm to calculate land class conversion probability, and combine neighborhood factor, random factor to extract cellular transformation rules. The overall Kappa coefficient is 0.8014 and the overall accuracy is 0.8821 in the land dynamic simulation results of the three gorges reservoir area from 2000 to 2010, and the simulation results are satisfactory.
Exploration on Construction of Hospital "Talent Tree" Project.

PubMed

Yi, Lihua; Wei, Lei; Hao, Aimin; Hu, Minmin; Xu, Xinzhou

2015-05-01

Talent is the core competitive force of a hospital's development. Wuxi No. 2 People's Hospital followed the characteristics that medical talents mature slowly and their growth requires a long period. The innovated "talent tree" project, trained classified talents corresponding to "base-trunk-crown" of a tree, formed an individualized professional training plan with different levels and at different periods. We carried out a relay of the "talent tree" to bring their initiative into play. In practice, we gradually found this as a unique way of the talent construction, which conforms to our hospital's condition. This guarantees sustained development and innovative force of the hospital.

Virtual Network Embedding via Monte Carlo Tree Search.

PubMed

Haeri, Soroush; Trajkovic, Ljiljana

2018-02-01

Network virtualization helps overcome shortcomings of the current Internet architecture. The virtualized network architecture enables coexistence of multiple virtual networks (VNs) on an existing physical infrastructure. VN embedding (VNE) problem, which deals with the embedding of VN components onto a physical network, is known to be -hard. In this paper, we propose two VNE algorithms: MaVEn-M and MaVEn-S. MaVEn-M employs the multicommodity flow algorithm for virtual link mapping while MaVEn-S uses the shortest-path algorithm. They formalize the virtual node mapping problem by using the Markov decision process (MDP) framework and devise action policies (node mappings) for the proposed MDP using the Monte Carlo tree search algorithm. Service providers may adjust the execution time of the MaVEn algorithms based on the traffic load of VN requests. The objective of the algorithms is to maximize the profit of infrastructure providers. We develop a discrete event VNE simulator to implement and evaluate performance of MaVEn-M, MaVEn-S, and several recently proposed VNE algorithms. We introduce profitability as a new performance metric that captures both acceptance and revenue to cost ratios. Simulation results show that the proposed algorithms find more profitable solutions than the existing algorithms. Given additional computation time, they further improve embedding solutions.
Stress wave velocity patterns in the longitudinal-radial plane of trees for defect diagnosis

Treesearch

Guanghui Li; Xiang Weng; Xiaocheng Du; Xiping Wang; Hailin Feng

2016-01-01

Acoustic tomography for urban tree inspection typically uses stress wave data to reconstruct tomographic images for the trunk cross section using interpolation algorithm. This traditional technique does not take into account the stress wave velocity patterns along tree height. In this study, we proposed an analytical model for the wave velocity in the longitudinalâ...
Portable Language-Independent Adaptive Translation from OCR. Phase 1

DTIC Science & Technology

2009-04-01

including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation
A scale-based connected coherence tree algorithm for image segmentation.

PubMed

Ding, Jundi; Ma, Runing; Chen, Songcan

2008-02-01

This paper presents a connected coherence tree algorithm (CCTA) for image segmentation with no prior knowledge. It aims to find regions of semantic coherence based on the proposed epsilon-neighbor coherence segmentation criterion. More specifically, with an adaptive spatial scale and an appropriate intensity-difference scale, CCTA often achieves several sets of coherent neighboring pixels which maximize the probability of being a single image content (including kinds of complex backgrounds). In practice, each set of coherent neighboring pixels corresponds to a coherence class (CC). The fact that each CC just contains a single equivalence class (EC) ensures the separability of an arbitrary image theoretically. In addition, the resultant CCs are represented by tree-based data structures, named connected coherence tree (CCT)s. In this sense, CCTA is a graph-based image analysis algorithm, which expresses three advantages: 1) its fundamental idea, epsilon-neighbor coherence segmentation criterion, is easy to interpret and comprehend; 2) it is efficient due to a linear computational complexity in the number of image pixels; 3) both subjective comparisons and objective evaluation have shown that it is effective for the tasks of semantic object segmentation and figure-ground separation in a wide variety of images. Those images either contain tiny, long and thin objects or are severely degraded by noise, uneven lighting, occlusion, poor illumination, and shadow.
MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination

NASA Technical Reports Server (NTRS)

Riggs, George A.; Hall, Dorothy K.

2010-01-01

Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.
The Optimization of Automatically Generated Compilers.

DTIC Science & Technology

1987-01-01

than their procedural counterparts, and are also easier to analyze for storage optimizations; (2) AGs can be algorithmically checked to be non-circular...Providing algorithms to move the storage for many attributes from the For structure tree into global stacks and variables. -Dd(2) Creating AEs which build and...54 3.5.2. Partitioning algorithm
Pattern pluralism and the Tree of Life hypothesis

PubMed Central

Doolittle, W. Ford; Bapteste, Eric

2007-01-01

Darwin claimed that a unique inclusively hierarchical pattern of relationships between all organisms based on their similarities and differences [the Tree of Life (TOL)] was a fact of nature, for which evolution, and in particular a branching process of descent with modification, was the explanation. However, there is no independent evidence that the natural order is an inclusive hierarchy, and incorporation of prokaryotes into the TOL is especially problematic. The only data sets from which we might construct a universal hierarchy including prokaryotes, the sequences of genes, often disagree and can seldom be proven to agree. Hierarchical structure can always be imposed on or extracted from such data sets by algorithms designed to do so, but at its base the universal TOL rests on an unproven assumption about pattern that, given what we know about process, is unlikely to be broadly true. This is not to say that similarities and differences between organisms are not to be accounted for by evolutionary mechanisms, but descent with modification is only one of these mechanisms, and a single tree-like pattern is not the necessary (or expected) result of their collective operation. Pattern pluralism (the recognition that different evolutionary models and representations of relationships will be appropriate, and true, for different taxa or at different scales or for different purposes) is an attractive alternative to the quixotic pursuit of a single true TOL. PMID:17261804
Application of decision tree for prediction of cutaneous leishmaniasis incidence based on environmental and topographic factors in Isfahan Province, Iran.

PubMed

Ramezankhani, Roghieh; Sajjadi, Nooshin; Nezakati Esmaeilzadeh, Roya; Jozi, Seyed Ali; Shirzadi, Mohammad Reza

2018-05-08

Cutaneous Leishmaniasis (CL) is a neglected tropical disease that continues to be a health problem in Iran. Nearly 350 million people are thought to be at risk. We investigated the impact of the environmental factors on CL incidence during the period 2007- 2015 in a known endemic area for this disease in Isfahan Province, Iran. After collecting data with regard to the climatic, topographic, vegetation coverage and CL cases in the study area, a decision tree model was built using the classification and regression tree algorithm. CL data for the years 2007 until 2012 were used for model construction and the data for the years 2013 until 2015 were used for testing the model. The Root Mean Square error and the correlation factor were used to evaluate the predictive performance of the decision tree model. We found that wind speeds less than 14 m/s, altitudes between 1234 and 1810 m above the mean sea level, vegetation coverage according to the normalized difference vegetation index (NDVI) less than 0.12, rainfall less than 1.6 mm and air temperatures higher than 30°C would correspond to a seasonal incidence of 163.28 per 100,000 persons, while if wind speed is less than 14 m/s, altitude less than 1,810 m and NDVI higher than 0.12, then the mean seasonal incidence of the disease would be 2.27 per 100,000 persons. Environmental factors were found to be important predictive variables for CL incidence and should be considered in surveillance and prevention programmes for CL control.
Optimization of matrix tablets controlled drug release using Elman dynamic neural networks and decision trees.

PubMed

Petrović, Jelena; Ibrić, Svetlana; Betz, Gabriele; Đurić, Zorica

2012-05-30

The main objective of the study was to develop artificial intelligence methods for optimization of drug release from matrix tablets regardless of the matrix type. Static and dynamic artificial neural networks of the same topology were developed to model dissolution profiles of different matrix tablets types (hydrophilic/lipid) using formulation composition, compression force used for tableting and tablets porosity and tensile strength as input data. Potential application of decision trees in discovering knowledge from experimental data was also investigated. Polyethylene oxide polymer and glyceryl palmitostearate were used as matrix forming materials for hydrophilic and lipid matrix tablets, respectively whereas selected model drugs were diclofenac sodium and caffeine. Matrix tablets were prepared by direct compression method and tested for in vitro dissolution profiles. Optimization of static and dynamic neural networks used for modeling of drug release was performed using Monte Carlo simulations or genetic algorithms optimizer. Decision trees were constructed following discretization of data. Calculated difference (f(1)) and similarity (f(2)) factors for predicted and experimentally obtained dissolution profiles of test matrix tablets formulations indicate that Elman dynamic neural networks as well as decision trees are capable of accurate predictions of both hydrophilic and lipid matrix tablets dissolution profiles. Elman neural networks were compared to most frequently used static network, Multi-layered perceptron, and superiority of Elman networks have been demonstrated. Developed methods allow simple, yet very precise way of drug release predictions for both hydrophilic and lipid matrix tablets having controlled drug release. Copyright © 2012 Elsevier B.V. All rights reserved.
TREAT (TREe-based Association Test)

Cancer.gov

TREAT is an R package for detecting complex joint effects in case-control studies. The test statistic is derived from a tree-structure model by recursive partitioning the data. Ultra-fast algorithm is designed to evaluate the significance of association between candidate gene and disease outcome
An application of locally linear model tree algorithm with combination of feature selection in credit scoring

NASA Astrophysics Data System (ADS)

Siami, Mohammad; Gholamian, Mohammad Reza; Basiri, Javad

2014-10-01

Nowadays, credit scoring is one of the most important topics in the banking sector. Credit scoring models have been widely used to facilitate the process of credit assessing. In this paper, an application of the locally linear model tree algorithm (LOLIMOT) was experimented to evaluate the superiority of its performance to predict the customer's credit status. The algorithm is improved with an aim of adjustment by credit scoring domain by means of data fusion and feature selection techniques. Two real world credit data sets - Australian and German - from UCI machine learning database were selected to demonstrate the performance of our new classifier. The analytical results indicate that the improved LOLIMOT significantly increase the prediction accuracy.
A method to study response of large trees to different amounts of available soil water

Treesearch

D.H. Marx; Shi-Jean S. Sung; J.S. Cunningham; M.D. Thompson; L.M. White

1995-01-01

A method was developed to manipulate available soil water on large trees by intercepting thrufall with gutters placed under tree canopies and irrigating the intercepted thrufall onto other trees. With this design, trees were exposed for 2 years to either 25% less thrufall, normal thrufall, or 25% additional thrufall.Undercanopy construction in these plots moderately...
A Method to Study Response of Large Trees to Different Amounts of Available Soil Water

Treesearch

Donald H. Marx; Shi-jean S. Sung; James S. Cunningham; Michael D. Thompson; Linda M. White

1995-01-01

A method was developed to manipulate available soil water on large trees by intercepting thrufall with gutters placed under tree canopies and irrigating the intercepted thrufall onto other trees. With this design, trees were exposed for 2 years to either 25 percent less thrufall, normal tbrufall,or 25 percent additional thrufall. Undercanopy construction in these plots...
Bioinformatics in proteomics: application, terminology, and pitfalls.

PubMed

Wiemer, Jan C; Prokudin, Alexander

2004-01-01

Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.
Phylogenetic trees and Euclidean embeddings.

PubMed

Layer, Mark; Rhodes, John A

2017-01-01

It was recently observed by de Vienne et al. (Syst Biol 60(6):826-832, 2011) that a simple square root transformation of distances between taxa on a phylogenetic tree allowed for an embedding of the taxa into Euclidean space. While the justification for this was based on a diffusion model of continuous character evolution along the tree, here we give a direct and elementary explanation for it that provides substantial additional insight. We use this embedding to reinterpret the differences between the NJ and BIONJ tree building algorithms, providing one illustration of how this embedding reflects tree structures in data.
The Efficacy of Consensus Tree Methods for Summarizing Phylogenetic Relationships from a Posterior Sample of Trees Estimated from Morphological Data.

PubMed

O'Reilly, Joseph E; Donoghue, Philip C J

2018-03-01

Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data.
The Efficacy of Consensus Tree Methods for Summarizing Phylogenetic Relationships from a Posterior Sample of Trees Estimated from Morphological Data

PubMed Central

O’Reilly, Joseph E; Donoghue, Philip C J

2018-01-01

Abstract Consensus trees are required to summarize trees obtained through MCMC sampling of a posterior distribution, providing an overview of the distribution of estimated parameters such as topology, branch lengths, and divergence times. Numerous consensus tree construction methods are available, each presenting a different interpretation of the tree sample. The rise of morphological clock and sampled-ancestor methods of divergence time estimation, in which times and topology are coestimated, has increased the popularity of the maximum clade credibility (MCC) consensus tree method. The MCC method assumes that the sampled, fully resolved topology with the highest clade credibility is an adequate summary of the most probable clades, with parameter estimates from compatible sampled trees used to obtain the marginal distributions of parameters such as clade ages and branch lengths. Using both simulated and empirical data, we demonstrate that MCC trees, and trees constructed using the similar maximum a posteriori (MAP) method, often include poorly supported and incorrect clades when summarizing diffuse posterior samples of trees. We demonstrate that the paucity of information in morphological data sets contributes to the inability of MCC and MAP trees to accurately summarise of the posterior distribution. Conversely, majority-rule consensus (MRC) trees represent a lower proportion of incorrect nodes when summarizing the same posterior samples of trees. Thus, we advocate the use of MRC trees, in place of MCC or MAP trees, in attempts to summarize the results of Bayesian phylogenetic analyses of morphological data. PMID:29106675
A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species

PubMed Central

Redelings, Benjamin D.

2017-01-01

We present a new supertree method that enables rapid estimation of a summary tree on the scale of millions of leaves. This supertree method summarizes a collection of input phylogenies and an input taxonomy. We introduce formal goals and criteria for such a supertree to satisfy in order to transparently and justifiably represent the input trees. In addition to producing a supertree, our method computes annotations that describe which grouping in the input trees support and conflict with each group in the supertree. We compare our supertree construction method to a previously published supertree construction method by assessing their performance on input trees used to construct the Open Tree of Life version 4, and find that our method increases the number of displayed input splits from 35,518 to 39,639 and decreases the number of conflicting input splits from 2,760 to 1,357. The new supertree method also improves on the previous supertree construction method in that it produces no unsupported branches and avoids unnecessary polytomies. This pipeline is currently used by the Open Tree of Life project to produce all of the versions of project’s “synthetic tree” starting at version 5. This software pipeline is called “propinquity”. It relies heavily on “otcetera”—a set of C++ tools to perform most of the steps of the pipeline. All of the components are free software and are available on GitHub. PMID:28265520
On the Suitability of Suffix Arrays for Lempel-Ziv Data Compression

NASA Astrophysics Data System (ADS)

Ferreira, Artur J.; Oliveira, Arlindo L.; Figueiredo, Mário A. T.

Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used nowadays. Regarding time and memory requirements, LZ encoding is much more demanding than decoding. In order to speed up the encoding process, efficient data structures, like suffix trees, have been used. In this paper, we explore the use of suffix arrays to hold the dictionary of the LZ encoder, and propose an algorithm to search over it. We show that the resulting encoder attains roughly the same compression ratios as those based on suffix trees. However, the amount of memory required by the suffix array is fixed, and much lower than the variable amount of memory used by encoders based on suffix trees (which depends on the text to encode). We conclude that suffix arrays, when compared to suffix trees in terms of the trade-off among time, memory, and compression ratio, may be preferable in scenarios (e.g., embedded systems) where memory is at a premium and high speed is not critical.
K-Means Algorithm Performance Analysis With Determining The Value Of Starting Centroid With Random And KD-Tree Method

NASA Astrophysics Data System (ADS)

Sirait, Kamson; Tulus; Budhiarti Nababan, Erna

2017-12-01

Clustering methods that have high accuracy and time efficiency are necessary for the filtering process. One method that has been known and applied in clustering is K-Means Clustering. In its application, the determination of the begining value of the cluster center greatly affects the results of the K-Means algorithm. This research discusses the results of K-Means Clustering with starting centroid determination with a random and KD-Tree method. The initial determination of random centroid on the data set of 1000 student academic data to classify the potentially dropout has a sse value of 952972 for the quality variable and 232.48 for the GPA, whereas the initial centroid determination by KD-Tree has a sse value of 504302 for the quality variable and 214,37 for the GPA variable. The smaller sse values indicate that the result of K-Means Clustering with initial KD-Tree centroid selection have better accuracy than K-Means Clustering method with random initial centorid selection.

Scalable Regression Tree Learning on Hadoop using OpenPlanet

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yin, Wei; Simmhan, Yogesh; Prasanna, Viktor

As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework usingmore » a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.« less
Artificial intelligence techniques applied to the development of a decision–support system for diagnosing celiac disease

PubMed Central

Tenório, Josceli Maria; Hummel, Anderson Diniz; Cohrs, Frederico Molina; Sdepanian, Vera Lucia; Pisa, Ivan Torres; de Fátima Marin, Heimar

2013-01-01

Background Celiac disease (CD) is a difficult-to-diagnose condition because of its multiple clinical presentations and symptoms shared with other diseases. Gold-standard diagnostic confirmation of suspected CD is achieved by biopsying the small intestine. Objective To develop a clinical decision–support system (CDSS) integrated with an automated classifier to recognize CD cases, by selecting from experimental models developed using intelligence artificial techniques. Methods A web-based system was designed for constructing a retrospective database that included 178 clinical cases for training. Tests were run on 270 automated classifiers available in Weka 3.6.1 using five artificial intelligence techniques, namely decision trees, Bayesian inference, k-nearest neighbor algorithm, support vector machines and artificial neural networks. The parameters evaluated were accuracy, sensitivity, specificity and area under the ROC curve (AUC). AUC was used as a criterion for selecting the CDSS algorithm. A testing database was constructed including 38 clinical CD cases for CDSS evaluation. The diagnoses suggested by CDSS were compared with those made by physicians during patient consultations. Results The most accurate method during the training phase was the averaged one-dependence estimator (AODE) algorithm (a Bayesian classifier), which showed accuracy 80.0%, sensitivity 0.78, specificity 0.80 and AUC 0.84. This classifier was integrated into the web-based decision–support system. The gold-standard validation of CDSS achieved accuracy of 84.2% and k = 0.68 (p < 0.0001) with good agreement. The same accuracy was achieved in the comparison between the physician’s diagnostic impression and the gold standard k = 0. 64 (p < 0.0001). There was moderate agreement between the physician’s diagnostic impression and CDSS k = 0.46 (p = 0.0008). Conclusions The study results suggest that CDSS could be used to help in diagnosing CD, since the algorithm tested achieved excellent accuracy in differentiating possible positive from negative CD diagnoses. This study may contribute towards developing of a computer-assisted environment to support CD diagnosis. PMID:21917512
Artificial intelligence techniques applied to the development of a decision-support system for diagnosing celiac disease.

PubMed

Tenório, Josceli Maria; Hummel, Anderson Diniz; Cohrs, Frederico Molina; Sdepanian, Vera Lucia; Pisa, Ivan Torres; de Fátima Marin, Heimar

2011-11-01

Celiac disease (CD) is a difficult-to-diagnose condition because of its multiple clinical presentations and symptoms shared with other diseases. Gold-standard diagnostic confirmation of suspected CD is achieved by biopsying the small intestine. To develop a clinical decision-support system (CDSS) integrated with an automated classifier to recognize CD cases, by selecting from experimental models developed using intelligence artificial techniques. A web-based system was designed for constructing a retrospective database that included 178 clinical cases for training. Tests were run on 270 automated classifiers available in Weka 3.6.1 using five artificial intelligence techniques, namely decision trees, Bayesian inference, k-nearest neighbor algorithm, support vector machines and artificial neural networks. The parameters evaluated were accuracy, sensitivity, specificity and area under the ROC curve (AUC). AUC was used as a criterion for selecting the CDSS algorithm. A testing database was constructed including 38 clinical CD cases for CDSS evaluation. The diagnoses suggested by CDSS were compared with those made by physicians during patient consultations. The most accurate method during the training phase was the averaged one-dependence estimator (AODE) algorithm (a Bayesian classifier), which showed accuracy 80.0%, sensitivity 0.78, specificity 0.80 and AUC 0.84. This classifier was integrated into the web-based decision-support system. The gold-standard validation of CDSS achieved accuracy of 84.2% and k=0.68 (p<0.0001) with good agreement. The same accuracy was achieved in the comparison between the physician's diagnostic impression and the gold standard k=0. 64 (p<0.0001). There was moderate agreement between the physician's diagnostic impression and CDSS k=0.46 (p=0.0008). The study results suggest that CDSS could be used to help in diagnosing CD, since the algorithm tested achieved excellent accuracy in differentiating possible positive from negative CD diagnoses. This study may contribute towards developing of a computer-assisted environment to support CD diagnosis. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Niche construction within riparian corridors. Part II: The unexplored role of positive intraspecific interactions in Salicaceae species

NASA Astrophysics Data System (ADS)

Corenblit, Dov; Garófano-Gómez, Virginia; González, Eduardo; Hortobágyi, Borbála; Julien, Frédéric; Lambs, Luc; Otto, Thierry; Roussel, Erwan; Steiger, Johannes; Tabacchi, Eric; Till-Bottraud, Irène

2018-03-01

Within riparian corridors, Salicaceae trees and shrubs affect hydrogeomorphic processes and lead to the formation of wooded fluvial landforms. These trees form dense stands and enhance plant anchorage, as grouped plants are less prone to be uprooted than free-standing individuals. This also enhances their role as ecosystem engineers through the trapping of sediment, organic matter, and nutrients. The landform formation caused by these wooded biogeomorphic landforms probably represents a positive niche construction, which ultimately leads, through facilitative processes, to an improved capacity of the individual trees to survive, exploit resources, and reach sexual maturity in the interval between destructive floods. The facilitative effects of riparian vegetation are well established; however, the nature and intensity of biotic interactions among trees of the same species forming dense woody stands and constructing the niche remain unclear. Our hypothesis is that the niche construction process also comprises more direct intraspecific interactions, such as cooperation or altruism. Our aim in this paper is to propose an original theoretical framework for positive intraspecific interactions among riparian Salicaceae species operating from establishment to sexual maturity. Within this framework, we speculate that (i) positive intraspecific interactions among trees are maximized in dynamic river reaches; (ii) during establishment, intraspecific facilitation (or helping) occurs among trees and this leads to the maintenance of a dense stand that improves survival and growth because saplings protect each other from shear stress and scour; (iii) in addition to the improved capacity to trap mineral and organic matter, individuals that constitute the dense stand can cooperate to mutually support a mycorrhizal network that will connect plants, soil, and groundwater and influence nutrient transfer, cycling, and storage within the shared constructed niche; (iv) during post-establishment, roots form functional grafts between neighbouring trees to increase biomechanical and physiological anchorage as well as nutrient acquisition and exchange; and (v) these stands remain dense on alluvial bars until a threshold of landform construction and hydrogeomorphic disconnection is reached. At this last stage, intraspecific competition for resources (light and nutrients) increases, inducing a density reduction in the aerial stand (i.e., self-thinning), but root systems of altruistic individuals could remain functional via root grafting. Finally, we suggest new methodological perspectives for testing our hypotheses related to the occurrence of positive intraspecific interactions among Salicaceae trees in fluvial landform and niche construction through in situ and ex situ experiments.
Automatic red eye correction and its quality metric

NASA Astrophysics Data System (ADS)

Safonov, Ilia V.; Rychagov, Michael N.; Kang, KiMin; Kim, Sang Ho

2008-01-01

The red eye artifacts are troublesome defect of amateur photos. Correction of red eyes during printing without user intervention and making photos more pleasant for an observer are important tasks. The novel efficient technique of automatic correction of red eyes aimed for photo printers is proposed. This algorithm is independent from face orientation and capable to detect paired red eyes as well as single red eyes. The approach is based on application of 3D tables with typicalness levels for red eyes and human skin tones and directional edge detection filters for processing of redness image. Machine learning is applied for feature selection. For classification of red eye regions a cascade of classifiers including Gentle AdaBoost committee from Classification and Regression Trees (CART) is applied. Retouching stage includes desaturation, darkening and blending with initial image. Several versions of approach implementation using trade-off between detection and correction quality, processing time, memory volume are possible. The numeric quality criterion of automatic red eye correction is proposed. This quality metric is constructed by applying Analytic Hierarchy Process (AHP) for consumer opinions about correction outcomes. Proposed numeric metric helped to choose algorithm parameters via optimization procedure. Experimental results demonstrate high accuracy and efficiency of the proposed algorithm in comparison with existing solutions.
Improvement of the cost-benefit analysis algorithm for high-rise construction projects

NASA Astrophysics Data System (ADS)

Gafurov, Andrey; Skotarenko, Oksana; Plotnikov, Vladimir

2018-03-01

The specific nature of high-rise investment projects entailing long-term construction, high risks, etc. implies a need to improve the standard algorithm of cost-benefit analysis. An improved algorithm is described in the article. For development of the improved algorithm of cost-benefit analysis for high-rise construction projects, the following methods were used: weighted average cost of capital, dynamic cost-benefit analysis of investment projects, risk mapping, scenario analysis, sensitivity analysis of critical ratios, etc. This comprehensive approach helped to adapt the original algorithm to feasibility objectives in high-rise construction. The authors put together the algorithm of cost-benefit analysis for high-rise construction projects on the basis of risk mapping and sensitivity analysis of critical ratios. The suggested project risk management algorithms greatly expand the standard algorithm of cost-benefit analysis in investment projects, namely: the "Project analysis scenario" flowchart, improving quality and reliability of forecasting reports in investment projects; the main stages of cash flow adjustment based on risk mapping for better cost-benefit project analysis provided the broad range of risks in high-rise construction; analysis of dynamic cost-benefit values considering project sensitivity to crucial variables, improving flexibility in implementation of high-rise projects.
Evaluation of supervised machine-learning algorithms to distinguish between inflammatory bowel disease and alimentary lymphoma in cats.

PubMed

Awaysheh, Abdullah; Wilcke, Jeffrey; Elvinger, François; Rees, Loren; Fan, Weiguo; Zimmerman, Kurt L

2016-11-01

Inflammatory bowel disease (IBD) and alimentary lymphoma (ALA) are common gastrointestinal diseases in cats. The very similar clinical signs and histopathologic features of these diseases make the distinction between them diagnostically challenging. We tested the use of supervised machine-learning algorithms to differentiate between the 2 diseases using data generated from noninvasive diagnostic tests. Three prediction models were developed using 3 machine-learning algorithms: naive Bayes, decision trees, and artificial neural networks. The models were trained and tested on data from complete blood count (CBC) and serum chemistry (SC) results for the following 3 groups of client-owned cats: normal, inflammatory bowel disease (IBD), or alimentary lymphoma (ALA). Naive Bayes and artificial neural networks achieved higher classification accuracy (sensitivities of 70.8% and 69.2%, respectively) than the decision tree algorithm (63%, p < 0.0001). The areas under the receiver-operating characteristic curve for classifying cases into the 3 categories was 83% by naive Bayes, 79% by decision tree, and 82% by artificial neural networks. Prediction models using machine learning provided a method for distinguishing between ALA-IBD, ALA-normal, and IBD-normal. The naive Bayes and artificial neural networks classifiers used 10 and 4 of the CBC and SC variables, respectively, to outperform the C4.5 decision tree, which used 5 CBC and SC variables in classifying cats into the 3 classes. These models can provide another noninvasive diagnostic tool to assist clinicians with differentiating between IBD and ALA, and between diseased and nondiseased cats. © 2016 The Author(s).
Semiautomated landscape feature extraction and modeling

NASA Astrophysics Data System (ADS)

Wasilewski, Anthony A.; Faust, Nickolas L.; Ribarsky, William

2001-08-01

We have developed a semi-automated procedure for generating correctly located 3D tree objects form overhead imagery. Cross-platform software partitions arbitrarily large, geocorrected and geolocated imagery into management sub- images. The user manually selected tree areas from one or more of these sub-images. Tree group blobs are then narrowed to lines using a special thinning algorithm which retains the topology of the blobs, and also stores the thickness of the parent blob. Maxima along these thinned tree grous are found, and used as individual tree locations within the tree group. Magnitudes of the local maxima are used to scale the radii of the tree objects. Grossly overlapping trees are culled based on a comparison of tree-tree distance to combined radii. Tree color is randomly selected based on the distribution of sample tree pixels, and height is estimated form tree radius. The final tree objects are then inserted into a terrain database which can be navigated by VGIS, a high-resolution global terrain visualization system developed at Georgia Tech.
Scalable Nearest Neighbor Algorithms for High Dimensional Data.

PubMed

Muja, Marius; Lowe, David G

2014-11-01

For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
Not seeing the forest for the trees: size of the minimum spanning trees (MSTs) forest and branch significance in MST-based phylogenetic analysis.

PubMed

Teixeira, Andreia Sofia; Monteiro, Pedro T; Carriço, João A; Ramirez, Mário; Francisco, Alexandre P

2015-01-01

Trees, including minimum spanning trees (MSTs), are commonly used in phylogenetic studies. But, for the research community, it may be unclear that the presented tree is just a hypothesis, chosen from among many possible alternatives. In this scenario, it is important to quantify our confidence in both the trees and the branches/edges included in such trees. In this paper, we address this problem for MSTs by introducing a new edge betweenness metric for undirected and weighted graphs. This spanning edge betweenness metric is defined as the fraction of equivalent MSTs where a given edge is present. The metric provides a per edge statistic that is similar to that of the bootstrap approach frequently used in phylogenetics to support the grouping of taxa. We provide methods for the exact computation of this metric based on the well known Kirchhoff's matrix tree theorem. Moreover, we implement and make available a module for the PHYLOViZ software and evaluate the proposed metric concerning both effectiveness and computational performance. Analysis of trees generated using multilocus sequence typing data (MLST) and the goeBURST algorithm revealed that the space of possible MSTs in real data sets is extremely large. Selection of the edge to be represented using bootstrap could lead to unreliable results since alternative edges are present in the same fraction of equivalent MSTs. The choice of the MST to be presented, results from criteria implemented in the algorithm that must be based in biologically plausible models.
A voxel-based technique to estimate the volume of trees from terrestrial laser scanner data

NASA Astrophysics Data System (ADS)

Bienert, A.; Hess, C.; Maas, H.-G.; von Oheimb, G.

2014-06-01

The precise determination of the volume of standing trees is very important for ecological and economical considerations in forestry. If terrestrial laser scanner data are available, a simple approach for volume determination is given by allocating points into a voxel structure and subsequently counting the filled voxels. Generally, this method will overestimate the volume. The paper presents an improved algorithm to estimate the wood volume of trees using a voxel-based method which will correct for the overestimation. After voxel space transformation, each voxel which contains points is reduced to the volume of its surrounding bounding box. In a next step, occluded (inner stem) voxels are identified by a neighbourhood analysis sweeping in the X and Y direction of each filled voxel. Finally, the wood volume of the tree is composed by the sum of the bounding box volumes of the outer voxels and the volume of all occluded inner voxels. Scan data sets from several young Norway maple trees (Acer platanoides) were used to analyse the algorithm. Therefore, the scanned trees as well as their representing point clouds were separated in different components (stem, branches) to make a meaningful comparison. Two reference measurements were performed for validation: A direct wood volume measurement by placing the tree components into a water tank, and a frustum calculation of small trunk segments by measuring the radii along the trunk. Overall, the results show slightly underestimated volumes (-0.3% for a probe of 13 trees) with a RMSE of 11.6% for the individual tree volume calculated with the new approach.
A wideband fast multipole boundary element method for half-space/plane-symmetric acoustic wave problems

NASA Astrophysics Data System (ADS)

Zheng, Chang-Jun; Chen, Hai-Bo; Chen, Lei-Lei

2013-04-01

This paper presents a novel wideband fast multipole boundary element approach to 3D half-space/plane-symmetric acoustic wave problems. The half-space fundamental solution is employed in the boundary integral equations so that the tree structure required in the fast multipole algorithm is constructed for the boundary elements in the real domain only. Moreover, a set of symmetric relations between the multipole expansion coefficients of the real and image domains are derived, and the half-space fundamental solution is modified for the purpose of applying such relations to avoid calculating, translating and saving the multipole/local expansion coefficients of the image domain. The wideband adaptive multilevel fast multipole algorithm associated with the iterative solver GMRES is employed so that the present method is accurate and efficient for both lowand high-frequency acoustic wave problems. As for exterior acoustic problems, the Burton-Miller method is adopted to tackle the fictitious eigenfrequency problem involved in the conventional boundary integral equation method. Details on the implementation of the present method are described, and numerical examples are given to demonstrate its accuracy and efficiency.
Sequential Test Strategies for Multiple Fault Isolation

NASA Technical Reports Server (NTRS)

Shakeri, M.; Pattipati, Krishna R.; Raghavan, V.; Patterson-Hine, Ann; Kell, T.

1997-01-01

In this paper, we consider the problem of constructing near optimal test sequencing algorithms for diagnosing multiple faults in redundant (fault-tolerant) systems. The computational complexity of solving the optimal multiple-fault isolation problem is super-exponential, that is, it is much more difficult than the single-fault isolation problem, which, by itself, is NP-hard. By employing concepts from information theory and Lagrangian relaxation, we present several static and dynamic (on-line or interactive) test sequencing algorithms for the multiple fault isolation problem that provide a trade-off between the degree of suboptimality and computational complexity. Furthermore, we present novel diagnostic strategies that generate a static diagnostic directed graph (digraph), instead of a static diagnostic tree, for multiple fault diagnosis. Using this approach, the storage complexity of the overall diagnostic strategy reduces substantially. Computational results based on real-world systems indicate that the size of a static multiple fault strategy is strictly related to the structure of the system, and that the use of an on-line multiple fault strategy can diagnose faults in systems with as many as 10,000 failure sources.
A parallel algorithm for computing the eigenvalues of a symmetric tridiagonal matrix

NASA Technical Reports Server (NTRS)

Swarztrauber, Paul N.

1993-01-01

A parallel algorithm, called polysection, is presented for computing the eigenvalues of a symmetric tridiagonal matrix. The method is based on a quadratic recurrence in which the characteristic polynomial is constructed on a binary tree from polynomials whose degree doubles at each level. Intervals that contain exactly one zero are determined by the zeros of polynomials at the previous level which ensures that different processors compute different zeros. The signs of the polynomials at the interval endpoints are determined a priori and used to guarantee that all zeros are found. The use of finite-precision arithmetic may result in multiple zeros; however, in this case, the intervals coalesce and their number determines exactly the multiplicity of the zero. For an N x N matrix the eigenvalues can be determined in O(log-squared N) time with N-squared processors and O(N) time with N processors. The method is compared with a parallel variant of bisection that requires O(N-squared) time on a single processor, O(N) time with N processors, and O(log N) time with N-squared processors.
Combining Benford's Law and machine learning to detect money laundering. An actual Spanish court case.

PubMed

Badal-Valero, Elena; Alvarez-Jareño, José A; Pavía, Jose M

2018-01-01

This paper is based on the analysis of the database of operations from a macro-case on money laundering orchestrated between a core company and a group of its suppliers, 26 of which had already been identified by the police as fraudulent companies. In the face of a well-founded suspicion that more companies have perpetrated criminal acts and in order to make better use of what are very limited police resources, we aim to construct a tool to detect money laundering criminals. We combine Benford's Law and machine learning algorithms (logistic regression, decision trees, neural networks, and random forests) to find patterns of money laundering criminals in the context of a real Spanish court case. After mapping each supplier's set of accounting data into a 21-dimensional space using Benford's Law and applying machine learning algorithms, additional companies that could merit further scrutiny are flagged up. A new tool to detect money laundering criminals is proposed in this paper. The tool is tested in the context of a real case. Copyright © 2017 Elsevier B.V. All rights reserved.
The estimation of tree posterior probabilities using conditional clade probability distributions.

PubMed

Larget, Bret

2013-07-01

In this article I introduce the idea of conditional independence of separated subtrees as a principle by which to estimate the posterior probability of trees using conditional clade probability distributions rather than simple sample relative frequencies. I describe an algorithm for these calculations and software which implements these ideas. I show that these alternative calculations are very similar to simple sample relative frequencies for high probability trees but are substantially more accurate for relatively low probability trees. The method allows the posterior probability of unsampled trees to be calculated when these trees contain only clades that are in other sampled trees. Furthermore, the method can be used to estimate the total probability of the set of sampled trees which provides a measure of the thoroughness of a posterior sample.
Estimating babassu palm density using automatic palm tree detection with very high spatial resolution satellite images.

PubMed

Dos Santos, Alessio Moreira; Mitja, Danielle; Delaître, Eric; Demagistri, Laurent; de Souza Miranda, Izildinha; Libourel, Thérèse; Petit, Michel

2017-05-15

High spatial resolution images as well as image processing and object detection algorithms are recent technologies that aid the study of biodiversity and commercial plantations of forest species. This paper seeks to contribute knowledge regarding the use of these technologies by studying randomly dispersed native palm tree. Here, we analyze the automatic detection of large circular crown (LCC) palm tree using a high spatial resolution panchromatic GeoEye image (0.50 m) taken on the area of a community of small agricultural farms in the Brazilian Amazon. We also propose auxiliary methods to estimate the density of the LCC palm tree Attalea speciosa (babassu) based on the detection results. We used the "Compt-palm" algorithm based on the detection of palm tree shadows in open areas via mathematical morphology techniques and the spatial information was validated using field methods (i.e. structural census and georeferencing). The algorithm recognized individuals in life stages 5 and 6, and the extraction percentage, branching factor and quality percentage factors were used to evaluate its performance. A principal components analysis showed that the structure of the studied species differs from other species. Approximately 96% of the babassu individuals in stage 6 were detected. These individuals had significantly smaller stipes than the undetected ones. In turn, 60% of the stage 5 babassu individuals were detected, showing significantly a different total height and a different number of leaves from the undetected ones. Our calculations regarding resource availability indicate that 6870 ha contained 25,015 adult babassu palm tree, with an annual potential productivity of 27.4 t of almond oil. The detection of LCC palm tree and the implementation of auxiliary field methods to estimate babassu density is an important first step to monitor this industry resource that is extremely important to the Brazilian economy and thousands of families over a large scale. Copyright © 2017 Elsevier Ltd. All rights reserved.
Category of trees in representation theory of quantum algebras

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moskaliuk, N. M.; Moskaliuk, S. S., E-mail: mss@bitp.kiev.ua

2013-10-15

New applications of categorical methods are connected with new additional structures on categories. One of such structures in representation theory of quantum algebras, the category of Kuznetsov-Smorodinsky-Vilenkin-Smirnov (KSVS) trees, is constructed, whose objects are finite rooted KSVS trees and morphisms generated by the transition from a KSVS tree to another one.
Supersonic air jets preserve tree roots in underground pipeline installation

Treesearch

Rob Gross; Michelle Julene

2002-01-01

Tree roots are often damaged during construction projects, particularly during trenching operations for pipeline installation. Although mechanical soil excavation using heavy equipment, such as an excavator or backhoe is considered the fastest the most economical method, it damages and destroys tree roots and can lead to unintentional tree loss, poor public relations,...
A review of tree root conflicts with sidewalks, curbs, and roads

Treesearch

T.B. Randrup; E.G. McPherson; L.R. Costello

2003-01-01

Literature relevant to tree root and urban infrastructure conflicts is reviewed. Although tree roots can conflict with many infrastructure elements, sidewalk and curb conflicts are the focus of this review. Construction protocols, urban soils, root growth, and causal factors (soil conditions, limited planting space, tree size, variation in root architecture, management...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.