tree induction algorithm: Topics by Science.gov

Sample records for tree induction algorithm

Uncertain decision tree inductive inference

NASA Astrophysics Data System (ADS)

Zarban, L.; Jafari, S.; Fakhrahmad, S. M.

2011-10-01

Induction is the process of reasoning in which general rules are formulated based on limited observations of recurring phenomenal patterns. Decision tree learning is one of the most widely used and practical inductive methods, which represents the results in a tree scheme. Various decision tree algorithms have already been proposed such as CLS, ID3, Assistant C4.5, REPTree and Random Tree. These algorithms suffer from some major shortcomings. In this article, after discussing the main limitations of the existing methods, we introduce a new decision tree induction algorithm, which overcomes all the problems existing in its counterparts. The new method uses bit strings and maintains important information on them. This use of bit strings and logical operation on them causes high speed during the induction process. Therefore, it has several important features: it deals with inconsistencies in data, avoids overfitting and handles uncertainty. We also illustrate more advantages and the new features of the proposed method. The experimental results show the effectiveness of the method in comparison with other methods existing in the literature.
Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data.

PubMed

Barros, Rodrigo C; Winck, Ana T; Machado, Karina S; Basgalupp, Márcio P; de Carvalho, André C P L F; Ruiz, Duncan D; de Souza, Osmar Norberto

2012-11-21

This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor.
Automatic design of decision-tree induction algorithms tailored to flexible-receptor docking data

PubMed Central

2012-01-01

Background This paper addresses the prediction of the free energy of binding of a drug candidate with enzyme InhA associated with Mycobacterium tuberculosis. This problem is found within rational drug design, where interactions between drug candidates and target proteins are verified through molecular docking simulations. In this application, it is important not only to correctly predict the free energy of binding, but also to provide a comprehensible model that could be validated by a domain specialist. Decision-tree induction algorithms have been successfully used in drug-design related applications, specially considering that decision trees are simple to understand, interpret, and validate. There are several decision-tree induction algorithms available for general-use, but each one has a bias that makes it more suitable for a particular data distribution. In this article, we propose and investigate the automatic design of decision-tree induction algorithms tailored to particular drug-enzyme binding data sets. We investigate the performance of our new method for evaluating binding conformations of different drug candidates to InhA, and we analyze our findings with respect to decision tree accuracy, comprehensibility, and biological relevance. Results The empirical analysis indicates that our method is capable of automatically generating decision-tree induction algorithms that significantly outperform the traditional C4.5 algorithm with respect to both accuracy and comprehensibility. In addition, we provide the biological interpretation of the rules generated by our approach, reinforcing the importance of comprehensible predictive models in this particular bioinformatics application. Conclusions We conclude that automatically designing a decision-tree algorithm tailored to molecular docking data is a promising alternative for the prediction of the free energy from the binding of a drug candidate with a flexible-receptor. PMID:23171000
Comparison of rule induction, decision trees and formal concept analysis approaches for classification

NASA Astrophysics Data System (ADS)

Kotelnikov, E. V.; Milov, V. R.

2018-05-01

Rule-based learning algorithms have higher transparency and easiness to interpret in comparison with neural networks and deep learning algorithms. These properties make it possible to effectively use such algorithms to solve descriptive tasks of data mining. The choice of an algorithm depends also on its ability to solve predictive tasks. The article compares the quality of the solution of the problems with binary and multiclass classification based on the experiments with six datasets from the UCI Machine Learning Repository. The authors investigate three algorithms: Ripper (rule induction), C4.5 (decision trees), In-Close (formal concept analysis). The results of the experiments show that In-Close demonstrates the best quality of classification in comparison with Ripper and C4.5, however the latter two generate more compact rule sets.
FDT 2.0: Improving scalability of the fuzzy decision tree induction tool - integrating database storage.

PubMed

Durham, Erin-Elizabeth A; Yu, Xiaxia; Harrison, Robert W

2014-12-01

Effective machine-learning handles large datasets efficiently. One key feature of handling large data is the use of databases such as MySQL. The freeware fuzzy decision tree induction tool, FDT, is a scalable supervised-classification software tool implementing fuzzy decision trees. It is based on an optimized fuzzy ID3 (FID3) algorithm. FDT 2.0 improves upon FDT 1.0 by bridging the gap between data science and data engineering: it combines a robust decisioning tool with data retention for future decisions, so that the tool does not need to be recalibrated from scratch every time a new decision is required. In this paper we briefly review the analytical capabilities of the freeware FDT tool and its major features and functionalities; examples of large biological datasets from HIV, microRNAs and sRNAs are included. This work shows how to integrate fuzzy decision algorithms with modern database technology. In addition, we show that integrating the fuzzy decision tree induction tool with database storage allows for optimal user satisfaction in today's Data Analytics world.
Automated rule-base creation via CLIPS-Induce

NASA Technical Reports Server (NTRS)

Murphy, Patrick M.

1994-01-01

Many CLIPS rule-bases contain one or more rule groups that perform classification. In this paper we describe CLIPS-Induce, an automated system for the creation of a CLIPS classification rule-base from a set of test cases. CLIPS-Induce consists of two components, a decision tree induction component and a CLIPS production extraction component. ID3, a popular decision tree induction algorithm, is used to induce a decision tree from the test cases. CLIPS production extraction is accomplished through a top-down traversal of the decision tree. Nodes of the tree are used to construct query rules, and branches of the tree are used to construct classification rules. The learned CLIPS productions may easily be incorporated into a large CLIPS system that perform tasks such as accessing a database or displaying information.
Using Induction to Refine Information Retrieval Strategies

NASA Technical Reports Server (NTRS)

Baudin, Catherine; Pell, Barney; Kedar, Smadar

1994-01-01

Conceptual information retrieval systems use structured document indices, domain knowledge and a set of heuristic retrieval strategies to match user queries with a set of indices describing the document's content. Such retrieval strategies increase the set of relevant documents retrieved (increase recall), but at the expense of returning additional irrelevant documents (decrease precision). Usually in conceptual information retrieval systems this tradeoff is managed by hand and with difficulty. This paper discusses ways of managing this tradeoff by the application of standard induction algorithms to refine the retrieval strategies in an engineering design domain. We gathered examples of query/retrieval pairs during the system's operation using feedback from a user on the retrieved information. We then fed these examples to the induction algorithm and generated decision trees that refine the existing set of retrieval strategies. We found that (1) induction improved the precision on a set of queries generated by another user, without a significant loss in recall, and (2) in an interactive mode, the decision trees pointed out flaws in the retrieval and indexing knowledge and suggested ways to refine the retrieval strategies.
The use of decision trees and naïve Bayes algorithms and trace element patterns for controlling the authenticity of free-range-pastured hens' eggs.

PubMed

Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando

2014-09-01

This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation. © 2014 Institute of Food Technologists®
An Isometric Mapping Based Co-Location Decision Tree Algorithm

NASA Astrophysics Data System (ADS)

Zhou, G.; Wei, J.; Zhou, X.; Zhang, R.; Huang, W.; Sha, H.; Chen, J.

2018-05-01

Decision tree (DT) induction has been widely used in different pattern classification. However, most traditional DTs have the disadvantage that they consider only non-spatial attributes (ie, spectral information) as a result of classifying pixels, which can result in objects being misclassified. Therefore, some researchers have proposed a co-location decision tree (Cl-DT) method, which combines co-location and decision tree to solve the above the above-mentioned traditional decision tree problems. Cl-DT overcomes the shortcomings of the existing DT algorithms, which create a node for each value of a given attribute, which has a higher accuracy than the existing decision tree approach. However, for non-linearly distributed data instances, the euclidean distance between instances does not reflect the true positional relationship between them. In order to overcome these shortcomings, this paper proposes an isometric mapping method based on Cl-DT (called, (Isomap-based Cl-DT), which is a method that combines heterogeneous and Cl-DT together. Because isometric mapping methods use geodetic distances instead of Euclidean distances between non-linearly distributed instances, the true distance between instances can be reflected. The experimental results and several comparative analyzes show that: (1) The extraction method of exposed carbonate rocks is of high accuracy. (2) The proposed method has many advantages, because the total number of nodes, the number of leaf nodes and the number of nodes are greatly reduced compared to Cl-DT. Therefore, the Isomap -based Cl-DT algorithm can construct a more accurate and faster decision tree.
Technology transfer by means of fault tree synthesis

NASA Astrophysics Data System (ADS)

Batzias, Dimitris F.

2012-12-01

Since Fault Tree Analysis (FTA) attempts to model and analyze failure processes of engineering, it forms a common technique for good industrial practice. On the contrary, fault tree synthesis (FTS) refers to the methodology of constructing complex trees either from dentritic modules built ad hoc or from fault tress already used and stored in a Knowledge Base. In both cases, technology transfer takes place in a quasi-inductive mode, from partial to holistic knowledge. In this work, an algorithmic procedure, including 9 activity steps and 3 decision nodes is developed for performing effectively this transfer when the fault under investigation occurs within one of the latter stages of an industrial procedure with several stages in series. The main parts of the algorithmic procedure are: (i) the construction of a local fault tree within the corresponding production stage, where the fault has been detected, (ii) the formation of an interface made of input faults that might occur upstream, (iii) the fuzzy (to count for uncertainty) multicriteria ranking of these faults according to their significance, and (iv) the synthesis of an extended fault tree based on the construction of part (i) and on the local fault tree of the first-ranked fault in part (iii). An implementation is presented, referring to 'uneven sealing of Al anodic film', thus proving the functionality of the developed methodology.
An approach for automated fault diagnosis based on a fuzzy decision tree and boundary analysis of a reconstructed phase space.

PubMed

Aydin, Ilhan; Karakose, Mehmet; Akin, Erhan

2014-03-01

Although reconstructed phase space is one of the most powerful methods for analyzing a time series, it can fail in fault diagnosis of an induction motor when the appropriate pre-processing is not performed. Therefore, boundary analysis based a new feature extraction method in phase space is proposed for diagnosis of induction motor faults. The proposed approach requires the measurement of one phase current signal to construct the phase space representation. Each phase space is converted into an image, and the boundary of each image is extracted by a boundary detection algorithm. A fuzzy decision tree has been designed to detect broken rotor bars and broken connector faults. The results indicate that the proposed approach has a higher recognition rate than other methods on the same dataset. © 2013 ISA Published by ISA All rights reserved.
Statistical Methods in Ai: Rare Event Learning Using Associative Rules and Higher-Order Statistics

NASA Astrophysics Data System (ADS)

Iyer, V.; Shetty, S.; Iyengar, S. S.

2015-07-01

Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (t) in our proposed ensemble always yields minimum number (m) of leafs keeping pre-processing computation to n × t log m compared to N2 for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.
Concurrent approach for evolving compact decision rule sets

NASA Astrophysics Data System (ADS)

Marmelstein, Robert E.; Hammack, Lonnie P.; Lamont, Gary B.

1999-02-01

The induction of decision rules from data is important to many disciplines, including artificial intelligence and pattern recognition. To improve the state of the art in this area, we introduced the genetic rule and classifier construction environment (GRaCCE). It was previously shown that GRaCCE consistently evolved decision rule sets from data, which were significantly more compact than those produced by other methods (such as decision tree algorithms). The primary disadvantage of GRaCCe, however, is its relatively poor run-time execution performance. In this paper, a concurrent version of the GRaCCE architecture is introduced, which improves the efficiency of the original algorithm. A prototype of the algorithm is tested on an in- house parallel processor configuration and the results are discussed.
Toward a Principled Sampling Theory for Quasi-Orders

PubMed Central

Ünlü, Ali; Schrepp, Martin

2016-01-01

Quasi-orders, that is, reflexive and transitive binary relations, have numerous applications. In educational theories, the dependencies of mastery among the problems of a test can be modeled by quasi-orders. Methods such as item tree or Boolean analysis that mine for quasi-orders in empirical data are sensitive to the underlying quasi-order structure. These data mining techniques have to be compared based on extensive simulation studies, with unbiased samples of randomly generated quasi-orders at their basis. In this paper, we develop techniques that can provide the required quasi-order samples. We introduce a discrete doubly inductive procedure for incrementally constructing the set of all quasi-orders on a finite item set. A randomization of this deterministic procedure allows us to generate representative samples of random quasi-orders. With an outer level inductive algorithm, we consider the uniform random extensions of the trace quasi-orders to higher dimension. This is combined with an inner level inductive algorithm to correct the extensions that violate the transitivity property. The inner level correction step entails sampling biases. We propose three algorithms for bias correction and investigate them in simulation. It is evident that, on even up to 50 items, the new algorithms create close to representative quasi-order samples within acceptable computing time. Hence, the principled approach is a significant improvement to existing methods that are used to draw quasi-orders uniformly at random but cannot cope with reasonably large item sets. PMID:27965601
Toward a Principled Sampling Theory for Quasi-Orders.

PubMed

Ünlü, Ali; Schrepp, Martin

2016-01-01

Quasi-orders, that is, reflexive and transitive binary relations, have numerous applications. In educational theories, the dependencies of mastery among the problems of a test can be modeled by quasi-orders. Methods such as item tree or Boolean analysis that mine for quasi-orders in empirical data are sensitive to the underlying quasi-order structure. These data mining techniques have to be compared based on extensive simulation studies, with unbiased samples of randomly generated quasi-orders at their basis. In this paper, we develop techniques that can provide the required quasi-order samples. We introduce a discrete doubly inductive procedure for incrementally constructing the set of all quasi-orders on a finite item set. A randomization of this deterministic procedure allows us to generate representative samples of random quasi-orders. With an outer level inductive algorithm, we consider the uniform random extensions of the trace quasi-orders to higher dimension. This is combined with an inner level inductive algorithm to correct the extensions that violate the transitivity property. The inner level correction step entails sampling biases. We propose three algorithms for bias correction and investigate them in simulation. It is evident that, on even up to 50 items, the new algorithms create close to representative quasi-order samples within acceptable computing time. Hence, the principled approach is a significant improvement to existing methods that are used to draw quasi-orders uniformly at random but cannot cope with reasonably large item sets.
PRIM versus CART in subgroup discovery: when patience is harmful.

PubMed

Abu-Hanna, Ameen; Nannings, Barry; Dongelmans, Dave; Hasman, Arie

2010-10-01

We systematically compare the established algorithms CART (Classification and Regression Trees) and PRIM (Patient Rule Induction Method) in a subgroup discovery task on a large real-world high-dimensional clinical database. Contrary to current conjectures, PRIM's performance was generally inferior to CART's. PRIM often considered "peeling of" a large chunk of data at a value of a relevant discrete ordinal variable unattractive, ultimately missing an important subgroup. This finding has considerable significance in clinical medicine where ordinal scores are ubiquitous. PRIM's utility in clinical databases would increase when global information about (ordinal) variables is better put to use and when the search algorithm keeps track of alternative solutions.
Knowledge discovery with classification rules in a cardiovascular dataset.

PubMed

Podgorelec, Vili; Kokol, Peter; Stiglic, Milojka Molan; Hericko, Marjan; Rozman, Ivan

2005-12-01

In this paper we study an evolutionary machine learning approach to data mining and knowledge discovery based on the induction of classification rules. A method for automatic rules induction called AREX using evolutionary induction of decision trees and automatic programming is introduced. The proposed algorithm is applied to a cardiovascular dataset consisting of different groups of attributes which should possibly reveal the presence of some specific cardiovascular problems in young patients. A case study is presented that shows the use of AREX for the classification of patients and for discovering possible new medical knowledge from the dataset. The defined knowledge discovery loop comprises a medical expert's assessment of induced rules to drive the evolution of rule sets towards more appropriate solutions. The final result is the discovery of a possible new medical knowledge in the field of pediatric cardiology.
Precision assessment of some supervised and unsupervised algorithms for genotype discrimination in the genus Pisum using SSR molecular data.

PubMed

Nasiri, Jaber; Naghavi, Mohammad Reza; Kayvanjoo, Amir Hossein; Nasiri, Mojtaba; Ebrahimi, Mansour

2015-03-07

For the first time, prediction accuracies of some supervised and unsupervised algorithms were evaluated in an SSR-based DNA fingerprinting study of a pea collection containing 20 cultivars and 57 wild samples. In general, according to the 10 attribute weighting models, the SSR alleles of PEAPHTAP-2 and PSBLOX13.2-1 were the two most important attributes to generate discrimination among eight different species and subspecies of genus Pisum. In addition, K-Medoids unsupervised clustering run on Chi squared dataset exhibited the best prediction accuracy (83.12%), while the lowest accuracy (25.97%) gained as K-Means model ran on FCdb database. Irrespective of some fluctuations, the overall accuracies of tree induction models were significantly high for many algorithms, and the attributes PSBLOX13.2-3 and PEAPHTAP could successfully detach Pisum fulvum accessions and cultivars from the others when two selected decision trees were taken into account. Meanwhile, the other used supervised algorithms exhibited overall reliable accuracies, even though in some rare cases, they gave us low amounts of accuracies. Our results, altogether, demonstrate promising applications of both supervised and unsupervised algorithms to provide suitable data mining tools regarding accurate fingerprinting of different species and subspecies of genus Pisum, as a fundamental priority task in breeding programs of the crop. Copyright © 2015 Elsevier Ltd. All rights reserved.
Protein attributes contribute to halo-stability, bioinformatics approach

PubMed Central

2011-01-01

Halophile proteins can tolerate high salt concentrations. Understanding halophilicity features is the first step toward engineering halostable crops. To this end, we examined protein features contributing to the halo-toleration of halophilic organisms. We compared more than 850 features for halophilic and non-halophilic proteins with various screening, clustering, decision tree, and generalized rule induction models to search for patterns that code for halo-toleration. Up to 251 protein attributes selected by various attribute weighting algorithms as important features contribute to halo-stability; from them 14 attributes selected by 90% of models and the count of hydrogen gained the highest value (1.0) in 70% of attribute weighting models, showing the importance of this attribute in feature selection modeling. The other attributes mostly were the frequencies of di-peptides. No changes were found in the numbers of groups when K-Means and TwoStep clustering modeling were performed on datasets with or without feature selection filtering. Although the depths of induced trees were not high, the accuracies of trees were higher than 94% and the frequency of hydrophobic residues pointed as the most important feature to build trees. The performance evaluation of decision tree models had the same values and the best correctness percentage recorded with the Exhaustive CHAID and CHAID models. We did not find any significant difference in the percent of correctness, performance evaluation, and mean correctness of various decision tree models with or without feature selection. For the first time, we analyzed the performance of different screening, clustering, and decision tree algorithms for discriminating halophilic and non-halophilic proteins and the results showed that amino acid composition can be used to discriminate between halo-tolerant and halo-sensitive proteins. PMID:21592393
PhySIC: a veto supertree method with desirable properties.

PubMed

Ranwez, Vincent; Berry, Vincent; Criscuolo, Alexis; Fabre, Pierre-Henri; Guillemot, Sylvain; Scornavacca, Celine; Douzery, Emmanuel J P

2007-10-01

This paper focuses on veto supertree methods; i.e., methods that aim at producing a conservative synthesis of the relationships agreed upon by all source trees. We propose desirable properties that a supertree should satisfy in this framework, namely the non-contradiction property (PC) and the induction property (PI). The former requires that the supertree does not contain relationships that contradict one or a combination of the source topologies, whereas the latter requires that all topological information contained in the supertree is present in a source tree or collectively induced by several source trees. We provide simple examples to illustrate their relevance and that allow a comparison with previously advocated properties. We show that these properties can be checked in polynomial time for any given rooted supertree. Moreover, we introduce the PhySIC method (PHYlogenetic Signal with Induction and non-Contradiction). For k input trees spanning a set of n taxa, this method produces a supertree that satisfies the above-mentioned properties in O(kn(3) + n(4)) computing time. The polytomies of the produced supertree are also tagged by labels indicating areas of conflict as well as those with insufficient overlap. As a whole, PhySIC enables the user to quickly summarize consensual information of a set of trees and localize groups of taxa for which the data require consolidation. Lastly, we illustrate the behaviour of PhySIC on primate data sets of various sizes, and propose a supertree covering 95% of all primate extant genera. The PhySIC algorithm is available at http://atgc.lirmm.fr/cgi-bin/PhySIC.

Understanding the undelaying mechanism of HA-subtyping in the level of physic-chemical characteristics of protein.

PubMed

Ebrahimi, Mansour; Aghagolzadeh, Parisa; Shamabadi, Narges; Tahmasebi, Ahmad; Alsharifi, Mohammed; Adelson, David L; Hemmatzadeh, Farhid; Ebrahimie, Esmaeil

2014-01-01

The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics.
Understanding the Underlying Mechanism of HA-Subtyping in the Level of Physic-Chemical Characteristics of Protein

PubMed Central

Ebrahimi, Mansour; Aghagolzadeh, Parisa; Shamabadi, Narges; Tahmasebi, Ahmad; Alsharifi, Mohammed; Adelson, David L.

2014-01-01

The evolution of the influenza A virus to increase its host range is a major concern worldwide. Molecular mechanisms of increasing host range are largely unknown. Influenza surface proteins play determining roles in reorganization of host-sialic acid receptors and host range. In an attempt to uncover the physic-chemical attributes which govern HA subtyping, we performed a large scale functional analysis of over 7000 sequences of 16 different HA subtypes. Large number (896) of physic-chemical protein characteristics were calculated for each HA sequence. Then, 10 different attribute weighting algorithms were used to find the key characteristics distinguishing HA subtypes. Furthermore, to discover machine leaning models which can predict HA subtypes, various Decision Tree, Support Vector Machine, Naïve Bayes, and Neural Network models were trained on calculated protein characteristics dataset as well as 10 trimmed datasets generated by attribute weighting algorithms. The prediction accuracies of the machine learning methods were evaluated by 10-fold cross validation. The results highlighted the frequency of Gln (selected by 80% of attribute weighting algorithms), percentage/frequency of Tyr, percentage of Cys, and frequencies of Try and Glu (selected by 70% of attribute weighting algorithms) as the key features that are associated with HA subtyping. Random Forest tree induction algorithm and RBF kernel function of SVM (scaled by grid search) showed high accuracy of 98% in clustering and predicting HA subtypes based on protein attributes. Decision tree models were successful in monitoring the short mutation/reassortment paths by which influenza virus can gain the key protein structure of another HA subtype and increase its host range in a short period of time with less energy consumption. Extracting and mining a large number of amino acid attributes of HA subtypes of influenza A virus through supervised algorithms represent a new avenue for understanding and predicting possible future structure of influenza pandemics. PMID:24809455
Exact Algorithms for Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees.

PubMed

Kordi, Misagh; Bansal, Mukul S

2017-06-01

Duplication-Transfer-Loss (DTL) reconciliation is a powerful method for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation seeks to reconcile gene trees with species trees by postulating speciation, duplication, transfer, and loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. In practice, however, gene trees are often non-binary due to uncertainty in the gene tree topologies, and DTL reconciliation with non-binary gene trees is known to be NP-hard. In this paper, we present the first exact algorithms for DTL reconciliation with non-binary gene trees. Specifically, we (i) show that the DTL reconciliation problem for non-binary gene trees is fixed-parameter tractable in the maximum degree of the gene tree, (ii) present an exponential-time, but in-practice efficient, algorithm to track and enumerate all optimal binary resolutions of a non-binary input gene tree, and (iii) apply our algorithms to a large empirical data set of over 4700 gene trees from 100 species to study the impact of gene tree uncertainty on DTL-reconciliation and to demonstrate the applicability and utility of our algorithms. The new techniques and algorithms introduced in this paper will help biologists avoid incorrect evolutionary inferences caused by gene tree uncertainty.
Recursive algorithms for phylogenetic tree counting.

PubMed

Gavryushkina, Alexandra; Welch, David; Drummond, Alexei J

2013-10-28

In Bayesian phylogenetic inference we are interested in distributions over a space of trees. The number of trees in a tree space is an important characteristic of the space and is useful for specifying prior distributions. When all samples come from the same time point and no prior information available on divergence times, the tree counting problem is easy. However, when fossil evidence is used in the inference to constrain the tree or data are sampled serially, new tree spaces arise and counting the number of trees is more difficult. We describe an algorithm that is polynomial in the number of sampled individuals for counting of resolutions of a constraint tree assuming that the number of constraints is fixed. We generalise this algorithm to counting resolutions of a fully ranked constraint tree. We describe a quadratic algorithm for counting the number of possible fully ranked trees on n sampled individuals. We introduce a new type of tree, called a fully ranked tree with sampled ancestors, and describe a cubic time algorithm for counting the number of such trees on n sampled individuals. These algorithms should be employed for Bayesian Markov chain Monte Carlo inference when fossil data are included or data are serially sampled.
SDIA: A dynamic situation driven information fusion algorithm for cloud environment

NASA Astrophysics Data System (ADS)

Guo, Shuhang; Wang, Tong; Wang, Jian

2017-09-01

Information fusion is an important issue in information integration domain. In order to form an extensive information fusion technology under the complex and diverse situations, a new information fusion algorithm is proposed. Firstly, a fuzzy evaluation model of tag utility was proposed that can be used to count the tag entropy. Secondly, a ubiquitous situation tag tree model is proposed to define multidimensional structure of information situation. Thirdly, the similarity matching between the situation models is classified into three types: the tree inclusion, the tree embedding, and the tree compatibility. Next, in order to reduce the time complexity of the tree compatible matching algorithm, a fast and ordered tree matching algorithm is proposed based on the node entropy, which is used to support the information fusion by ubiquitous situation. Since the algorithm revolve from the graph theory of disordered tree matching algorithm, it can improve the information fusion present recall rate and precision rate in the situation. The information fusion algorithm is compared with the star and the random tree matching algorithm, and the difference between the three algorithms is analyzed in the view of isomorphism, which proves the innovation and applicability of the algorithm.
Coalescent-based species tree inference from gene tree topologies under incomplete lineage sorting by maximum likelihood.

PubMed

Wu, Yufeng

2012-03-01

Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets. © 2011 The Author. Evolution© 2011 The Society for the Study of Evolution.
Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems

NASA Astrophysics Data System (ADS)

Chang, Pei-Chann; Fan, Chin-Yuan; Wang, Yen-Wen

Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others.
2005 AG20/20 Annual Review

NASA Technical Reports Server (NTRS)

Ross, Kenton W.; McKellip, Rodney D.

2005-01-01

Topics covered include: Implementation and Validation of Sensor-Based Site-Specific Crop Management; Enhanced Management of Agricultural Perennial Systems (EMAPS) Using GIS and Remote Sensing; Validation and Application of Geospatial Information for Early Identification of Stress in Wheat; Adapting and Validating Precision Technologies for Cotton Production in the Mid-Southern United States - 2004 Progress Report; Development of a System to Automatically Geo-Rectify Images; Economics of Precision Agriculture Technologies in Cotton Production-AG 2020 Prescription Farming Automation Algorithms; Field Testing a Sensor-Based Applicator for Nitrogen and Phosphorus Application; Early Detection of Citrus Diseases Using Machine Vision and DGPS; Remote Sensing of Citrus Tree Stress Levels and Factors; Spectral-based Nitrogen Sensing for Citrus; Characterization of Tree Canopies; In-field Sensing of Shallow Water Tables and Hydromorphic Soils with an Electromagnetic Induction Profiler; Maintaining the Competitiveness of Tree Fruit Production Through Precision Agriculture; Modeling and Visualizing Terrain and Remote Sensing Data for Research and Education in Precision Agriculture; Thematic Soil Mapping and Crop-Based Strategies for Site-Specific Management; and Crop-Based Strategies for Site-Specific Management.
Finding Frequent Closed Itemsets in Sliding Window in Linear Time

NASA Astrophysics Data System (ADS)

Chen, Junbo; Zhou, Bo; Chen, Lu; Wang, Xinyu; Ding, Yiqun

One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent itemsets. Among such algorithms, the approach of mining closed itemsets has raised much interest in data mining community. The algorithms taking this approach include TITANIC [8], CLOSET+[6], DCI-Closed [4], FCI-Stream [3], GC-Tree [15], TGC-Tree [16] etc. Among these algorithms, FCI-Stream, GC-Tree and TGC-Tree are online algorithms work under sliding window environments. By the performance evaluation in [16], GC-Tree [15] is the fastest one. In this paper, an improved algorithm based on GC-Tree is proposed, the computational complexity of which is proved to be a linear combination of the average transaction size and the average closed itemset size. The algorithm is based on the essential theorem presented in Sect. 4.2. Empirically, the new algorithm is several orders of magnitude faster than the state of art algorithm, GC-Tree.
An algorithm for computing the gene tree probability under the multispecies coalescent and its application in the inference of population tree

PubMed Central

2016-01-01

Motivation: Gene tree represents the evolutionary history of gene lineages that originate from multiple related populations. Under the multispecies coalescent model, lineages may coalesce outside the species (population) boundary. Given a species tree (with branch lengths), the gene tree probability is the probability of observing a specific gene tree topology under the multispecies coalescent model. There are two existing algorithms for computing the exact gene tree probability. The first algorithm is due to Degnan and Salter, where they enumerate all the so-called coalescent histories for the given species tree and the gene tree topology. Their algorithm runs in exponential time in the number of gene lineages in general. The second algorithm is the STELLS algorithm (2012), which is usually faster but also runs in exponential time in almost all the cases. Results: In this article, we present a new algorithm, called CompactCH, for computing the exact gene tree probability. This new algorithm is based on the notion of compact coalescent histories: multiple coalescent histories are represented by a single compact coalescent history. The key advantage of our new algorithm is that it runs in polynomial time in the number of gene lineages if the number of populations is fixed to be a constant. The new algorithm is more efficient than the STELLS algorithm both in theory and in practice when the number of populations is small and there are multiple gene lineages from each population. As an application, we show that CompactCH can be applied in the inference of population tree (i.e. the population divergence history) from population haplotypes. Simulation results show that the CompactCH algorithm enables efficient and accurate inference of population trees with much more haplotypes than a previous approach. Availability: The CompactCH algorithm is implemented in the STELLS software package, which is available for download at http://www.engr.uconn.edu/ywu/STELLS.html. Contact: ywu@engr.uconn.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307621
Finding Minimum-Power Broadcast Trees for Wireless Networks

NASA Technical Reports Server (NTRS)

Arabshahi, Payman; Gray, Andrew; Das, Arindam; El-Sharkawi, Mohamed; Marks, Robert, II

2004-01-01

Some algorithms have been devised for use in a method of constructing tree graphs that represent connections among the nodes of a wireless communication network. These algorithms provide for determining the viability of any given candidate connection tree and for generating an initial set of viable trees that can be used in any of a variety of search algorithms (e.g., a genetic algorithm) to find a tree that enables the network to broadcast from a source node to all other nodes while consuming the minimum amount of total power. The method yields solutions better than those of a prior algorithm known as the broadcast incremental power algorithm, albeit at a slightly greater computational cost.
Polynomial-Time Algorithms for Building a Consensus MUL-Tree

PubMed Central

Cui, Yun; Jansson, Jesper

2012-01-01

Abstract A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host–parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists. PMID:22963134
Polynomial-time algorithms for building a consensus MUL-tree.

PubMed

Cui, Yun; Jansson, Jesper; Sung, Wing-Kin

2012-09-01

A multi-labeled phylogenetic tree, or MUL-tree, is a generalization of a phylogenetic tree that allows each leaf label to be used many times. MUL-trees have applications in biogeography, the study of host-parasite cospeciation, gene evolution studies, and computer science. Here, we consider the problem of inferring a consensus MUL-tree that summarizes a given set of conflicting MUL-trees, and present the first polynomial-time algorithms for solving it. In particular, we give a straightforward, fast algorithm for building a strict consensus MUL-tree for any input set of MUL-trees with identical leaf label multisets, as well as a polynomial-time algorithm for building a majority rule consensus MUL-tree for the special case where every leaf label occurs at most twice. We also show that, although it is NP-hard to find a majority rule consensus MUL-tree in general, the variant that we call the singular majority rule consensus MUL-tree can be constructed efficiently whenever it exists.
The effect of high temperature interruptions during inductive period on the extent of flowering and on metabolic responses in olives (Olea europaea L.)

USDA-ARS?s Scientific Manuscript database

The effect of the duration of high temperature interruption and the timing of it’s occurrence during inductive period on the extent of inhibition of inflorescence production in ‘Arbequina’ olive trees was investigated. Trees kept under inductive conditions in different growth chambers were subjected...
Autumn Algorithm-Computation of Hybridization Networks for Realistic Phylogenetic Trees.

PubMed

Huson, Daniel H; Linz, Simone

2018-01-01

A minimum hybridization network is a rooted phylogenetic network that displays two given rooted phylogenetic trees using a minimum number of reticulations. Previous mathematical work on their calculation has usually assumed the input trees to be bifurcating, correctly rooted, or that they both contain the same taxa. These assumptions do not hold in biological studies and "realistic" trees have multifurcations, are difficult to root, and rarely contain the same taxa. We present a new algorithm for computing minimum hybridization networks for a given pair of "realistic" rooted phylogenetic trees. We also describe how the algorithm might be used to improve the rooting of the input trees. We introduce the concept of "autumn trees", a nice framework for the formulation of algorithms based on the mathematics of "maximum acyclic agreement forests". While the main computational problem is hard, the run-time depends mainly on how different the given input trees are. In biological studies, where the trees are reasonably similar, our parallel implementation performs well in practice. The algorithm is available in our open source program Dendroscope 3, providing a platform for biologists to explore rooted phylogenetic networks. We demonstrate the utility of the algorithm using several previously studied data sets.
Direct evaluation of fault trees using object-oriented programming techniques

NASA Technical Reports Server (NTRS)

Patterson-Hine, F. A.; Koen, B. V.

1989-01-01

Object-oriented programming techniques are used in an algorithm for the direct evaluation of fault trees. The algorithm combines a simple bottom-up procedure for trees without repeated events with a top-down recursive procedure for trees with repeated events. The object-oriented approach results in a dynamic modularization of the tree at each step in the reduction process. The algorithm reduces the number of recursive calls required to solve trees with repeated events and calculates intermediate results as well as the solution of the top event. The intermediate results can be reused if part of the tree is modified. An example is presented in which the results of the algorithm implemented with conventional techniques are compared to those of the object-oriented approach.
An inductive method for automatic generation of referring physician prefetch rules for PACS.

PubMed

Okura, Yasuhiko; Matsumura, Yasushi; Harauchi, Hajime; Sukenobu, Yoshiharu; Kou, Hiroko; Kohyama, Syunsuke; Yasuda, Norihiro; Yamamoto, Yuichiro; Inamura, Kiyonari

2002-12-01

To prefetch images in a hospital-wide picture archiving and communication system (PACS), a rule must be devised to permit accurate selection of examinations in which a patient's images are stored. We developed an inductive method to compose prefetch rules from practical data which were obtained in a hospital using a decision tree algorithm. Our methods were evaluated on data acquired in Osaka University Hospital for one month. The data collected consisted of 58,617 cases of consultation reservations, 643,797 examination histories of patients, and 323,993 records of image requests in PACS. Four parameters indicating whether the images of the patient were requested or not for each consultation reservation were derived from the database. As a result, the successful selection sensitivity for consultations in which images were requested was approximately 0.8, and the specificity for excluding consultations accurately where images were not requested was approximately 0.7.
Two frameworks for integrating knowledge in induction

NASA Technical Reports Server (NTRS)

Rosenbloom, Paul S.; Hirsh, Haym; Cohen, William W.; Smith, Benjamin D.

1994-01-01

The use of knowledge in inductive learning is critical for improving the quality of the concept definitions generated, reducing the number of examples required in order to learn effective concept definitions, and reducing the computation needed to find good concept definitions. Relevant knowledge may come in many forms (such as examples, descriptions, advice, and constraints) and from many sources (such as books, teachers, databases, and scientific instruments). How to extract the relevant knowledge from this plethora of possibilities, and then to integrate it together so as to appropriately affect the induction process is perhaps the key issue at this point in inductive learning. Here the focus is on the integration part of this problem; that is, how induction algorithms can, and do, utilize a range of extracted knowledge. Preliminary work on a transformational framework for defining knowledge-intensive inductive algorithms out of relatively knowledge-free algorithms is described, as is a more tentative problems-space framework that attempts to cover all induction algorithms within a single general approach. These frameworks help to organize what is known about current knowledge-intensive induction algorithms, and to point towards new algorithms.
The use of decision tree induction and artificial neural networks for recognizing the geochemical distribution patterns of LREE in the Choghart deposit, Central Iran

NASA Astrophysics Data System (ADS)

Zaremotlagh, S.; Hezarkhani, A.

2017-04-01

Some evidences of rare earth elements (REE) concentrations are found in iron oxide-apatite (IOA) deposits which are located in Central Iranian microcontinent. There are many unsolved problems about the origin and metallogenesis of IOA deposits in this district. Although it is considered that felsic magmatism and mineralization were simultaneous in the district, interaction of multi-stage hydrothermal-magmatic processes within the Early Cambrian volcano-sedimentary sequence probably caused some epigenetic mineralizations. Secondary geological processes (e.g., multi-stage mineralization, alteration, and weathering) have affected on variations of major elements and possible redistribution of REE in IOA deposits. Hence, the geochemical behaviors and distribution patterns of REE are expected to be complicated in different zones of these deposits. The aim of this paper is recognizing LREE distribution patterns based on whole-rock chemical compositions and automatic discovery of their geochemical rules. For this purpose, the pattern recognition techniques including decision tree and neural network were applied on a high-dimensional geochemical dataset from Choghart IOA deposit. Because some data features were irrelevant or redundant in recognizing the distribution patterns of each LREE, a greedy attribute subset selection technique was employed to select the best subset of predictors used in classification tasks. The decision trees (CART algorithm) were pruned optimally to more accurately categorize independent test data than unpruned ones. The most effective classification rules were extracted from the pruned tree to describe the meaningful relationships between the predictors and different concentrations of LREE. A feed-forward artificial neural network was also applied to reliably predict the influence of various rock compositions on the spatial distribution patterns of LREE with a better performance than the decision tree induction. The findings of this study could be effectively used to visualize the LREE distribution patterns as geochemical maps.
Induction as Knowledge Integration

NASA Technical Reports Server (NTRS)

Smith, Benjamin D.; Rosenbloom, Paul S.

1996-01-01

Two key issues for induction algorithms are the accuracy of the learned hypothesis and the computational resources consumed in inducing that hypothesis. One of the most promising ways to improve performance along both dimensions is to make use of additional knowledge. Multi-strategy learning algorithms tackle this problem by employing several strategies for handling different kinds of knowledge in different ways. However, integrating knowledge into an induction algorithm can be difficult when the new knowledge differs significantly from the knowledge the algorithm already uses. In many cases the algorithm must be rewritten. This paper presents Knowledge Integration framework for Induction (KII), a KII, that provides a uniform mechanism for integrating knowledge into induction. In theory, arbitrary knowledge can be integrated with this mechanism, but in practice the knowledge representation language determines both the knowledge that can be integrated, and the costs of integration and induction. By instantiating KII with various set representations, algorithms can be generated at different trade-off points along these dimensions. One instantiation of KII, called RS-KII, is presented that can implement hybrid induction algorithms, depending on which knowledge it utilizes. RS-KII is demonstrated to implement AQ-11, as well as a hybrid algorithm that utilizes a domain theory and noisy examples. Other algorithms are also possible.

A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality.

PubMed

Wang, Xueyi

2012-02-08

The k-nearest neighbors (k-NN) algorithm is a widely used machine learning method that finds nearest neighbors of a test object in a feature space. We present a new exact k-NN algorithm called kMkNN (k-Means for k-Nearest Neighbors) that uses the k-means clustering and the triangle inequality to accelerate the searching for nearest neighbors in a high dimensional space. The kMkNN algorithm has two stages. In the buildup stage, instead of using complex tree structures such as metric trees, kd-trees, or ball-tree, kMkNN uses a simple k-means clustering method to preprocess the training dataset. In the searching stage, given a query object, kMkNN finds nearest training objects starting from the nearest cluster to the query object and uses the triangle inequality to reduce the distance calculations. Experiments show that the performance of kMkNN is surprisingly good compared to the traditional k-NN algorithm and tree-based k-NN algorithms such as kd-trees and ball-trees. On a collection of 20 datasets with up to 10(6) records and 10(4) dimensions, kMkNN shows a 2-to 80-fold reduction of distance calculations and a 2- to 60-fold speedup over the traditional k-NN algorithm for 16 datasets. Furthermore, kMkNN performs significant better than a kd-tree based k-NN algorithm for all datasets and performs better than a ball-tree based k-NN algorithm for most datasets. The results show that kMkNN is effective for searching nearest neighbors in high dimensional spaces.
Decision tree and ensemble learning algorithms with their applications in bioinformatics.

PubMed

Che, Dongsheng; Liu, Qi; Rasheed, Khaled; Tao, Xiuping

2011-01-01

Machine learning approaches have wide applications in bioinformatics, and decision tree is one of the successful approaches applied in this field. In this chapter, we briefly review decision tree and related ensemble algorithms and show the successful applications of such approaches on solving biological problems. We hope that by learning the algorithms of decision trees and ensemble classifiers, biologists can get the basic ideas of how machine learning algorithms work. On the other hand, by being exposed to the applications of decision trees and ensemble algorithms in bioinformatics, computer scientists can get better ideas of which bioinformatics topics they may work on in their future research directions. We aim to provide a platform to bridge the gap between biologists and computer scientists.
A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images.

PubMed

Moschini, Ugo; Meijster, Arnold; Wilkinson, Michael H F

2018-03-01

Max-trees, or component trees, are graph structures that represent the connected components of an image in a hierarchical way. Nowadays, many application fields rely on images with high-dynamic range or floating point values. Efficient sequential algorithms exist to build trees and compute attributes for images of any bit depth. However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel. We propose a parallel method combining the two worlds of flooding and merging max-tree algorithms. First, a pilot max-tree of a quantized version of the image is built in parallel using a flooding method. Later, this structure is used in a parallel leaf-to-root approach to compute efficiently the final max-tree and to drive the merging of the sub-trees computed by the threads. We present an analysis of the performance both on simulated and actual 2D images and 3D volumes. Execution times are about better than the fastest sequential algorithm and speed-up goes up to on 64 threads.
Efficient Grammar Induction Algorithm with Parse Forests from Real Corpora

NASA Astrophysics Data System (ADS)

Kurihara, Kenichi; Kameya, Yoshitaka; Sato, Taisuke

The task of inducing grammar structures has received a great deal of attention. The reasons why researchers have studied are different; to use grammar induction as the first stage in building large treebanks or to make up better language models. However, grammar induction has inherent computational complexity. To overcome it, some grammar induction algorithms add new production rules incrementally. They refine the grammar while keeping their computational complexity low. In this paper, we propose a new efficient grammar induction algorithm. Although our algorithm is similar to algorithms which learn a grammar incrementally, our algorithm uses the graphical EM algorithm instead of the Inside-Outside algorithm. We report results of learning experiments in terms of learning speeds. The results show that our algorithm learns a grammar in constant time regardless of the size of the grammar. Since our algorithm decreases syntactic ambiguities in each step, our algorithm reduces required time for learning. This constant-time learning considerably affects learning time for larger grammars. We also reports results of evaluation of criteria to choose nonterminals. Our algorithm refines a grammar based on a nonterminal in each step. Since there can be several criteria to decide which nonterminal is the best, we evaluate them by learning experiments.
Bayes Forest: a data-intensive generator of morphological tree clones

PubMed Central

Järvenpää, Marko; Åkerblom, Markku; Raumonen, Pasi; Kaasalainen, Mikko

2017-01-01

Abstract Detailed and realistic tree form generators have numerous applications in ecology and forestry. For example, the varying morphology of trees contributes differently to formation of landscapes, natural habitats of species, and eco-physiological characteristics of the biosphere. Here, we present an algorithm for generating morphological tree “clones” based on the detailed reconstruction of the laser scanning data, statistical measure of similarity, and a plant growth model with simple stochastic rules. The algorithm is designed to produce tree forms, i.e., morphological clones, similar (and not identical) in respect to tree-level structure, but varying in fine-scale structural detail. Although we opted for certain choices in our algorithm, individual parts may vary depending on the application, making it a general adaptable pipeline. Namely, we showed that a specific multipurpose procedural stochastic growth model can be algorithmically adjusted to produce the morphological clones replicated from the target experimentally measured tree. For this, we developed a statistical measure of similarity (structural distance) between any given pair of trees, which allows for the comprehensive comparing of the tree morphologies by means of empirical distributions describing the geometrical and topological features of a tree. Finally, we developed a programmable interface to manipulate data required by the algorithm. Our algorithm can be used in a variety of applications for exploration of the morphological potential of the growth models (both theoretical and experimental), arising in all sectors of plant science research. PMID:29020742
Detection and Counting of Orchard Trees from Vhr Images Using a Geometrical-Optical Model and Marked Template Matching

NASA Astrophysics Data System (ADS)

Maillard, Philippe; Gomes, Marília F.

2016-06-01

This article presents an original algorithm created to detect and count trees in orchards using very high resolution images. The algorithm is based on an adaptation of the "template matching" image processing approach, in which the template is based on a "geometricaloptical" model created from a series of parameters, such as illumination angles, maximum and ambient radiance, and tree size specifications. The algorithm is tested on four images from different regions of the world and different crop types. These images all have < 1 meter spatial resolution and were downloaded from the GoogleEarth application. Results show that the algorithm is very efficient at detecting and counting trees as long as their spectral and spatial characteristics are relatively constant. For walnut, mango and orange trees, the overall accuracy was clearly above 90%. However, the overall success rate for apple trees fell under 75%. It appears that the openness of the apple tree crown is most probably responsible for this poorer result. The algorithm is fully explained with a step-by-step description. At this stage, the algorithm still requires quite a bit of user interaction. The automatic determination of most of the required parameters is under development.
A Modified Decision Tree Algorithm Based on Genetic Algorithm for Mobile User Classification Problem

PubMed Central

Liu, Dong-sheng; Fan, Shu-jiang

2014-01-01

In order to offer mobile customers better service, we should classify the mobile user firstly. Aimed at the limitations of previous classification methods, this paper puts forward a modified decision tree algorithm for mobile user classification, which introduced genetic algorithm to optimize the results of the decision tree algorithm. We also take the context information as a classification attributes for the mobile user and we classify the context into public context and private context classes. Then we analyze the processes and operators of the algorithm. At last, we make an experiment on the mobile user with the algorithm, we can classify the mobile user into Basic service user, E-service user, Plus service user, and Total service user classes and we can also get some rules about the mobile user. Compared to C4.5 decision tree algorithm and SVM algorithm, the algorithm we proposed in this paper has higher accuracy and more simplicity. PMID:24688389
Pruning Rogue Taxa Improves Phylogenetic Accuracy: An Efficient Algorithm and Webservice

PubMed Central

Aberer, Andre J.; Krompass, Denis; Stamatakis, Alexandros

2013-01-01

Abstract The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees. PMID:22962004
Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice.

PubMed

Aberer, Andre J; Krompass, Denis; Stamatakis, Alexandros

2013-01-01

The presence of rogue taxa (rogues) in a set of trees can frequently have a negative impact on the results of a bootstrap analysis (e.g., the overall support in consensus trees). We introduce an efficient graph-based algorithm for rogue taxon identification as well as an interactive webservice implementing this algorithm. Compared with our previous method, the new algorithm is up to 4 orders of magnitude faster, while returning qualitatively identical results. Because of this significant improvement in scalability, the new algorithm can now identify substantially more complex and compute-intensive rogue taxon constellations. On a large and diverse collection of real-world data sets, we show that our method yields better supported reduced/pruned consensus trees than any competing rogue taxon identification method. Using the parallel version of our open-source code, we successfully identified rogue taxa in a set of 100 trees with 116 334 taxa each. For simulated data sets, we show that when removing/pruning rogue taxa with our method from a tree set, we consistently obtain bootstrap consensus trees as well as maximum-likelihood trees that are topologically closer to the respective true trees.
Concurrent computation of attribute filters on shared memory parallel machines.

PubMed

Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold

2008-10-01

Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.
Phylogenetic search through partial tree mixing

PubMed Central

2012-01-01

Background Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques. Results When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/psoda Conclusions The use of Partial Tree Mixing in a partition based tree space allows the algorithm to quickly converge on near optimal tree regions. These regions can then be searched in a methodical way to determine the overall optimal phylogenetic solution. PMID:23320449
Expertise and category-based induction.

PubMed

Proffitt, J B; Coley, J D; Medin, D L

2000-07-01

The authors examined inductive reasoning among experts in a domain. Three types of tree experts (landscapers, taxonomists, and parks maintenance personnel) completed 3 reasoning tasks. In Experiment 1, participants inferred which of 2 novel diseases would affect "more other kinds of trees" and provided justifications for their choices. In Experiment 2, the authors used modified instructions and asked which disease would be more likely to affect "all trees." In Experiment 3, the conclusion category was eliminated altogether, and participants were asked to generate a list of other affected trees. Among these populations, typicality and diversity effects were weak to nonexistent. Instead, experts' reasoning was influenced by "local" coverage (extension of the property to members of the same folk family) and causal-ecological factors. The authors concluded that domain knowledge leads to the use of a variety of reasoning strategies not captured by current models of category-based induction.
Fuzzy α-minimum spanning tree problem: definition and solutions

NASA Astrophysics Data System (ADS)

Zhou, Jian; Chen, Lu; Wang, Ke; Yang, Fan

2016-04-01

In this paper, the minimum spanning tree problem is investigated on the graph with fuzzy edge weights. The notion of fuzzy ? -minimum spanning tree is presented based on the credibility measure, and then the solutions of the fuzzy ? -minimum spanning tree problem are discussed under different assumptions. First, we respectively, assume that all the edge weights are triangular fuzzy numbers and trapezoidal fuzzy numbers and prove that the fuzzy ? -minimum spanning tree problem can be transformed to a classical problem on a crisp graph in these two cases, which can be solved by classical algorithms such as the Kruskal algorithm and the Prim algorithm in polynomial time. Subsequently, as for the case that the edge weights are general fuzzy numbers, a fuzzy simulation-based genetic algorithm using Prüfer number representation is designed for solving the fuzzy ? -minimum spanning tree problem. Some numerical examples are also provided for illustrating the effectiveness of the proposed solutions.
Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees.

PubMed

Stolzer, Maureen; Lai, Han; Xu, Minli; Sathaye, Deepa; Vernot, Benjamin; Durand, Dannie

2012-09-15

Gene duplication (D), transfer (T), loss (L) and incomplete lineage sorting (I) are crucial to the evolution of gene families and the emergence of novel functions. The history of these events can be inferred via comparison of gene and species trees, a process called reconciliation, yet current reconciliation algorithms model only a subset of these evolutionary processes. We present an algorithm to reconcile a binary gene tree with a nonbinary species tree under a DTLI parsimony criterion. This is the first reconciliation algorithm to capture all four evolutionary processes driving tree incongruence and the first to reconcile non-binary species trees with a transfer model. Our algorithm infers all optimal solutions and reports complete, temporally feasible event histories, giving the gene and species lineages in which each event occurred. It is fixed-parameter tractable, with polytime complexity when the maximum species outdegree is fixed. Application of our algorithms to prokaryotic and eukaryotic data show that use of an incomplete event model has substantial impact on the events inferred and resulting biological conclusions. Our algorithms have been implemented in Notung, a freely available phylogenetic reconciliation software package, available at http://www.cs.cmu.edu/~durand/Notung. mstolzer@andrew.cmu.edu.
A stochastic multiple imputation algorithm for missing covariate data in tree-structured survival analysis.

PubMed

Wallace, Meredith L; Anderson, Stewart J; Mazumdar, Sati

2010-12-20

Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults. Copyright © 2010 John Wiley & Sons, Ltd.
Routing Algorithm based on Minimum Spanning Tree and Minimum Cost Flow for Hybrid Wireless-optical Broadband Access Network

NASA Astrophysics Data System (ADS)

Le, Zichun; Suo, Kaihua; Fu, Minglei; Jiang, Ling; Dong, Wen

2012-03-01

In order to minimize the average end to end delay for data transporting in hybrid wireless optical broadband access network, a novel routing algorithm named MSTMCF (minimum spanning tree and minimum cost flow) is devised. The routing problem is described as a minimum spanning tree and minimum cost flow model and corresponding algorithm procedures are given. To verify the effectiveness of MSTMCF algorithm, extensively simulations based on OWNS have been done under different types of traffic source.
Exact solutions for species tree inference from discordant gene trees.

PubMed

Chang, Wen-Chieh; Górecki, Paweł; Eulenstein, Oliver

2013-10-01

Phylogenetic analysis has to overcome the grant challenge of inferring accurate species trees from evolutionary histories of gene families (gene trees) that are discordant with the species tree along whose branches they have evolved. Two well studied approaches to cope with this challenge are to solve either biologically informed gene tree parsimony (GTP) problems under gene duplication, gene loss, and deep coalescence, or the classic RF supertree problem that does not rely on any biological model. Despite the potential of these problems to infer credible species trees, they are NP-hard. Therefore, these problems are addressed by heuristics that typically lack any provable accuracy and precision. We describe fast dynamic programming algorithms that solve the GTP problems and the RF supertree problem exactly, and demonstrate that our algorithms can solve instances with data sets consisting of as many as 22 taxa. Extensions of our algorithms can also report the number of all optimal species trees, as well as the trees themselves. To better asses the quality of the resulting species trees that best fit the given gene trees, we also compute the worst case species trees, their numbers, and optimization score for each of the computational problems. Finally, we demonstrate the performance of our exact algorithms using empirical and simulated data sets, and analyze the quality of heuristic solutions for the studied problems by contrasting them with our exact solutions.
Algorithm for protecting light-trees in survivable mesh wavelength-division-multiplexing networks

NASA Astrophysics Data System (ADS)

Luo, Hongbin; Li, Lemin; Yu, Hongfang

2006-12-01

Wavelength-division-multiplexing (WDM) technology is expected to facilitate bandwidth-intensive multicast applications such as high-definition television. A single fiber cut in a WDM mesh network, however, can disrupt the dissemination of information to several destinations on a light-tree based multicast session. Thus it is imperative to protect multicast sessions by reserving redundant resources. We propose a novel and efficient algorithm for protecting light-trees in survivable WDM mesh networks. The algorithm is called segment-based protection with sister node first (SSNF), whose basic idea is to protect a light-tree using a set of backup segments with a higher priority to protect the segments from a branch point to its children (sister nodes). The SSNF algorithm differs from the segment protection scheme proposed in the literature in how the segments are identified and protected. Our objective is to minimize the network resources used for protecting each primary light-tree such that the blocking probability can be minimized. To verify the effectiveness of the SSNF algorithm, we conduct extensive simulation experiments. The simulation results demonstrate that the SSNF algorithm outperforms existing algorithms for the same problem.
Learning classification trees

NASA Technical Reports Server (NTRS)

Buntine, Wray

1991-01-01

Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. How a tree learning algorithm can be derived from Bayesian decision theory is outlined. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule turns out to be similar to Quinlan's information gain splitting rule, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach, Quinlan's C4 and Breiman et al. Cart show the full Bayesian algorithm is consistently as good, or more accurate than these other approaches though at a computational price.
A new approach to enhance the performance of decision tree for classifying gene expression data.

PubMed

Hassan, Md; Kotagiri, Ramamohanarao

2013-12-20

Gene expression data classification is a challenging task due to the large dimensionality and very small number of samples. Decision tree is one of the popular machine learning approaches to address such classification problems. However, the existing decision tree algorithms use a single gene feature at each node to split the data into its child nodes and hence might suffer from poor performance specially when classifying gene expression dataset. By using a new decision tree algorithm where, each node of the tree consists of more than one gene, we enhance the classification performance of traditional decision tree classifiers. Our method selects suitable genes that are combined using a linear function to form a derived composite feature. To determine the structure of the tree we use the area under the Receiver Operating Characteristics curve (AUC). Experimental analysis demonstrates higher classification accuracy using the new decision tree compared to the other existing decision trees in literature. We experimentally compare the effect of our scheme against other well known decision tree techniques. Experiments show that our algorithm can substantially boost the classification performance of the decision tree.

Automated Decision Tree Classification of Corneal Shape

PubMed Central

Twa, Michael D.; Parthasarathy, Srinivasan; Roberts, Cynthia; Mahmoud, Ashraf M.; Raasch, Thomas W.; Bullimore, Mark A.

2011-01-01

Purpose The volume and complexity of data produced during videokeratography examinations present a challenge of interpretation. As a consequence, results are often analyzed qualitatively by subjective pattern recognition or reduced to comparisons of summary indices. We describe the application of decision tree induction, an automated machine learning classification method, to discriminate between normal and keratoconic corneal shapes in an objective and quantitative way. We then compared this method with other known classification methods. Methods The corneal surface was modeled with a seventh-order Zernike polynomial for 132 normal eyes of 92 subjects and 112 eyes of 71 subjects diagnosed with keratoconus. A decision tree classifier was induced using the C4.5 algorithm, and its classification performance was compared with the modified Rabinowitz–McDonnell index, Schwiegerling’s Z3 index (Z3), Keratoconus Prediction Index (KPI), KISA%, and Cone Location and Magnitude Index using recommended classification thresholds for each method. We also evaluated the area under the receiver operator characteristic (ROC) curve for each classification method. Results Our decision tree classifier performed equal to or better than the other classifiers tested: accuracy was 92% and the area under the ROC curve was 0.97. Our decision tree classifier reduced the information needed to distinguish between normal and keratoconus eyes using four of 36 Zernike polynomial coefficients. The four surface features selected as classification attributes by the decision tree method were inferior elevation, greater sagittal depth, oblique toricity, and trefoil. Conclusions Automated decision tree classification of corneal shape through Zernike polynomials is an accurate quantitative method of classification that is interpretable and can be generated from any instrument platform capable of raw elevation data output. This method of pattern classification is extendable to other classification problems. PMID:16357645
Decision Trees Predicting Tumor Shrinkage for Head and Neck Cancer: Implications for Adaptive Radiotherapy.

PubMed

Surucu, Murat; Shah, Karan K; Mescioglu, Ibrahim; Roeske, John C; Small, William; Choi, Mehee; Emami, Bahman

2016-02-01

To develop decision trees predicting for tumor volume reduction in patients with head and neck (H&N) cancer using pretreatment clinical and pathological parameters. Forty-eight patients treated with definitive concurrent chemoradiotherapy for squamous cell carcinoma of the nasopharynx, oropharynx, oral cavity, or hypopharynx were retrospectively analyzed. These patients were rescanned at a median dose of 37.8 Gy and replanned to account for anatomical changes. The percentages of gross tumor volume (GTV) change from initial to rescan computed tomography (CT; %GTVΔ) were calculated. Two decision trees were generated to correlate %GTVΔ in primary and nodal volumes with 14 characteristics including age, gender, Karnofsky performance status (KPS), site, human papilloma virus (HPV) status, tumor grade, primary tumor growth pattern (endophytic/exophytic), tumor/nodal/group stages, chemotherapy regimen, and primary, nodal, and total GTV volumes in the initial CT scan. The C4.5 Decision Tree induction algorithm was implemented. The median %GTVΔ for primary, nodal, and total GTVs was 26.8%, 43.0%, and 31.2%, respectively. Type of chemotherapy, age, primary tumor growth pattern, site, KPS, and HPV status were the most predictive parameters for primary %GTVΔ decision tree, whereas for nodal %GTVΔ, KPS, site, age, primary tumor growth pattern, initial primary GTV, and total GTV volumes were predictive. Both decision trees had an accuracy of 88%. There can be significant changes in primary and nodal tumor volumes during the course of H&N chemoradiotherapy. Considering the proposed decision trees, radiation oncologists can select patients predicted to have high %GTVΔ, who would theoretically gain the most benefit from adaptive radiotherapy, in order to better use limited clinical resources. © The Author(s) 2015.
Enumerating Substituted Benzene Isomers of Tree-Like Chemical Graphs.

PubMed

Li, Jinghui; Nagamochi, Hiroshi; Akutsu, Tatsuya

2018-01-01

Enumeration of chemical structures is useful for drug design, which is one of the main targets of computational biology and bioinformatics. A chemical graph with no other cycles than benzene rings is called tree-like, and becomes a tree possibly with multiple edges if we contract each benzene ring into a single virtual atom of valence 6. All tree-like chemical graphs with a given tree representation are called the substituted benzene isomers of . When we replace each virtual atom in with a benzene ring to obtain a substituted benzene isomer, distinct isomers of are caused by the difference in arrangements of atom groups around a benzene ring. In this paper, we propose an efficient algorithm that enumerates all substituted benzene isomers of a given tree representation . Our algorithm first counts the number of all the isomers of the tree representation by a dynamic programming method. To enumerate all the isomers, for each , our algorithm then generates the th isomer by backtracking the counting phase of the dynamic programming. We also implemented our algorithm for computational experiments.
Efficient tree tensor network states (TTNS) for quantum chemistry: Generalizations of the density matrix renormalization group algorithm

NASA Astrophysics Data System (ADS)

Nakatani, Naoki; Chan, Garnet Kin-Lic

2013-04-01

We investigate tree tensor network states for quantum chemistry. Tree tensor network states represent one of the simplest generalizations of matrix product states and the density matrix renormalization group. While matrix product states encode a one-dimensional entanglement structure, tree tensor network states encode a tree entanglement structure, allowing for a more flexible description of general molecules. We describe an optimal tree tensor network state algorithm for quantum chemistry. We introduce the concept of half-renormalization which greatly improves the efficiency of the calculations. Using our efficient formulation we demonstrate the strengths and weaknesses of tree tensor network states versus matrix product states. We carry out benchmark calculations both on tree systems (hydrogen trees and π-conjugated dendrimers) as well as non-tree molecules (hydrogen chains, nitrogen dimer, and chromium dimer). In general, tree tensor network states require much fewer renormalized states to achieve the same accuracy as matrix product states. In non-tree molecules, whether this translates into a computational savings is system dependent, due to the higher prefactor and computational scaling associated with tree algorithms. In tree like molecules, tree network states are easily superior to matrix product states. As an illustration, our largest dendrimer calculation with tree tensor network states correlates 110 electrons in 110 active orbitals.
The influence of conifer forest canopy cover on the accuracy of two individual tree measurement algorithms using lidar data

Treesearch

Michael J. Falkowski; Alistair M.S. Smith; Paul E. Gessler; Andrew T. Hudak; Lee A. Vierling; Jeffrey S. Evans

2008-01-01

Individual tree detection algorithms can provide accurate measurements of individual tree locations, crown diameters (from aerial photography and light detection and ranging (lidar) data), and tree heights (from lidar data). However, to be useful for forest management goals relating to timber harvest, carbon accounting, and ecological processes, there is a need to...
Distance-Based Phylogenetic Methods Around a Polytomy.

PubMed

Davidson, Ruth; Sullivant, Seth

2014-01-01

Distance-based phylogenetic algorithms attempt to solve the NP-hard least-squares phylogeny problem by mapping an arbitrary dissimilarity map representing biological data to a tree metric. The set of all dissimilarity maps is a Euclidean space properly containing the space of all tree metrics as a polyhedral fan. Outputs of distance-based tree reconstruction algorithms such as UPGMA and neighbor-joining are points in the maximal cones in the fan. Tree metrics with polytomies lie at the intersections of maximal cones. A phylogenetic algorithm divides the space of all dissimilarity maps into regions based upon which combinatorial tree is reconstructed by the algorithm. Comparison of phylogenetic methods can be done by comparing the geometry of these regions. We use polyhedral geometry to compare the local nature of the subdivisions induced by least-squares phylogeny, UPGMA, and neighbor-joining when the true tree has a single polytomy with exactly four neighbors. Our results suggest that in some circumstances, UPGMA and neighbor-joining poorly match least-squares phylogeny.
Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees.

PubMed

Fokkema, M; Smits, N; Zeileis, A; Hothorn, T; Kelderman, H

2017-10-25

Identification of subgroups of patients for whom treatment A is more effective than treatment B, and vice versa, is of key importance to the development of personalized medicine. Tree-based algorithms are helpful tools for the detection of such interactions, but none of the available algorithms allow for taking into account clustered or nested dataset structures, which are particularly common in psychological research. Therefore, we propose the generalized linear mixed-effects model tree (GLMM tree) algorithm, which allows for the detection of treatment-subgroup interactions, while accounting for the clustered structure of a dataset. The algorithm uses model-based recursive partitioning to detect treatment-subgroup interactions, and a GLMM to estimate the random-effects parameters. In a simulation study, GLMM trees show higher accuracy in recovering treatment-subgroup interactions, higher predictive accuracy, and lower type II error rates than linear-model-based recursive partitioning and mixed-effects regression trees. Also, GLMM trees show somewhat higher predictive accuracy than linear mixed-effects models with pre-specified interaction effects, on average. We illustrate the application of GLMM trees on an individual patient-level data meta-analysis on treatments for depression. We conclude that GLMM trees are a promising exploratory tool for the detection of treatment-subgroup interactions in clustered datasets.
Image compression using quad-tree coding with morphological dilation

NASA Astrophysics Data System (ADS)

Wu, Jiaji; Jiang, Weiwei; Jiao, Licheng; Wang, Lei

2007-11-01

In this paper, we propose a new algorithm which integrates morphological dilation operation to quad-tree coding, the purpose of doing this is to compensate each other's drawback by using quad-tree coding and morphological dilation operation respectively. New algorithm can not only quickly find the seed significant coefficient of dilation but also break the limit of block boundary of quad-tree coding. We also make a full use of both within-subband and cross-subband correlation to avoid the expensive cost of representing insignificant coefficients. Experimental results show that our algorithm outperforms SPECK and SPIHT. Without using any arithmetic coding, our algorithm can achieve good performance with low computational cost and it's more suitable to mobile devices or scenarios with a strict real-time requirement.
Decision tree methods: applications for classification and prediction.

PubMed

Song, Yan-Yan; Lu, Ying

2015-04-25

Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing prediction algorithms for a target variable. This method classifies a population into branch-like segments that construct an inverted tree with a root node, internal nodes, and leaf nodes. The algorithm is non-parametric and can efficiently deal with large, complicated datasets without imposing a complicated parametric structure. When the sample size is large enough, study data can be divided into training and validation datasets. Using the training dataset to build a decision tree model and a validation dataset to decide on the appropriate tree size needed to achieve the optimal final model. This paper introduces frequently used algorithms used to develop decision trees (including CART, C4.5, CHAID, and QUEST) and describes the SPSS and SAS programs that can be used to visualize tree structure.
Greedy algorithms in disordered systems

NASA Astrophysics Data System (ADS)

Duxbury, P. M.; Dobrin, R.

1999-08-01

We discuss search, minimal path and minimal spanning tree algorithms and their applications to disordered systems. Greedy algorithms solve these problems exactly, and are related to extremal dynamics in physics. Minimal cost path (Dijkstra) and minimal cost spanning tree (Prim) algorithms provide extremal dynamics for a polymer in a random medium (the KPZ universality class) and invasion percolation (without trapping) respectively.
A data driven approach for condition monitoring of wind turbine blade using vibration signals through best-first tree algorithm and functional trees algorithm: A comparative study.

PubMed

Joshuva, A; Sugumaran, V

2017-03-01

Wind energy is one of the important renewable energy resources available in nature. It is one of the major resources for production of energy because of its dependability due to the development of the technology and relatively low cost. Wind energy is converted into electrical energy using rotating blades. Due to environmental conditions and large structure, the blades are subjected to various vibration forces that may cause damage to the blades. This leads to a liability in energy production and turbine shutdown. The downtime can be reduced when the blades are diagnosed continuously using structural health condition monitoring. These are considered as a pattern recognition problem which consists of three phases namely, feature extraction, feature selection, and feature classification. In this study, statistical features were extracted from vibration signals, feature selection was carried out using a J48 decision tree algorithm and feature classification was performed using best-first tree algorithm and functional trees algorithm. The better algorithm is suggested for fault diagnosis of wind turbine blade. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
Object-Oriented Algorithm For Evaluation Of Fault Trees

NASA Technical Reports Server (NTRS)

Patterson-Hine, F. A.; Koen, B. V.

1992-01-01

Algorithm for direct evaluation of fault trees incorporates techniques of object-oriented programming. Reduces number of calls needed to solve trees with repeated events. Provides significantly improved software environment for such computations as quantitative analyses of safety and reliability of complicated systems of equipment (e.g., spacecraft or factories).
Applications and Benefits for Big Data Sets Using Tree Distances and The T-SNE Algorithm

DTIC Science & Technology

2016-03-01

BENEFITS FOR BIG DATA SETS USING TREE DISTANCES AND THE T-SNE ALGORITHM by Suyoung Lee March 2016 Thesis Advisor: Samuel E. Buttrey...REPORT TYPE AND DATES COVERED Master’s thesis 4. TITLE AND SUBTITLE APPLICATIONS AND BENEFITS FOR BIG DATA SETS USING TREE DISTANCES AND THE T-SNE...this work we use tree distance computed using Buttrey’s treeClust package in R, as discussed by Buttrey and Whitaker in 2015, to process mixed data
Wavelet compression of multichannel ECG data by enhanced set partitioning in hierarchical trees algorithm.

PubMed

Sharifahmadian, Ershad

2006-01-01

The set partitioning in hierarchical trees (SPIHT) algorithm is very effective and computationally simple technique for image and signal compression. Here the author modified the algorithm which provides even better performance than the SPIHT algorithm. The enhanced set partitioning in hierarchical trees (ESPIHT) algorithm has performance faster than the SPIHT algorithm. In addition, the proposed algorithm reduces the number of bits in a bit stream which is stored or transmitted. I applied it to compression of multichannel ECG data. Also, I presented a specific procedure based on the modified algorithm for more efficient compression of multichannel ECG data. This method employed on selected records from the MIT-BIH arrhythmia database. According to experiments, the proposed method attained the significant results regarding compression of multichannel ECG data. Furthermore, in order to compress one signal which is stored for a long time, the proposed multichannel compression method can be utilized efficiently.
Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

PubMed

Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi

2017-11-02

Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.
On the Complexity of Duplication-Transfer-Loss Reconciliation with Non-Binary Gene Trees.

PubMed

Kordi, Misagh; Bansal, Mukul S

2017-01-01

Duplication-Transfer-Loss (DTL) reconciliation has emerged as a powerful technique for studying gene family evolution in the presence of horizontal gene transfer. DTL reconciliation takes as input a gene family phylogeny and the corresponding species phylogeny, and reconciles the two by postulating speciation, gene duplication, horizontal gene transfer, and gene loss events. Efficient algorithms exist for finding optimal DTL reconciliations when the gene tree is binary. However, gene trees are frequently non-binary. With such non-binary gene trees, the reconciliation problem seeks to find a binary resolution of the gene tree that minimizes the reconciliation cost. Given the prevalence of non-binary gene trees, many efficient algorithms have been developed for this problem in the context of the simpler Duplication-Loss (DL) reconciliation model. Yet, no efficient algorithms exist for DTL reconciliation with non-binary gene trees and the complexity of the problem remains unknown. In this work, we resolve this open question by showing that the problem is, in fact, NP-hard. Our reduction applies to both the dated and undated formulations of DTL reconciliation. By resolving this long-standing open problem, this work will spur the development of both exact and heuristic algorithms for this important problem.
Rare itemsets mining algorithm based on RP-Tree and spark framework

NASA Astrophysics Data System (ADS)

Liu, Sainan; Pan, Haoan

2018-05-01

For the issues of the rare itemsets mining in big data, this paper proposed a rare itemsets mining algorithm based on RP-Tree and Spark framework. Firstly, it arranged the data vertically according to the transaction identifier, in order to solve the defects of scan the entire data set, the vertical datasets are divided into frequent vertical datasets and rare vertical datasets. Then, it adopted the RP-Tree algorithm to construct the frequent pattern tree that contains rare items and generate rare 1-itemsets. After that, it calculated the support of the itemsets by scanning the two vertical data sets, finally, it used the iterative process to generate rare itemsets. The experimental show that the algorithm can effectively excavate rare itemsets and have great superiority in execution time.
On-line experimental validation of a model-based diagnostic algorithm dedicated to a solid oxide fuel cell system

NASA Astrophysics Data System (ADS)

Polverino, Pierpaolo; Esposito, Angelo; Pianese, Cesare; Ludwig, Bastian; Iwanschitz, Boris; Mai, Andreas

2016-02-01

In the current energetic scenario, Solid Oxide Fuel Cells (SOFCs) exhibit appealing features which make them suitable for environmental-friendly power production, especially for stationary applications. An example is represented by micro-combined heat and power (μ-CHP) generation units based on SOFC stacks, which are able to produce electric and thermal power with high efficiency and low pollutant and greenhouse gases emissions. However, the main limitations to their diffusion into the mass market consist in high maintenance and production costs and short lifetime. To improve these aspects, the current research activity focuses on the development of robust and generalizable diagnostic techniques, aimed at detecting and isolating faults within the entire system (i.e. SOFC stack and balance of plant). Coupled with appropriate recovery strategies, diagnosis can prevent undesired system shutdowns during faulty conditions, with consequent lifetime increase and maintenance costs reduction. This paper deals with the on-line experimental validation of a model-based diagnostic algorithm applied to a pre-commercial SOFC system. The proposed algorithm exploits a Fault Signature Matrix based on a Fault Tree Analysis and improved through fault simulations. The algorithm is characterized on the considered system and it is validated by means of experimental induction of faulty states in controlled conditions.
Privacy-preserving heterogeneous health data sharing.

PubMed

Mohammed, Noman; Jiang, Xiaoqian; Chen, Rui; Fung, Benjamin C M; Ohno-Machado, Lucila

2013-05-01

Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversary's background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing relational and set-valued data in a privacy-preserving manner separately. In this paper, we propose an algorithm that considers both relational and set-valued data in differentially private disclosure of healthcare data. The proposed approach makes a simple yet fundamental switch in differentially private algorithm design: instead of listing all possible records (ie, a contingency table) for noise addition, records are generalized before noise addition. The algorithm first generalizes the raw data in a probabilistic way, and then adds noise to guarantee ε-differential privacy. We showed that the disclosed data could be used effectively to build a decision tree induction classifier. Experimental results demonstrated that the proposed algorithm is scalable and performs better than existing solutions for classification analysis. The resulting utility may degrade when the output domain size is very large, making it potentially inappropriate to generate synthetic data for large health databases. Unlike existing techniques, the proposed algorithm allows the disclosure of health data containing both relational and set-valued data in a differentially private manner, and can retain essential information for discriminative analysis.
Privacy-preserving heterogeneous health data sharing

PubMed Central

Mohammed, Noman; Jiang, Xiaoqian; Chen, Rui; Fung, Benjamin C M; Ohno-Machado, Lucila

2013-01-01

Objective Privacy-preserving data publishing addresses the problem of disclosing sensitive data when mining for useful information. Among existing privacy models, ε-differential privacy provides one of the strongest privacy guarantees and makes no assumptions about an adversary's background knowledge. All existing solutions that ensure ε-differential privacy handle the problem of disclosing relational and set-valued data in a privacy-preserving manner separately. In this paper, we propose an algorithm that considers both relational and set-valued data in differentially private disclosure of healthcare data. Methods The proposed approach makes a simple yet fundamental switch in differentially private algorithm design: instead of listing all possible records (ie, a contingency table) for noise addition, records are generalized before noise addition. The algorithm first generalizes the raw data in a probabilistic way, and then adds noise to guarantee ε-differential privacy. Results We showed that the disclosed data could be used effectively to build a decision tree induction classifier. Experimental results demonstrated that the proposed algorithm is scalable and performs better than existing solutions for classification analysis. Limitation The resulting utility may degrade when the output domain size is very large, making it potentially inappropriate to generate synthetic data for large health databases. Conclusions Unlike existing techniques, the proposed algorithm allows the disclosure of health data containing both relational and set-valued data in a differentially private manner, and can retain essential information for discriminative analysis. PMID:23242630

Automated Mobile System for Accurate Outdoor Tree Crop Enumeration Using an Uncalibrated Camera.

PubMed

Nguyen, Thuy Tuong; Slaughter, David C; Hanson, Bradley D; Barber, Andrew; Freitas, Amy; Robles, Daniel; Whelan, Erin

2015-07-28

This paper demonstrates an automated computer vision system for outdoor tree crop enumeration in a seedling nursery. The complete system incorporates both hardware components (including an embedded microcontroller, an odometry encoder, and an uncalibrated digital color camera) and software algorithms (including microcontroller algorithms and the proposed algorithm for tree crop enumeration) required to obtain robust performance in a natural outdoor environment. The enumeration system uses a three-step image analysis process based upon: (1) an orthographic plant projection method integrating a perspective transform with automatic parameter estimation; (2) a plant counting method based on projection histograms; and (3) a double-counting avoidance method based on a homography transform. Experimental results demonstrate the ability to count large numbers of plants automatically with no human effort. Results show that, for tree seedlings having a height up to 40 cm and a within-row tree spacing of approximately 10 cm, the algorithms successfully estimated the number of plants with an average accuracy of 95.2% for trees within a single image and 98% for counting of the whole plant population in a large sequence of images.
Automated Mobile System for Accurate Outdoor Tree Crop Enumeration Using an Uncalibrated Camera

PubMed Central

Nguyen, Thuy Tuong; Slaughter, David C.; Hanson, Bradley D.; Barber, Andrew; Freitas, Amy; Robles, Daniel; Whelan, Erin

2015-01-01

This paper demonstrates an automated computer vision system for outdoor tree crop enumeration in a seedling nursery. The complete system incorporates both hardware components (including an embedded microcontroller, an odometry encoder, and an uncalibrated digital color camera) and software algorithms (including microcontroller algorithms and the proposed algorithm for tree crop enumeration) required to obtain robust performance in a natural outdoor environment. The enumeration system uses a three-step image analysis process based upon: (1) an orthographic plant projection method integrating a perspective transform with automatic parameter estimation; (2) a plant counting method based on projection histograms; and (3) a double-counting avoidance method based on a homography transform. Experimental results demonstrate the ability to count large numbers of plants automatically with no human effort. Results show that, for tree seedlings having a height up to 40 cm and a within-row tree spacing of approximately 10 cm, the algorithms successfully estimated the number of plants with an average accuracy of 95.2% for trees within a single image and 98% for counting of the whole plant population in a large sequence of images. PMID:26225982
Multivariate analysis of flow cytometric data using decision trees.

PubMed

Simon, Svenja; Guthke, Reinhard; Kamradt, Thomas; Frey, Oliver

2012-01-01

Characterization of the response of the host immune system is important in understanding the bidirectional interactions between the host and microbial pathogens. For research on the host site, flow cytometry has become one of the major tools in immunology. Advances in technology and reagents allow now the simultaneous assessment of multiple markers on a single cell level generating multidimensional data sets that require multivariate statistical analysis. We explored the explanatory power of the supervised machine learning method called "induction of decision trees" in flow cytometric data. In order to examine whether the production of a certain cytokine is depended on other cytokines, datasets from intracellular staining for six cytokines with complex patterns of co-expression were analyzed by induction of decision trees. After weighting the data according to their class probabilities, we created a total of 13,392 different decision trees for each given cytokine with different parameter settings. For a more realistic estimation of the decision trees' quality, we used stratified fivefold cross validation and chose the "best" tree according to a combination of different quality criteria. While some of the decision trees reflected previously known co-expression patterns, we found that the expression of some cytokines was not only dependent on the co-expression of others per se, but was also dependent on the intensity of expression. Thus, for the first time we successfully used induction of decision trees for the analysis of high dimensional flow cytometric data and demonstrated the feasibility of this method to reveal structural patterns in such data sets.
Block-Based Connected-Component Labeling Algorithm Using Binary Decision Trees

PubMed Central

Chang, Wan-Yu; Chiu, Chung-Cheng; Yang, Jia-Horng

2015-01-01

In this paper, we propose a fast labeling algorithm based on block-based concepts. Because the number of memory access points directly affects the time consumption of the labeling algorithms, the aim of the proposed algorithm is to minimize neighborhood operations. Our algorithm utilizes a block-based view and correlates a raster scan to select the necessary pixels generated by a block-based scan mask. We analyze the advantages of a sequential raster scan for the block-based scan mask, and integrate the block-connected relationships using two different procedures with binary decision trees to reduce unnecessary memory access. This greatly simplifies the pixel locations of the block-based scan mask. Furthermore, our algorithm significantly reduces the number of leaf nodes and depth levels required in the binary decision tree. We analyze the labeling performance of the proposed algorithm alongside that of other labeling algorithms using high-resolution images and foreground images. The experimental results from synthetic and real image datasets demonstrate that the proposed algorithm is faster than other methods. PMID:26393597
Determinants and development of a web-based child mortality prediction model in resource-limited settings: A data mining approach.

PubMed

Tesfaye, Brook; Atique, Suleman; Elias, Noah; Dibaba, Legesse; Shabbir, Syed-Abdul; Kebede, Mihiretu

2017-03-01

Improving child health and reducing child mortality rate are key health priorities in developing countries. This study aimed to identify determinant sand develop, a web-based child mortality prediction model in Ethiopian local language using classification data mining algorithm. Decision tree (using J48 algorithm) and rule induction (using PART algorithm) techniques were applied on 11,654 records of Ethiopian demographic and health survey data. Waikato Environment for Knowledge Analysis (WEKA) for windows version 3.6.8 was used to develop optimal models. 8157 (70%) records were randomly allocated to training group for model building while; the remaining 3496 (30%) records were allocated as the test group for model validation. The validation of the model was assessed using accuracy, sensitivity, specificity and area under Receiver Operating Characteristics (ROC) curve. Using Statistical Package for Social Sciences (SPSS) version 20.0; logistic regressions and Odds Ratio (OR) with 95% Confidence Interval (CI) was used to identify determinants of child mortality. The child mortality rate was 72 deaths per 1000 live births. Breast-feeding (AOR= 1.46, (95% CI [1.22. 1.75]), maternal education (AOR= 1.40, 95% CI [1.11, 1.81]), family planning (AOR= 1.21, [1.08, 1.43]), preceding birth interval (AOR= 4.90, [2.94, 8.15]), presence of diarrhea (AOR= 1.54, 95% CI [1.32, 1.66]), father's education (AOR= 1.4, 95% CI [1.04, 1.78]), low birth weight (AOR= 1.2, 95% CI [0.98, 1.51]) and, age of the mother at first birth (AOR= 1.42, [1.01-1.89]) were found to be determinants for child mortality. The J48 model had better performance, accuracy (94.3%), sensitivity (93.8%), specificity (94.3%), Positive Predictive Value (PPV) (92.2%), Negative Predictive Value (NPV) (94.5%) and, the area under ROC (94.8%). Subsequent to developing an optimal prediction model, we relied on this model to develop a web-based application system for child mortality prediction. In this study, nearly accurate results were obtained by employing decision tree and rule induction techniques. Determinants are identified and a web-based child mortality prediction model in Ethiopian local language is developed. Thus, the result obtained could support child health intervention programs in Ethiopia where trained human resource for health is limited. Advanced classification algorithms need to be tested to come up with optimal models. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Faster Bit-Parallel Algorithms for Unordered Pseudo-tree Matching and Tree Homeomorphism

NASA Astrophysics Data System (ADS)

Kaneta, Yusaku; Arimura, Hiroki

In this paper, we consider the unordered pseudo-tree matching problem, which is a problem of, given two unordered labeled trees P and T, finding all occurrences of P in T via such many-one embeddings that preserve node labels and parent-child relationship. This problem is closely related to tree pattern matching problem for XPath queries with child axis only. If m > w , we present an efficient algorithm that solves the problem in O(nm log(w)/w) time using O(hm/w + mlog(w)/w) space and O(m log(w)) preprocessing on a unit-cost arithmetic RAM model with addition, where m is the number of nodes in P, n is the number of nodes in T, h is the height of T, and w is the word length. We also discuss a modification of our algorithm for the unordered tree homeomorphism problem, which corresponds to a tree pattern matching problem for XPath queries with descendant axis only.
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azad, Ariful; Buluc, Aydn; Pothen, Alex

It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

DOE PAGES

Azad, Ariful; Buluc, Aydn; Pothen, Alex

2016-03-24

It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
A new algorithm to construct phylogenetic networks from trees.

PubMed

Wang, J

2014-03-06

Developing appropriate methods for constructing phylogenetic networks from tree sets is an important problem, and much research is currently being undertaken in this area. BIMLR is an algorithm that constructs phylogenetic networks from tree sets. The algorithm can construct a much simpler network than other available methods. Here, we introduce an improved version of the BIMLR algorithm, QuickCass. QuickCass changes the selection strategy of the labels of leaves below the reticulate nodes, i.e., the nodes with an indegree of at least 2 in BIMLR. We show that QuickCass can construct simpler phylogenetic networks than BIMLR. Furthermore, we show that QuickCass is a polynomial-time algorithm when the output network that is constructed by QuickCass is binary.
Efficient algorithms for a class of partitioning problems

NASA Technical Reports Server (NTRS)

Iqbal, M. Ashraf; Bokhari, Shahid H.

1990-01-01

The problem of optimally partitioning the modules of chain- or tree-like tasks over chain-structured or host-satellite multiple computer systems is addressed. This important class of problems includes many signal processing and industrial control applications. Prior research has resulted in a succession of faster exact and approximate algorithms for these problems. Polynomial exact and approximate algorithms are described for this class that are better than any of the previously reported algorithms. The approach is based on a preprocessing step that condenses the given chain or tree structured task into a monotonic chain or tree. The partitioning of this monotonic take can then be carried out using fast search techniques.
Reconciliation of Gene and Species Trees

PubMed Central

Rusin, L. Y.; Lyubetskaya, E. V.; Gorbunov, K. Y.; Lyubetsky, V. A.

2014-01-01

The first part of the paper briefly overviews the problem of gene and species trees reconciliation with the focus on defining and algorithmic construction of the evolutionary scenario. Basic ideas are discussed for the aspects of mapping definitions, costs of the mapping and evolutionary scenario, imposing time scales on a scenario, incorporating horizontal gene transfers, binarization and reconciliation of polytomous trees, and construction of species trees and scenarios. The review does not intend to cover the vast diversity of literature published on these subjects. Instead, the authors strived to overview the problem of the evolutionary scenario as a central concept in many areas of evolutionary research. The second part provides detailed mathematical proofs for the solutions of two problems: (i) inferring a gene evolution along a species tree accounting for various types of evolutionary events and (ii) trees reconciliation into a single species tree when only gene duplications and losses are allowed. All proposed algorithms have a cubic time complexity and are mathematically proved to find exact solutions. Solving algorithms for problem (ii) can be naturally extended to incorporate horizontal transfers, other evolutionary events, and time scales on the species tree. PMID:24800245
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age

PubMed Central

2010-01-01

Background Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. Results We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. Conclusions GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events. PMID:20534164
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.

PubMed

Thomas, Paul D

2010-06-09

Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events.
iNJclust: Iterative Neighbor-Joining Tree Clustering Framework for Inferring Population Structure.

PubMed

Limpiti, Tulaya; Amornbunchornvej, Chainarong; Intarapanich, Apichart; Assawamakin, Anunchai; Tongsima, Sissades

2014-01-01

Understanding genetic differences among populations is one of the most important issues in population genetics. Genetic variations, e.g., single nucleotide polymorphisms, are used to characterize commonality and difference of individuals from various populations. This paper presents an efficient graph-based clustering framework which operates iteratively on the Neighbor-Joining (NJ) tree called the iNJclust algorithm. The framework uses well-known genetic measurements, namely the allele-sharing distance, the neighbor-joining tree, and the fixation index. The behavior of the fixation index is utilized in the algorithm's stopping criterion. The algorithm provides an estimated number of populations, individual assignments, and relationships between populations as outputs. The clustering result is reported in the form of a binary tree, whose terminal nodes represent the final inferred populations and the tree structure preserves the genetic relationships among them. The clustering performance and the robustness of the proposed algorithm are tested extensively using simulated and real data sets from bovine, sheep, and human populations. The result indicates that the number of populations within each data set is reasonably estimated, the individual assignment is robust, and the structure of the inferred population tree corresponds to the intrinsic relationships among populations within the data.
Performance analysis of a dual-tree algorithm for computing spatial distance histograms

PubMed Central

Chen, Shaoping; Tu, Yi-Cheng; Xia, Yuni

2011-01-01

Many scientific and engineering fields produce large volume of spatiotemporal data. The storage, retrieval, and analysis of such data impose great challenges to database systems design. Analysis of scientific spatiotemporal data often involves computing functions of all point-to-point interactions. One such analytics, the Spatial Distance Histogram (SDH), is of vital importance to scientific discovery. Recently, algorithms for efficient SDH processing in large-scale scientific databases have been proposed. These algorithms adopt a recursive tree-traversing strategy to process point-to-point distances in the visited tree nodes in batches, thus require less time when compared to the brute-force approach where all pairwise distances have to be computed. Despite the promising experimental results, the complexity of such algorithms has not been thoroughly studied. In this paper, we present an analysis of such algorithms based on a geometric modeling approach. The main technique is to transform the analysis of point counts into a problem of quantifying the area of regions where pairwise distances can be processed in batches by the algorithm. From the analysis, we conclude that the number of pairwise distances that are left to be processed decreases exponentially with more levels of the tree visited. This leads to the proof of a time complexity lower than the quadratic time needed for a brute-force algorithm and builds the foundation for a constant-time approximate algorithm. Our model is also general in that it works for a wide range of point spatial distributions, histogram types, and space-partitioning options in building the tree. PMID:21804753
Efficient FPT Algorithms for (Strict) Compatibility of Unrooted Phylogenetic Trees.

PubMed

Baste, Julien; Paul, Christophe; Sau, Ignasi; Scornavacca, Celine

2017-04-01

In phylogenetics, a central problem is to infer the evolutionary relationships between a set of species X; these relationships are often depicted via a phylogenetic tree-a tree having its leaves labeled bijectively by elements of X and without degree-2 nodes-called the "species tree." One common approach for reconstructing a species tree consists in first constructing several phylogenetic trees from primary data (e.g., DNA sequences originating from some species in X), and then constructing a single phylogenetic tree maximizing the "concordance" with the input trees. The obtained tree is our estimation of the species tree and, when the input trees are defined on overlapping-but not identical-sets of labels, is called "supertree." In this paper, we focus on two problems that are central when combining phylogenetic trees into a supertree: the compatibility and the strict compatibility problems for unrooted phylogenetic trees. These problems are strongly related, respectively, to the notions of "containing as a minor" and "containing as a topological minor" in the graph community. Both problems are known to be fixed parameter tractable in the number of input trees k, by using their expressibility in monadic second-order logic and a reduction to graphs of bounded treewidth. Motivated by the fact that the dependency on k of these algorithms is prohibitively large, we give the first explicit dynamic programming algorithms for solving these problems, both running in time [Formula: see text], where n is the total size of the input.
An algorithm to count the number of repeated patient data entries with B tree.

PubMed

Okada, M; Okada, M

1985-04-01

An algorithm to obtain the number of different values that appear a specified number of times in a given data field of a given data file is presented. Basically, a well-known B-tree structure is employed in this study. Some modifications were made to the basic B-tree algorithm. The first step of the modifications is to allow a data item whose values are not necessary distinct from one record to another to be used as a primary key. When a key value is inserted, the number of previous appearances is counted. At the end of all the insertions, the number of key values which are unique in the tree, the number of key values which appear twice, three times, and so forth are obtained. This algorithm is especially powerful for a large size file in disk storage.
Wavelet tree structure based speckle noise removal for optical coherence tomography

NASA Astrophysics Data System (ADS)

Yuan, Xin; Liu, Xuan; Liu, Yang

2018-02-01

We report a new speckle noise removal algorithm in optical coherence tomography (OCT). Though wavelet domain thresholding algorithms have demonstrated superior advantages in suppressing noise magnitude and preserving image sharpness in OCT, the wavelet tree structure has not been investigated in previous applications. In this work, we propose an adaptive wavelet thresholding algorithm via exploiting the tree structure in wavelet coefficients to remove the speckle noise in OCT images. The threshold for each wavelet band is adaptively selected following a special rule to retain the structure of the image across different wavelet layers. Our results demonstrate that the proposed algorithm outperforms conventional wavelet thresholding, with significant advantages in preserving image features.
Using trees to compute approximate solutions to ordinary differential equations exactly

NASA Technical Reports Server (NTRS)

Grossman, Robert

1991-01-01

Some recent work is reviewed which relates families of trees to symbolic algorithms for the exact computation of series which approximate solutions of ordinary differential equations. It turns out that the vector space whose basis is the set of finite, rooted trees carries a natural multiplication related to the composition of differential operators, making the space of trees an algebra. This algebraic structure can be exploited to yield a variety of algorithms for manipulating vector fields and the series and algebras they generate.
Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph.

PubMed

Jothi, R; Mohanty, Sraban Kumar; Ojha, Aparajita

2016-04-01

Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms. Copyright © 2016 Elsevier Ltd. All rights reserved.

Improved quantum backtracking algorithms using effective resistance estimates

NASA Astrophysics Data System (ADS)

Jarret, Michael; Wan, Kianna

2018-02-01

We investigate quantum backtracking algorithms of the type introduced by Montanaro (Montanaro, arXiv:1509.02374). These algorithms explore trees of unknown structure and in certain settings exponentially outperform their classical counterparts. Some of the previous work focused on obtaining a quantum advantage for trees in which a unique marked vertex is promised to exist. We remove this restriction by recharacterizing the problem in terms of the effective resistance of the search space. In this paper, we present a generalization of one of Montanaro's algorithms to trees containing k marked vertices, where k is not necessarily known a priori. Our approach involves using amplitude estimation to determine a near-optimal weighting of a diffusion operator, which can then be applied to prepare a superposition state with support only on marked vertices and ancestors thereof. By repeatedly sampling this state and updating the input vertex, a marked vertex is reached in a logarithmic number of steps. The algorithm thereby achieves the conjectured bound of O ˜(√{T Rmax }) for finding a single marked vertex and O ˜(k √{T Rmax }) for finding all k marked vertices, where T is an upper bound on the tree size and Rmax is the maximum effective resistance encountered by the algorithm. This constitutes a speedup over Montanaro's original procedure in both the case of finding one and the case of finding multiple marked vertices in an arbitrary tree.
Hierarchical Learning of Tree Classifiers for Large-Scale Plant Species Identification.

PubMed

Fan, Jianping; Zhou, Ning; Peng, Jinye; Gao, Ling

2015-11-01

In this paper, a hierarchical multi-task structural learning algorithm is developed to support large-scale plant species identification, where a visual tree is constructed for organizing large numbers of plant species in a coarse-to-fine fashion and determining the inter-related learning tasks automatically. For a given parent node on the visual tree, it contains a set of sibling coarse-grained categories of plant species or sibling fine-grained plant species, and a multi-task structural learning algorithm is developed to train their inter-related classifiers jointly for enhancing their discrimination power. The inter-level relationship constraint, e.g., a plant image must first be assigned to a parent node (high-level non-leaf node) correctly if it can further be assigned to the most relevant child node (low-level non-leaf node or leaf node) on the visual tree, is formally defined and leveraged to learn more discriminative tree classifiers over the visual tree. Our experimental results have demonstrated the effectiveness of our hierarchical multi-task structural learning algorithm on training more discriminative tree classifiers for large-scale plant species identification.
A fast bottom-up algorithm for computing the cut sets of noncoherent fault trees

DOE Office of Scientific and Technical Information (OSTI.GOV)

Corynen, G.C.

1987-11-01

An efficient procedure for finding the cut sets of large fault trees has been developed. Designed to address coherent or noncoherent systems, dependent events, shared or common-cause events, the method - called SHORTCUT - is based on a fast algorithm for transforming a noncoherent tree into a quasi-coherent tree (COHERE), and on a new algorithm for reducing cut sets (SUBSET). To assure sufficient clarity and precision, the procedure is discussed in the language of simple sets, which is also developed in this report. Although the new method has not yet been fully implemented on the computer, we report theoretical worst-casemore » estimates of its computational complexity. 12 refs., 10 figs.« less
Learning Extended Finite State Machines

NASA Technical Reports Server (NTRS)

Cassel, Sofia; Howar, Falk; Jonsson, Bengt; Steffen, Bernhard

2014-01-01

We present an active learning algorithm for inferring extended finite state machines (EFSM)s, combining data flow and control behavior. Key to our learning technique is a novel learning model based on so-called tree queries. The learning algorithm uses the tree queries to infer symbolic data constraints on parameters, e.g., sequence numbers, time stamps, identifiers, or even simple arithmetic. We describe sufficient conditions for the properties that the symbolic constraints provided by a tree query in general must have to be usable in our learning model. We have evaluated our algorithm in a black-box scenario, where tree queries are realized through (black-box) testing. Our case studies include connection establishment in TCP and a priority queue from the Java Class Library.
RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection.

PubMed

Wu, Ke; Zhang, Kun; Fan, Wei; Edwards, Andrea; Yu, Philip S

Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request.
RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection

PubMed Central

Wu, Ke; Zhang, Kun; Fan, Wei; Edwards, Andrea; Yu, Philip S.

2015-01-01

Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request. PMID:25685112
GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems

NASA Astrophysics Data System (ADS)

Fukushige, Toshiyuki; Makino, Junichiro; Kawai, Atsushi

2005-12-01

In this paper, we describe the design and performance of GRAPE-6A, a special-purpose computer for gravitational many-body simulations. It was designed to be used with a PC cluster, in which each node has one GRAPE-6A. Such a configuration is particularly cost-effective in running parallel tree algorithms. Though the use of parallel tree algorithms was possible with the original GRAPE-6 hardware, it was not very cost-effective since a single GRAPE-6 board was still too fast and too expensive. Therefore, we designed GRAPE-6A as a single PCI card to minimize the reproduction cost and to optimize the computing speed. The peak performance is 130 Gflops for one GRAPE-6A board and 3.1 Tflops for our 24 node cluster. We describe the implementation of the tree, TreePM and individual timestep algorithms on both a single GRAPE-6A system and GRAPE-6A cluster. Using the tree algorithm on our 16-node GRAPE-6A system, we can complete a collisionless simulation with 100 million particles (8000 steps) within 10 days.
An Extension of CART's Pruning Algorithm. Program Statistics Research Technical Report No. 91-11.

ERIC Educational Resources Information Center

Kim, Sung-Ho

Among the computer-based methods used for the construction of trees such as AID, THAID, CART, and FACT, the only one that uses an algorithm that first grows a tree and then prunes the tree is CART. The pruning component of CART is analogous in spirit to the backward elimination approach in regression analysis. This idea provides a tool in…
Automatic creation of object hierarchies for ray tracing

NASA Technical Reports Server (NTRS)

Goldsmith, Jeffrey; Salmon, John

1987-01-01

Various methods for evaluating generated trees are proposed. The use of the hierarchical extent method of Rubin and Whitted (1980) to find the objects that will be hit by a ray is examined. This method employs tree searching; the construction of a tree of bounding volumes in order to determine the number of objects that will be hit by a ray is discussed. A tree generation algorithm, which uses a heuristic tree search strategy, is described. The effects of shuffling and sorting on the input data are investigated. The cost of inserting an object into the hierarchy during the construction of a tree algorithm is estimated. The steps involved in estimating the number of intersection calculations are presented.
Integrated Network Decompositions and Dynamic Programming for Graph Optimization (INDDGO)

DOE Office of Scientific and Technical Information (OSTI.GOV)

The INDDGO software package offers a set of tools for finding exact solutions to graph optimization problems via tree decompositions and dynamic programming algorithms. Currently the framework offers serial and parallel (distributed memory) algorithms for finding tree decompositions and solving the maximum weighted independent set problem. The parallel dynamic programming algorithm is implemented on top of the MADNESS task-based runtime.
Learning accurate very fast decision trees from uncertain data streams

NASA Astrophysics Data System (ADS)

Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo

2015-12-01

Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.
An Improved Binary Differential Evolution Algorithm to Infer Tumor Phylogenetic Trees.

PubMed

Liang, Ying; Liao, Bo; Zhu, Wen

2017-01-01

Tumourigenesis is a mutation accumulation process, which is likely to start with a mutated founder cell. The evolutionary nature of tumor development makes phylogenetic models suitable for inferring tumor evolution through genetic variation data. Copy number variation (CNV) is the major genetic marker of the genome with more genes, disease loci, and functional elements involved. Fluorescence in situ hybridization (FISH) accurately measures multiple gene copy number of hundreds of single cells. We propose an improved binary differential evolution algorithm, BDEP, to infer tumor phylogenetic tree based on FISH platform. The topology analysis of tumor progression tree shows that the pathway of tumor subcell expansion varies greatly during different stages of tumor formation. And the classification experiment shows that tree-based features are better than data-based features in distinguishing tumor. The constructed phylogenetic trees have great performance in characterizing tumor development process, which outperforms other similar algorithms.
Enumeration of spanning trees in planar unclustered networks

NASA Astrophysics Data System (ADS)

Xiao, Yuzhi; Zhao, Haixing; Hu, Guona; Ma, Xiujuan

2014-07-01

Among a variety of subgraphs, spanning trees are one of the most important and fundamental categories. They are relevant to diverse aspects of networks, including reliability, transport, self-organized criticality, loop-erased random walks and so on. In this paper, we introduce a family of modular, self-similar planar networks with zero clustering. Relevant properties of this family are comparable to those networks associated with technological systems having low clustering, like power grids, some electronic circuits, the Internet and some biological systems. So, it is very significant to research on spanning trees of planar networks. However, for a large network, evaluating the relevant determinant is intractable. In this paper, we propose a fairly generic linear algorithm for counting the number of spanning trees of a planar network. Using the algorithm, we derive analytically the exact numbers of spanning trees in planar networks. Our result shows that the computational complexity is O(t) , which is better than that of the matrix tree theorem with O(m2t2) , where t is the number of steps and m is the girth of the planar network. We also obtain the entropy for the spanning trees of a given planar network. We find that the entropy of spanning trees in the studied network is small, which is in sharp contrast to the previous result for planar networks with the same average degree. We also determine an upper bound and a lower bound for the numbers of spanning trees in the family of planar networks by the algorithm. As another application of the algorithm, we give a formula for the number of spanning trees in an outerplanar network with small-world features.
Implementation of Data Mining to Analyze Drug Cases Using C4.5 Decision Tree

NASA Astrophysics Data System (ADS)

Wahyuni, Sri

2018-03-01

Data mining was the process of finding useful information from a large set of databases. One of the existing techniques in data mining was classification. The method used was decision tree method and algorithm used was C4.5 algorithm. The decision tree method was a method that transformed a very large fact into a decision tree which was presenting the rules. Decision tree method was useful for exploring data, as well as finding a hidden relationship between a number of potential input variables with a target variable. The decision tree of the C4.5 algorithm was constructed with several stages including the selection of attributes as roots, created a branch for each value and divided the case into the branch. These stages would be repeated for each branch until all the cases on the branch had the same class. From the solution of the decision tree there would be some rules of a case. In this case the researcher classified the data of prisoners at Labuhan Deli prison to know the factors of detainees committing criminal acts of drugs. By applying this C4.5 algorithm, then the knowledge was obtained as information to minimize the criminal acts of drugs. From the findings of the research, it was found that the most influential factor of the detainee committed the criminal act of drugs was from the address variable.
Live phylogeny with polytomies: Finding the most compact parsimonious trees.

PubMed

Papamichail, D; Huang, A; Kennedy, E; Ott, J-L; Miller, A; Papamichail, G

2017-08-01

Construction of phylogenetic trees has traditionally focused on binary trees where all species appear on leaves, a problem for which numerous efficient solutions have been developed. Certain application domains though, such as viral evolution and transmission, paleontology, linguistics, and phylogenetic stemmatics, often require phylogeny inference that involves placing input species on ancestral tree nodes (live phylogeny), and polytomies. These requirements, despite their prevalence, lead to computationally harder algorithmic solutions and have been sparsely examined in the literature to date. In this article we prove some unique properties of most parsimonious live phylogenetic trees with polytomies, and their mapping to traditional binary phylogenetic trees. We show that our problem reduces to finding the most compact parsimonious tree for n species, and describe a novel efficient algorithm to find such trees without resorting to exhaustive enumeration of all possible tree topologies. Copyright © 2017 Elsevier Ltd. All rights reserved.
Blooming Trees: Substructures and Surrounding Groups of Galaxy Clusters

NASA Astrophysics Data System (ADS)

Yu, Heng; Diaferio, Antonaldo; Serra, Ana Laura; Baldi, Marco

2018-06-01

We develop the Blooming Tree Algorithm, a new technique that uses spectroscopic redshift data alone to identify the substructures and the surrounding groups of galaxy clusters, along with their member galaxies. Based on the estimated binding energy of galaxy pairs, the algorithm builds a binary tree that hierarchically arranges all of the galaxies in the field of view. The algorithm searches for buds, corresponding to gravitational potential minima on the binary tree branches; for each bud, the algorithm combines the number of galaxies, their velocity dispersion, and their average pairwise distance into a parameter that discriminates between the buds that do not correspond to any substructure or group, and thus eventually die, and the buds that correspond to substructures and groups, and thus bloom into the identified structures. We test our new algorithm with a sample of 300 mock redshift surveys of clusters in different dynamical states; the clusters are extracted from a large cosmological N-body simulation of a ΛCDM model. We limit our analysis to substructures and surrounding groups identified in the simulation with mass larger than 1013 h ‑1 M ⊙. With mock redshift surveys with 200 galaxies within 6 h ‑1 Mpc from the cluster center, the technique recovers 80% of the real substructures and 60% of the surrounding groups; in 57% of the identified structures, at least 60% of the member galaxies of the substructures and groups belong to the same real structure. These results improve by roughly a factor of two the performance of the best substructure identification algorithm currently available, the σ plateau algorithm, and suggest that our Blooming Tree Algorithm can be an invaluable tool for detecting substructures of galaxy clusters and investigating their complex dynamics.
Adversarial search by evolutionary computation.

PubMed

Hong, T P; Huang, K Y; Lin, W Y

2001-01-01

In this paper, we consider the problem of finding good next moves in two-player games. Traditional search algorithms, such as minimax and alpha-beta pruning, suffer great temporal and spatial expansion when exploring deeply into search trees to find better next moves. The evolution of genetic algorithms with the ability to find global or near global optima in limited time seems promising, but they are inept at finding compound optima, such as the minimax in a game-search tree. We thus propose a new genetic algorithm-based approach that can find a good next move by reserving the board evaluation values of new offspring in a partial game-search tree. Experiments show that solution accuracy and search speed are greatly improved by our algorithm.
Using laser altimetry-based segmentation to refine automated tree identification in managed forests of the Black Hills, South Dakota

Treesearch

Eric Rowell; Carl Selelstad; Lee Vierling; Lloyd Queen; Wayne Sheppard

2006-01-01

The success of a local maximum (LM) tree detection algorithm for detecting individual trees from lidar data depends on stand conditions that are often highly variable. A laser height variance and percent canopy cover (PCC) classification is used to segment the landscape by stand condition prior to stem detection. We test the performance of the LM algorithm using canopy...
3D Forest: An application for descriptions of three-dimensional forest structures using terrestrial LiDAR

PubMed Central

Krůček, Martin; Vrška, Tomáš; Král, Kamil

2017-01-01

Terrestrial laser scanning is a powerful technology for capturing the three-dimensional structure of forests with a high level of detail and accuracy. Over the last decade, many algorithms have been developed to extract various tree parameters from terrestrial laser scanning data. Here we present 3D Forest, an open-source non-platform-specific software application with an easy-to-use graphical user interface with the compilation of algorithms focused on the forest environment and extraction of tree parameters. The current version (0.42) extracts important parameters of forest structure from the terrestrial laser scanning data, such as stem positions (X, Y, Z), tree heights, diameters at breast height (DBH), as well as more advanced parameters such as tree planar projections, stem profiles or detailed crown parameters including convex and concave crown surface and volume. Moreover, 3D Forest provides quantitative measures of between-crown interactions and their real arrangement in 3D space. 3D Forest also includes an original algorithm of automatic tree segmentation and crown segmentation. Comparison with field data measurements showed no significant difference in measuring DBH or tree height using 3D Forest, although for DBH only the Randomized Hough Transform algorithm proved to be sufficiently resistant to noise and provided results comparable to traditional field measurements. PMID:28472167
EDNA: Expert fault digraph analysis using CLIPS

NASA Technical Reports Server (NTRS)

Dixit, Vishweshwar V.

1990-01-01

Traditionally fault models are represented by trees. Recently, digraph models have been proposed (Sack). Digraph models closely imitate the real system dependencies and hence are easy to develop, validate and maintain. However, they can also contain directed cycles and analysis algorithms are hard to find. Available algorithms tend to be complicated and slow. On the other hand, the tree analysis (VGRH, Tayl) is well understood and rooted in vast research effort and analytical techniques. The tree analysis algorithms are sophisticated and orders of magnitude faster. Transformation of a digraph (cyclic) into trees (CLP, LP) is a viable approach to blend the advantages of the representations. Neither the digraphs nor the trees provide the ability to handle heuristic knowledge. An expert system, to capture the engineering knowledge, is essential. We propose an approach here, namely, expert network analysis. We combine the digraph representation and tree algorithms. The models are augmented by probabilistic and heuristic knowledge. CLIPS, an expert system shell from NASA-JSC will be used to develop a tool. The technique provides the ability to handle probabilities and heuristic knowledge. Mixed analysis, some nodes with probabilities, is possible. The tool provides graphics interface for input, query, and update. With the combined approach it is expected to be a valuable tool in the design process as well in the capture of final design knowledge.

Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study.

PubMed

Khalkhali, Hamid Reza; Lotfnezhad Afshar, Hadi; Esnaashari, Omid; Jabbari, Nasrollah

2016-01-01

Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. The classification and regression trees (CART) was applied to a breast cancer database contained information on 569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.
Creating ensembles of oblique decision trees with evolutionary algorithms and sampling

DOEpatents

Cantu-Paz, Erick [Oakland, CA; Kamath, Chandrika [Tracy, CA

2006-06-13

A decision tree system that is part of a parallel object-oriented pattern recognition system, which in turn is part of an object oriented data mining system. A decision tree process includes the step of reading the data. If necessary, the data is sorted. A potential split of the data is evaluated according to some criterion. An initial split of the data is determined. The final split of the data is determined using evolutionary algorithms and statistical sampling techniques. The data is split. Multiple decision trees are combined in ensembles.
An Adaptive Clustering Approach Based on Minimum Travel Route Planning for Wireless Sensor Networks with a Mobile Sink.

PubMed

Tang, Jiqiang; Yang, Wu; Zhu, Lingyun; Wang, Dong; Feng, Xin

2017-04-26

In recent years, Wireless Sensor Networks with a Mobile Sink (WSN-MS) have been an active research topic due to the widespread use of mobile devices. However, how to get the balance between data delivery latency and energy consumption becomes a key issue of WSN-MS. In this paper, we study the clustering approach by jointly considering the Route planning for mobile sink and Clustering Problem (RCP) for static sensor nodes. We solve the RCP problem by using the minimum travel route clustering approach, which applies the minimum travel route of the mobile sink to guide the clustering process. We formulate the RCP problem as an Integer Non-Linear Programming (INLP) problem to shorten the travel route of the mobile sink under three constraints: the communication hops constraint, the travel route constraint and the loop avoidance constraint. We then propose an Imprecise Induction Algorithm (IIA) based on the property that the solution with a small hop count is more feasible than that with a large hop count. The IIA algorithm includes three processes: initializing travel route planning with a Traveling Salesman Problem (TSP) algorithm, transforming the cluster head to a cluster member and transforming the cluster member to a cluster head. Extensive experimental results show that the IIA algorithm could automatically adjust cluster heads according to the maximum hops parameter and plan a shorter travel route for the mobile sink. Compared with the Shortest Path Tree-based Data-Gathering Algorithm (SPT-DGA), the IIA algorithm has the characteristics of shorter route length, smaller cluster head count and faster convergence rate.
Traveling front solutions to directed diffusion-limited aggregation, digital search trees, and the Lempel-Ziv data compression algorithm.

PubMed

Majumdar, Satya N

2003-08-01

We use the traveling front approach to derive exact asymptotic results for the statistics of the number of particles in a class of directed diffusion-limited aggregation models on a Cayley tree. We point out that some aspects of these models are closely connected to two different problems in computer science, namely, the digital search tree problem in data structures and the Lempel-Ziv algorithm for data compression. The statistics of the number of particles studied here is related to the statistics of height in digital search trees which, in turn, is related to the statistics of the length of the longest word formed by the Lempel-Ziv algorithm. Implications of our results to these computer science problems are pointed out.
Traveling front solutions to directed diffusion-limited aggregation, digital search trees, and the Lempel-Ziv data compression algorithm

NASA Astrophysics Data System (ADS)

Majumdar, Satya N.

2003-08-01

We use the traveling front approach to derive exact asymptotic results for the statistics of the number of particles in a class of directed diffusion-limited aggregation models on a Cayley tree. We point out that some aspects of these models are closely connected to two different problems in computer science, namely, the digital search tree problem in data structures and the Lempel-Ziv algorithm for data compression. The statistics of the number of particles studied here is related to the statistics of height in digital search trees which, in turn, is related to the statistics of the length of the longest word formed by the Lempel-Ziv algorithm. Implications of our results to these computer science problems are pointed out.
A Semi-Automated Machine Learning Algorithm for Tree Cover Delineation from 1-m Naip Imagery Using a High Performance Computing Architecture

NASA Astrophysics Data System (ADS)

Basu, S.; Ganguly, S.; Nemani, R. R.; Mukhopadhyay, S.; Milesi, C.; Votava, P.; Michaelis, A.; Zhang, G.; Cook, B. D.; Saatchi, S. S.; Boyda, E.

2014-12-01

Accurate tree cover delineation is a useful instrument in the derivation of Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) satellite imagery data. Numerous algorithms have been designed to perform tree cover delineation in high to coarse resolution satellite imagery, but most of them do not scale to terabytes of data, typical in these VHR datasets. In this paper, we present an automated probabilistic framework for the segmentation and classification of 1-m VHR data as obtained from the National Agriculture Imagery Program (NAIP) for deriving tree cover estimates for the whole of Continental United States, using a High Performance Computing Architecture. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field (CRF), which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by incorporating expert knowledge through the relabeling of misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the state of California, which covers a total of 11,095 NAIP tiles and spans a total geographical area of 163,696 sq. miles. Our framework produced correct detection rates of around 85% for fragmented forests and 70% for urban tree cover areas, with false positive rates lower than 3% for both regions. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR high-resolution canopy height model shows the effectiveness of our algorithm in generating accurate high-resolution tree cover maps.
A New Algorithm Using the Non-Dominated Tree to Improve Non-Dominated Sorting.

PubMed

Gustavsson, Patrik; Syberfeldt, Anna

2018-01-01

Non-dominated sorting is a technique often used in evolutionary algorithms to determine the quality of solutions in a population. The most common algorithm is the Fast Non-dominated Sort (FNS). This algorithm, however, has the drawback that its performance deteriorates when the population size grows. The same drawback applies also to other non-dominating sorting algorithms such as the Efficient Non-dominated Sort with Binary Strategy (ENS-BS). An algorithm suggested to overcome this drawback is the Divide-and-Conquer Non-dominated Sort (DCNS) which works well on a limited number of objectives but deteriorates when the number of objectives grows. This article presents a new, more efficient algorithm called the Efficient Non-dominated Sort with Non-Dominated Tree (ENS-NDT). ENS-NDT is an extension of the ENS-BS algorithm and uses a novel Non-Dominated Tree (NDTree) to speed up the non-dominated sorting. ENS-NDT is able to handle large population sizes and a large number of objectives more efficiently than existing algorithms for non-dominated sorting. In the article, it is shown that with ENS-NDT the runtime of multi-objective optimization algorithms such as the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) can be substantially reduced.
New Hybrid Algorithms for Estimating Tree Stem Diameters at Breast Height Using a Two Dimensional Terrestrial Laser Scanner

PubMed Central

Kong, Jianlei; Ding, Xiaokang; Liu, Jinhao; Yan, Lei; Wang, Jianli

2015-01-01

In this paper, a new algorithm to improve the accuracy of estimating diameter at breast height (DBH) for tree trunks in forest areas is proposed. First, the information is collected by a two-dimensional terrestrial laser scanner (2DTLS), which emits laser pulses to generate a point cloud. After extraction and filtration, the laser point clusters of the trunks are obtained, which are optimized by an arithmetic means method. Then, an algebraic circle fitting algorithm in polar form is non-linearly optimized by the Levenberg-Marquardt method to form a new hybrid algorithm, which is used to acquire the diameters and positions of the trees. Compared with previous works, this proposed method improves the accuracy of diameter estimation of trees significantly and effectively reduces the calculation time. Moreover, the experimental results indicate that this method is stable and suitable for the most challenging conditions, which has practical significance in improving the operating efficiency of forest harvester and reducing the risk of causing accidents. PMID:26147726
Efficient Delaunay Tessellation through K-D Tree Decomposition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morozov, Dmitriy; Peterka, Tom

Delaunay tessellations are fundamental data structures in computational geometry. They are important in data analysis, where they can represent the geometry of a point set or approximate its density. The algorithms for computing these tessellations at scale perform poorly when the input data is unbalanced. We investigate the use of k-d trees to evenly distribute points among processes and compare two strategies for picking split points between domain regions. Because resulting point distributions no longer satisfy the assumptions of existing parallel Delaunay algorithms, we develop a new parallel algorithm that adapts to its input and prove its correctness. We evaluatemore » the new algorithm using two late-stage cosmology datasets. The new running times are up to 50 times faster using k-d tree compared with regular grid decomposition. Moreover, in the unbalanced data sets, decomposing the domain into a k-d tree is up to five times faster than decomposing it into a regular grid.« less
Recursive optimal pruning with applications to tree structured vector quantizers

NASA Technical Reports Server (NTRS)

Kiang, Shei-Zein; Baker, Richard L.; Sullivan, Gary J.; Chiu, Chung-Yen

1992-01-01

A pruning algorithm of Chou et al. (1989) for designing optimal tree structures identifies only those codebooks which lie on the convex hull of the original codebook's operational distortion rate function. The authors introduce a modified version of the original algorithm, which identifies a large number of codebooks having minimum average distortion, under the constraint that, in each step, only modes having no descendents are removed from the tree. All codebooks generated by the original algorithm are also generated by this algorithm. The new algorithm generates a much larger number of codebooks in the middle- and low-rate regions. The additional codebooks permit operation near the codebook's operational distortion rate function without time sharing by choosing from the increased number of available bit rates. Despite the statistical mismatch which occurs when coding data outside the training sequence, these pruned codebooks retain their performance advantage over full search vector quantizers (VQs) for a large range of rates.
A faster 1.375-approximation algorithm for sorting by transpositions.

PubMed

Cunha, Luís Felipe I; Kowada, Luis Antonio B; Hausen, Rodrigo de A; de Figueiredo, Celina M H

2015-11-01

Sorting by Transpositions is an NP-hard problem for which several polynomial-time approximation algorithms have been developed. Hartman and Shamir (2006) developed a 1.5-approximation [Formula: see text] algorithm, whose running time was improved to O(nlogn) by Feng and Zhu (2007) with a data structure they defined, the permutation tree. Elias and Hartman (2006) developed a 1.375-approximation O(n(2)) algorithm, and Firoz et al. (2011) claimed an improvement to the running time, from O(n(2)) to O(nlogn), by using the permutation tree. We provide counter-examples to the correctness of Firoz et al.'s strategy, showing that it is not possible to reach a component by sufficient extensions using the method proposed by them. In addition, we propose a 1.375-approximation algorithm, modifying Elias and Hartman's approach with the use of permutation trees and achieving O(nlogn) time.
Application of the pessimistic pruning to increase the accuracy of C4.5 algorithm in diagnosing chronic kidney disease

NASA Astrophysics Data System (ADS)

Muslim, M. A.; Herowati, A. J.; Sugiharti, E.; Prasetiyo, B.

2018-03-01

A technique to dig valuable information buried or hidden in data collection which is so big to be found an interesting patterns that was previously unknown is called data mining. Data mining has been applied in the healthcare industry. One technique used data mining is classification. The decision tree included in the classification of data mining and algorithm developed by decision tree is C4.5 algorithm. A classifier is designed using applying pessimistic pruning in C4.5 algorithm in diagnosing chronic kidney disease. Pessimistic pruning use to identify and remove branches that are not needed, this is done to avoid overfitting the decision tree generated by the C4.5 algorithm. In this paper, the result obtained using these classifiers are presented and discussed. Using pessimistic pruning shows increase accuracy of C4.5 algorithm of 1.5% from 95% to 96.5% in diagnosing of chronic kidney disease.
C-semiring Frameworks for Minimum Spanning Tree Problems

NASA Astrophysics Data System (ADS)

Bistarelli, Stefano; Santini, Francesco

In this paper we define general algebraic frameworks for the Minimum Spanning Tree problem based on the structure of c-semirings. We propose general algorithms that can compute such trees by following different cost criteria, which must be all specific instantiation of c-semirings. Our algorithms are extensions of well-known procedures, as Prim or Kruskal, and show the expressivity of these algebraic structures. They can deal also with partially-ordered costs on the edges.
Parallel peak pruning for scalable SMP contour tree computation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.

As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
[Prediction of regional soil quality based on mutual information theory integrated with decision tree algorithm].

PubMed

Lin, Fen-Fang; Wang, Ke; Yang, Ning; Yan, Shi-Guang; Zheng, Xin-Yu

2012-02-01

In this paper, some main factors such as soil type, land use pattern, lithology type, topography, road, and industry type that affect soil quality were used to precisely obtain the spatial distribution characteristics of regional soil quality, mutual information theory was adopted to select the main environmental factors, and decision tree algorithm See 5.0 was applied to predict the grade of regional soil quality. The main factors affecting regional soil quality were soil type, land use, lithology type, distance to town, distance to water area, altitude, distance to road, and distance to industrial land. The prediction accuracy of the decision tree model with the variables selected by mutual information was obviously higher than that of the model with all variables, and, for the former model, whether of decision tree or of decision rule, its prediction accuracy was all higher than 80%. Based on the continuous and categorical data, the method of mutual information theory integrated with decision tree could not only reduce the number of input parameters for decision tree algorithm, but also predict and assess regional soil quality effectively.
Heterogeneous Compression of Large Collections of Evolutionary Trees.

PubMed

Matthews, Suzanne J

2015-01-01

Compressing heterogeneous collections of trees is an open problem in computational phylogenetics. In a heterogeneous tree collection, each tree can contain a unique set of taxa. An ideal compression method would allow for the efficient archival of large tree collections and enable scientists to identify common evolutionary relationships over disparate analyses. In this paper, we extend TreeZip to compress heterogeneous collections of trees. TreeZip is the most efficient algorithm for compressing homogeneous tree collections. To the best of our knowledge, no other domain-based compression algorithm exists for large heterogeneous tree collections or enable their rapid analysis. Our experimental results indicate that TreeZip averages 89.03 percent (72.69 percent) space savings on unweighted (weighted) collections of trees when the level of heterogeneity in a collection is moderate. The organization of the TRZ file allows for efficient computations over heterogeneous data. For example, consensus trees can be computed in mere seconds. Lastly, combining the TreeZip compressed (TRZ) file with general-purpose compression yields average space savings of 97.34 percent (81.43 percent) on unweighted (weighted) collections of trees. Our results lead us to believe that TreeZip will prove invaluable in the efficient archival of tree collections, and enables scientists to develop novel methods for relating heterogeneous collections of trees.
Exploiting the wavelet structure in compressed sensing MRI.

PubMed

Chen, Chen; Huang, Junzhou

2014-12-01

Sparsity has been widely utilized in magnetic resonance imaging (MRI) to reduce k-space sampling. According to structured sparsity theories, fewer measurements are required for tree sparse data than the data only with standard sparsity. Intuitively, more accurate image reconstruction can be achieved with the same number of measurements by exploiting the wavelet tree structure in MRI. A novel algorithm is proposed in this article to reconstruct MR images from undersampled k-space data. In contrast to conventional compressed sensing MRI (CS-MRI) that only relies on the sparsity of MR images in wavelet or gradient domain, we exploit the wavelet tree structure to improve CS-MRI. This tree-based CS-MRI problem is decomposed into three simpler subproblems then each of the subproblems can be efficiently solved by an iterative scheme. Simulations and in vivo experiments demonstrate the significant improvement of the proposed method compared to conventional CS-MRI algorithms, and the feasibleness on MR data compared to existing tree-based imaging algorithms. Copyright © 2014 Elsevier Inc. All rights reserved.
Data mining for multiagent rules, strategies, and fuzzy decision tree structure

NASA Astrophysics Data System (ADS)

Smith, James F., III; Rhyne, Robert D., II; Fisher, Kristin

2002-03-01

A fuzzy logic based resource manager (RM) has been developed that automatically allocates electronic attack resources in real-time over many dissimilar platforms. Two different data mining algorithms have been developed to determine rules, strategies, and fuzzy decision tree structure. The first data mining algorithm uses a genetic algorithm as a data mining function and is called from an electronic game. The game allows a human expert to play against the resource manager in a simulated battlespace with each of the defending platforms being exclusively directed by the fuzzy resource manager and the attacking platforms being controlled by the human expert or operating autonomously under their own logic. This approach automates the data mining problem. The game automatically creates a database reflecting the domain expert's knowledge. It calls a data mining function, a genetic algorithm, for data mining of the database as required and allows easy evaluation of the information mined in the second step. The criterion for re- optimization is discussed as well as experimental results. Then a second data mining algorithm that uses a genetic program as a data mining function is introduced to automatically discover fuzzy decision tree structures. Finally, a fuzzy decision tree generated through this process is discussed.
Determining Geometric Parameters of Agricultural Trees from Laser Scanning Data Obtained with Unmanned Aerial Vehicle

NASA Astrophysics Data System (ADS)

Hadas, E.; Jozkow, G.; Walicka, A.; Borkowski, A.

2018-05-01

The estimation of dendrometric parameters has become an important issue for agriculture planning and for the efficient management of orchards. Airborne Laser Scanning (ALS) data is widely used in forestry and many algorithms for automatic estimation of dendrometric parameters of individual forest trees were developed. Unfortunately, due to significant differences between forest and fruit trees, some contradictions exist against adopting the achievements of forestry science to agricultural studies indiscriminately. In this study we present the methodology to identify individual trees in apple orchard and estimate heights of individual trees, using high-density LiDAR data (3200 points/m2) obtained with Unmanned Aerial Vehicle (UAV) equipped with Velodyne HDL32-E sensor. The processing strategy combines the alpha-shape algorithm, principal component analysis (PCA) and detection of local minima. The alpha-shape algorithm is used to separate tree rows. In order to separate trees in a single row, we detect local minima on the canopy profile and slice polygons from alpha-shape results. We successfully separated 92 % of trees in the test area. 6 % of trees in orchard were not separated from each other and 2 % were sliced into two polygons. The RMSE of tree heights determined from the point clouds compared to field measurements was equal to 0.09 m, and the correlation coefficient was equal to 0.96. The results confirm the usefulness of LiDAR data from UAV platform in orchard inventory.
Method for estimating potential tree-grade distributions for northeastern forest species

Treesearch

Daniel A. Yaussy; Daniel A. Yaussy

1993-01-01

Generalized logistic regression was used to distribute trees into four potential tree grades for 20 northeastern species groups. The potential tree grade is defined as the tree grade based on the length and amount of clear cuttings and defects only, disregarding minimum grading diameter. The algorithms described use site index and tree diameter as the predictive...

Integrated Approach To Design And Analysis Of Systems

NASA Technical Reports Server (NTRS)

Patterson-Hine, F. A.; Iverson, David L.

1993-01-01

Object-oriented fault-tree representation unifies evaluation of reliability and diagnosis of faults. Programming/fault tree described more fully in "Object-Oriented Algorithm For Evaluation Of Fault Trees" (ARC-12731). Augmented fault tree object contains more information than fault tree object used in quantitative analysis of reliability. Additional information needed to diagnose faults in system represented by fault tree.
MDTS: automatic complex materials design using Monte Carlo tree search.

PubMed

M Dieb, Thaer; Ju, Shenghong; Yoshizoe, Kazuki; Hou, Zhufeng; Shiomi, Junichiro; Tsuda, Koji

2017-01-01

Complex materials design is often represented as a black-box combinatorial optimization problem. In this paper, we present a novel python library called MDTS (Materials Design using Tree Search). Our algorithm employs a Monte Carlo tree search approach, which has shown exceptional performance in computer Go game. Unlike evolutionary algorithms that require user intervention to set parameters appropriately, MDTS has no tuning parameters and works autonomously in various problems. In comparison to a Bayesian optimization package, our algorithm showed competitive search efficiency and superior scalability. We succeeded in designing large Silicon-Germanium (Si-Ge) alloy structures that Bayesian optimization could not deal with due to excessive computational cost. MDTS is available at https://github.com/tsudalab/MDTS.
MDTS: automatic complex materials design using Monte Carlo tree search

NASA Astrophysics Data System (ADS)

Dieb, Thaer M.; Ju, Shenghong; Yoshizoe, Kazuki; Hou, Zhufeng; Shiomi, Junichiro; Tsuda, Koji

2017-12-01

Complex materials design is often represented as a black-box combinatorial optimization problem. In this paper, we present a novel python library called MDTS (Materials Design using Tree Search). Our algorithm employs a Monte Carlo tree search approach, which has shown exceptional performance in computer Go game. Unlike evolutionary algorithms that require user intervention to set parameters appropriately, MDTS has no tuning parameters and works autonomously in various problems. In comparison to a Bayesian optimization package, our algorithm showed competitive search efficiency and superior scalability. We succeeded in designing large Silicon-Germanium (Si-Ge) alloy structures that Bayesian optimization could not deal with due to excessive computational cost. MDTS is available at https://github.com/tsudalab/MDTS.
A fast algorithm for identifying friends-of-friends halos

NASA Astrophysics Data System (ADS)

Feng, Y.; Modi, C.

2017-07-01

We describe a simple and fast algorithm for identifying friends-of-friends features and prove its correctness. The algorithm avoids unnecessary expensive neighbor queries, uses minimal memory overhead, and rejects slowdown in high over-density regions. We define our algorithm formally based on pair enumeration, a problem that has been heavily studied in fast 2-point correlation codes and our reference implementation employs a dual KD-tree correlation function code. We construct features in a hierarchical tree structure, and use a splay operation to reduce the average cost of identifying the root of a feature from O [ log L ] to O [ 1 ] (L is the size of a feature) without additional memory costs. This reduces the overall time complexity of merging trees from O [ L log L ] to O [ L ] , reducing the number of operations per splay by orders of magnitude. We next introduce a pruning operation that skips merge operations between two fully self-connected KD-tree nodes. This improves the robustness of the algorithm, reducing the number of merge operations in high density peaks from O [δ2 ] to O [ δ ] . We show that for cosmological data set the algorithm eliminates more than half of merge operations for typically used linking lengths b ∼ 0 . 2 (relative to mean separation). Furthermore, our algorithm is extremely simple and easy to implement on top of an existing pair enumeration code, reusing the optimization effort that has been invested in fast correlation function codes.
Tree-based solvers for adaptive mesh refinement code FLASH - I: gravity and optical depths

NASA Astrophysics Data System (ADS)

Wünsch, R.; Walch, S.; Dinnbier, F.; Whitworth, A.

2018-04-01

We describe an OctTree algorithm for the MPI parallel, adaptive mesh refinement code FLASH, which can be used to calculate the gas self-gravity, and also the angle-averaged local optical depth, for treating ambient diffuse radiation. The algorithm communicates to the different processors only those parts of the tree that are needed to perform the tree-walk locally. The advantage of this approach is a relatively low memory requirement, important in particular for the optical depth calculation, which needs to process information from many different directions. This feature also enables a general tree-based radiation transport algorithm that will be described in a subsequent paper, and delivers excellent scaling up to at least 1500 cores. Boundary conditions for gravity can be either isolated or periodic, and they can be specified in each direction independently, using a newly developed generalization of the Ewald method. The gravity calculation can be accelerated with the adaptive block update technique by partially re-using the solution from the previous time-step. Comparison with the FLASH internal multigrid gravity solver shows that tree-based methods provide a competitive alternative, particularly for problems with isolated or mixed boundary conditions. We evaluate several multipole acceptance criteria (MACs) and identify a relatively simple approximate partial error MAC which provides high accuracy at low computational cost. The optical depth estimates are found to agree very well with those of the RADMC-3D radiation transport code, with the tree-solver being much faster. Our algorithm is available in the standard release of the FLASH code in version 4.0 and later.
Polynomial algorithms for the Maximal Pairing Problem: efficient phylogenetic targeting on arbitrary trees

PubMed Central

2010-01-01

Background The Maximal Pairing Problem (MPP) is the prototype of a class of combinatorial optimization problems that are of considerable interest in bioinformatics: Given an arbitrary phylogenetic tree T and weights ωxy for the paths between any two pairs of leaves (x, y), what is the collection of edge-disjoint paths between pairs of leaves that maximizes the total weight? Special cases of the MPP for binary trees and equal weights have been described previously; algorithms to solve the general MPP are still missing, however. Results We describe a relatively simple dynamic programming algorithm for the special case of binary trees. We then show that the general case of multifurcating trees can be treated by interleaving solutions to certain auxiliary Maximum Weighted Matching problems with an extension of this dynamic programming approach, resulting in an overall polynomial-time solution of complexity (n4 log n) w.r.t. the number n of leaves. The source code of a C implementation can be obtained under the GNU Public License from http://www.bioinf.uni-leipzig.de/Software/Targeting. For binary trees, we furthermore discuss several constrained variants of the MPP as well as a partition function approach to the probabilistic version of the MPP. Conclusions The algorithms introduced here make it possible to solve the MPP also for large trees with high-degree vertices. This has practical relevance in the field of comparative phylogenetics and, for example, in the context of phylogenetic targeting, i.e., data collection with resource limitations. PMID:20525185
OCTGRAV: Sparse Octree Gravitational N-body Code on Graphics Processing Units

NASA Astrophysics Data System (ADS)

Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

2010-10-01

Octgrav is a very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The algorithms are based on parallel-scan and sort methods. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way, a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s is achieved. It takes about a second to compute forces on a million particles with an opening angle of heta approx 0.5. To test the performance and feasibility, we implemented the algorithms in CUDA in the form of a gravitational tree-code which completely runs on the GPU. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. The code has a convenient user interface and is freely available for use.
Phylogenetic Copy-Number Factorization of Multiple Tumor Samples.

PubMed

Zaccaria, Simone; El-Kebir, Mohammed; Klau, Gunnar W; Raphael, Benjamin J

2018-04-16

Cancer is an evolutionary process driven by somatic mutations. This process can be represented as a phylogenetic tree. Constructing such a phylogenetic tree from genome sequencing data is a challenging task due to the many types of mutations in cancer and the fact that nearly all cancer sequencing is of a bulk tumor, measuring a superposition of somatic mutations present in different cells. We study the problem of reconstructing tumor phylogenies from copy-number aberrations (CNAs) measured in bulk-sequencing data. We introduce the Copy-Number Tree Mixture Deconvolution (CNTMD) problem, which aims to find the phylogenetic tree with the fewest number of CNAs that explain the copy-number data from multiple samples of a tumor. We design an algorithm for solving the CNTMD problem and apply the algorithm to both simulated and real data. On simulated data, we find that our algorithm outperforms existing approaches that either perform deconvolution/factorization of mixed tumor samples or build phylogenetic trees assuming homogeneous tumor samples. On real data, we analyze multiple samples from a prostate cancer patient, identifying clones within these samples and a phylogenetic tree that relates these clones and their differing proportions across samples. This phylogenetic tree provides a higher resolution view of copy-number evolution of this cancer than published analyses.
Multi-hop path tracing of mobile robot with multi-range image

NASA Astrophysics Data System (ADS)

Choudhury, Ramakanta; Samal, Chandrakanta; Choudhury, Umakanta

2010-02-01

It is well known that image processing depends heavily upon image representation technique . This paper intends to find out the optimal path of mobile robots for a specified area where obstacles are predefined as well as modified. Here the optimal path is represented by using the Quad tree method. Since there has been rising interest in the use of quad tree, we have tried to use the successive subdivision of images into quadrants from which the quad tree is developed. In the quad tree, obstacles-free area and the partial filled area are represented with different notations. After development of quad tree the algorithm is used to find the optimal path by employing neighbor finding technique, with a view to move the robot from the source to destination. The algorithm, here , permeates through the entire tree, and tries to locate the common ancestor for computation. The computation and the algorithm, aim at easing the ability of the robot to trace the optimal path with the help of adjacencies between the neighboring nodes as well as determining such adjacencies in the horizontal, vertical and diagonal directions. In this paper efforts have been made to determine the movement of the adjacent block in the quad tree and to detect the transition between the blocks equal size and finally generate the result.
A hybrid 3D spatial access method based on quadtrees and R-trees for globe data

NASA Astrophysics Data System (ADS)

Gong, Jun; Ke, Shengnan; Li, Xiaomin; Qi, Shuhua

2009-10-01

3D spatial access method for globe data is very crucial technique for virtual earth. This paper presents a brand-new maintenance method to index 3d objects distributed on the whole surface of the earth, which integrates the 1:1,000,000- scale topographic map tiles, Quad-tree and R-tree. Furthermore, when traditional methods are extended into 3d space, the performance of spatial index deteriorates badly, for example 3D R-tree. In order to effectively solve this difficult problem, a new algorithm of dynamic R-tree is put forward, which includes two sub-procedures, namely node-choosing and node-split. In the node-choosing algorithm, a new strategy is adopted, not like the traditional mode which is from top to bottom, but firstly from bottom to top then from top to bottom. This strategy can effectively solve the negative influence of node overlap. In the node-split algorithm, 2-to-3 split mode substitutes the traditional 1-to-2 mode, which can better concern the shape and size of nodes. Because of the rational tree shape, this R-tree method can easily integrate the concept of LOD. Therefore, it will be later implemented in commercial DBMS and adopted in time-crucial 3d GIS system.
TreeNetViz: revealing patterns of networks over tree structures.

PubMed

Gou, Liang; Zhang, Xiaolong Luke

2011-12-01

Network data often contain important attributes from various dimensions such as social affiliations and areas of expertise in a social network. If such attributes exhibit a tree structure, visualizing a compound graph consisting of tree and network structures becomes complicated. How to visually reveal patterns of a network over a tree has not been fully studied. In this paper, we propose a compound graph model, TreeNet, to support visualization and analysis of a network at multiple levels of aggregation over a tree. We also present a visualization design, TreeNetViz, to offer the multiscale and cross-scale exploration and interaction of a TreeNet graph. TreeNetViz uses a Radial, Space-Filling (RSF) visualization to represent the tree structure, a circle layout with novel optimization to show aggregated networks derived from TreeNet, and an edge bundling technique to reduce visual complexity. Our circular layout algorithm reduces both total edge-crossings and edge length and also considers hierarchical structure constraints and edge weight in a TreeNet graph. These experiments illustrate that the algorithm can reduce visual cluttering in TreeNet graphs. Our case study also shows that TreeNetViz has the potential to support the analysis of a compound graph by revealing multiscale and cross-scale network patterns. © 2011 IEEE
Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets

PubMed Central

Doubravsky, Karel; Dohnal, Mirko

2015-01-01

Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details. PMID:26158662
Reconciliation of Decision-Making Heuristics Based on Decision Trees Topologies and Incomplete Fuzzy Probabilities Sets.

PubMed

Doubravsky, Karel; Dohnal, Mirko

2015-01-01

Complex decision making tasks of different natures, e.g. economics, safety engineering, ecology and biology, are based on vague, sparse, partially inconsistent and subjective knowledge. Moreover, decision making economists / engineers are usually not willing to invest too much time into study of complex formal theories. They require such decisions which can be (re)checked by human like common sense reasoning. One important problem related to realistic decision making tasks are incomplete data sets required by the chosen decision making algorithm. This paper presents a relatively simple algorithm how some missing III (input information items) can be generated using mainly decision tree topologies and integrated into incomplete data sets. The algorithm is based on an easy to understand heuristics, e.g. a longer decision tree sub-path is less probable. This heuristic can solve decision problems under total ignorance, i.e. the decision tree topology is the only information available. But in a practice, isolated information items e.g. some vaguely known probabilities (e.g. fuzzy probabilities) are usually available. It means that a realistic problem is analysed under partial ignorance. The proposed algorithm reconciles topology related heuristics and additional fuzzy sets using fuzzy linear programming. The case study, represented by a tree with six lotteries and one fuzzy probability, is presented in details.
Computing all hybridization networks for multiple binary phylogenetic input trees.

PubMed

Albrecht, Benjamin

2015-07-30

The computation of phylogenetic trees on the same set of species that are based on different orthologous genes can lead to incongruent trees. One possible explanation for this behavior are interspecific hybridization events recombining genes of different species. An important approach to analyze such events is the computation of hybridization networks. This work presents the first algorithm computing the hybridization number as well as a set of representative hybridization networks for multiple binary phylogenetic input trees on the same set of taxa. To improve its practical runtime, we show how this algorithm can be parallelized. Moreover, we demonstrate the efficiency of the software Hybroscale, containing an implementation of our algorithm, by comparing it to PIRNv2.0, which is so far the best available software computing the exact hybridization number for multiple binary phylogenetic trees on the same set of taxa. The algorithm is part of the software Hybroscale, which was developed specifically for the investigation of hybridization networks including their computation and visualization. Hybroscale is freely available(1) and runs on all three major operating systems. Our simulation study indicates that our approach is on average 100 times faster than PIRNv2.0. Moreover, we show how Hybroscale improves the interpretation of the reported hybridization networks by adding certain features to its graphical representation.
Continuous-time quantum search on balanced trees

NASA Astrophysics Data System (ADS)

Philipp, Pascal; Tarrataca, Luís; Boettcher, Stefan

2016-03-01

We examine the effect of network heterogeneity on the performance of quantum search algorithms. To this end, we study quantum search on a tree for the oracle Hamiltonian formulation employed by continuous-time quantum walks. We use analytical and numerical arguments to show that the exponent of the asymptotic running time ˜Nβ changes uniformly from β =0.5 to β =1 as the searched-for site is moved from the root of the tree towards the leaves. These results imply that the time complexity of the quantum search algorithm on a balanced tree is closely correlated with certain path-based centrality measures of the searched-for site.
Application of a fast skyline computation algorithm for serendipitous searching problems

NASA Astrophysics Data System (ADS)

Koizumi, Kenichi; Hiraki, Kei; Inaba, Mary

2018-02-01

Skyline computation is a method of extracting interesting entries from a large population with multiple attributes. These entries, called skyline or Pareto optimal entries, are known to have extreme characteristics that cannot be found by outlier detection methods. Skyline computation is an important task for characterizing large amounts of data and selecting interesting entries with extreme features. When the population changes dynamically, the task of calculating a sequence of skyline sets is called continuous skyline computation. This task is known to be difficult to perform for the following reasons: (1) information of non-skyline entries must be stored since they may join the skyline in the future; (2) the appearance or disappearance of even a single entry can change the skyline drastically; (3) it is difficult to adopt a geometric acceleration algorithm for skyline computation tasks with high-dimensional datasets. Our new algorithm called jointed rooted-tree (JR-tree) manages entries using a rooted tree structure. JR-tree delays extend the tree to deep levels to accelerate tree construction and traversal. In this study, we presented the difficulties in extracting entries tagged with a rare label in high-dimensional space and the potential of fast skyline computation in low-latency cell identification technology.
Identification of TPS family members in apple (Malus x domestica Borkh.) and the effect of sucrose sprays on TPS expression and floral induction.

PubMed

Du, Lisha; Qi, Siyan; Ma, Juanjuan; Xing, Libo; Fan, Sheng; Zhang, Songwen; Li, Youmei; Shen, Yawen; Zhang, Dong; Han, Mingyu

2017-11-01

Trehalose (α-D-glucopyranosyl α-D-glucopyranoside) is a non-reducing disaccharide that serves as a carbon source and stress protectant in apple trees. Trehalose-6-phosphate (T6P) is the biosynthetic precursor of trehalose. It functions as a crucial signaling molecule involved in the regulation of floral induction, and is closely related to sucrose. Trehalose-6-phosphate synthase (TPS) family members are pivotal components of the T6P biosynthetic pathway. The present study identified 13 apple TPS family members and characterized their expression patterns in different tissues and in response to exogenous application of sucrose during floral induction. 'Fuji' apple trees were sprayed with sucrose prior to the onset of floral induction. Bud growth, flowering rate, and endogenous sugar levels were then monitored. The expression of genes associated with sucrose metabolism and flowering were also characterized by RT-quantitative PCR. Results revealed that sucrose applications significantly improved flower production and increased bud size and fresh weight, as well as the sucrose content in buds and leaves. Furthermore, the expression of MdTPS1, 2, 4, 10, and 11 was rapidly and significantly up-regulated in response to the sucrose treatments. In addition, the expression levels of flowering-related genes (e.g., SPL genes, FT1, and AP1) also increased in response to the sucrose sprays. In summary, apple TPS family members were identified that may influence the regulation of floral induction and other responses to sucrose. The relationship between sucrose and T6P or TPS during the regulation of floral induction in apple trees is discussed. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Implementation of Tree and Butterfly Barriers with Optimistic Time Management Algorithms for Discrete Event Simulation

NASA Astrophysics Data System (ADS)

Rizvi, Syed S.; Shah, Dipali; Riasat, Aasia

The Time Wrap algorithm [3] offers a run time recovery mechanism that deals with the causality errors. These run time recovery mechanisms consists of rollback, anti-message, and Global Virtual Time (GVT) techniques. For rollback, there is a need to compute GVT which is used in discrete-event simulation to reclaim the memory, commit the output, detect the termination, and handle the errors. However, the computation of GVT requires dealing with transient message problem and the simultaneous reporting problem. These problems can be dealt in an efficient manner by the Samadi's algorithm [8] which works fine in the presence of causality errors. However, the performance of both Time Wrap and Samadi's algorithms depends on the latency involve in GVT computation. Both algorithms give poor latency for large simulation systems especially in the presence of causality errors. To improve the latency and reduce the processor ideal time, we implement tree and butterflies barriers with the optimistic algorithm. Our analysis shows that the use of synchronous barriers such as tree and butterfly with the optimistic algorithm not only minimizes the GVT latency but also minimizes the processor idle time.
Fish to meat intake ratio and cooking oils are associated with hepatitis C virus carriers with persistently normal alanine aminotransferase levels.

PubMed

Otsuka, Momoka; Uchida, Yuki; Kawaguchi, Takumi; Taniguchi, Eitaro; Kawaguchi, Atsushi; Kitani, Shingo; Itou, Minoru; Oriishi, Tetsuharu; Kakuma, Tatsuyuki; Tanaka, Suiko; Yagi, Minoru; Sata, Michio

2012-10-01

Dietary habits are involved in the development of chronic inflammation; however, the impact of dietary profiles of hepatitis C virus carriers with persistently normal alanine transaminase levels (HCV-PNALT) remains unclear. The decision-tree algorithm is a data-mining statistical technique, which uncovers meaningful profiles of factors from a data collection. We aimed to investigate dietary profiles associated with HCV-PNALT using a decision-tree algorithm. Twenty-seven HCV-PNALT and 41 patients with chronic hepatitis C were enrolled in this study. Dietary habit was assessed using a validated semiquantitative food frequency questionnaire. A decision-tree algorithm was created by dietary variables, and was evaluated by area under the receiver operating characteristic curve analysis (AUROC). In multivariate analysis, fish to meat ratio, dairy product and cooking oils were identified as independent variables associated with HCV-PNALT. The decision-tree algorithm was created with two variables: a fish to meat ratio and cooking oils/ideal bodyweight. When subjects showed a fish to meat ratio of 1.24 or more, 68.8% of the subjects were HCV-PNALT. On the other hand, 11.5% of the subjects were HCV-PNALT when subjects showed a fish to meat ratio of less than 1.24 and cooking oil/ideal bodyweight of less than 0.23 g/kg. The difference in the proportion of HCV-PNALT between these groups are significant (odds ratio 16.87, 95% CI 3.40-83.67, P = 0.0005). Fivefold cross-validation of the decision-tree algorithm showed an AUROC of 0.6947 (95% CI 0.5656-0.8238, P = 0.0067). The decision-tree algorithm disclosed that fish to meat ratio and cooking oil/ideal bodyweight were associated with HCV-PNALT. © 2012 The Japan Society of Hepatology.
An Adaptive Clustering Approach Based on Minimum Travel Route Planning for Wireless Sensor Networks with a Mobile Sink

PubMed Central

Tang, Jiqiang; Yang, Wu; Zhu, Lingyun; Wang, Dong; Feng, Xin

2017-01-01

In recent years, Wireless Sensor Networks with a Mobile Sink (WSN-MS) have been an active research topic due to the widespread use of mobile devices. However, how to get the balance between data delivery latency and energy consumption becomes a key issue of WSN-MS. In this paper, we study the clustering approach by jointly considering the Route planning for mobile sink and Clustering Problem (RCP) for static sensor nodes. We solve the RCP problem by using the minimum travel route clustering approach, which applies the minimum travel route of the mobile sink to guide the clustering process. We formulate the RCP problem as an Integer Non-Linear Programming (INLP) problem to shorten the travel route of the mobile sink under three constraints: the communication hops constraint, the travel route constraint and the loop avoidance constraint. We then propose an Imprecise Induction Algorithm (IIA) based on the property that the solution with a small hop count is more feasible than that with a large hop count. The IIA algorithm includes three processes: initializing travel route planning with a Traveling Salesman Problem (TSP) algorithm, transforming the cluster head to a cluster member and transforming the cluster member to a cluster head. Extensive experimental results show that the IIA algorithm could automatically adjust cluster heads according to the maximum hops parameter and plan a shorter travel route for the mobile sink. Compared with the Shortest Path Tree-based Data-Gathering Algorithm (SPT-DGA), the IIA algorithm has the characteristics of shorter route length, smaller cluster head count and faster convergence rate. PMID:28445434

Finding minimum spanning trees more efficiently for tile-based phase unwrapping

NASA Astrophysics Data System (ADS)

Sawaf, Firas; Tatam, Ralph P.

2006-06-01

The tile-based phase unwrapping method employs an algorithm for finding the minimum spanning tree (MST) in each tile. We first examine the properties of a tile's representation from a graph theory viewpoint, observing that it is possible to make use of a more efficient class of MST algorithms. We then describe a novel linear time algorithm which reduces the size of the MST problem by half at the least, and solves it completely at best. We also show how this algorithm can be applied to a tile using a sliding window technique. Finally, we show how the reduction algorithm can be combined with any other standard MST algorithm to achieve a more efficient hybrid, using Prim's algorithm for empirical comparison and noting that the reduction algorithm takes only 0.1% of the time taken by the overall hybrid.
Association between split selection instability and predictive error in survival trees.

PubMed

Radespiel-Tröger, M; Gefeller, O; Rabenstein, T; Hothorn, T

2006-01-01

To evaluate split selection instability in six survival tree algorithms and its relationship with predictive error by means of a bootstrap study. We study the following algorithms: logrank statistic with multivariate p-value adjustment without pruning (LR), Kaplan-Meier distance of survival curves (KM), martingale residuals (MR), Poisson regression for censored data (PR), within-node impurity (WI), and exponential log-likelihood loss (XL). With the exception of LR, initial trees are pruned by using split-complexity, and final trees are selected by means of cross-validation. We employ a real dataset from a clinical study of patients with gallbladder stones. The predictive error is evaluated using the integrated Brier score for censored data. The relationship between split selection instability and predictive error is evaluated by means of box-percentile plots, covariate and cutpoint selection entropy, and cutpoint selection coefficients of variation, respectively, in the root node. We found a positive association between covariate selection instability and predictive error in the root node. LR yields the lowest predictive error, while KM and MR yield the highest predictive error. The predictive error of survival trees is related to split selection instability. Based on the low predictive error of LR, we recommend the use of this algorithm for the construction of survival trees. Unpruned survival trees with multivariate p-value adjustment can perform equally well compared to pruned trees. The analysis of split selection instability can be used to communicate the results of tree-based analyses to clinicians and to support the application of survival trees.
Decision Tree based Prediction and Rule Induction for Groundwater Trichloroethene (TCE) Pollution Vulnerability

NASA Astrophysics Data System (ADS)

Park, J.; Yoo, K.

2013-12-01

For groundwater resource conservation, it is important to accurately assess groundwater pollution sensitivity or vulnerability. In this work, we attempted to use data mining approach to assess groundwater pollution vulnerability in a TCE (trichloroethylene) contaminated Korean industrial site. The conventional DRASTIC method failed to describe TCE sensitivity data with a poor correlation with hydrogeological properties. Among the different data mining methods such as Artificial Neural Network (ANN), Multiple Logistic Regression (MLR), Case Base Reasoning (CBR), and Decision Tree (DT), the accuracy and consistency of Decision Tree (DT) was the best. According to the following tree analyses with the optimal DT model, the failure of the conventional DRASTIC method in fitting with TCE sensitivity data may be due to the use of inaccurate weight values of hydrogeological parameters for the study site. These findings provide a proof of concept that DT based data mining approach can be used in predicting and rule induction of groundwater TCE sensitivity without pre-existing information on weights of hydrogeological properties.
Extensions and applications of ensemble-of-trees methods in machine learning

NASA Astrophysics Data System (ADS)

Bleich, Justin

Ensemble-of-trees algorithms have emerged to the forefront of machine learning due to their ability to generate high forecasting accuracy for a wide array of regression and classification problems. Classic ensemble methodologies such as random forests (RF) and stochastic gradient boosting (SGB) rely on algorithmic procedures to generate fits to data. In contrast, more recent ensemble techniques such as Bayesian Additive Regression Trees (BART) and Dynamic Trees (DT) focus on an underlying Bayesian probability model to generate the fits. These new probability model-based approaches show much promise versus their algorithmic counterparts, but also offer substantial room for improvement. The first part of this thesis focuses on methodological advances for ensemble-of-trees techniques with an emphasis on the more recent Bayesian approaches. In particular, we focus on extensions of BART in four distinct ways. First, we develop a more robust implementation of BART for both research and application. We then develop a principled approach to variable selection for BART as well as the ability to naturally incorporate prior information on important covariates into the algorithm. Next, we propose a method for handling missing data that relies on the recursive structure of decision trees and does not require imputation. Last, we relax the assumption of homoskedasticity in the BART model to allow for parametric modeling of heteroskedasticity. The second part of this thesis returns to the classic algorithmic approaches in the context of classification problems with asymmetric costs of forecasting errors. First we consider the performance of RF and SGB more broadly and demonstrate its superiority to logistic regression for applications in criminology with asymmetric costs. Next, we use RF to forecast unplanned hospital readmissions upon patient discharge with asymmetric costs taken into account. Finally, we explore the construction of stable decision trees for forecasts of violence during probation hearings in court systems.
ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography.

PubMed

Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

2016-07-07

Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.
ATLAAS: an automatic decision tree-based learning algorithm for advanced image segmentation in positron emission tomography

NASA Astrophysics Data System (ADS)

Berthon, Beatrice; Marshall, Christopher; Evans, Mererid; Spezi, Emiliano

2016-07-01

Accurate and reliable tumour delineation on positron emission tomography (PET) is crucial for radiotherapy treatment planning. PET automatic segmentation (PET-AS) eliminates intra- and interobserver variability, but there is currently no consensus on the optimal method to use, as different algorithms appear to perform better for different types of tumours. This work aimed to develop a predictive segmentation model, trained to automatically select and apply the best PET-AS method, according to the tumour characteristics. ATLAAS, the automatic decision tree-based learning algorithm for advanced segmentation is based on supervised machine learning using decision trees. The model includes nine PET-AS methods and was trained on a 100 PET scans with known true contour. A decision tree was built for each PET-AS algorithm to predict its accuracy, quantified using the Dice similarity coefficient (DSC), according to the tumour volume, tumour peak to background SUV ratio and a regional texture metric. The performance of ATLAAS was evaluated for 85 PET scans obtained from fillable and printed subresolution sandwich phantoms. ATLAAS showed excellent accuracy across a wide range of phantom data and predicted the best or near-best segmentation algorithm in 93% of cases. ATLAAS outperformed all single PET-AS methods on fillable phantom data with a DSC of 0.881, while the DSC for H&N phantom data was 0.819. DSCs higher than 0.650 were achieved in all cases. ATLAAS is an advanced automatic image segmentation algorithm based on decision tree predictive modelling, which can be trained on images with known true contour, to predict the best PET-AS method when the true contour is unknown. ATLAAS provides robust and accurate image segmentation with potential applications to radiation oncology.
The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

NASA Astrophysics Data System (ADS)

Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

2017-05-01

Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.
Automatic Classification of Trees from Laser Scanning Point Clouds

NASA Astrophysics Data System (ADS)

Sirmacek, B.; Lindenbergh, R.

2015-08-01

Development of laser scanning technologies has promoted tree monitoring studies to a new level, as the laser scanning point clouds enable accurate 3D measurements in a fast and environmental friendly manner. In this paper, we introduce a probability matrix computation based algorithm for automatically classifying laser scanning point clouds into 'tree' and 'non-tree' classes. Our method uses the 3D coordinates of the laser scanning points as input and generates a new point cloud which holds a label for each point indicating if it belongs to the 'tree' or 'non-tree' class. To do so, a grid surface is assigned to the lowest height level of the point cloud. The grids are filled with probability values which are calculated by checking the point density above the grid. Since the tree trunk locations appear with very high values in the probability matrix, selecting the local maxima of the grid surface help to detect the tree trunks. Further points are assigned to tree trunks if they appear in the close proximity of trunks. Since heavy mathematical computations (such as point cloud organization, detailed shape 3D detection methods, graph network generation) are not required, the proposed algorithm works very fast compared to the existing methods. The tree classification results are found reliable even on point clouds of cities containing many different objects. As the most significant weakness, false detection of light poles, traffic signs and other objects close to trees cannot be prevented. Nevertheless, the experimental results on mobile and airborne laser scanning point clouds indicate the possible usage of the algorithm as an important step for tree growth observation, tree counting and similar applications. While the laser scanning point cloud is giving opportunity to classify even very small trees, accuracy of the results is reduced in the low point density areas further away than the scanning location. These advantages and disadvantages of two laser scanning point cloud sources are discussed in detail.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Behroozi, Peter S.; Wechsler, Risa H.; Wu, Hao-Yi

We present a new algorithm for generating merger trees and halo catalogs which explicitly ensures consistency of halo properties (mass, position, and velocity) across time steps. Our algorithm has demonstrated the ability to improve both the completeness (through detecting and inserting otherwise missing halos) and purity (through detecting and removing spurious objects) of both merger trees and halo catalogs. In addition, our method is able to robustly measure the self-consistency of halo finders; it is the first to directly measure the uncertainties in halo positions, halo velocities, and the halo mass function for a given halo finder based on consistencymore » between snapshots in cosmological simulations. We use this algorithm to generate merger trees for two large simulations (Bolshoi and Consuelo) and evaluate two halo finders (ROCKSTAR and BDM). We find that both the ROCKSTAR and BDM halo finders track halos extremely well; in both, the number of halos which do not have physically consistent progenitors is at the 1%-2% level across all halo masses. Our code is publicly available at http://code.google.com/p/consistent-trees. Our trees and catalogs are publicly available at http://hipacc.ucsc.edu/Bolshoi/.« less
Effects of plot size on forest-type algorithm accuracy

Treesearch

James A. Westfall

2009-01-01

The Forest Inventory and Analysis (FIA) program utilizes an algorithm to consistently determine the forest type for forested conditions on sample plots. Forest type is determined from tree size and species information. Thus, the accuracy of results is often dependent on the number of trees present, which is highly correlated with plot area. This research examines the...
Can forest trees compensate for stress-generated growth losses by induced production of volatile compounds?

PubMed

Holopainen, Jarmo K

2011-12-01

Plants produce a variety of volatile organic compounds (VOCs). Under abiotic and biotic stresses, the number and amount of produced compounds can increase. Due to their long life span and large size, trees can produce biogenic VOCs (BVOCs) in much higher amounts than many other plants. It has been suggested that at cellular and tree physiological levels, induced production of VOCs is aimed at improving plant resistance to damage by reactive oxygen species generated by multiple abiotic stresses. In the few reported cases when biosynthesis of plant volatiles is inhibited or enhanced, the observed response to stress can be attributed to plant volatiles. Reported increase, e.g., in photosynthesis has mostly ranged between 5 and 50%. A comprehensive model to explain similar induction of VOCs under multiple biotic stresses is not yet available. As a result of pathogen or herbivore attack on forest trees, the induced production of VOCs is localized to the damage site but systemic induction of emissions has also been detected. These volatiles can affect fungal pathogens and the arrival rate of herbivorous insects on damaged trees, but also act as signalling compounds to maintain the trophic cascades that may improve tree fitness by improved efficiency of herbivore natural enemies. On the forest scale, biotic induction of VOC synthesis and release leads to an amplified flow of BVOCs in atmospheric reactions, which in atmospheres rich in oxides of nitrogen (NOx) results in ozone formation, and in low NOx atmospheres results in oxidation of VOCs, removal in ozone from the troposphere and the resulting formation of biogenic secondary organic aerosol (SOA) particles. I will summarize recent advances in the understanding of stress-induced VOC emissions from trees, with special focus on Populus spp. Particular importance is given to the ecological and atmospheric feedback systems based on BVOCs and biogenic SOA formation.
Fast Dating Using Least-Squares Criteria and Algorithms.

PubMed

To, Thu-Hien; Jung, Matthieu; Lycett, Samantha; Gascuel, Olivier

2016-01-01

Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley-Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley-Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Fast Dating Using Least-Squares Criteria and Algorithms

PubMed Central

To, Thu-Hien; Jung, Matthieu; Lycett, Samantha; Gascuel, Olivier

2016-01-01

Phylogenies provide a useful way to understand the evolutionary history of genetic samples, and data sets with more than a thousand taxa are becoming increasingly common, notably with viruses (e.g., human immunodeficiency virus (HIV)). Dating ancestral events is one of the first, essential goals with such data. However, current sophisticated probabilistic approaches struggle to handle data sets of this size. Here, we present very fast dating algorithms, based on a Gaussian model closely related to the Langley–Fitch molecular-clock model. We show that this model is robust to uncorrelated violations of the molecular clock. Our algorithms apply to serial data, where the tips of the tree have been sampled through times. They estimate the substitution rate and the dates of all ancestral nodes. When the input tree is unrooted, they can provide an estimate for the root position, thus representing a new, practical alternative to the standard rooting methods (e.g., midpoint). Our algorithms exploit the tree (recursive) structure of the problem at hand, and the close relationships between least-squares and linear algebra. We distinguish between an unconstrained setting and the case where the temporal precedence constraint (i.e., an ancestral node must be older that its daughter nodes) is accounted for. With rooted trees, the former is solved using linear algebra in linear computing time (i.e., proportional to the number of taxa), while the resolution of the latter, constrained setting, is based on an active-set method that runs in nearly linear time. With unrooted trees the computing time becomes (nearly) quadratic (i.e., proportional to the square of the number of taxa). In all cases, very large input trees (>10,000 taxa) can easily be processed and transformed into time-scaled trees. We compare these algorithms to standard methods (root-to-tip, r8s version of Langley–Fitch method, and BEAST). Using simulated data, we show that their estimation accuracy is similar to that of the most sophisticated methods, while their computing time is much faster. We apply these algorithms on a large data set comprising 1194 strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the results show that these algorithms provide a very fast alternative with results similar to those of other computer programs. These algorithms are implemented in the LSD software (least-squares dating), which can be downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our data sets and detailed results. An Online Appendix, providing additional algorithm descriptions, tables, and figures can be found in the Supplementary Material available on Dryad at http://dx.doi.org/10.5061/dryad.968t3. PMID:26424727
Labeled trees and the efficient computation of derivations

NASA Technical Reports Server (NTRS)

Grossman, Robert; Larson, Richard G.

1989-01-01

The effective parallel symbolic computation of operators under composition is discussed. Examples include differential operators under composition and vector fields under the Lie bracket. Data structures consisting of formal linear combinations of rooted labeled trees are discussed. A multiplication on rooted labeled trees is defined, thereby making the set of these data structures into an associative algebra. An algebra homomorphism is defined from the original algebra of operators into this algebra of trees. An algebra homomorphism from the algebra of trees into the algebra of differential operators is then described. The cancellation which occurs when noncommuting operators are expressed in terms of commuting ones occurs naturally when the operators are represented using this data structure. This leads to an algorithm which, for operators which are derivations, speeds up the computation exponentially in the degree of the operator. It is shown that the algebra of trees leads naturally to a parallel version of the algorithm.
Quantitative Comparison of Minimum Inductance and Minimum Power Algorithms for the Design of Shim Coils for Small Animal Imaging

PubMed Central

HUDSON, PARISA; HUDSON, STEPHEN D.; HANDLER, WILLIAM B.; SCHOLL, TIMOTHY J.; CHRONIK, BLAINE A.

2010-01-01

High-performance shim coils are required for high-field magnetic resonance imaging and spectroscopy. Complete sets of high-power and high-performance shim coils were designed using two different methods: the minimum inductance and the minimum power target field methods. A quantitative comparison of shim performance in terms of merit of inductance (ML) and merit of resistance (MR) was made for shim coils designed using the minimum inductance and the minimum power design algorithms. In each design case, the difference in ML and the difference in MR given by the two design methods was <15%. Comparison of wire patterns obtained using the two design algorithms show that minimum inductance designs tend to feature oscillations within the current density; while minimum power designs tend to feature less rapidly varying current densities and lower power dissipation. Overall, the differences in coil performance obtained by the two methods are relatively small. For the specific case of shim systems customized for small animal imaging, the reduced power dissipation obtained when using the minimum power method is judged to be more significant than the improvements in switching speed obtained from the minimum inductance method. PMID:20411157
Computational path planner for product assembly in complex environments

NASA Astrophysics Data System (ADS)

Shang, Wei; Liu, Jianhua; Ning, Ruxin; Liu, Mi

2013-03-01

Assembly path planning is a crucial problem in assembly related design and manufacturing processes. Sampling based motion planning algorithms are used for computational assembly path planning. However, the performance of such algorithms may degrade much in environments with complex product structure, narrow passages or other challenging scenarios. A computational path planner for automatic assembly path planning in complex 3D environments is presented. The global planning process is divided into three phases based on the environment and specific algorithms are proposed and utilized in each phase to solve the challenging issues. A novel ray test based stochastic collision detection method is proposed to evaluate the intersection between two polyhedral objects. This method avoids fake collisions in conventional methods and degrades the geometric constraint when a part has to be removed with surface contact with other parts. A refined history based rapidly-exploring random tree (RRT) algorithm which bias the growth of the tree based on its planning history is proposed and employed in the planning phase where the path is simple but the space is highly constrained. A novel adaptive RRT algorithm is developed for the path planning problem with challenging scenarios and uncertain environment. With extending values assigned on each tree node and extending schemes applied, the tree can adapts its growth to explore complex environments more efficiently. Experiments on the key algorithms are carried out and comparisons are made between the conventional path planning algorithms and the presented ones. The comparing results show that based on the proposed algorithms, the path planner can compute assembly path in challenging complex environments more efficiently and with higher success. This research provides the references to the study of computational assembly path planning under complex environments.
Control of fire blight (Erwinia amylovora) on apple trees with trunk-injected plant resistance inducers and antibiotics and assessment of induction of pathogenesis-related protein genes

PubMed Central

Aćimović, Srđan G.; Zeng, Quan; McGhee, Gayle C.; Sundin, George W.; Wise, John C.

2015-01-01

Management of fire blight is complicated by limitations on use of antibiotics in agriculture, antibiotic resistance development, and limited efficacy of alternative control agents. Even though successful in control, preventive antibiotic sprays also affect non-target bacteria, aiding the selection for resistance which could ultimately be transferred to the pathogen Erwinia amylovora. Trunk injection is a target-precise pesticide delivery method that utilizes tree xylem to distribute injected compounds. Trunk injection could decrease antibiotic usage in the open environment and increase the effectiveness of compounds in fire blight control. In field experiments, after 1–2 apple tree injections of either streptomycin, potassium phosphites (PH), or acibenzolar-S-methyl (ASM), significant reduction of blossom and shoot blight symptoms was observed compared to water injected control trees. Overall disease suppression with streptomycin was lower than typically observed following spray applications to flowers. Trunk injection of oxytetracycline resulted in excellent control of shoot blight severity, suggesting that injection is a superior delivery method for this antibiotic. Injection of both ASM and PH resulted in the significant induction of PR-1, PR-2, and PR-8 protein genes in apple leaves indicating induction of systemic acquired resistance (SAR) under field conditions. The time separating SAR induction and fire blight symptom suppression indicated that various defensive compounds within the SAR response were synthesized and accumulated in the canopy. ASM and PH suppressed fire blight even after cessation of induced gene expression. With the development of injectable formulations and optimization of doses and injection schedules, the injection of protective compounds could serve as an effective option for fire blight control. PMID:25717330
High autumn temperature delays spring bud burst in boreal trees, counterbalancing the effect of climatic warming.

PubMed

Heide, O M

2003-09-01

The effect of temperature during short-day (SD) dormancy induction was examined in three boreal tree species in a controlled environment. Saplings of Betula pendula Roth, B. pubescens Ehrh. and Alnus glutinosa (L.) Moench. were exposed to 5 weeks of 10-h SD induction at 9, 15 and 21 degrees C followed by chilling at 5 degrees C for 40, 70, 100 and 130 days and subsequent forcing at 15 degrees C in a 24-h photoperiod for 60 days. In all species and with all chilling periods, high temperature during SD dormancy induction significantly delayed bud burst during subsequent flushing at 15 degrees C. In A. glutinosa, high temperature during SD dormancy induction also significantly increased the chilling requirement for dormancy release. Field experiments at 60 degrees N with a range of latitudinal birch populations revealed a highly significant correlation between autumn temperature and days to bud burst in the subsequent spring. September temperature alone explained 20% of the variation between years in time of bud burst. In birch populations from 69 and 71 degrees N, which ceased growing and shed their leaves in August when the mean temperature was 15 degrees C, bud burst occurred later than expected compared with lower latitude populations (56 degrees N) in which dormancy induction took place more than 2 months later at a mean temperature of about 6 degrees C. It is concluded that this autumn temperature response may be important for counterbalancing the potentially adverse effects of higher winter temperatures on dormancy stability of boreal trees during climate warming.
Analysis of data mining classification by comparison of C4.5 and ID algorithms

NASA Astrophysics Data System (ADS)

Sudrajat, R.; Irianingsih, I.; Krisnawan, D.

2017-01-01

The rapid development of information technology, triggered by the intensive use of information technology. For example, data mining widely used in investment. Many techniques that can be used assisting in investment, the method that used for classification is decision tree. Decision tree has a variety of algorithms, such as C4.5 and ID3. Both algorithms can generate different models for similar data sets and different accuracy. C4.5 and ID3 algorithms with discrete data provide accuracy are 87.16% and 99.83% and C4.5 algorithm with numerical data is 89.69%. C4.5 and ID3 algorithms with discrete data provides 520 and 598 customers and C4.5 algorithm with numerical data is 546 customers. From the analysis of the both algorithm it can classified quite well because error rate less than 15%.
Multi-test decision tree and its application to microarray data classification.

PubMed

Czajkowski, Marcin; Grześ, Marek; Kretowski, Marek

2014-05-01

The desirable property of tools used to investigate biological data is easy to understand models and predictive decisions. Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity. We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions. Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on 14 datasets by an average 6%. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model are supported by biological evidence in the literature. This paper introduces a new type of decision tree which is more suitable for solving biological problems. MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts. Copyright © 2014 Elsevier B.V. All rights reserved.

Knowledge, expectations, and inductive reasoning within conceptual hierarchies.

PubMed

Coley, John D; Hayes, Brett; Lawson, Christopher; Moloney, Michelle

2004-01-01

Previous research (e.g. Cognition 64 (1997) 73) suggests that the privileged level for inductive inference in a folk biological conceptual hierarchy does not correspond to the "basic" level (i.e. the level at which concepts are both informative and distinct). To further explore inductive inference within conceptual hierarchies, we examine relations between knowledge of concepts at different hierarchical levels, expectations about conceptual coherence, and inductive inference. In Experiments 1 and 2, 5- and 8-year-olds and adults listed features of living kind (Experiments 1 and 2) and artifact (Experiment 2) concepts at different hierarchical levels (e.g. plant, tree, oak, desert oak), and also rated the strength of generalizations to the same concepts. For living kinds, the level that showed a relative advantage on these two tasks differed; the greatest increase in features listed tended to occur at the life-form level (e.g. tree), whereas the greatest increase in inductive strength tended to occur at the folk-generic level (e.g. oak). Knowledge and induction also showed different developmental trajectories. For artifact concepts, the levels at which the greatest gains in knowledge and induction occurred were more varied, and corresponded more closely across tasks. In Experiment 3, adults reported beliefs about within-category similarity for concepts at different levels of animal, plant and artifact hierarchies, and rated inductive strength as before. For living kind concepts, expectations about category coherence predicted patterns of inductions; knowledge did not. For artifact concepts, both knowledge and expectations predicted patterns of induction. Results suggest that beliefs about conceptual coherence play an important role in guiding inductive inference, that this role may be largely independent of specific knowledge of concepts, and that such beliefs are especially important in reasoning about living kinds.
Accuracy Assessment of Crown Delineation Methods for the Individual Trees Using LIDAR Data

NASA Astrophysics Data System (ADS)

Chang, K. T.; Lin, C.; Lin, Y. C.; Liu, J. K.

2016-06-01

Forest canopy density and height are used as variables in a number of environmental applications, including the estimation of biomass, forest extent and condition, and biodiversity. The airborne Light Detection and Ranging (LiDAR) is very useful to estimate forest canopy parameters according to the generated canopy height models (CHMs). The purpose of this work is to introduce an algorithm to delineate crown parameters, e.g. tree height and crown radii based on the generated rasterized CHMs. And accuracy assessment for the extraction of volumetric parameters of a single tree is also performed via manual measurement using corresponding aerial photo pairs. A LiDAR dataset of a golf course acquired by Leica ALS70-HP is used in this study. Two algorithms, i.e. a traditional one with the subtraction of a digital elevation model (DEM) from a digital surface model (DSM), and a pit-free approach are conducted to generate the CHMs firstly. Then two algorithms, a multilevel morphological active-contour (MMAC) and a variable window filter (VWF), are implemented and used in this study for individual tree delineation. Finally, experimental results of two automatic estimation methods for individual trees can be evaluated with manually measured stand-level parameters, i.e. tree height and crown diameter. The resulting CHM generated by a simple subtraction is full of empty pixels (called "pits") that will give vital impact on subsequent analysis for individual tree delineation. The experimental results indicated that if more individual trees can be extracted, tree crown shape will became more completely in the CHM data after the pit-free process.
Aneurysmal subarachnoid hemorrhage prognostic decision-making algorithm using classification and regression tree analysis.

PubMed

Lo, Benjamin W Y; Fukuda, Hitoshi; Angle, Mark; Teitelbaum, Jeanne; Macdonald, R Loch; Farrokhyar, Forough; Thabane, Lehana; Levine, Mitchell A H

2016-01-01

Classification and regression tree analysis involves the creation of a decision tree by recursive partitioning of a dataset into more homogeneous subgroups. Thus far, there is scarce literature on using this technique to create clinical prediction tools for aneurysmal subarachnoid hemorrhage (SAH). The classification and regression tree analysis technique was applied to the multicenter Tirilazad database (3551 patients) in order to create the decision-making algorithm. In order to elucidate prognostic subgroups in aneurysmal SAH, neurologic, systemic, and demographic factors were taken into account. The dependent variable used for analysis was the dichotomized Glasgow Outcome Score at 3 months. Classification and regression tree analysis revealed seven prognostic subgroups. Neurological grade, occurrence of post-admission stroke, occurrence of post-admission fever, and age represented the explanatory nodes of this decision tree. Split sample validation revealed classification accuracy of 79% for the training dataset and 77% for the testing dataset. In addition, the occurrence of fever at 1-week post-aneurysmal SAH is associated with increased odds of post-admission stroke (odds ratio: 1.83, 95% confidence interval: 1.56-2.45, P < 0.01). A clinically useful classification tree was generated, which serves as a prediction tool to guide bedside prognostication and clinical treatment decision making. This prognostic decision-making algorithm also shed light on the complex interactions between a number of risk factors in determining outcome after aneurysmal SAH.
Machine Learning

NASA Astrophysics Data System (ADS)

Hoffmann, Achim; Mahidadia, Ashesh

The purpose of this chapter is to present fundamental ideas and techniques of machine learning suitable for the field of this book, i.e., for automated scientific discovery. The chapter focuses on those symbolic machine learning methods, which produce results that are suitable to be interpreted and understood by humans. This is particularly important in the context of automated scientific discovery as the scientific theories to be produced by machines are usually meant to be interpreted by humans. This chapter contains some of the most influential ideas and concepts in machine learning research to give the reader a basic insight into the field. After the introduction in Sect. 1, general ideas of how learning problems can be framed are given in Sect. 2. The section provides useful perspectives to better understand what learning algorithms actually do. Section 3 presents the Version space model which is an early learning algorithm as well as a conceptual framework, that provides important insight into the general mechanisms behind most learning algorithms. In section 4, a family of learning algorithms, the AQ family for learning classification rules is presented. The AQ family belongs to the early approaches in machine learning. The next, Sect. 5 presents the basic principles of decision tree learners. Decision tree learners belong to the most influential class of inductive learning algorithms today. Finally, a more recent group of learning systems are presented in Sect. 6, which learn relational concepts within the framework of logic programming. This is a particularly interesting group of learning systems since the framework allows also to incorporate background knowledge which may assist in generalisation. Section 7 discusses Association Rules - a technique that comes from the related field of Data mining. Section 8 presents the basic idea of the Naive Bayesian Classifier. While this is a very popular learning technique, the learning result is not well suited for human comprehension as it is essentially a large collection of probability values. In Sect. 9, we present a generic method for improving accuracy of a given learner by generatingmultiple classifiers using variations of the training data. While this works well in most cases, the resulting classifiers have significantly increased complexity and, hence, tend to destroy the human readability of the learning result that a single learner may produce. Section 10 contains a summary, mentions briefly other techniques not discussed in this chapter and presents outlook on the potential of machine learning in the future.
Individual tree detection in intact forest and degraded forest areas in the north region of Mato Grosso State, Brazilian Amazon

NASA Astrophysics Data System (ADS)

Santos, E. G.; Jorge, A.; Shimabukuro, Y. E.; Gasparini, K.

2017-12-01

The State of Mato Grosso - MT has the second largest area with degraded forest among the states of the Brazilian Legal Amazon. Land use and land cover change processes that occur in this region cause the loss of forest biomass, releasing greenhouse gases that contribute to the increase of temperature on earth. These degraded forest areas lose biomass according to the intensity and magnitude of the degradation type. The estimate of forest biomass, commonly performed by forest inventory through sample plots, shows high variance in degraded forest areas. Due to this variance and complexity of tropical forests, the aim of this work was to estimate forest biomass using LiDAR point clouds in three distinct forest areas: one degraded by fire, another by selective logging and one area of intact forest. The approach applied in these areas was the Individual Tree Detection (ITD). To isolate the trees, we generated Canopy Height Models (CHM) images, which are obtained by subtracting the Digital Elevation Model (MDE) and the Digital Terrain Model (MDT), created by the cloud of LiDAR points. The trees in the CHM images are isolated by an algorithm provided by the Quantitative Ecology research group at the School of Forestry at Northern Arizona University (SILVA, 2015). With these points, metrics were calculated for some areas, which were used in the model of biomass estimation. The methodology used in this work was expected to reduce the error in biomass estimate in the study area. The cloud points of the most representative trees were analyzed, and thus field data was correlated with the individual trees found by the proposed algorithm. In a pilot study, the proposed methodology was applied generating the individual tree metrics: total height and area of the crown. When correlating 339 isolated trees, an unsatisfactory R² was obtained, as heights found by the algorithm were lower than those obtained in the field, with an average difference of 2.43 m. This shows that the algorithm used to isolate trees in temperate areas did not obtained satisfactory results in the tropical forest of Mato Grosso State. Due to this, in future works two algorithms, one developed by Dalponte et al. (2015) and another by Li et al. (2012) will be used.
The K tree score: quantification of differences in the relative branch length and topology of phylogenetic trees.

PubMed

Soria-Carrasco, Víctor; Talavera, Gerard; Igea, Javier; Castresana, Jose

2007-11-01

We introduce a new phylogenetic comparison method that measures overall differences in the relative branch length and topology of two phylogenetic trees. To do this, the algorithm first scales one of the trees to have a global divergence as similar as possible to the other tree. Then, the branch length distance, which takes differences in topology and branch lengths into account, is applied to the two trees. We thus obtain the minimum branch length distance or K tree score. Two trees with very different relative branch lengths get a high K score whereas two trees that follow a similar among-lineage rate variation get a low score, regardless of the overall rates in both trees. There are several applications of the K tree score, two of which are explained here in more detail. First, this score allows the evaluation of the performance of phylogenetic algorithms, not only with respect to their topological accuracy, but also with respect to the reproduction of a given branch length variation. In a second example, we show how the K score allows the selection of orthologous genes by choosing those that better follow the overall shape of a given reference tree. http://molevol.ibmb.csic.es/Ktreedist.html
Changes in Tree Quality in Response to Defoliation

Treesearch

Jack C. Schultz; Ian T. Baldwin

1983-01-01

Plant chemistry alone fails to explain why most trees escape defoliation most of the time. Chemical variation in space and time, acting to enhance the effectiveness of natural enemies, may be the key. Changes and increasing variation in direct response to insect attack ("induction") may be particularly important for irruptive pests.
Triplet supertree heuristics for the tree of life

PubMed Central

Lin, Harris T; Burleigh, J Gordon; Eulenstein, Oliver

2009-01-01

Background There is much interest in developing fast and accurate supertree methods to infer the tree of life. Supertree methods combine smaller input trees with overlapping sets of taxa to make a comprehensive phylogenetic tree that contains all of the taxa in the input trees. The intrinsically hard triplet supertree problem takes a collection of input species trees and seeks a species tree (supertree) that maximizes the number of triplet subtrees that it shares with the input trees. However, the utility of this supertree problem has been limited by a lack of efficient and effective heuristics. Results We introduce fast hill-climbing heuristics for the triplet supertree problem that perform a step-wise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. To realize time efficient heuristics we designed the first nontrivial algorithms for two standard search problems, which greatly improve on the time complexity to the best known (naïve) solutions by a factor of n and n2 (the number of taxa in the supertree). These algorithms enable large-scale supertree analyses based on the triplet supertree problem that were previously not possible. We implemented hill-climbing heuristics that are based on our new algorithms, and in analyses of two published supertree data sets, we demonstrate that our new heuristics outperform other standard supertree methods in maximizing the number of triplets shared with the input trees. Conclusion With our new heuristics, the triplet supertree problem is now computationally more tractable for large-scale supertree analyses, and it provides a potentially more accurate alternative to existing supertree methods. PMID:19208181
Simulating Urban Tree Effects on Air, Water, and Heat Pollution Mitigation: iTree-Hydro Model

NASA Astrophysics Data System (ADS)

Yang, Y.; Endreny, T. A.; Nowak, D.

2011-12-01

Urban and suburban development changes land surface thermal, radiative, porous, and roughness properties and pollutant loading rates, with the combined effect leading to increased air, water, and heat pollution (e.g., urban heat islands). In this research we present the USDA Forest Service urban forest ecosystem and hydrology model, iTree Eco and Hydro, used to analyze how tree cover can deliver valuable ecosystem services to mitigate air, water, and heat pollution. Air pollution mitigation is simulated by dry deposition processes based on detected pollutant levels for CO, NO2, SO2, O3 and atmospheric stability and leaf area indices. Water quality mitigation is simulated with event mean concentration loading algorithms for N, P, metals, and TSS, and by green infrastructure pollutant filtering algorithms that consider flow path dispersal areas. Urban cooling considers direct shading and indirect evapotranspiration. Spatially distributed estimates of hourly tree evapotranspiration during the growing season are used to estimate human thermal comfort. Two main factors regulating evapotranspiration are soil moisture and canopy radiation. Spatial variation of soil moisture is represented by a modified urban topographic index and radiation for each tree is modified by considering aspect, slope and shade from surrounding buildings or hills. We compare the urban cooling algorithms used in iTree-Hydro with the urban canopy and land surface physics schemes used in the Weather Research and Forecasting model. We conclude by identifying biophysical feedbacks between tree-modulated air and water quality environmental services and how these may respond to urban heating and cooling. Improvements to this iTree model are intended to assist managers identify valuable tree services for urban living.
A combined NLP-differential evolution algorithm approach for the optimization of looped water distribution systems

NASA Astrophysics Data System (ADS)

Zheng, Feifei; Simpson, Angus R.; Zecchin, Aaron C.

2011-08-01

This paper proposes a novel optimization approach for the least cost design of looped water distribution systems (WDSs). Three distinct steps are involved in the proposed optimization approach. In the first step, the shortest-distance tree within the looped network is identified using the Dijkstra graph theory algorithm, for which an extension is proposed to find the shortest-distance tree for multisource WDSs. In the second step, a nonlinear programming (NLP) solver is employed to optimize the pipe diameters for the shortest-distance tree (chords of the shortest-distance tree are allocated the minimum allowable pipe sizes). Finally, in the third step, the original looped water network is optimized using a differential evolution (DE) algorithm seeded with diameters in the proximity of the continuous pipe sizes obtained in step two. As such, the proposed optimization approach combines the traditional deterministic optimization technique of NLP with the emerging evolutionary algorithm DE via the proposed network decomposition. The proposed methodology has been tested on four looped WDSs with the number of decision variables ranging from 21 to 454. Results obtained show the proposed approach is able to find optimal solutions with significantly less computational effort than other optimization techniques.
Exploiting machine learning algorithms for tree species classification in a semiarid woodland using RapidEye image

NASA Astrophysics Data System (ADS)

Adelabu, Samuel; Mutanga, Onisimo; Adam, Elhadi; Cho, Moses Azong

2013-01-01

Classification of different tree species in semiarid areas can be challenging as a result of the change in leaf structure and orientation due to soil moisture constraints. Tree species mapping is, however, a key parameter for forest management in semiarid environments. In this study, we examined the suitability of 5-band RapidEye satellite data for the classification of five tree species in mopane woodland of Botswana using machine leaning algorithms with limited training samples.We performed classification using random forest (RF) and support vector machines (SVM) based on EnMap box. The overall accuracies for classifying the five tree species was 88.75 and 85% for both SVM and RF, respectively. We also demonstrated that the new red-edge band in the RapidEye sensor has the potential for classifying tree species in semiarid environments when integrated with other standard bands. Similarly, we observed that where there are limited training samples, SVM is preferred over RF. Finally, we demonstrated that the two accuracy measures of quantity and allocation disagreement are simpler and more helpful for the vast majority of remote sensing classification process than the kappa coefficient. Overall, high species classification can be achieved using strategically located RapidEye bands integrated with advanced processing algorithms.
Binary tree eigen solver in finite element analysis

NASA Technical Reports Server (NTRS)

Akl, F. A.; Janetzke, D. C.; Kiraly, L. J.

1993-01-01

This paper presents a transputer-based binary tree eigensolver for the solution of the generalized eigenproblem in linear elastic finite element analysis. The algorithm is based on the method of recursive doubling, which parallel implementation of a number of associative operations on an arbitrary set having N elements is of the order of o(log2N), compared to (N-1) steps if implemented sequentially. The hardware used in the implementation of the binary tree consists of 32 transputers. The algorithm is written in OCCAM which is a high-level language developed with the transputers to address parallel programming constructs and to provide the communications between processors. The algorithm can be replicated to match the size of the binary tree transputer network. Parallel and sequential finite element analysis programs have been developed to solve for the set of the least-order eigenpairs using the modified subspace method. The speed-up obtained for a typical analysis problem indicates close agreement with the theoretical prediction given by the method of recursive doubling.
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE PAGES

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott; ...

2017-08-29

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
A new method of optimal capacitor switching based on minimum spanning tree theory in distribution systems

NASA Astrophysics Data System (ADS)

Li, H. W.; Pan, Z. Y.; Ren, Y. B.; Wang, J.; Gan, Y. L.; Zheng, Z. Z.; Wang, W.

2018-03-01

According to the radial operation characteristics in distribution systems, this paper proposes a new method based on minimum spanning trees method for optimal capacitor switching. Firstly, taking the minimal active power loss as objective function and not considering the capacity constraints of capacitors and source, this paper uses Prim algorithm among minimum spanning trees algorithms to get the power supply ranges of capacitors and source. Then with the capacity constraints of capacitors considered, capacitors are ranked by the method of breadth-first search. In term of the order from high to low of capacitor ranking, capacitor compensation capacity based on their power supply range is calculated. Finally, IEEE 69 bus system is adopted to test the accuracy and practicality of the proposed algorithm.
Thread Graphs, Linear Rank-Width and Their Algorithmic Applications

NASA Astrophysics Data System (ADS)

Ganian, Robert

The introduction of tree-width by Robertson and Seymour [7] was a breakthrough in the design of graph algorithms. A lot of research since then has focused on obtaining a width measure which would be more general and still allowed efficient algorithms for a wide range of NP-hard problems on graphs of bounded width. To this end, Oum and Seymour have proposed rank-width, which allows the solution of many such hard problems on a less restricted graph classes (see e.g. [3,4]). But what about problems which are NP-hard even on graphs of bounded tree-width or even on trees? The parameter used most often for these exceptionally hard problems is path-width, however it is extremely restrictive - for example the graphs of path-width 1 are exactly paths.
The Proposal of a Evolutionary Strategy Generating the Data Structures Based on a Horizontal Tree for the Tests

NASA Astrophysics Data System (ADS)

Żukowicz, Marek; Markiewicz, Michał

2016-09-01

The aim of the article is to present a mathematical definition of the object model, that is known in computer science as TreeList and to show application of this model for design evolutionary algorithm, that purpose is to generate structures based on this object. The first chapter introduces the reader to the problem of presenting data using the TreeList object. The second chapter describes the problem of testing data structures based on TreeList. The third one shows a mathematical model of the object TreeList and the parameters, used in determining the utility of structures created through this model and in evolutionary strategy, that generates these structures for testing purposes. The last chapter provides a brief summary and plans for future research related to the algorithm presented in the article.
Constraint Embedding Technique for Multibody System Dynamics

NASA Technical Reports Server (NTRS)

Woo, Simon S.; Cheng, Michael K.

2011-01-01

Multibody dynamics play a critical role in simulation testbeds for space missions. There has been a considerable interest in the development of efficient computational algorithms for solving the dynamics of multibody systems. Mass matrix factorization and inversion techniques and the O(N) class of forward dynamics algorithms developed using a spatial operator algebra stand out as important breakthrough on this front. Techniques such as these provide the efficient algorithms and methods for the application and implementation of such multibody dynamics models. However, these methods are limited only to tree-topology multibody systems. Closed-chain topology systems require different techniques that are not as efficient or as broad as those for tree-topology systems. The closed-chain forward dynamics approach consists of treating the closed-chain topology as a tree-topology system subject to additional closure constraints. The resulting forward dynamics solution consists of: (a) ignoring the closure constraints and using the O(N) algorithm to solve for the free unconstrained accelerations for the system; (b) using the tree-topology solution to compute a correction force to enforce the closure constraints; and (c) correcting the unconstrained accelerations with correction accelerations resulting from the correction forces. This constraint-embedding technique shows how to use direct embedding to eliminate local closure-loops in the system and effectively convert the system back to a tree-topology system. At this point, standard tree-topology techniques can be brought to bear on the problem. The approach uses a spatial operator algebra approach to formulating the equations of motion. The operators are block-partitioned around the local body subgroups to convert them into aggregate bodies. Mass matrix operator factorization and inversion techniques are applied to the reformulated tree-topology system. Thus in essence, the new technique allows conversion of a system with closure-constraints into an equivalent tree-topology system, and thus allows one to take advantage of the host of techniques available to the latter class of systems. This technology is highly suitable for the class of multibody systems where the closure-constraints are local, i.e., where they are confined to small groupings of bodies within the system. Important examples of such local closure-constraints are constraints associated with four-bar linkages, geared motors, differential suspensions, etc. One can eliminate these closure-constraints and convert the system into a tree-topology system by embedding the constraints directly into the system dynamics and effectively replacing the body groupings with virtual aggregate bodies. Once eliminated, one can apply the well-known results and algorithms for tree-topology systems to solve the dynamics of such closed-chain system.
Global interrupt and barrier networks

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E; Heidelberger, Philip; Kopcsay, Gerard V.; Steinmacher-Burow, Burkhard D.; Takken, Todd E.

2008-10-28

A system and method for generating global asynchronous signals in a computing structure. Particularly, a global interrupt and barrier network is implemented that implements logic for generating global interrupt and barrier signals for controlling global asynchronous operations performed by processing elements at selected processing nodes of a computing structure in accordance with a processing algorithm; and includes the physical interconnecting of the processing nodes for communicating the global interrupt and barrier signals to the elements via low-latency paths. The global asynchronous signals respectively initiate interrupt and barrier operations at the processing nodes at times selected for optimizing performance of the processing algorithms. In one embodiment, the global interrupt and barrier network is implemented in a scalable, massively parallel supercomputing device structure comprising a plurality of processing nodes interconnected by multiple independent networks, with each node including one or more processing elements for performing computation or communication activity as required when performing parallel algorithm operations. One multiple independent network includes a global tree network for enabling high-speed global tree communications among global tree network nodes or sub-trees thereof. The global interrupt and barrier network may operate in parallel with the global tree network for providing global asynchronous sideband signals.
Cost-effectiveness Analysis with Influence Diagrams.

PubMed

Arias, M; Díez, F J

2015-01-01

Cost-effectiveness analysis (CEA) is used increasingly in medicine to determine whether the health benefit of an intervention is worth the economic cost. Decision trees, the standard decision modeling technique for non-temporal domains, can only perform CEA for very small problems. To develop a method for CEA in problems involving several dozen variables. We explain how to build influence diagrams (IDs) that explicitly represent cost and effectiveness. We propose an algorithm for evaluating cost-effectiveness IDs directly, i.e., without expanding an equivalent decision tree. The evaluation of an ID returns a set of intervals for the willingness to pay - separated by cost-effectiveness thresholds - and, for each interval, the cost, the effectiveness, and the optimal intervention. The algorithm that evaluates the ID directly is in general much more efficient than the brute-force method, which is in turn more efficient than the expansion of an equivalent decision tree. Using OpenMarkov, an open-source software tool that implements this algorithm, we have been able to perform CEAs on several IDs whose equivalent decision trees contain millions of branches. IDs can perform CEA on large problems that cannot be analyzed with decision trees.

Tanglegrams for rooted phylogenetic trees and networks

PubMed Central

Scornavacca, Celine; Zickmann, Franziska; Huson, Daniel H.

2011-01-01

Motivation: In systematic biology, one is often faced with the task of comparing different phylogenetic trees, in particular in multi-gene analysis or cospeciation studies. One approach is to use a tanglegram in which two rooted phylogenetic trees are drawn opposite each other, using auxiliary lines to connect matching taxa. There is an increasing interest in using rooted phylogenetic networks to represent evolutionary history, so as to explicitly represent reticulate events, such as horizontal gene transfer, hybridization or reassortment. Thus, the question arises how to define and compute a tanglegram for such networks. Results: In this article, we present the first formal definition of a tanglegram for rooted phylogenetic networks and present a heuristic approach for computing one, called the NN-tanglegram method. We compare the performance of our method with existing tree tanglegram algorithms and also show a typical application to real biological datasets. For maximum usability, the algorithm does not require that the trees or networks are bifurcating or bicombining, or that they are on identical taxon sets. Availability: The algorithm is implemented in our program Dendroscope 3, which is freely available from www.dendroscope.org. Contact: scornava@informatik.uni-tuebingen.de; huson@informatik.uni-tuebingen.de PMID:21685078
Consensus properties and their large-scale applications for the gene duplication problem.

PubMed

Moon, Jucheol; Lin, Harris T; Eulenstein, Oliver

2016-06-01

Solving the gene duplication problem is a classical approach for species tree inference from gene trees that are confounded by gene duplications. This problem takes a collection of gene trees and seeks a species tree that implies the minimum number of gene duplications. Wilkinson et al. posed the conjecture that the gene duplication problem satisfies the desirable Pareto property for clusters. That is, for every instance of the problem, all clusters that are commonly present in the input gene trees of this instance, called strict consensus, will also be found in every solution to this instance. We prove that this conjecture does not generally hold. Despite this negative result we show that the gene duplication problem satisfies a weaker version of the Pareto property where the strict consensus is found in at least one solution (rather than all solutions). This weaker property contributes to our design of an efficient scalable algorithm for the gene duplication problem. We demonstrate the performance of our algorithm in analyzing large-scale empirical datasets. Finally, we utilize the algorithm to evaluate the accuracy of standard heuristics for the gene duplication problem using simulated datasets.
Node Deployment Algorithm Based on Connected Tree for Underwater Sensor Networks

PubMed Central

Jiang, Peng; Wang, Xingmin; Jiang, Lurong

2015-01-01

Designing an efficient deployment method to guarantee optimal monitoring quality is one of the key topics in underwater sensor networks. At present, a realistic approach of deployment involves adjusting the depths of nodes in water. One of the typical algorithms used in such process is the self-deployment depth adjustment algorithm (SDDA). This algorithm mainly focuses on maximizing network coverage by constantly adjusting node depths to reduce coverage overlaps between two neighboring nodes, and thus, achieves good performance. However, the connectivity performance of SDDA is irresolute. In this paper, we propose a depth adjustment algorithm based on connected tree (CTDA). In CTDA, the sink node is used as the first root node to start building a connected tree. Finally, the network can be organized as a forest to maintain network connectivity. Coverage overlaps between the parent node and the child node are then reduced within each sub-tree to optimize coverage. The hierarchical strategy is used to adjust the distance between the parent node and the child node to reduce node movement. Furthermore, the silent mode is adopted to reduce communication cost. Simulations show that compared with SDDA, CTDA can achieve high connectivity with various communication ranges and different numbers of nodes. Moreover, it can realize coverage as high as that of SDDA with various sensing ranges and numbers of nodes but with less energy consumption. Simulations under sparse environments show that the connectivity and energy consumption performances of CTDA are considerably better than those of SDDA. Meanwhile, the connectivity and coverage performances of CTDA are close to those depth adjustment algorithms base on connected dominating set (CDA), which is an algorithm similar to CTDA. However, the energy consumption of CTDA is less than that of CDA, particularly in sparse underwater environments. PMID:26184209
Amazon Forest Structure from IKONOS Satellite Data and the Automated Characterization of Forest Canopy Properties

Treesearch

Michael Palace; Michael Keller; Gregory P. Asner; Stephen Hagen; Bobby Braswell

2008-01-01

We developed an automated tree crown analysis algorithm using 1-m panchromatic IKONOS satellite images to examine forest canopy structure in the Brazilian Amazon. The algorithm was calibrated on the landscape level with tree geometry and forest stand data at the Fazenda Cauaxi (3.75◦ S, 48.37◦ W) in the eastern Amazon, and then compared with forest...
Vlsi implementation of flexible architecture for decision tree classification in data mining

NASA Astrophysics Data System (ADS)

Sharma, K. Venkatesh; Shewandagn, Behailu; Bhukya, Shankar Nayak

2017-07-01

The Data mining algorithms have become vital to researchers in science, engineering, medicine, business, search and security domains. In recent years, there has been a terrific raise in the size of the data being collected and analyzed. Classification is the main difficulty faced in data mining. In a number of the solutions developed for this problem, most accepted one is Decision Tree Classification (DTC) that gives high precision while handling very large amount of data. This paper presents VLSI implementation of flexible architecture for Decision Tree classification in data mining using c4.5 algorithm.
Logistic regression trees for initial selection of interesting loci in case-control studies

PubMed Central

Nickolov, Radoslav Z; Milanov, Valentin B

2007-01-01

Modern genetic epidemiology faces the challenge of dealing with hundreds of thousands of genetic markers. The selection of a small initial subset of interesting markers for further investigation can greatly facilitate genetic studies. In this contribution we suggest the use of a logistic regression tree algorithm known as logistic tree with unbiased selection. Using the simulated data provided for Genetic Analysis Workshop 15, we show how this algorithm, with incorporation of multifactor dimensionality reduction method, can reduce an initial large pool of markers to a small set that includes the interesting markers with high probability. PMID:18466557
An IPv6 routing lookup algorithm using weight-balanced tree based on prefix value for virtual router

NASA Astrophysics Data System (ADS)

Chen, Lingjiang; Zhou, Shuguang; Zhang, Qiaoduo; Li, Fenghua

2016-10-01

Virtual router enables the coexistence of different networks on the same physical facility and has lately attracted a great deal of attention from researchers. As the number of IPv6 addresses is rapidly increasing in virtual routers, designing an efficient IPv6 routing lookup algorithm is of great importance. In this paper, we present an IPv6 lookup algorithm called weight-balanced tree (WBT). WBT merges Forwarding Information Bases (FIBs) of virtual routers into one spanning tree, and compresses the space cost. WBT's average time complexity and the worst case time complexity of lookup and update process are both O(logN) and space complexity is O(cN) where N is the size of routing table and c is a constant. Experiments show that WBT helps reduce more than 80% Static Random Access Memory (SRAM) cost in comparison to those separation schemes. WBT also achieves the least average search depth comparing with other homogeneous algorithms.
Multipoint to multipoint routing and wavelength assignment in multi-domain optical networks

NASA Astrophysics Data System (ADS)

Qin, Panke; Wu, Jingru; Li, Xudong; Tang, Yongli

2018-01-01

In multi-point to multi-point (MP2MP) routing and wavelength assignment (RWA) problems, researchers usually assume the optical networks to be a single domain. However, the optical networks develop toward to multi-domain and larger scale in practice. In this context, multi-core shared tree (MST)-based MP2MP RWA are introduced problems including optimal multicast domain sequence selection, core nodes belonging in which domains and so on. In this letter, we focus on MST-based MP2MP RWA problems in multi-domain optical networks, mixed integer linear programming (MILP) formulations to optimally construct MP2MP multicast trees is presented. A heuristic algorithm base on network virtualization and weighted clustering algorithm (NV-WCA) is proposed. Simulation results show that, under different traffic patterns, the proposed algorithm achieves significant improvement on network resources occupation and multicast trees setup latency in contrast with the conventional algorithms which were proposed base on a single domain network environment.
Combinatorics of least-squares trees.

PubMed

Mihaescu, Radu; Pachter, Lior

2008-09-09

A recurring theme in the least-squares approach to phylogenetics has been the discovery of elegant combinatorial formulas for the least-squares estimates of edge lengths. These formulas have proved useful for the development of efficient algorithms, and have also been important for understanding connections among popular phylogeny algorithms. For example, the selection criterion of the neighbor-joining algorithm is now understood in terms of the combinatorial formulas of Pauplin for estimating tree length. We highlight a phylogenetically desirable property that weighted least-squares methods should satisfy, and provide a complete characterization of methods that satisfy the property. The necessary and sufficient condition is a multiplicative four-point condition that the variance matrix needs to satisfy. The proof is based on the observation that the Lagrange multipliers in the proof of the Gauss-Markov theorem are tree-additive. Our results generalize and complete previous work on ordinary least squares, balanced minimum evolution, and the taxon-weighted variance model. They also provide a time-optimal algorithm for computation.
Expression patterns of flowering genes in leaves of 'Pineapple' sweet orange [Citrus sinensis (L.) Osbeck] and pummelo (Citrus grandis Osbeck).

PubMed

Pajon, Melanie; Febres, Vicente J; Moore, Gloria A

2017-08-30

In citrus the transition from juvenility to mature phase is marked by the capability of a tree to flower and fruit consistently. The long period of juvenility in citrus severely impedes the use of genetic based strategies to improve fruit quality, disease resistance, and responses to abiotic environmental factors. One of the genes whose expression signals flower development in many plant species is FLOWERING LOCUS T (FT). In this study, gene expression levels of flowering genes CiFT1, CiFT2 and CiFT3 were determined using reverse-transcription quantitative real-time PCR in citrus trees over a 1 year period in Florida. Distinct genotypes of citrus trees of different ages were used. In mature trees of pummelo (Citrus grandis Osbeck) and 'Pineapple' sweet orange (Citrus sinensis (L.) Osbeck) the expression of all three CiFT genes was coordinated and significantly higher in April, after flowering was over, regardless of whether they were in the greenhouse or in the field. Interestingly, immature 'Pineapple' seedlings showed significantly high levels of CiFT3 expression in April and June, while CiFT1 and CiFT2 were highest in June, and hence their expression induction was not simultaneous as in mature plants. In mature citrus trees the induction of CiFTs expression in leaves occurs at the end of spring and after flowering has taken place suggesting it is not associated with dormancy interruption and further flower bud development but is probably involved with shoot apex differentiation and flower bud determination. CiFTs were also seasonally induced in immature seedlings, indicating that additional factors must be suppressing flowering induction and their expression has other functions.
Generalizing Atoms in Constraint Logic

NASA Technical Reports Server (NTRS)

Page, C. David, Jr.; Frisch, Alan M.

1991-01-01

This paper studies the generalization of atomic formulas, or atoms, that are augmented with constraints on or among their terms. The atoms may also be viewed as definite clauses whose antecedents express the constraints. Atoms are generalized relative to a body of background information about the constraints. This paper first examines generalization of atoms with only monadic constraints. The paper develops an algorithm for the generalization task and discusses algorithm complexity. It then extends the algorithm to apply to atoms with constraints of arbitrary arity. The paper also presents semantic properties of the generalizations computed by the algorithms, making the algorithms applicable to such problems as abduction, induction, and knowledge base verification. The paper emphasizes the application to induction and presents a pac-learning result for constrained atoms.
Determination of Heavy Metals in Almonds and Mistletoe as a Parasite Growing on the Almond Tree Using ICP-OES or ICP-MS.

PubMed

Kamar, Veysi; Dağalp, Rukiye; Taştekin, Mustafa

2017-12-28

In this study, the elements of Al, As, B, Ba, Ca, Cd, Co, Cr, Cu, Fe, K, Mg, Mn, Mo, Ni, Sr, Pb, Ti, and Zn were determined in the leaves, fruits, and branches of mistletoe, (Viscum albüm L.), used as a medicinal plant, and in the leaves, branches and barks of almond tree which mistletoe grows on. The aim of the study is to investigate whether the mistletoe are more absorbent than the almond tree in terms of the heavy metal contents and the determination of the amount of the elements penetrated into the mistletoe from the almond tree. ICP-MS (inductively coupled plasma-mass spectrometry) was used for the analysis of As, Cd, Mo, and Pb, whereas ICP-OES (inductively coupled plasma optical emission spectrometry) was used for the other elements. The results obtained were statistically evaluated at 95% confidence level. Within the results obtained in this study, it was determined whether there is a significant difference between metal elements in almond tree and mistletoe, or not. As a result, it was observed that there were higher contents of B, Ba, K, Mg, and Zn in the mistletoe than in the almond tree. K was found much higher than other elements in the mistletoe. On the other hand, Al, As, Ca, Cd, Cr, Cu, Fe, Mo, Ni, Sr, Pb, and Ti contents were determined to be more in almond tree than mistletoe.
Visual management of large scale data mining projects.

PubMed

Shah, I; Hunter, L

2000-01-01

This paper describes a unified framework for visualizing the preparations for, and results of, hundreds of machine learning experiments. These experiments were designed to improve the accuracy of enzyme functional predictions from sequence, and in many cases were successful. Our system provides graphical user interfaces for defining and exploring training datasets and various representational alternatives, for inspecting the hypotheses induced by various types of learning algorithms, for visualizing the global results, and for inspecting in detail results for specific training sets (functions) and examples (proteins). The visualization tools serve as a navigational aid through a large amount of sequence data and induced knowledge. They provided significant help in understanding both the significance and the underlying biological explanations of our successes and failures. Using these visualizations it was possible to efficiently identify weaknesses of the modular sequence representations and induction algorithms which suggest better learning strategies. The context in which our data mining visualization toolkit was developed was the problem of accurately predicting enzyme function from protein sequence data. Previous work demonstrated that approximately 6% of enzyme protein sequences are likely to be assigned incorrect functions on the basis of sequence similarity alone. In order to test the hypothesis that more detailed sequence analysis using machine learning techniques and modular domain representations could address many of these failures, we designed a series of more than 250 experiments using information-theoretic decision tree induction and naive Bayesian learning on local sequence domain representations of problematic enzyme function classes. In more than half of these cases, our methods were able to perfectly discriminate among various possible functions of similar sequences. We developed and tested our visualization techniques on this application.
Fault detection and diagnosis of induction motors using motor current signature analysis and a hybrid FMM-CART model.

PubMed

Seera, Manjeevan; Lim, Chee Peng; Ishak, Dahaman; Singh, Harapajan

2012-01-01

In this paper, a novel approach to detect and classify comprehensive fault conditions of induction motors using a hybrid fuzzy min-max (FMM) neural network and classification and regression tree (CART) is proposed. The hybrid model, known as FMM-CART, exploits the advantages of both FMM and CART for undertaking data classification and rule extraction problems. A series of real experiments is conducted, whereby the motor current signature analysis method is applied to form a database comprising stator current signatures under different motor conditions. The signal harmonics from the power spectral density are extracted as discriminative input features for fault detection and classification with FMM-CART. A comprehensive list of induction motor fault conditions, viz., broken rotor bars, unbalanced voltages, stator winding faults, and eccentricity problems, has been successfully classified using FMM-CART with good accuracy rates. The results are comparable, if not better, than those reported in the literature. Useful explanatory rules in the form of a decision tree are also elicited from FMM-CART to analyze and understand different fault conditions of induction motors.
Application of decision tree model for the ground subsidence hazard mapping near abandoned underground coal mines.

PubMed

Lee, Saro; Park, Inhye

2013-09-30

Subsidence of ground caused by underground mines poses hazards to human life and property. This study analyzed the hazard to ground subsidence using factors that can affect ground subsidence and a decision tree approach in a geographic information system (GIS). The study area was Taebaek, Gangwon-do, Korea, where many abandoned underground coal mines exist. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 50/50 for training and validation of the models. A data-mining classification technique was applied to the GSH mapping, and decision trees were constructed using the chi-squared automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. The frequency ratio model was also applied to the GSH mapping for comparing with probabilistic model. The resulting GSH maps were validated using area-under-the-curve (AUC) analysis with the subsidence area data that had not been used for training the model. The highest accuracy was achieved by the decision tree model using CHAID algorithm (94.01%) comparing with QUEST algorithms (90.37%) and frequency ratio model (86.70%). These accuracies are higher than previously reported results for decision tree. Decision tree methods can therefore be used efficiently for GSH analysis and might be widely used for prediction of various spatial events. Copyright © 2013. Published by Elsevier Ltd.
Automated Reconstruction of Neural Trees Using Front Re-initialization

PubMed Central

Mukherjee, Amit; Stepanyants, Armen

2013-01-01

This paper proposes a greedy algorithm for automated reconstruction of neural arbors from light microscopy stacks of images. The algorithm is based on the minimum cost path method. While the minimum cost path, obtained using the Fast Marching Method, results in a trace with the least cumulative cost between the start and the end points, it is not sufficient for the reconstruction of neural trees. This is because sections of the minimum cost path can erroneously travel through the image background with undetectable detriment to the cumulative cost. To circumvent this problem we propose an algorithm that grows a neural tree from a specified root by iteratively re-initializing the Fast Marching fronts. The speed image used in the Fast Marching Method is generated by computing the average outward flux of the gradient vector flow field. Each iteration of the algorithm produces a candidate extension by allowing the front to travel a specified distance and then tracking from the farthest point of the front back to the tree. Robust likelihood ratio test is used to evaluate the quality of the candidate extension by comparing voxel intensities along the extension to those in the foreground and the background. The qualified extensions are appended to the current tree, the front is re-initialized, and Fast Marching is continued until the stopping criterion is met. To evaluate the performance of the algorithm we reconstructed 6 stacks of two-photon microscopy images and compared the results to the ground truth reconstructions by using the DIADEM metric. The average comparison score was 0.82 out of 1.0, which is on par with the performance achieved by expert manual tracers. PMID:24386539
Occupancy schedules learning process through a data mining framework

DOE Office of Scientific and Technical Information (OSTI.GOV)

D'Oca, Simona; Hong, Tianzhen

Building occupancy is a paramount factor in building energy simulations. Specifically, lighting, plug loads, HVAC equipment utilization, fresh air requirements and internal heat gain or loss greatly depends on the level of occupancy within a building. Developing the appropriate methodologies to describe and reproduce the intricate network responsible for human-building interactions are needed. Extrapolation of patterns from big data streams is a powerful analysis technique which will allow for a better understanding of energy usage in buildings. A three-step data mining framework is applied to discover occupancy patterns in office spaces. First, a data set of 16 offices with 10more » minute interval occupancy data, over a two year period is mined through a decision tree model which predicts the occupancy presence. Then a rule induction algorithm is used to learn a pruned set of rules on the results from the decision tree model. Finally, a cluster analysis is employed in order to obtain consistent patterns of occupancy schedules. Furthermore, the identified occupancy rules and schedules are representative as four archetypal working profiles that can be used as input to current building energy modeling programs, such as EnergyPlus or IDA-ICE, to investigate impact of occupant presence on design, operation and energy use in office buildings.« less
Occupancy schedules learning process through a data mining framework

DOE PAGES

D'Oca, Simona; Hong, Tianzhen

2014-12-17

Building occupancy is a paramount factor in building energy simulations. Specifically, lighting, plug loads, HVAC equipment utilization, fresh air requirements and internal heat gain or loss greatly depends on the level of occupancy within a building. Developing the appropriate methodologies to describe and reproduce the intricate network responsible for human-building interactions are needed. Extrapolation of patterns from big data streams is a powerful analysis technique which will allow for a better understanding of energy usage in buildings. A three-step data mining framework is applied to discover occupancy patterns in office spaces. First, a data set of 16 offices with 10more » minute interval occupancy data, over a two year period is mined through a decision tree model which predicts the occupancy presence. Then a rule induction algorithm is used to learn a pruned set of rules on the results from the decision tree model. Finally, a cluster analysis is employed in order to obtain consistent patterns of occupancy schedules. Furthermore, the identified occupancy rules and schedules are representative as four archetypal working profiles that can be used as input to current building energy modeling programs, such as EnergyPlus or IDA-ICE, to investigate impact of occupant presence on design, operation and energy use in office buildings.« less
Secure Multicast Tree Structure Generation Method for Directed Diffusion Using A* Algorithms

NASA Astrophysics Data System (ADS)

Kim, Jin Myoung; Lee, Hae Young; Cho, Tae Ho

The application of wireless sensor networks to areas such as combat field surveillance, terrorist tracking, and highway traffic monitoring requires secure communication among the sensor nodes within the networks. Logical key hierarchy (LKH) is a tree based key management model which provides secure group communication. When a sensor node is added or evicted from the communication group, LKH updates the group key in order to ensure the security of the communications. In order to efficiently update the group key in directed diffusion, we propose a method for secure multicast tree structure generation, an extension to LKH that reduces the number of re-keying messages by considering the addition and eviction ratios of the history data. For the generation of the proposed key tree structure the A* algorithm is applied, in which the branching factor at each level can take on different value. The experiment results demonstrate the efficiency of the proposed key tree structure against the existing key tree structures of fixed branching factors.
A junction-tree based learning algorithm to optimize network wide traffic control: A coordinated multi-agent framework

DOE PAGES

Zhu, Feng; Aziz, H. M. Abdul; Qian, Xinwu; ...

2015-01-31

Our study develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm (JTA) based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. Moreover, the algorithm is implemented and tested with a network containing 18 signalized intersections in VISSIM. Finally, our results show that the JTA based algorithm outperforms independent learning (Q-learning), real-time adaptive learning, and fixed timing plansmore » in terms of average delay, number of stops, and vehicular emissions at the network level.« less

Applied Swarm-based medicine: collecting decision trees for patterns of algorithms analysis.

PubMed

Panje, Cédric M; Glatzer, Markus; von Rappard, Joscha; Rothermundt, Christian; Hundsberger, Thomas; Zumstein, Valentin; Plasswilm, Ludwig; Putora, Paul Martin

2017-08-16

The objective consensus methodology has recently been applied in consensus finding in several studies on medical decision-making among clinical experts or guidelines. The main advantages of this method are an automated analysis and comparison of treatment algorithms of the participating centers which can be performed anonymously. Based on the experience from completed consensus analyses, the main steps for the successful implementation of the objective consensus methodology were identified and discussed among the main investigators. The following steps for the successful collection and conversion of decision trees were identified and defined in detail: problem definition, population selection, draft input collection, tree conversion, criteria adaptation, problem re-evaluation, results distribution and refinement, tree finalisation, and analysis. This manuscript provides information on the main steps for successful collection of decision trees and summarizes important aspects at each point of the analysis.
Prediction of Baseflow Index of Catchments using Machine Learning Algorithms

NASA Astrophysics Data System (ADS)

Yadav, B.; Hatfield, K.

2017-12-01

We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Soner Yorgun, M.; Rood, Richard B.

An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

DOE PAGES

Soner Yorgun, M.; Rood, Richard B.

2016-11-11

An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
Towards a hybrid energy efficient multi-tree-based optimized routing protocol for wireless networks.

PubMed

Mitton, Nathalie; Razafindralambo, Tahiry; Simplot-Ryl, David; Stojmenovic, Ivan

2012-12-13

This paper considers the problem of designing power efficient routing with guaranteed delivery for sensor networks with unknown geographic locations. We propose HECTOR, a hybrid energy efficient tree-based optimized routing protocol, based on two sets of virtual coordinates. One set is based on rooted tree coordinates, and the other is based on hop distances toward several landmarks. In HECTOR, the node currently holding the packet forwards it to its neighbor that optimizes ratio of power cost over distance progress with landmark coordinates, among nodes that reduce landmark coordinates and do not increase distance in tree coordinates. If such a node does not exist, then forwarding is made to the neighbor that reduces tree-based distance only and optimizes power cost over tree distance progress ratio. We theoretically prove the packet delivery and propose an extension based on the use of multiple trees. Our simulations show the superiority of our algorithm over existing alternatives while guaranteeing delivery, and only up to 30% additional power compared to centralized shortest weighted path algorithm.
Towards a Hybrid Energy Efficient Multi-Tree-Based Optimized Routing Protocol for Wireless Networks

PubMed Central

Mitton, Nathalie; Razafindralambo, Tahiry; Simplot-Ryl, David; Stojmenovic, Ivan

2012-01-01

This paper considers the problem of designing power efficient routing with guaranteed delivery for sensor networks with unknown geographic locations. We propose HECTOR, a hybrid energy efficient tree-based optimized routing protocol, based on two sets of virtual coordinates. One set is based on rooted tree coordinates, and the other is based on hop distances toward several landmarks. In HECTOR, the node currently holding the packet forwards it to its neighbor that optimizes ratio of power cost over distance progress with landmark coordinates, among nodes that reduce landmark coordinates and do not increase distance in tree coordinates. If such a node does not exist, then forwarding is made to the neighbor that reduces tree-based distance only and optimizes power cost over tree distance progress ratio. We theoretically prove the packet delivery and propose an extension based on the use of multiple trees. Our simulations show the superiority of our algorithm over existing alternatives while guaranteeing delivery, and only up to 30% additional power compared to centralized shortest weighted path algorithm. PMID:23443398
Resolution of the 1D regularized Burgers equation using a spatial wavelet approximation

NASA Technical Reports Server (NTRS)

Liandrat, J.; Tchamitchian, PH.

1990-01-01

The Burgers equation with a small viscosity term, initial and periodic boundary conditions is resolved using a spatial approximation constructed from an orthonormal basis of wavelets. The algorithm is directly derived from the notions of multiresolution analysis and tree algorithms. Before the numerical algorithm is described these notions are first recalled. The method uses extensively the localization properties of the wavelets in the physical and Fourier spaces. Moreover, the authors take advantage of the fact that the involved linear operators have constant coefficients. Finally, the algorithm can be considered as a time marching version of the tree algorithm. The most important point is that an adaptive version of the algorithm exists: it allows one to reduce in a significant way the number of degrees of freedom required for a good computation of the solution. Numerical results and description of the different elements of the algorithm are provided in combination with different mathematical comments on the method and some comparison with more classical numerical algorithms.
A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery Using a Probabilistic Learning Framework

NASA Technical Reports Server (NTRS)

Basu, Saikat; Ganguly, Sangram; Michaelis, Andrew; Votava, Petr; Roy, Anshuman; Mukhopadhyay, Supratik; Nemani, Ramakrishna

2015-01-01

Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets, which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.
A High Performance Computing Approach to Tree Cover Delineation in 1-m NAIP Imagery using a Probabilistic Learning Framework

NASA Astrophysics Data System (ADS)

Basu, S.; Ganguly, S.; Michaelis, A.; Votava, P.; Roy, A.; Mukhopadhyay, S.; Nemani, R. R.

2015-12-01

Tree cover delineation is a useful instrument in deriving Above Ground Biomass (AGB) density estimates from Very High Resolution (VHR) airborne imagery data. Numerous algorithms have been designed to address this problem, but most of them do not scale to these datasets which are of the order of terabytes. In this paper, we present a semi-automated probabilistic framework for the segmentation and classification of 1-m National Agriculture Imagery Program (NAIP) for tree-cover delineation for the whole of Continental United States, using a High Performance Computing Architecture. Classification is performed using a multi-layer Feedforward Backpropagation Neural Network and segmentation is performed using a Statistical Region Merging algorithm. The results from the classification and segmentation algorithms are then consolidated into a structured prediction framework using a discriminative undirected probabilistic graphical model based on Conditional Random Field, which helps in capturing the higher order contextual dependencies between neighboring pixels. Once the final probability maps are generated, the framework is updated and re-trained by relabeling misclassified image patches. This leads to a significant improvement in the true positive rates and reduction in false positive rates. The tree cover maps were generated for the whole state of California, spanning a total of 11,095 NAIP tiles covering a total geographical area of 163,696 sq. miles. The framework produced true positive rates of around 88% for fragmented forests and 74% for urban tree cover areas, with false positive rates lower than 2% for both landscapes. Comparative studies with the National Land Cover Data (NLCD) algorithm and the LiDAR canopy height model (CHM) showed the effectiveness of our framework for generating accurate high-resolution tree-cover maps.
hs-CRP is strongly associated with coronary heart disease (CHD): A data mining approach using decision tree algorithm.

PubMed

Tayefi, Maryam; Tajfard, Mohammad; Saffar, Sara; Hanachi, Parichehr; Amirabadizadeh, Ali Reza; Esmaeily, Habibollah; Taghipour, Ali; Ferns, Gordon A; Moohebati, Mohsen; Ghayour-Mobarhan, Majid

2017-04-01

Coronary heart disease (CHD) is an important public health problem globally. Algorithms incorporating the assessment of clinical biomarkers together with several established traditional risk factors can help clinicians to predict CHD and support clinical decision making with respect to interventions. Decision tree (DT) is a data mining model for extracting hidden knowledge from large databases. We aimed to establish a predictive model for coronary heart disease using a decision tree algorithm. Here we used a dataset of 2346 individuals including 1159 healthy participants and 1187 participant who had undergone coronary angiography (405 participants with negative angiography and 782 participants with positive angiography). We entered 10 variables of a total 12 variables into the DT algorithm (including age, sex, FBG, TG, hs-CRP, TC, HDL, LDL, SBP and DBP). Our model could identify the associated risk factors of CHD with sensitivity, specificity, accuracy of 96%, 87%, 94% and respectively. Serum hs-CRP levels was at top of the tree in our model, following by FBG, gender and age. Our model appears to be an accurate, specific and sensitive model for identifying the presence of CHD, but will require validation in prospective studies. Copyright © 2017 Elsevier B.V. All rights reserved.
A Depth-Adjustment Deployment Algorithm Based on Two-Dimensional Convex Hull and Spanning Tree for Underwater Wireless Sensor Networks.

PubMed

Jiang, Peng; Liu, Shuai; Liu, Jun; Wu, Feng; Zhang, Le

2016-07-14

Most of the existing node depth-adjustment deployment algorithms for underwater wireless sensor networks (UWSNs) just consider how to optimize network coverage and connectivity rate. However, these literatures don't discuss full network connectivity, while optimization of network energy efficiency and network reliability are vital topics for UWSN deployment. Therefore, in this study, a depth-adjustment deployment algorithm based on two-dimensional (2D) convex hull and spanning tree (NDACS) for UWSNs is proposed. First, the proposed algorithm uses the geometric characteristics of a 2D convex hull and empty circle to find the optimal location of a sleep node and activate it, minimizes the network coverage overlaps of the 2D plane, and then increases the coverage rate until the first layer coverage threshold is reached. Second, the sink node acts as a root node of all active nodes on the 2D convex hull and then forms a small spanning tree gradually. Finally, the depth-adjustment strategy based on time marker is used to achieve the three-dimensional overall network deployment. Compared with existing depth-adjustment deployment algorithms, the simulation results show that the NDACS algorithm can maintain full network connectivity with high network coverage rate, as well as improved network average node degree, thus increasing network reliability.
A Depth-Adjustment Deployment Algorithm Based on Two-Dimensional Convex Hull and Spanning Tree for Underwater Wireless Sensor Networks

PubMed Central

Jiang, Peng; Liu, Shuai; Liu, Jun; Wu, Feng; Zhang, Le

2016-01-01

Most of the existing node depth-adjustment deployment algorithms for underwater wireless sensor networks (UWSNs) just consider how to optimize network coverage and connectivity rate. However, these literatures don’t discuss full network connectivity, while optimization of network energy efficiency and network reliability are vital topics for UWSN deployment. Therefore, in this study, a depth-adjustment deployment algorithm based on two-dimensional (2D) convex hull and spanning tree (NDACS) for UWSNs is proposed. First, the proposed algorithm uses the geometric characteristics of a 2D convex hull and empty circle to find the optimal location of a sleep node and activate it, minimizes the network coverage overlaps of the 2D plane, and then increases the coverage rate until the first layer coverage threshold is reached. Second, the sink node acts as a root node of all active nodes on the 2D convex hull and then forms a small spanning tree gradually. Finally, the depth-adjustment strategy based on time marker is used to achieve the three-dimensional overall network deployment. Compared with existing depth-adjustment deployment algorithms, the simulation results show that the NDACS algorithm can maintain full network connectivity with high network coverage rate, as well as improved network average node degree, thus increasing network reliability. PMID:27428970
AntiClustal: Multiple Sequence Alignment by antipole clustering and linear approximate 1-median computation.

PubMed

Di Pietro, C; Di Pietro, V; Emmanuele, G; Ferro, A; Maugeri, T; Modica, E; Pigola, G; Pulvirenti, A; Purrello, M; Ragusa, M; Scalia, M; Shasha, D; Travali, S; Zimmitti, V

2003-01-01

In this paper we present a new Multiple Sequence Alignment (MSA) algorithm called AntiClusAl. The method makes use of the commonly use idea of aligning homologous sequences belonging to classes generated by some clustering algorithm, and then continue the alignment process ina bottom-up way along a suitable tree structure. The final result is then read at the root of the tree. Multiple sequence alignment in each cluster makes use of the progressive alignment with the 1-median (center) of the cluster. The 1-median of set S of sequences is the element of S which minimizes the average distance from any other sequence in S. Its exact computation requires quadratic time. The basic idea of our proposed algorithm is to make use of a simple and natural algorithmic technique based on randomized tournaments which has been successfully applied to large size search problems in general metric spaces. In particular a clustering algorithm called Antipole tree and an approximate linear 1-median computation are used. Our algorithm compared with Clustal W, a widely used tool to MSA, shows a better running time results with fully comparable alignment quality. A successful biological application showing high aminoacid conservation during evolution of Xenopus laevis SOD2 is also cited.
How Hierarchical Topics Evolve in Large Text Corpora.

PubMed

Cui, Weiwei; Liu, Shixia; Wu, Zhuofeng; Wei, Hao

2014-12-01

Using a sequence of topic trees to organize documents is a popular way to represent hierarchical and evolving topics in text corpora. However, following evolving topics in the context of topic trees remains difficult for users. To address this issue, we present an interactive visual text analysis approach to allow users to progressively explore and analyze the complex evolutionary patterns of hierarchical topics. The key idea behind our approach is to exploit a tree cut to approximate each tree and allow users to interactively modify the tree cuts based on their interests. In particular, we propose an incremental evolutionary tree cut algorithm with the goal of balancing 1) the fitness of each tree cut and the smoothness between adjacent tree cuts; 2) the historical and new information related to user interests. A time-based visualization is designed to illustrate the evolving topics over time. To preserve the mental map, we develop a stable layout algorithm. As a result, our approach can quickly guide users to progressively gain profound insights into evolving hierarchical topics. We evaluate the effectiveness of the proposed method on Amazon's Mechanical Turk and real-world news data. The results show that users are able to successfully analyze evolving topics in text data.
Detection of dead standing Eucalyptus camaldulensis without tree delineation for managing biodiversity in native Australian forest

NASA Astrophysics Data System (ADS)

Miltiadou, Milto; Campbell, Neil D. F.; Gonzalez Aracil, Susana; Brown, Tony; Grant, Michael G.

2018-05-01

In Australia, many birds and arboreal animals use hollows for shelters, but studies predict shortage of hollows in near future. Aged dead trees are more likely to contain hollows and therefore automated detection of them plays a substantial role in preserving biodiversity and consequently maintaining a resilient ecosystem. For this purpose full-waveform LiDAR data were acquired from a native Eucalypt forest in Southern Australia. The structure of the forest significantly varies in terms of tree density, age and height. Additionally, Eucalyptus camaldulensis have multiple trunk splits making tree delineation very challenging. For that reason, this paper investigates automated detection of dead standing Eucalyptus camaldulensis without tree delineation. It also presents the new feature of the open source software DASOS, which extracts features for 3D object detection in voxelised FW LiDAR. A random forest classifier, a weighted-distance KNN algorithm and a seed growth algorithm are used to create a 2D probabilistic field and to then predict potential positions of dead trees. It is shown that tree health assessment is possible without tree delineation but since it is a new research directions there are many improvements to be made.
An efficient group multicast routing for multimedia communication

NASA Astrophysics Data System (ADS)

Wang, Yanlin; Sun, Yugen; Yan, Xinfang

2004-04-01

Group multicasting is a kind of communication mechanism whereby each member of a group sends messages to all the other members of the same group. Group multicast routing algorithms capable of satisfying quality of service (QoS) requirements of multimedia applications are essential for high-speed networks. We present a heuristic algorithm for group multicast routing with end to end delay constraint. Source-specific routing trees for each member are generated in our algorithm, which satisfy member"s bandwidth and end to end delay requirements. Simulations over random network were carried out to compare proposed algorithm performance with Low and Song"s. The experimental results show that our proposed algorithm performs better in terms of network cost and ability in constructing feasible multicast trees for group members. Moreover, our algorithm achieves good performance in balancing traffic, which can avoid link blocking and enhance the network behavior efficiently.
Enhancement of Fast Face Detection Algorithm Based on a Cascade of Decision Trees

NASA Astrophysics Data System (ADS)

Khryashchev, V. V.; Lebedev, A. A.; Priorov, A. L.

2017-05-01

Face detection algorithm based on a cascade of ensembles of decision trees (CEDT) is presented. The new approach allows detecting faces other than the front position through the use of multiple classifiers. Each classifier is trained for a specific range of angles of the rotation head. The results showed a high rate of productivity for CEDT on images with standard size. The algorithm increases the area under the ROC-curve of 13% compared to a standard Viola-Jones face detection algorithm. Final realization of given algorithm consist of 5 different cascades for frontal/non-frontal faces. One more thing which we take from the simulation results is a low computational complexity of CEDT algorithm in comparison with standard Viola-Jones approach. This could prove important in the embedded system and mobile device industries because it can reduce the cost of hardware and make battery life longer.
Using traveling salesman problem algorithms for evolutionary tree construction.

PubMed

Korostensky, C; Gonnet, G H

2000-07-01

The construction of evolutionary trees is one of the major problems in computational biology, mainly due to its complexity. We present a new tree construction method that constructs a tree with minimum score for a given set of sequences, where the score is the amount of evolution measured in PAM distances. To do this, the problem of tree construction is reduced to the Traveling Salesman Problem (TSP). The input for the TSP algorithm are the pairwise distances of the sequences and the output is a circular tour through the optimal, unknown tree plus the minimum score of the tree. The circular order and the score can be used to construct the topology of the optimal tree. Our method can be used for any scoring function that correlates to the amount of changes along the branches of an evolutionary tree, for instance it could also be used for parsimony scores, but it cannot be used for least squares fit of distances. A TSP solution reduces the space of all possible trees to 2n. Using this order, we can guarantee that we reconstruct a correct evolutionary tree if the absolute value of the error for each distance measurement is smaller than f2.gif" BORDER="0">, where f3.gif" BORDER="0">is the length of the shortest edge in the tree. For data sets with large errors, a dynamic programming approach is used to reconstruct the tree. Finally simulations and experiments with real data are shown.
DupTree: a program for large-scale phylogenetic analyses using gene tree parsimony.

PubMed

Wehe, André; Bansal, Mukul S; Burleigh, J Gordon; Eulenstein, Oliver

2008-07-01

DupTree is a new software program for inferring rooted species trees from collections of gene trees using the gene tree parsimony approach. The program implements a novel algorithm that significantly improves upon the run time of standard search heuristics for gene tree parsimony, and enables the first truly genome-scale phylogenetic analyses. In addition, DupTree allows users to examine alternate rootings and to weight the reconciliation costs for gene trees. DupTree is an open source project written in C++. DupTree for Mac OS X, Windows, and Linux along with a sample dataset and an on-line manual are available at http://genome.cs.iastate.edu/CBL/DupTree
Comparison of Naive Bayes and Decision Tree on Feature Selection Using Genetic Algorithm for Classification Problem

NASA Astrophysics Data System (ADS)

Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias

2018-03-01

This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.

A stepwise regression tree for nonlinear approximation: applications to estimating subpixel land cover

USGS Publications Warehouse

Huang, C.; Townshend, J.R.G.

2003-01-01

A stepwise regression tree (SRT) algorithm was developed for approximating complex nonlinear relationships. Based on the regression tree of Breiman et al . (BRT) and a stepwise linear regression (SLR) method, this algorithm represents an improvement over SLR in that it can approximate nonlinear relationships and over BRT in that it gives more realistic predictions. The applicability of this method to estimating subpixel forest was demonstrated using three test data sets, on all of which it gave more accurate predictions than SLR and BRT. SRT also generated more compact trees and performed better than or at least as well as BRT at all 10 equal forest proportion interval ranging from 0 to 100%. This method is appealing to estimating subpixel land cover over large areas.
Improving generalized inverted index lock wait times

NASA Astrophysics Data System (ADS)

Borodin, A.; Mirvoda, S.; Porshnev, S.; Ponomareva, O.

2018-01-01

Concurrent operations on tree like data structures is a cornerstone of any database system. Concurrent operations intended for improving read\\write performance and usually implemented via some way of locking. Deadlock-free methods of concurrency control are known as tree locking protocols. These protocols provide basic operations(verbs) and algorithm (ways of operation invocations) for applying it to any tree-like data structure. These algorithms operate on data, managed by storage engine which are very different among RDBMS implementations. In this paper, we discuss tree locking protocol implementation for General inverted index (Gin) applied to multiversion concurrency control (MVCC) storage engine inside PostgreSQL RDBMS. After that we introduce improvements to locking protocol and provide usage statistics about evaluation of our improvement in very high load environment in one of the world’s largest IT company.
Integrated pipeline for inferring the evolutionary history of a gene family embedded in the species tree: a case study on the STIMATE gene family.

PubMed

Song, Jia; Zheng, Sisi; Nguyen, Nhung; Wang, Youjun; Zhou, Yubin; Lin, Kui

2017-10-03

Because phylogenetic inference is an important basis for answering many evolutionary problems, a large number of algorithms have been developed. Some of these algorithms have been improved by integrating gene evolution models with the expectation of accommodating the hierarchy of evolutionary processes. To the best of our knowledge, however, there still is no single unifying model or algorithm that can take all evolutionary processes into account through a stepwise or simultaneous method. On the basis of three existing phylogenetic inference algorithms, we built an integrated pipeline for inferring the evolutionary history of a given gene family; this pipeline can model gene sequence evolution, gene duplication-loss, gene transfer and multispecies coalescent processes. As a case study, we applied this pipeline to the STIMATE (TMEM110) gene family, which has recently been reported to play an important role in store-operated Ca 2+ entry (SOCE) mediated by ORAI and STIM proteins. We inferred their phylogenetic trees in 69 sequenced chordate genomes. By integrating three tree reconstruction algorithms with diverse evolutionary models, a pipeline for inferring the evolutionary history of a gene family was developed, and its application was demonstrated.
Isosurface Extraction in Time-Varying Fields Using a Temporal Hierarchical Index Tree

NASA Technical Reports Server (NTRS)

Shen, Han-Wei; Gerald-Yamasaki, Michael (Technical Monitor)

1998-01-01

Many high-performance isosurface extraction algorithms have been proposed in the past several years as a result of intensive research efforts. When applying these algorithms to large-scale time-varying fields, the storage overhead incurred from storing the search index often becomes overwhelming. this paper proposes an algorithm for locating isosurface cells in time-varying fields. We devise a new data structure, called Temporal Hierarchical Index Tree, which utilizes the temporal coherence that exists in a time-varying field and adoptively coalesces the cells' extreme values over time; the resulting extreme values are then used to create the isosurface cell search index. For a typical time-varying scalar data set, not only does this temporal hierarchical index tree require much less storage space, but also the amount of I/O required to access the indices from the disk at different time steps is substantially reduced. We illustrate the utility and speed of our algorithm with data from several large-scale time-varying CID simulations. Our algorithm can achieve more than 80% of disk-space savings when compared with the existing techniques, while the isosurface extraction time is nearly optimal.
A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees.

PubMed

van Iersel, Leo; Kelk, Steven; Lekić, Nela; Scornavacca, Celine

2014-05-05

Reticulate events play an important role in determining evolutionary relationships. The problem of computing the minimum number of such events to explain discordance between two phylogenetic trees is a hard computational problem. Even for binary trees, exact solvers struggle to solve instances with reticulation number larger than 40-50. Here we present CycleKiller and NonbinaryCycleKiller, the first methods to produce solutions verifiably close to optimality for instances with hundreds or even thousands of reticulations. Using simulations, we demonstrate that these algorithms run quickly for large and difficult instances, producing solutions that are very close to optimality. As a spin-off from our simulations we also present TerminusEst, which is the fastest exact method currently available that can handle nonbinary trees: this is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All three methods are based on extensions of previous theoretical work (SIDMA 26(4):1635-1656, TCBB 10(1):18-25, SIDMA 28(1):49-66) and are publicly available. We also apply our methods to real data.
Repetitive Reaction and Restitution (R3) induction of drought hardiness in conifer container seedlings

Treesearch

Jol Hodgson

2011-01-01

Planting failures are often attributed to unexpectedly harsh conditions after planting. Characterization of soil water at the planting site, along with associated influences of site preparation and soil texture, is recommended. Additionally, tree planting technique and seedling biology should be targeted to site conditions. A nursery regime for induction of drought...
Characterizing the phylogenetic tree-search problem.

PubMed

Money, Daniel; Whelan, Simon

2012-03-01

Phylogenetic trees are important in many areas of biological research, ranging from systematic studies to the methods used for genome annotation. Finding the best scoring tree under any optimality criterion is an NP-hard problem, which necessitates the use of heuristics for tree-search. Although tree-search plays a major role in obtaining a tree estimate, there remains a limited understanding of its characteristics and how the elements of the statistical inferential procedure interact with the algorithms used. This study begins to answer some of these questions through a detailed examination of maximum likelihood tree-search on a wide range of real genome-scale data sets. We examine all 10,395 trees for each of the 106 genes of an eight-taxa yeast phylogenomic data set, then apply different tree-search algorithms to investigate their performance. We extend our findings by examining two larger genome-scale data sets and a large disparate data set that has been previously used to benchmark the performance of tree-search programs. We identify several broad trends occurring during tree-search that provide an insight into the performance of heuristics and may, in the future, aid their development. These trends include a tendency for the true maximum likelihood (best) tree to also be the shortest tree in terms of branch lengths, a weak tendency for tree-search to recover the best tree, and a tendency for tree-search to encounter fewer local optima in genes that have a high information content. When examining current heuristics for tree-search, we find that nearest-neighbor-interchange performs poorly, and frequently finds trees that are significantly different from the best tree. In contrast, subtree-pruning-and-regrafting tends to perform well, nearly always finding trees that are not significantly different to the best tree. Finally, we demonstrate that the precise implementation of a tree-search strategy, including when and where parameters are optimized, can change the character of tree-search, and that good strategies for tree-search may combine existing tree-search programs.
An efficient and extensible approach for compressing phylogenetic trees

PubMed Central

2011-01-01

Background Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference. Results On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings. Conclusions TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community. PMID:22165819
An efficient and extensible approach for compressing phylogenetic trees.

PubMed

Matthews, Suzanne J; Williams, Tiffani L

2011-10-18

Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference. On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings. TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community.
Using Decision Trees to Detect and Isolate Simulated Leaks in the J-2X Rocket Engine

NASA Technical Reports Server (NTRS)

Schwabacher, Mark A.; Aguilar, Robert; Figueroa, Fernando F.

2009-01-01

The goal of this work was to use data-driven methods to automatically detect and isolate faults in the J-2X rocket engine. It was decided to use decision trees, since they tend to be easier to interpret than other data-driven methods. The decision tree algorithm automatically "learns" a decision tree by performing a search through the space of possible decision trees to find one that fits the training data. The particular decision tree algorithm used is known as C4.5. Simulated J-2X data from a high-fidelity simulator developed at Pratt & Whitney Rocketdyne and known as the Detailed Real-Time Model (DRTM) was used to "train" and test the decision tree. Fifty-six DRTM simulations were performed for this purpose, with different leak sizes, different leak locations, and different times of leak onset. To make the simulations as realistic as possible, they included simulated sensor noise, and included a gradual degradation in both fuel and oxidizer turbine efficiency. A decision tree was trained using 11 of these simulations, and tested using the remaining 45 simulations. In the training phase, the C4.5 algorithm was provided with labeled examples of data from nominal operation and data including leaks in each leak location. From the data, it "learned" a decision tree that can classify unseen data as having no leak or having a leak in one of the five leak locations. In the test phase, the decision tree produced very low false alarm rates and low missed detection rates on the unseen data. It had very good fault isolation rates for three of the five simulated leak locations, but it tended to confuse the remaining two locations, perhaps because a large leak at one of these two locations can look very similar to a small leak at the other location.
Molecular Infectious Disease Epidemiology: Survival Analysis and Algorithms Linking Phylogenies to Transmission Trees

PubMed Central

Kenah, Eben; Britton, Tom; Halloran, M. Elizabeth; Longini, Ira M.

2016-01-01

Recent work has attempted to use whole-genome sequence data from pathogens to reconstruct the transmission trees linking infectors and infectees in outbreaks. However, transmission trees from one outbreak do not generalize to future outbreaks. Reconstruction of transmission trees is most useful to public health if it leads to generalizable scientific insights about disease transmission. In a survival analysis framework, estimation of transmission parameters is based on sums or averages over the possible transmission trees. A phylogeny can increase the precision of these estimates by providing partial information about who infected whom. The leaves of the phylogeny represent sampled pathogens, which have known hosts. The interior nodes represent common ancestors of sampled pathogens, which have unknown hosts. Starting from assumptions about disease biology and epidemiologic study design, we prove that there is a one-to-one correspondence between the possible assignments of interior node hosts and the transmission trees simultaneously consistent with the phylogeny and the epidemiologic data on person, place, and time. We develop algorithms to enumerate these transmission trees and show these can be used to calculate likelihoods that incorporate both epidemiologic data and a phylogeny. A simulation study confirms that this leads to more efficient estimates of hazard ratios for infectiousness and baseline hazards of infectious contact, and we use these methods to analyze data from a foot-and-mouth disease virus outbreak in the United Kingdom in 2001. These results demonstrate the importance of data on individuals who escape infection, which is often overlooked. The combination of survival analysis and algorithms linking phylogenies to transmission trees is a rigorous but flexible statistical foundation for molecular infectious disease epidemiology. PMID:27070316
Trees, bialgebras and intrinsic numerical algorithms

NASA Technical Reports Server (NTRS)

Crouch, Peter; Grossman, Robert; Larson, Richard

1990-01-01

Preliminary work about intrinsic numerical integrators evolving on groups is described. Fix a finite dimensional Lie group G; let g denote its Lie algebra, and let Y(sub 1),...,Y(sub N) denote a basis of g. A class of numerical algorithms is presented that approximate solutions to differential equations evolving on G of the form: dot-x(t) = F(x(t)), x(0) = p is an element of G. The algorithms depend upon constants c(sub i) and c(sub ij), for i = 1,...,k and j is less than i. The algorithms have the property that if the algorithm starts on the group, then it remains on the group. In addition, they also have the property that if G is the abelian group R(N), then the algorithm becomes the classical Runge-Kutta algorithm. The Cayley algebra generated by labeled, ordered trees is used to generate the equations that the coefficients c(sub i) and c(sub ij) must satisfy in order for the algorithm to yield an rth order numerical integrator and to analyze the resulting algorithms.
Induction of belief decision trees from data

NASA Astrophysics Data System (ADS)

AbuDahab, Khalil; Xu, Dong-ling; Keane, John

2012-09-01

In this paper, a method for acquiring belief rule-bases by inductive inference from data is described and evaluated. Existing methods extract traditional rules inductively from data, with consequents that are believed to be either 100% true or 100% false. Belief rules can capture uncertain or incomplete knowledge using uncertain belief degrees in consequents. Instead of using singled-value consequents, each belief rule deals with a set of collectively exhaustive and mutually exclusive consequents. The proposed method extracts belief rules from data which contain uncertain or incomplete knowledge.
Reducing process delays for real-time earthquake parameter estimation - An application of KD tree to large databases for Earthquake Early Warning

NASA Astrophysics Data System (ADS)

Yin, Lucy; Andrews, Jennifer; Heaton, Thomas

2018-05-01

Earthquake parameter estimations using nearest neighbor searching among a large database of observations can lead to reliable prediction results. However, in the real-time application of Earthquake Early Warning (EEW) systems, the accurate prediction using a large database is penalized by a significant delay in the processing time. We propose to use a multidimensional binary search tree (KD tree) data structure to organize large seismic databases to reduce the processing time in nearest neighbor search for predictions. We evaluated the performance of KD tree on the Gutenberg Algorithm, a database-searching algorithm for EEW. We constructed an offline test to predict peak ground motions using a database with feature sets of waveform filter-bank characteristics, and compare the results with the observed seismic parameters. We concluded that large database provides more accurate predictions of the ground motion information, such as peak ground acceleration, velocity, and displacement (PGA, PGV, PGD), than source parameters, such as hypocenter distance. Application of the KD tree search to organize the database reduced the average searching process by 85% time cost of the exhaustive method, allowing the method to be feasible for real-time implementation. The algorithm is straightforward and the results will reduce the overall time of warning delivery for EEW.
Graphical models for optimal power flow

DOE PAGES

Dvijotham, Krishnamurthy; Chertkov, Michael; Van Hentenryck, Pascal; ...

2016-09-13

Optimal power flow (OPF) is the central optimization problem in electric power grids. Although solved routinely in the course of power grid operations, it is known to be strongly NP-hard in general, and weakly NP-hard over tree networks. In this paper, we formulate the optimal power flow problem over tree networks as an inference problem over a tree-structured graphical model where the nodal variables are low-dimensional vectors. We adapt the standard dynamic programming algorithm for inference over a tree-structured graphical model to the OPF problem. Combining this with an interval discretization of the nodal variables, we develop an approximation algorithmmore » for the OPF problem. Further, we use techniques from constraint programming (CP) to perform interval computations and adaptive bound propagation to obtain practically efficient algorithms. Compared to previous algorithms that solve OPF with optimality guarantees using convex relaxations, our approach is able to work for arbitrary tree-structured distribution networks and handle mixed-integer optimization problems. Further, it can be implemented in a distributed message-passing fashion that is scalable and is suitable for “smart grid” applications like control of distributed energy resources. In conclusion, numerical evaluations on several benchmark networks show that practical OPF problems can be solved effectively using this approach.« less
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations

PubMed Central

Mitchell, William F.

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given. PMID:28009355
The Refinement-Tree Partition for Parallel Solution of Partial Differential Equations.

PubMed

Mitchell, William F

1998-01-01

Dynamic load balancing is considered in the context of adaptive multilevel methods for partial differential equations on distributed memory multiprocessors. An approach that periodically repartitions the grid is taken. The important properties of a partitioning algorithm are presented and discussed in this context. A partitioning algorithm based on the refinement tree of the adaptive grid is presented and analyzed in terms of these properties. Theoretical and numerical results are given.
Multi-terminal pipe routing by Steiner minimal tree and particle swarm optimisation

NASA Astrophysics Data System (ADS)

Liu, Qiang; Wang, Chengen

2012-08-01

Computer-aided design of pipe routing is of fundamental importance for complex equipments' developments. In this article, non-rectilinear branch pipe routing with multiple terminals that can be formulated as a Euclidean Steiner Minimal Tree with Obstacles (ESMTO) problem is studied in the context of an aeroengine-integrated design engineering. Unlike the traditional methods that connect pipe terminals sequentially, this article presents a new branch pipe routing algorithm based on the Steiner tree theory. The article begins with a new algorithm for solving the ESMTO problem by using particle swarm optimisation (PSO), and then extends the method to the surface cases by using geodesics to meet the requirements of routing non-rectilinear pipes on the surfaces of aeroengines. Subsequently, the adaptive region strategy and the basic visibility graph method are adopted to increase the computation efficiency. Numeral computations show that the proposed routing algorithm can find satisfactory routing layouts while running in polynomial time.
The explicit computation of integration algorithms and first integrals for ordinary differential equations with polynomials coefficients using trees

NASA Technical Reports Server (NTRS)

Crouch, P. E.; Grossman, Robert

1992-01-01

This note is concerned with the explicit symbolic computation of expressions involving differential operators and their actions on functions. The derivation of specialized numerical algorithms, the explicit symbolic computation of integrals of motion, and the explicit computation of normal forms for nonlinear systems all require such computations. More precisely, if R = k(x(sub 1),...,x(sub N)), where k = R or C, F denotes a differential operator with coefficients from R, and g member of R, we describe data structures and algorithms for efficiently computing g. The basic idea is to impose a multiplicative structure on the vector space with basis the set of finite rooted trees and whose nodes are labeled with the coefficients of the differential operators. Cancellations of two trees with r + 1 nodes translates into cancellation of O(N(exp r)) expressions involving the coefficient functions and their derivatives.
Performance of a cavity-method-based algorithm for the prize-collecting Steiner tree problem on graphs

NASA Astrophysics Data System (ADS)

Biazzo, Indaco; Braunstein, Alfredo; Zecchina, Riccardo

2012-08-01

We study the behavior of an algorithm derived from the cavity method for the prize-collecting steiner tree (PCST) problem on graphs. The algorithm is based on the zero temperature limit of the cavity equations and as such is formally simple (a fixed point equation resolved by iteration) and distributed (parallelizable). We provide a detailed comparison with state-of-the-art algorithms on a wide range of existing benchmarks, networks, and random graphs. Specifically, we consider an enhanced derivative of the Goemans-Williamson heuristics and the dhea solver, a branch and cut integer linear programming based approach. The comparison shows that the cavity algorithm outperforms the two algorithms in most large instances both in running time and quality of the solution. Finally we prove a few optimality properties of the solutions provided by our algorithm, including optimality under the two postprocessing procedures defined in the Goemans-Williamson derivative and global optimality in some limit cases.

A physarum-inspired prize-collecting steiner tree approach to identify subnetworks for drug repositioning.

PubMed

Sun, Yahui; Hameed, Pathima Nusrath; Verspoor, Karin; Halgamuge, Saman

2016-12-05

Drug repositioning can reduce the time, costs and risks of drug development by identifying new therapeutic effects for known drugs. It is challenging to reposition drugs as pharmacological data is large and complex. Subnetwork identification has already been used to simplify the visualization and interpretation of biological data, but it has not been applied to drug repositioning so far. In this paper, we fill this gap by proposing a new Physarum-inspired Prize-Collecting Steiner Tree algorithm to identify subnetworks for drug repositioning. Drug Similarity Networks (DSN) are generated using the chemical, therapeutic, protein, and phenotype features of drugs. In DSNs, vertex prizes and edge costs represent the similarities and dissimilarities between drugs respectively, and terminals represent drugs in the cardiovascular class, as defined in the Anatomical Therapeutic Chemical classification system. A new Physarum-inspired Prize-Collecting Steiner Tree algorithm is proposed in this paper to identify subnetworks. We apply both the proposed algorithm and the widely-used GW algorithm to identify subnetworks in our 18 generated DSNs. In these DSNs, our proposed algorithm identifies subnetworks with an average Rand Index of 81.1%, while the GW algorithm can only identify subnetworks with an average Rand Index of 64.1%. We select 9 subnetworks with high Rand Index to find drug repositioning opportunities. 10 frequently occurring drugs in these subnetworks are identified as candidates to be repositioned for cardiovascular diseases. We find evidence to support previous discoveries that nitroglycerin, theophylline and acarbose may be able to be repositioned for cardiovascular diseases. Moreover, we identify seven previously unknown drug candidates that also may interact with the biological cardiovascular system. These discoveries show our proposed Prize-Collecting Steiner Tree approach as a promising strategy for drug repositioning.
Automatic Inference of Cryptographic Key Length Based on Analysis of Proof Tightness

DTIC Science & Technology

2016-06-01

within an attack tree structure, then expand attack tree methodology to include cryptographic reductions. We then provide the algorithms for...maintaining and automatically reasoning about these expanded attack trees . We provide a software tool that utilizes machine-readable proof and attack metadata...and the attack tree methodology to provide rapid and precise answers regarding security parameters and effective security. This eliminates the need
Layer stacking: A novel algorithm for individual forest tree segmentation from LiDAR point clouds

Treesearch

Elias Ayrey; Shawn Fraver; John A. Kershaw; Laura S. Kenefic; Daniel Hayes; Aaron R. Weiskittel; Brian E. Roth

2017-01-01

As light detection and ranging (LiDAR) technology advances, it has become common for datasets to be acquired at a point density high enough to capture structural information from individual trees. To process these data, an automatic method of isolating individual trees from a LiDAR point cloud is required. Traditional methods for segmenting trees attempt to isolate...
Memory-Scalable GPU Spatial Hierarchy Construction.

PubMed

Qiming Hou; Xin Sun; Kun Zhou; Lauterbach, C; Manocha, D

2011-04-01

Recent GPU algorithms for constructing spatial hierarchies have achieved promising performance for moderately complex models by using the breadth-first search (BFS) construction order. While being able to exploit the massive parallelism on the GPU, the BFS order also consumes excessive GPU memory, which becomes a serious issue for interactive applications involving very complex models with more than a few million triangles. In this paper, we propose to use the partial breadth-first search (PBFS) construction order to control memory consumption while maximizing performance. We apply the PBFS order to two hierarchy construction algorithms. The first algorithm is for kd-trees that automatically balances between the level of parallelism and intermediate memory usage. With PBFS, peak memory consumption during construction can be efficiently controlled without costly CPU-GPU data transfer. We also develop memory allocation strategies to effectively limit memory fragmentation. The resulting algorithm scales well with GPU memory and constructs kd-trees of models with millions of triangles at interactive rates on GPUs with 1 GB memory. Compared with existing algorithms, our algorithm is an order of magnitude more scalable for a given GPU memory bound. The second algorithm is for out-of-core bounding volume hierarchy (BVH) construction for very large scenes based on the PBFS construction order. At each iteration, all constructed nodes are dumped to the CPU memory, and the GPU memory is freed for the next iteration's use. In this way, the algorithm is able to build trees that are too large to be stored in the GPU memory. Experiments show that our algorithm can construct BVHs for scenes with up to 20 M triangles, several times larger than previous GPU algorithms.
Analysis of transcripts differentially expressed between fruited and deflowered 'Gala' adult trees: a contribution to biennial bearing understanding in apple.

PubMed

Guitton, B; Kelner, J J; Celton, J M; Sabau, X; Renou, J P; Chagné, D; Costes, E

2016-02-29

The transition from vegetative to floral state in shoot apical meristems (SAM) is a key event in plant development and is of crucial importance for reproductive success. In perennial plants, this event is recurrent during tree life and subject to both within-tree and between-years heterogeneity. In the present study, our goal was to identify candidate processes involved in the repression or induction of flowering in apical buds of adult apple trees. Genes differentially expressed (GDE) were examined between trees artificially set in either 'ON' or 'OFF' situation, and in which floral induction (FI) was shown to be inhibited or induced in most buds, respectively, using qRT-PCR and microarray analysis. From the period of FI through to flower differentiation, GDE belonged to four main biological processes (i) response to stimuli, including response to oxidative stress; (ii) cellular processes, (iii) cell wall biogenesis, and (iv) metabolic processes including carbohydrate biosynthesis and lipid metabolic process. Several key regulator genes, especially TEMPRANILLO (TEM), FLORAL TRANSITION AT MERISTEM (FTM1) and SQUAMOSA PROMOTER BINDING PROTEIN-LIKE (SPL) were found differentially expressed. Moreover, homologs of SPL and Leucine-Rich Repeat proteins were present under QTL zones previously detected for biennial bearing. This data set suggests that apical buds of 'ON' and 'OFF' trees were in different physiological states, resulting from different metabolic, hormonal and redox status which are likely to contribute to FI control in adult apple trees. Investigations on carbohydrate and hormonal fluxes from sources to SAM and on cell detoxification process are expected to further contribute to the identification of the underlying physiological mechanisms of FI in adult apple trees.
IND - THE IND DECISION TREE PACKAGE

NASA Technical Reports Server (NTRS)

Buntine, W.

1994-01-01

A common approach to supervised classification and prediction in artificial intelligence and statistical pattern recognition is the use of decision trees. A tree is "grown" from data using a recursive partitioning algorithm to create a tree which has good prediction of classes on new data. Standard algorithms are CART (by Breiman Friedman, Olshen and Stone) and ID3 and its successor C4 (by Quinlan). As well as reimplementing parts of these algorithms and offering experimental control suites, IND also introduces Bayesian and MML methods and more sophisticated search in growing trees. These produce more accurate class probability estimates that are important in applications like diagnosis. IND is applicable to most data sets consisting of independent instances, each described by a fixed length vector of attribute values. An attribute value may be a number, one of a set of attribute specific symbols, or it may be omitted. One of the attributes is delegated the "target" and IND grows trees to predict the target. Prediction can then be done on new data or the decision tree printed out for inspection. IND provides a range of features and styles with convenience for the casual user as well as fine-tuning for the advanced user or those interested in research. IND can be operated in a CART-like mode (but without regression trees, surrogate splits or multivariate splits), and in a mode like the early version of C4. Advanced features allow more extensive search, interactive control and display of tree growing, and Bayesian and MML algorithms for tree pruning and smoothing. These often produce more accurate class probability estimates at the leaves. IND also comes with a comprehensive experimental control suite. IND consists of four basic kinds of routines: data manipulation routines, tree generation routines, tree testing routines, and tree display routines. The data manipulation routines are used to partition a single large data set into smaller training and test sets. The generation routines are used to build classifiers. The test routines are used to evaluate classifiers and to classify data using a classifier. And the display routines are used to display classifiers in various formats. IND is written in C-language for Sun4 series computers. It consists of several programs with controlling shell scripts. Extensive UNIX man entries are included. IND is designed to be used on any UNIX system, although it has only been thoroughly tested on SUN platforms. The standard distribution medium for IND is a .25 inch streaming magnetic tape cartridge in UNIX tar format. An electronic copy of the documentation in PostScript format is included on the distribution medium. IND was developed in 1992.
On Determining if Tree-based Networks Contain Fixed Trees.

PubMed

Anaya, Maria; Anipchenko-Ulaj, Olga; Ashfaq, Aisha; Chiu, Joyce; Kaiser, Mahedi; Ohsawa, Max Shoji; Owen, Megan; Pavlechko, Ella; St John, Katherine; Suleria, Shivam; Thompson, Keith; Yap, Corrine

2016-05-01

We address an open question of Francis and Steel about phylogenetic networks and trees. They give a polynomial time algorithm to decide if a phylogenetic network, N, is tree-based and pose the problem: given a fixed tree T and network N, is N based on T? We show that it is [Formula: see text]-hard to decide, by reduction from 3-Dimensional Matching (3DM) and further that the problem is fixed-parameter tractable.
Potential link between biotic defense activation and recalcitrance to induction of somatic embryogenesis in shoot primordia from adult trees of white spruce (Picea glauca)

PubMed Central

2013-01-01

Background Among the many commercial opportunities afforded by somatic embryogenesis (SE), it is the ability to clonally propagate individual plants with rare or elite traits that has some of the most significant implications. This is particularly true for many long-lived species, such as conifers, but whose long generation times pose substantive challenges, including increased recalcitrance for SE as plants age. Identification of a clonal line of somatic embryo-derived trees whose shoot primordia have remained responsive to SE induction for over a decade, provided a unique opportunity to examine the molecular aspects underpinning SE within shoot tissues of adult white spruce trees. Results Microarray analysis was used to conduct transcriptome-wide expression profiling of shoot explants taken from this responsive genotype following one week of SE induction, which when compared with that of a nonresponsive genotype, led to the identification of four of the most differentially expressed genes within each genotype. Using absolute qPCR to expand the analysis to three weeks of induction revealed that differential expression of all eight candidate genes was maintained to the end of the induction treatment, albeit to differing degrees. Most striking was that both the magnitude and duration of candidate gene expression within the nonresponsive genotype was indicative of an intense physiological response. Examining their putative identities further revealed that all four encoded for proteins with similarity to angiosperm proteins known to play prominent roles in biotic defense, and that their high-level induction over an extended period is consistent with activation of a biotic defense response. In contrast, the more temperate response within the responsive genotype, including induction of a conifer-specific dehydrin, is more consistent with elicitation of an adaptive stress response. Conclusions While additional evidence is required to definitively establish an association between SE responsiveness and a specific physiological response, these results suggest that biotic defense activation may be antagonistic, likely related to the massive transcriptional and metabolic reprogramming that it elicits. A major issue for future work will be to determine how and if suppressing biotic defense activation could be used to promote a physiological state more conducive to SE induction. PMID:23937238
TREEGRAD: a grading program for eastern hardwoods

Treesearch

J.W. Stringer; D.W. Cremeans

1991-01-01

Assigning tree grades to eastern hardwoods is often a difficult task for neophyte graders. Recently several "dichotomous keys" have been developed for training graders in the USFS hardwood tree grading system. TREEGRAD uses the Tree Grading Algorithm (TGA) for determining grades from defect location data and is designed to be used as a teaching aid.
Key algorithms used in GR02: A computer simulation model for predicting tree and stand growth

Treesearch

Garrett A. Hughes; Paul E. Sendak; Paul E. Sendak

1985-01-01

GR02 is an individual tree, distance-independent simulation model for predicting tree and stand growth over time. It performs five major functions during each run: (1) updates diameter at breast height, (2) updates total height, (3) estimates mortality, (4) determines regeneration, and (5) updates crown class.
Using decision trees to characterize verbal communication during change and stuck episodes in the therapeutic process

PubMed Central

Masías, Víctor H.; Krause, Mariane; Valdés, Nelson; Pérez, J. C.; Laengle, Sigifredo

2015-01-01

Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice. PMID:25914657
Using decision trees to characterize verbal communication during change and stuck episodes in the therapeutic process.

PubMed

Masías, Víctor H; Krause, Mariane; Valdés, Nelson; Pérez, J C; Laengle, Sigifredo

2015-01-01

Methods are needed for creating models to characterize verbal communication between therapists and their patients that are suitable for teaching purposes without losing analytical potential. A technique meeting these twin requirements is proposed that uses decision trees to identify both change and stuck episodes in therapist-patient communication. Three decision tree algorithms (C4.5, NBTree, and REPTree) are applied to the problem of characterizing verbal responses into change and stuck episodes in the therapeutic process. The data for the problem is derived from a corpus of 8 successful individual therapy sessions with 1760 speaking turns in a psychodynamic context. The decision tree model that performed best was generated by the C4.5 algorithm. It delivered 15 rules characterizing the verbal communication in the two types of episodes. Decision trees are a promising technique for analyzing verbal communication during significant therapy events and have much potential for use in teaching practice on changes in therapeutic communication. The development of pedagogical methods using decision trees can support the transmission of academic knowledge to therapeutic practice.
Treelink: data integration, clustering and visualization of phylogenetic trees.

PubMed

Allende, Christian; Sohn, Erik; Little, Cedric

2015-12-29

Phylogenetic trees are central to a wide range of biological studies. In many of these studies, tree nodes need to be associated with a variety of attributes. For example, in studies concerned with viral relationships, tree nodes are associated with epidemiological information, such as location, age and subtype. Gene trees used in comparative genomics are usually linked with taxonomic information, such as functional annotations and events. A wide variety of tree visualization and annotation tools have been developed in the past, however none of them are intended for an integrative and comparative analysis. Treelink is a platform-independent software for linking datasets and sequence files to phylogenetic trees. The application allows an automated integration of datasets to trees for operations such as classifying a tree based on a field or showing the distribution of selected data attributes in branches and leafs. Genomic and proteonomic sequences can also be linked to the tree and extracted from internal and external nodes. A novel clustering algorithm to simplify trees and display the most divergent clades was also developed, where validation can be achieved using the data integration and classification function. Integrated geographical information allows ancestral character reconstruction for phylogeographic plotting based on parsimony and likelihood algorithms. Our software can successfully integrate phylogenetic trees with different data sources, and perform operations to differentiate and visualize those differences within a tree. File support includes the most popular formats such as newick and csv. Exporting visualizations as images, cluster outputs and genomic sequences is supported. Treelink is available as a web and desktop application at http://www.treelinkapp.com .
M-AMST: an automatic 3D neuron tracing method based on mean shift and adapted minimum spanning tree.

PubMed

Wan, Zhijiang; He, Yishan; Hao, Ming; Yang, Jian; Zhong, Ning

2017-03-29

Understanding the working mechanism of the brain is one of the grandest challenges for modern science. Toward this end, the BigNeuron project was launched to gather a worldwide community to establish a big data resource and a set of the state-of-the-art of single neuron reconstruction algorithms. Many groups contributed their own algorithms for the project, including our mean shift and minimum spanning tree (M-MST). Although M-MST is intuitive and easy to implement, the MST just considers spatial information of single neuron and ignores the shape information, which might lead to less precise connections between some neuron segments. In this paper, we propose an improved algorithm, namely M-AMST, in which a rotating sphere model based on coordinate transformation is used to improve the weight calculation method in M-MST. Two experiments are designed to illustrate the effect of adapted minimum spanning tree algorithm and the adoptability of M-AMST in reconstructing variety of neuron image datasets respectively. In the experiment 1, taking the reconstruction of APP2 as reference, we produce the four difference scores (entire structure average (ESA), different structure average (DSA), percentage of different structure (PDS) and max distance of neurons' nodes (MDNN)) by comparing the neuron reconstruction of the APP2 and the other 5 competing algorithm. The result shows that M-AMST gets lower difference scores than M-MST in ESA, PDS and MDNN. Meanwhile, M-AMST is better than N-MST in ESA and MDNN. It indicates that utilizing the adapted minimum spanning tree algorithm which took the shape information of neuron into account can achieve better neuron reconstructions. In the experiment 2, 7 neuron image datasets are reconstructed and the four difference scores are calculated by comparing the gold standard reconstruction and the reconstructions produced by 6 competing algorithms. Comparing the four difference scores of M-AMST and the other 5 algorithm, we can conclude that M-AMST is able to achieve the best difference score in 3 datasets and get the second-best difference score in the other 2 datasets. We develop a pathway extraction method using a rotating sphere model based on coordinate transformation to improve the weight calculation approach in MST. The experimental results show that M-AMST utilizes the adapted minimum spanning tree algorithm which takes the shape information of neuron into account can achieve better neuron reconstructions. Moreover, M-AMST is able to get good neuron reconstruction in variety of image datasets.
Efficient algorithms for dilated mappings of binary trees

NASA Technical Reports Server (NTRS)

Iqbal, M. Ashraf

1990-01-01

The problem is addressed to find a 1-1 mapping of the vertices of a binary tree onto those of a target binary tree such that the son of a node on the first binary tree is mapped onto a descendent of the image of that node in the second binary tree. There are two natural measures of the cost of this mapping, namely the dilation cost, i.e., the maximum distance in the target binary tree between the images of vertices that are adjacent in the original tree. The other measure, expansion cost, is defined as the number of extra nodes/edges to be added to the target binary tree in order to ensure a 1-1 mapping. An efficient algorithm to find a mapping of one binary tree onto another is described. It is shown that it is possible to minimize one cost of mapping at the expense of the other. This problem arises when designing pipelined arithmetic logic units (ALU) for special purpose computers. The pipeline is composed of ALU chips connected in the form of a binary tree. The operands to the pipeline can be supplied to the leaf nodes of the binary tree which then process and pass the results up to their parents. The final result is available at the root. As each new application may require a distinct nesting of operations, it is useful to be able to find a good mapping of a new binary tree over existing ALU tree. Another problem arises if every distinct required binary tree is known beforehand. Here it is useful to hardwire the pipeline in the form of a minimal supertree that contains all required binary trees.
Multi-level tree analysis of pulmonary artery/vein trees in non-contrast CT images

NASA Astrophysics Data System (ADS)

Gao, Zhiyun; Grout, Randall W.; Hoffman, Eric A.; Saha, Punam K.

2012-02-01

Diseases like pulmonary embolism and pulmonary hypertension are associated with vascular dystrophy. Identifying such pulmonary artery/vein (A/V) tree dystrophy in terms of quantitative measures via CT imaging significantly facilitates early detection of disease or a treatment monitoring process. A tree structure, consisting of nodes and connected arcs, linked to the volumetric representation allows multi-level geometric and volumetric analysis of A/V trees. Here, a new theory and method is presented to generate multi-level A/V tree representation of volumetric data and to compute quantitative measures of A/V tree geometry and topology at various tree hierarchies. The new method is primarily designed on arc skeleton computation followed by a tree construction based topologic and geometric analysis of the skeleton. The method starts with a volumetric A/V representation as input and generates its topologic and multi-level volumetric tree representations long with different multi-level morphometric measures. A new recursive merging and pruning algorithms are introduced to detect bad junctions and noisy branches often associated with digital geometric and topologic analysis. Also, a new notion of shortest axial path is introduced to improve the skeletal arc joining two junctions. The accuracy of the multi-level tree analysis algorithm has been evaluated using computer generated phantoms and pulmonary CT images of a pig vessel cast phantom while the reproducibility of method is evaluated using multi-user A/V separation of in vivo contrast-enhanced CT images of a pig lung at different respiratory volumes.
Algorithmic Complexity. Volume II.

DTIC Science & Technology

1982-06-01

digital computers, this improvement will go unnoticed if only a few complex products are to be taken, however it can become increasingly important as...computed in the reverse order. If the products are formed moving from the top of the tree downward, and then the divisions are performed going from the...the reverse order, going up the tree. (r- a mod m means that r is the remainder when a is divided by M.) The overall running time of the algorithm is
Orthology and paralogy constraints: satisfiability and consistency.

PubMed

Lafond, Manuel; El-Mabrouk, Nadia

2014-01-01

A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.
Orthology and paralogy constraints: satisfiability and consistency

PubMed Central

2014-01-01

Background A variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree? Results Previous results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships. PMID:25572629
Development of Gis Tool for the Solution of Minimum Spanning Tree Problem using Prim's Algorithm

NASA Astrophysics Data System (ADS)

Dutta, S.; Patra, D.; Shankar, H.; Alok Verma, P.

2014-11-01

minimum spanning tree (MST) of a connected, undirected and weighted network is a tree of that network consisting of all its nodes and the sum of weights of all its edges is minimum among all such possible spanning trees of the same network. In this study, we have developed a new GIS tool using most commonly known rudimentary algorithm called Prim's algorithm to construct the minimum spanning tree of a connected, undirected and weighted road network. This algorithm is based on the weight (adjacency) matrix of a weighted network and helps to solve complex network MST problem easily, efficiently and effectively. The selection of the appropriate algorithm is very essential otherwise it will be very hard to get an optimal result. In case of Road Transportation Network, it is very essential to find the optimal results by considering all the necessary points based on cost factor (time or distance). This paper is based on solving the Minimum Spanning Tree (MST) problem of a road network by finding it's minimum span by considering all the important network junction point. GIS technology is usually used to solve the network related problems like the optimal path problem, travelling salesman problem, vehicle routing problems, location-allocation problems etc. Therefore, in this study we have developed a customized GIS tool using Python script in ArcGIS software for the solution of MST problem for a Road Transportation Network of Dehradun city by considering distance and time as the impedance (cost) factors. It has a number of advantages like the users do not need a greater knowledge of the subject as the tool is user-friendly and that allows to access information varied and adapted the needs of the users. This GIS tool for MST can be applied for a nationwide plan called Prime Minister Gram Sadak Yojana in India to provide optimal all weather road connectivity to unconnected villages (points). This tool is also useful for constructing highways or railways spanning several cities optimally or connecting all cities with minimum total road length.

Mirroring co-evolving trees in the light of their topologies.

PubMed

Hajirasouliha, Iman; Schönhuth, Alexander; de Juan, David; Valencia, Alfonso; Sahinalp, S Cenk

2012-05-01

Determining the interaction partners among protein/domain families poses hard computational problems, in particular in the presence of paralogous proteins. Available approaches aim to identify interaction partners among protein/domain families through maximizing the similarity between trimmed versions of their phylogenetic trees. Since maximization of any natural similarity score is computationally difficult, many approaches employ heuristics to evaluate the distance matrices corresponding to the tree topologies in question. In this article, we devise an efficient deterministic algorithm which directly maximizes the similarity between two leaf labeled trees with edge lengths, obtaining a score-optimal alignment of the two trees in question. Our algorithm is significantly faster than those methods based on distance matrix comparison: 1 min on a single processor versus 730 h on a supercomputer. Furthermore, we outperform the current state-of-the-art exhaustive search approach in terms of precision, while incurring acceptable losses in recall. A C implementation of the method demonstrated in this article is available at http://compbio.cs.sfu.ca/mirrort.htm
Evolving optimised decision rules for intrusion detection using particle swarm paradigm

NASA Astrophysics Data System (ADS)

Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.

2012-12-01

The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.
An efficient 3D R-tree spatial index method for virtual geographic environments

NASA Astrophysics Data System (ADS)

Zhu, Qing; Gong, Jun; Zhang, Yeting

A three-dimensional (3D) spatial index is required for real time applications of integrated organization and management in virtual geographic environments of above ground, underground, indoor and outdoor objects. Being one of the most promising methods, the R-tree spatial index has been paid increasing attention in 3D geospatial database management. Since the existing R-tree methods are usually limited by their weakness of low efficiency, due to the critical overlap of sibling nodes and the uneven size of nodes, this paper introduces the k-means clustering method and employs the 3D overlap volume, 3D coverage volume and the minimum bounding box shape value of nodes as the integrative grouping criteria. A new spatial cluster grouping algorithm and R-tree insertion algorithm is then proposed. Experimental analysis on comparative performance of spatial indexing shows that by the new method the overlap of R-tree sibling nodes is minimized drastically and a balance in the volumes of the nodes is maintained.
BIMLR: a method for constructing rooted phylogenetic networks from rooted phylogenetic trees.

PubMed

Wang, Juan; Guo, Maozu; Xing, Linlin; Che, Kai; Liu, Xiaoyan; Wang, Chunyu

2013-09-15

Rooted phylogenetic trees constructed from different datasets (e.g. from different genes) are often conflicting with one another, i.e. they cannot be integrated into a single phylogenetic tree. Phylogenetic networks have become an important tool in molecular evolution, and rooted phylogenetic networks are able to represent conflicting rooted phylogenetic trees. Hence, the development of appropriate methods to compute rooted phylogenetic networks from rooted phylogenetic trees has attracted considerable research interest of late. The CASS algorithm proposed by van Iersel et al. is able to construct much simpler networks than other available methods, but it is extremely slow, and the networks it constructs are dependent on the order of the input data. Here, we introduce an improved CASS algorithm, BIMLR. We show that BIMLR is faster than CASS and less dependent on the input data order. Moreover, BIMLR is able to construct much simpler networks than almost all other methods. BIMLR is available at http://nclab.hit.edu.cn/wangjuan/BIMLR/. © 2013 Elsevier B.V. All rights reserved.
Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests.

PubMed

Hu, Chen; Steingrimsson, Jon Arni

2018-01-01

A crucial component of making individualized treatment decisions is to accurately predict each patient's disease risk. In clinical oncology, disease risks are often measured through time-to-event data, such as overall survival and progression/recurrence-free survival, and are often subject to censoring. Risk prediction models based on recursive partitioning methods are becoming increasingly popular largely due to their ability to handle nonlinear relationships, higher-order interactions, and/or high-dimensional covariates. The most popular recursive partitioning methods are versions of the Classification and Regression Tree (CART) algorithm, which builds a simple interpretable tree structured model. With the aim of increasing prediction accuracy, the random forest algorithm averages multiple CART trees, creating a flexible risk prediction model. Risk prediction models used in clinical oncology commonly use both traditional demographic and tumor pathological factors as well as high-dimensional genetic markers and treatment parameters from multimodality treatments. In this article, we describe the most commonly used extensions of the CART and random forest algorithms to right-censored outcomes. We focus on how they differ from the methods for noncensored outcomes, and how the different splitting rules and methods for cost-complexity pruning impact these algorithms. We demonstrate these algorithms by analyzing a randomized Phase III clinical trial of breast cancer. We also conduct Monte Carlo simulations to compare the prediction accuracy of survival forests with more commonly used regression models under various scenarios. These simulation studies aim to evaluate how sensitive the prediction accuracy is to the underlying model specifications, the choice of tuning parameters, and the degrees of missing covariates.
Initialization Method for Grammar-Guided Genetic Programming

NASA Astrophysics Data System (ADS)

García-Arnau, M.; Manrique, D.; Ríos, J.; Rodríguez-Patón, A.

This paper proposes a new tree-generation algorithm for grammarguided genetic programming that includes a parameter to control the maximum size of the trees to be generated. An important feature of this algorithm is that the initial populations generated are adequately distributed in terms of tree size and distribution within the search space. Consequently, genetic programming systems starting from the initial populations generated by the proposed method have a higher convergence speed. Two different problems have been chosen to carry out the experiments: a laboratory test involving searching for arithmetical equalities and the real-world task of breast cancer prognosis. In both problems, comparisons have been made to another five important initialization methods.
Automatic determination of trunk diameter, crown base and height of scots pine (Pinus Sylvestris L.) Based on analysis of 3D point clouds gathered from multi-station terrestrial laser scanning. (Polish Title: Automatyczne okreslanie srednicy pnia, podstawy korony oraz wysokosci sosny zwyczajnej (Pinus Silvestris L.) Na podstawie analiz chmur punktow 3D pochodzacych z wielostanowiskowego naziemnego skanowania laserowego)

NASA Astrophysics Data System (ADS)

Ratajczak, M.; Wężyk, P.

2015-12-01

Rapid development of terrestrial laser scanning (TLS) in recent years resulted in its recognition and implementation in many industries, including forestry and nature conservation. The use of the 3D TLS point clouds in the process of inventory of trees and stands, as well as in the determination of their biometric features (trunk diameter, tree height, crown base, number of trunk shapes), trees and lumber size (volume of trees) is slowly becoming a practice. In addition to the measurement precision, the primary added value of TLS is the ability to automate the processing of the clouds of points 3D in the direction of the extraction of selected features of trees and stands. The paper presents the original software (GNOM) for the automatic measurement of selected features of trees, based on the cloud of points obtained by the ground laser scanner FARO. With the developed algorithms (GNOM), the location of tree trunks on the circular research surface was specified and the measurement was performed; the measurement covered the DBH (l: 1.3m), further diameters of tree trunks at different heights of the tree trunk, base of the tree crown and volume of the tree trunk (the selection measurement method), as well as the tree crown. Research works were performed in the territory of the Niepolomice Forest in an unmixed pine stand (Pinussylvestris L.) on the circular surface with a radius of 18 m, within which there were 16 pine trees (14 of them were cut down). It was characterized by a two-storey and even-aged construction (147 years old) and was devoid of undergrowth. Ground scanning was performed just before harvesting. The DBH of 16 pine trees was specified in a fully automatic way, using the algorithm GNOM with an accuracy of +2.1%, as compared to the reference measurement by the DBH measurement device. The medium, absolute measurement error in the cloud of points - using semi-automatic methods "PIXEL" (between points) and PIPE (fitting the cylinder) in the FARO Scene 5.x., showed the error, 3.5% and 5.0%,.respectively The reference height was assumed as the measurement performed by the tape on the cut tree. The average error of automatic determination of the tree height by the algorithm GNOM based on the TLS point clouds amounted to 6.3% and was slightly higher than when using the manual method of measurements on profiles in the TerraScan (Terrasolid; the error of 5.6%). The relatively high value of the error may be mainly related to the small number of points TLS in the upper parts of crowns. The crown height measurement showed the error of +9.5%. The reference in this case was the tape measurement performed already on the trunks of cut pine trees. Processing the clouds of points by the algorithms GNOM for 16 analyzed trees took no longer than 10 min. (37 sec. /tree). The paper mainly showed the TLS measurement innovation and its high precision in acquiring biometric data in forestry, and at the same time also the further need to increase the degree of automation of processing the clouds of points 3D from terrestrial laser scanning.
A flooding algorithm for multirobot exploration.

PubMed

Cabrera-Mora, Flavio; Xiao, Jizhong

2012-06-01

In this paper, we present a multirobot exploration algorithm that aims at reducing the exploration time and to minimize the overall traverse distance of the robots by coordinating the movement of the robots performing the exploration. Modeling the environment as a tree, we consider a coordination model that restricts the number of robots allowed to traverse an edge and to enter a vertex during each step. This coordination is achieved in a decentralized manner by the robots using a set of active landmarks that are dropped by them at explored vertices. We mathematically analyze the algorithm on trees, obtaining its main properties and specifying its bounds on the exploration time. We also define three metrics of performance for multirobot algorithms. We simulate and compare the performance of this new algorithm with those of our multirobot depth first search (MR-DFS) approach presented in our recent paper and classic single-robot DFS.
Degree-constrained multicast routing for multimedia communications

NASA Astrophysics Data System (ADS)

Wang, Yanlin; Sun, Yugeng; Li, Guidan

2005-02-01

Multicast services have been increasingly used by many multimedia applications. As one of the key techniques to support multimedia applications, the rational and effective multicast routing algorithms are very important to networks performance. When switch nodes in networks have different multicast capability, multicast routing problem is modeled as the degree-constrained Steiner problem. We presented two heuristic algorithms, named BMSTA and BSPTA, for the degree-constrained case in multimedia communications. Both algorithms are used to generate degree-constrained multicast trees with bandwidth and end to end delay bound. Simulations over random networks were carried out to compare the performance of the two proposed algorithms. Experimental results show that the proposed algorithms have advantages in traffic load balancing, which can avoid link blocking and enhance networks performance efficiently. BMSTA has better ability in finding unsaturated links and (or) unsaturated nodes to generate multicast trees than BSPTA. The performance of BMSTA is affected by the variation of degree constraints.
Multistage classification of multispectral Earth observational data: The design approach

NASA Technical Reports Server (NTRS)

Bauer, M. E. (Principal Investigator); Muasher, M. J.; Landgrebe, D. A.

1981-01-01

An algorithm is proposed which predicts the optimal features at every node in a binary tree procedure. The algorithm estimates the probability of error by approximating the area under the likelihood ratio function for two classes and taking into account the number of training samples used in estimating each of these two classes. Some results on feature selection techniques, particularly in the presence of a very limited set of training samples, are presented. Results comparing probabilities of error predicted by the proposed algorithm as a function of dimensionality as compared to experimental observations are shown for aircraft and LANDSAT data. Results are obtained for both real and simulated data. Finally, two binary tree examples which use the algorithm are presented to illustrate the usefulness of the procedure.
Efficient enumeration of monocyclic chemical graphs with given path frequencies

PubMed Central

2014-01-01

Background The enumeration of chemical graphs (molecular graphs) satisfying given constraints is one of the fundamental problems in chemoinformatics and bioinformatics because it leads to a variety of useful applications including structure determination and development of novel chemical compounds. Results We consider the problem of enumerating chemical graphs with monocyclic structure (a graph structure that contains exactly one cycle) from a given set of feature vectors, where a feature vector represents the frequency of the prescribed paths in a chemical compound to be constructed and the set is specified by a pair of upper and lower feature vectors. To enumerate all tree-like (acyclic) chemical graphs from a given set of feature vectors, Shimizu et al. and Suzuki et al. proposed efficient branch-and-bound algorithms based on a fast tree enumeration algorithm. In this study, we devise a novel method for extending these algorithms to enumeration of chemical graphs with monocyclic structure by designing a fast algorithm for testing uniqueness. The results of computational experiments reveal that the computational efficiency of the new algorithm is as good as those for enumeration of tree-like chemical compounds. Conclusions We succeed in expanding the class of chemical graphs that are able to be enumerated efficiently. PMID:24955135
Treecode with a Special-Purpose Processor

NASA Astrophysics Data System (ADS)

Makino, Junichiro

1991-08-01

We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.
Forest structures retrieval from LiDAR onboard ULA

NASA Astrophysics Data System (ADS)

Shang, Xiaoxia; Chazette, Patrick; Totems, Julien; Marnas, Fabien; Sanak, Joseph

2013-04-01

Following the United Nations Framework Convention on Climate Change, the assessment of forest carbon stock is one of the main elements for a better understanding of the carbon cycle and its evolution following the climate change. The forests sequester 80% of the continental biospheric carbon and this efficiency is a function of the tree species and the tree health. The airborne backscatter LiDAR onboard the ultra light aircraft (ULA) can provide the key information on the forest vertical structures and evolution in the time. The most important structural parameter is the tree top height, which is directly linked to the above-ground biomass using non-linear relationships. In order to test the LiDAR capability for retrieving the tree top height, the LiDAR ULICE (Ultraviolet LIdar for Canopy Experiment) has been used over different forest types, from coniferous (maritime pins) to deciduous (oaks, hornbeams ...) trees. ULICE works at the wavelength of 355 nm with a sampling along the line-of-sight between 15 and 75 cm. According to the LiDAR signal to noise ratio (SNR), two different algorithms have been used in our study. The first algorithm is a threshold method directly based on the comparison between the LiDAR signal and the noise distributions, while the second one used a low pass filter by fitting a Gaussian curve family. In this paper, we will present these two algorithms and their evolution as a function of the SNR. The main error sources will be also discussed and assessed for each algorithm. The results show that these algorithms have great potential for ground-segment of future space borne LiDAR missions dedicated to the forest survey at the global scale. Acknowledgements: the canopy LiDAR system ULICE has been developed by CEA (Commissariat à l'Energie Atomique). It has been deployed with the support of CNES (Centre National d'Etude Spariales) and ANR (Agence Nationale de la Recherche). We acknowledge the ULA pilots Franck Toussaint for logistical help during the ULA campaign.
Deriving Continuous Fields of Tree Cover at 1-m over the Continental United States From the National Agriculture Imagery Program (NAIP) Imagery to Reduce Uncertainties in Forest Carbon Stock Estimation

NASA Astrophysics Data System (ADS)

Ganguly, S.; Basu, S.; Mukhopadhyay, S.; Michaelis, A.; Milesi, C.; Votava, P.; Nemani, R. R.

2013-12-01

An unresolved issue with coarse-to-medium resolution satellite-based forest carbon mapping over regional to continental scales is the high level of uncertainty in above ground biomass (AGB) estimates caused by the absence of forest cover information at a high enough spatial resolution (current spatial resolution is limited to 30-m). To put confidence in existing satellite-derived AGB density estimates, it is imperative to create continuous fields of tree cover at a sufficiently high resolution (e.g. 1-m) such that large uncertainties in forested area are reduced. The proposed work will provide means to reduce uncertainty in present satellite-derived AGB maps and Forest Inventory and Analysis (FIA) based regional estimates. Our primary objective will be to create Very High Resolution (VHR) estimates of tree cover at a spatial resolution of 1-m for the Continental United States using all available National Agriculture Imaging Program (NAIP) color-infrared imagery from 2010 till 2012. We will leverage the existing capabilities of the NASA Earth Exchange (NEX) high performance computing and storage facilities. The proposed 1-m tree cover map can be further aggregated to provide percent tree cover at any medium-to-coarse resolution spatial grid, which will aid in reducing uncertainties in AGB density estimation at the respective grid and overcome current limitations imposed by medium-to-coarse resolution land cover maps. We have implemented a scalable and computationally-efficient parallelized framework for tree-cover delineation - the core components of the algorithm [that] include a feature extraction process, a Statistical Region Merging image segmentation algorithm and a classification algorithm based on Deep Belief Network and a Feedforward Backpropagation Neural Network algorithm. An initial pilot exercise has been performed over the state of California (~11,000 scenes) to create a wall-to-wall 1-m tree cover map and the classification accuracy has been assessed. Results show an improvement in accuracy of tree-cover delineation as compared to existing forest cover maps from NLCD, especially over fragmented, heterogeneous and urban landscapes. Estimates of VHR tree cover will complement and enhance the accuracy of present remote-sensing based AGB modeling approaches and forest inventory based estimates at both national and local scales. A requisite step will be to characterize the inherent uncertainties in tree cover estimates and propagate them to estimate AGB.
Node Redeployment Algorithm Based on Stratified Connected Tree for Underwater Sensor Networks

PubMed Central

Liu, Jun; Jiang, Peng; Wu, Feng; Yu, Shanen; Song, Chunyue

2016-01-01

During the underwater sensor networks (UWSNs) operation, node drift with water environment causes network topology changes. Periodic node location examination and adjustment are needed to maintain good network monitoring quality as long as possible. In this paper, a node redeployment algorithm based on stratified connected tree for UWSNs is proposed. At every network adjustment moment, self-examination and adjustment on node locations are performed firstly. If a node is outside the monitored space, it returns to the last location recorded in its memory along straight line. Later, the network topology is stratified into a connected tree that takes the sink node as the root node by broadcasting ready information level by level, which can improve the network connectivity rate. Finally, with synthetically considering network coverage and connectivity rates, and node movement distance, the sink node performs centralized optimization on locations of leaf nodes in the stratified connected tree. Simulation results show that the proposed redeployment algorithm can not only keep the number of nodes in the monitored space as much as possible and maintain good network coverage and connectivity rates during network operation, but also reduce node movement distance during node redeployment and prolong the network lifetime. PMID:28029124
Intelligent Diagnostic Assistant for Complicated Skin Diseases through C5's Algorithm.

PubMed

Jeddi, Fatemeh Rangraz; Arabfard, Masoud; Kermany, Zahra Arab

2017-09-01

Intelligent Diagnostic Assistant can be used for complicated diagnosis of skin diseases, which are among the most common causes of disability. The aim of this study was to design and implement a computerized intelligent diagnostic assistant for complicated skin diseases through C5's Algorithm. An applied-developmental study was done in 2015. Knowledge base was developed based on interviews with dermatologists through questionnaires and checklists. Knowledge representation was obtained from the train data in the database using Excel Microsoft Office. Clementine Software and C5's Algorithms were applied to draw the decision tree. Analysis of test accuracy was performed based on rules extracted using inference chains. The rules extracted from the decision tree were entered into the CLIPS programming environment and the intelligent diagnostic assistant was designed then. The rules were defined using forward chaining inference technique and were entered into Clips programming environment as RULE. The accuracy and error rates obtained in the training phase from the decision tree were 99.56% and 0.44%, respectively. The accuracy of the decision tree was 98% and the error was 2% in the test phase. Intelligent diagnostic assistant can be used as a reliable system with high accuracy, sensitivity, specificity, and agreement.
Optimal tree-stem bucking of northeastern species of China

Treesearch

Jingxin Wang; Chris B. LeDoux; Joseph McNeel

2004-01-01

An application of optimal tree-stem bucking to the northeastern tree species of China is reported. The bucking procedures used in this region are summarized, which are the basic guidelines for the optimal bucking design. The directed graph approach was adopted to generate the bucking patterns by using the network analysis labeling algorithm. A computer-based bucking...
High-efficiency induction motor drives using type-2 fuzzy logic

NASA Astrophysics Data System (ADS)

Khemis, A.; Benlaloui, I.; Drid, S.; Chrifi-Alaoui, L.; Khamari, D.; Menacer, A.

2018-03-01

In this work we propose to develop an algorithm for improving the efficiency of an induction motor using type-2 fuzzy logic. Vector control is used to control this motor due to the high performances of this strategy. The type-2 fuzzy logic regulators are developed to obtain the optimal rotor flux for each torque load by minimizing the copper losses. We have compared the performances of our fuzzy type-2 algorithm with the type-1 fuzzy one proposed in the literature. The proposed algorithm is tested with success on the dSPACE DS1104 system even if there is parameters variance.
Principal component analysis and the locus of the Fréchet mean in the space of phylogenetic trees.

PubMed

Nye, Tom M W; Tang, Xiaoxian; Weyenberg, Grady; Yoshida, Ruriko

2017-12-01

Evolutionary relationships are represented by phylogenetic trees, and a phylogenetic analysis of gene sequences typically produces a collection of these trees, one for each gene in the analysis. Analysis of samples of trees is difficult due to the multi-dimensionality of the space of possible trees. In Euclidean spaces, principal component analysis is a popular method of reducing high-dimensional data to a low-dimensional representation that preserves much of the sample's structure. However, the space of all phylogenetic trees on a fixed set of species does not form a Euclidean vector space, and methods adapted to tree space are needed. Previous work introduced the notion of a principal geodesic in this space, analogous to the first principal component. Here we propose a geometric object for tree space similar to the [Formula: see text]th principal component in Euclidean space: the locus of the weighted Fréchet mean of [Formula: see text] vertex trees when the weights vary over the [Formula: see text]-simplex. We establish some basic properties of these objects, in particular showing that they have dimension [Formula: see text], and propose algorithms for projection onto these surfaces and for finding the principal locus associated with a sample of trees. Simulation studies demonstrate that these algorithms perform well, and analyses of two datasets, containing Apicomplexa and African coelacanth genomes respectively, reveal important structure from the second principal components.
Improving visibility of rear surface cracks during inductive thermography of metal plates using Autoencoder

NASA Astrophysics Data System (ADS)

Xie, Jing; Xu, Changhang; Chen, Guoming; Huang, Weiping

2018-06-01

Inductive thermography is one kind of infrared thermography (IRT) technique, which is effective in detection of front surface cracks in metal plates. However, rear surface cracks are usually missed due to their weak indications during inductive thermography. Here we propose a novel approach (AET: AE Thermography) to improve the visibility of rear surface cracks during inductive thermography by employing the Autoencoder (AE) algorithm, which is an important block to construct deep learning architectures. We construct an integrated framework for processing the raw inspection data of inductive thermography using the AE algorithm. Through this framework, underlying features of rear surface cracks are efficiently extracted and new clearer images are constructed. Experiments of inductive thermography were conducted on steel specimens to verify the efficacy of the proposed approach. We visually compare the raw thermograms, the empirical orthogonal functions (EOFs) of the prominent component thermography (PCT) technique and the results of AET. We further quantitatively evaluated AET by calculating crack contrast and signal-to-noise ratio (SNR). The results demonstrate that the proposed AET approach can remarkably improve the visibility of rear surface cracks and then improve the capability of inductive thermography in detecting rear surface cracks in metal plates.

Empirical Analysis and Refinement of Expert System Knowledge Bases

DTIC Science & Technology

1990-03-31

the number of hidden units and the error rates is listed in Figure 6. 3.3. Cancer Data A data qet for eva!ukting th.- Frognosis of breast cancer ...Alternative Rule Induction Methods A data set for evaluating the prognosis of breast cancer recurrence was analyzed by Michalski’s AQI5 rule induction program...AQ15 7 2 32% PVM 2 1 23% Figure 6-3: Comparative Summa-y for AQI5 and PVM on Breast Cancer Data 6.2.2. Alternative Decision Tree Induction Methods
An efficient non-dominated sorting method for evolutionary algorithms.

PubMed

Fang, Hongbing; Wang, Qian; Tu, Yi-Cheng; Horstemeyer, Mark F

2008-01-01

We present a new non-dominated sorting algorithm to generate the non-dominated fronts in multi-objective optimization with evolutionary algorithms, particularly the NSGA-II. The non-dominated sorting algorithm used by NSGA-II has a time complexity of O(MN(2)) in generating non-dominated fronts in one generation (iteration) for a population size N and M objective functions. Since generating non-dominated fronts takes the majority of total computational time (excluding the cost of fitness evaluations) of NSGA-II, making this algorithm faster will significantly improve the overall efficiency of NSGA-II and other genetic algorithms using non-dominated sorting. The new non-dominated sorting algorithm proposed in this study reduces the number of redundant comparisons existing in the algorithm of NSGA-II by recording the dominance information among solutions from their first comparisons. By utilizing a new data structure called the dominance tree and the divide-and-conquer mechanism, the new algorithm is faster than NSGA-II for different numbers of objective functions. Although the number of solution comparisons by the proposed algorithm is close to that of NSGA-II when the number of objectives becomes large, the total computational time shows that the proposed algorithm still has better efficiency because of the adoption of the dominance tree structure and the divide-and-conquer mechanism.
Land cover and land use mapping of the iSimangaliso Wetland Park, South Africa: comparison of oblique and orthogonal random forest algorithms

NASA Astrophysics Data System (ADS)

Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad

2016-01-01

In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.
Shoot bending promotes flower bud formation by miRNA-mediated regulation in apple (Malus domestica Borkh.).

PubMed

Xing, Libo; Zhang, Dong; Zhao, Caiping; Li, Youmei; Ma, Juanjuan; An, Na; Han, Mingyu

2016-02-01

Flower induction in apple (Malus domestica Borkh.) trees plays an important life cycle role, but young trees produce fewer and inferior quality flower buds. Therefore, shoot bending has become an important cultural practice, significantly promoting the capacity to develop more flower buds during the growing seasons. Additionally, microRNAs (miRNAs) play essential roles in plant growth, flower induction and stress responses. In this study, we identified miRNAs potentially involved in the regulation of bud growth, and flower induction and development, as well as in the response to shoot bending. Of the 195 miRNAs identified, 137 were novel miRNAs. The miRNA expression profiles revealed that the expression levels of 68 and 27 known miRNAs were down-regulated and up-regulated, respectively, in response to shoot bending, and that the 31 differentially expressed novel miRNAs between them formed five major clusters. Additionally, a complex regulatory network associated with auxin, cytokinin, abscisic acid (ABA) and gibberellic acid (GA) plays important roles in cell division, bud growth and flower induction, in which related miRNAs and targets mediated regulation. Among them, miR396, 160, 393, and their targets associated with AUX, miR159, 319, 164, and their targets associated with ABA and GA, and flowering-related miRNAs and genes, regulate bud growth and flower bud formation in response to shoot bending. Meanwhile, the flowering genes had significantly higher expression levels during shoot bending, suggesting that they are involved in this regulatory process. This study provides a framework for the future analysis of miRNAs associated with multiple hormones and their roles in the regulation of bud growth, and flower induction and formation in response to shoot bending in apple trees. © 2015 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
A simple and robust classification tree for differentiation between benign and malignant lesions in MR-mammography.

PubMed

Baltzer, Pascal A T; Dietzel, Matthias; Kaiser, Werner A

2013-08-01

In the face of multiple available diagnostic criteria in MR-mammography (MRM), a practical algorithm for lesion classification is needed. Such an algorithm should be as simple as possible and include only important independent lesion features to differentiate benign from malignant lesions. This investigation aimed to develop a simple classification tree for differential diagnosis in MRM. A total of 1,084 lesions in standardised MRM with subsequent histological verification (648 malignant, 436 benign) were investigated. Seventeen lesion criteria were assessed by 2 readers in consensus. Classification analysis was performed using the chi-squared automatic interaction detection (CHAID) method. Results include the probability for malignancy for every descriptor combination in the classification tree. A classification tree incorporating 5 lesion descriptors with a depth of 3 ramifications (1, root sign; 2, delayed enhancement pattern; 3, border, internal enhancement and oedema) was calculated. Of all 1,084 lesions, 262 (40.4 %) and 106 (24.3 %) could be classified as malignant and benign with an accuracy above 95 %, respectively. Overall diagnostic accuracy was 88.4 %. The classification algorithm reduced the number of categorical descriptors from 17 to 5 (29.4 %), resulting in a high classification accuracy. More than one third of all lesions could be classified with accuracy above 95 %. • A practical algorithm has been developed to classify lesions found in MR-mammography. • A simple decision tree consisting of five criteria reaches high accuracy of 88.4 %. • Unique to this approach, each classification is associated with a diagnostic certainty. • Diagnostic certainty of greater than 95 % is achieved in 34 % of all cases.
A possible role for flowering locus T-encoding genes in interpreting environmental and internal cues affecting olive (Olea europaea L.) flower induction.

PubMed

Haberman, Amnon; Bakhshian, Ortal; Cerezo-Medina, Sergio; Paltiel, Judith; Adler, Chen; Ben-Ari, Giora; Mercado, Jose Angel; Pliego-Alfaro, Fernando; Lavee, Shimon; Samach, Alon

2017-08-01

Olive (Olea europaea L.) inflorescences, formed in lateral buds, flower in spring. However, there is some debate regarding time of flower induction and inflorescence initiation. Olive juvenility and seasonality of flowering were altered by overexpressing genes encoding flowering locus T (FT). OeFT1 and OeFT2 caused early flowering under short days when expressed in Arabidopsis. Expression of OeFT1/2 in olive leaves and OeFT2 in buds increased in winter, while initiation of inflorescences occurred i n late winter. Trees exposed to an artificial warm winter expressed low levels of OeFT1/2 in leaves and did not flower. Olive flower induction thus seems to be mediated by an increase in FT levels in response to cold winters. Olive flowering is dependent on additional internal factors. It was severely reduced in trees that carried a heavy fruit load the previous season (harvested in November) and in trees without fruit to which cold temperatures were artificially applied in summer. Expression analysis suggested that these internal factors work either by reducing the increase in OeFT1/2 expression or through putative flowering repressors such as TFL1. With expected warmer winters, future consumption of olive oil, as part of a healthy Mediterranean diet, should benefit from better understanding these factors. © 2017 John Wiley & Sons Ltd.
Application and Exploration of Big Data Mining in Clinical Medicine.

PubMed

Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

2016-03-20

To review theories and technologies of big data mining and their application in clinical medicine. Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster-Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Big data mining has the potential to play an important role in clinical medicine.
Analytical and CASE study on Limited Search, ID3, CHAID, C4.5, Improved C4.5 and OVA Decision Tree Algorithms to design Decision Support System

NASA Astrophysics Data System (ADS)

Kaur, Parneet; Singh, Sukhwinder; Garg, Sushil; Harmanpreet

2010-11-01

In this paper we study about classification algorithms for farm DSS. By applying classification algorithms i.e. Limited search, ID3, CHAID, C4.5, Improved C4.5 and One VS all Decision Tree on common data set of crop with specified class, results are obtained. The tool used to derive results is SPINA. The graphical results obtained from tool are compared to suggest best technique to develop farm Decision Support System. This analysis would help to researchers to design effective and fast DSS for farmer to take decision for enhancing their yield.
Incremental Learning of Context Free Grammars by Parsing-Based Rule Generation and Rule Set Search

NASA Astrophysics Data System (ADS)

Nakamura, Katsuhiko; Hoshina, Akemi

This paper discusses recent improvements and extensions in Synapse system for inductive inference of context free grammars (CFGs) from sample strings. Synapse uses incremental learning, rule generation based on bottom-up parsing, and the search for rule sets. The form of production rules in the previous system is extended from Revised Chomsky Normal Form A→βγ to Extended Chomsky Normal Form, which also includes A→B, where each of β and γ is either a terminal or nonterminal symbol. From the result of bottom-up parsing, a rule generation mechanism synthesizes minimum production rules required for parsing positive samples. Instead of inductive CYK algorithm in the previous version of Synapse, the improved version uses a novel rule generation method, called ``bridging,'' which bridges the lacked part of the derivation tree for the positive string. The improved version also employs a novel search strategy, called serial search in addition to minimum rule set search. The synthesis of grammars by the serial search is faster than the minimum set search in most cases. On the other hand, the size of the generated CFGs is generally larger than that by the minimum set search, and the system can find no appropriate grammar for some CFL by the serial search. The paper shows experimental results of incremental learning of several fundamental CFGs and compares the methods of rule generation and search strategies.
Toward a Better Compression for DNA Sequences Using Huffman Encoding

PubMed Central

Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

2017-01-01

Abstract Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016). PMID:27960065
Toward a Better Compression for DNA Sequences Using Huffman Encoding.

PubMed

Al-Okaily, Anas; Almarri, Badar; Al Yami, Sultan; Huang, Chun-Hsi

2017-04-01

Due to the significant amount of DNA data that are being generated by next-generation sequencing machines for genomes of lengths ranging from megabases to gigabases, there is an increasing need to compress such data to a less space and a faster transmission. Different implementations of Huffman encoding incorporating the characteristics of DNA sequences prove to better compress DNA data. These implementations center on the concepts of selecting frequent repeats so as to force a skewed Huffman tree, as well as the construction of multiple Huffman trees when encoding. The implementations demonstrate improvements on the compression ratios for five genomes with lengths ranging from 5 to 50 Mbp, compared with the standard Huffman tree algorithm. The research hence suggests an improvement on all such DNA sequence compression algorithms that use the conventional Huffman encoding. The research suggests an improvement on all DNA sequence compression algorithms that use the conventional Huffman encoding. Accompanying software is publicly available (AL-Okaily, 2016 ).
Expression Profiling of FLOWERING LOCUS T-Like Gene in Alternate Bearing ‘Hass' Avocado Trees Suggests a Role for PaFT in Avocado Flower Induction

PubMed Central

Ziv, Dafna; Zviran, Tali; Zezak, Oshrat; Samach, Alon; Irihimovitch, Vered

2014-01-01

In many perennials, heavy fruit load on a shoot decreases the ability of the plant to undergo floral induction in the following spring, resulting in a pattern of crop production known as alternate bearing. Here, we studied the effects of fruit load on floral determination in ‘Hass' avocado (Persea americana). De-fruiting experiments initially confirmed the negative effects of fruit load on return to flowering. Next, we isolated a FLOWERING LOCUS T-like gene, PaFT, hypothesized to act as a phloem-mobile florigen signal and examined its expression profile in shoot tissues of on (fully loaded) and off (fruit-lacking) trees. Expression analyses revealed a strong peak in PaFT transcript levels in leaves of off trees from the end of October through November, followed by a return to starting levels. Moreover and concomitant with inflorescence development, only off buds displayed up-regulation of the floral identity transcripts PaAP1 and PaLFY, with significant variation being detected from October and November, respectively. Furthermore, a parallel microscopic study of off apical buds revealed the presence of secondary inflorescence axis structures that only appeared towards the end of November. Finally, ectopic expression of PaFT in Arabidopsis resulted in early flowering transition. Together, our data suggests a link between increased PaFT expression observed during late autumn and avocado flower induction. Furthermore, our results also imply that, as in the case of other crop trees, fruit-load might affect flowering by repressing the expression of PaFT in the leaves. Possible mechanism(s) by which fruit crop might repress PaFT expression, are discussed. PMID:25330324
Expression profiling of FLOWERING LOCUS T-like gene in alternate bearing 'Hass' avocado trees suggests a role for PaFT in avocado flower induction.

PubMed

Ziv, Dafna; Zviran, Tali; Zezak, Oshrat; Samach, Alon; Irihimovitch, Vered

2014-01-01

In many perennials, heavy fruit load on a shoot decreases the ability of the plant to undergo floral induction in the following spring, resulting in a pattern of crop production known as alternate bearing. Here, we studied the effects of fruit load on floral determination in 'Hass' avocado (Persea americana). De-fruiting experiments initially confirmed the negative effects of fruit load on return to flowering. Next, we isolated a FLOWERING LOCUS T-like gene, PaFT, hypothesized to act as a phloem-mobile florigen signal and examined its expression profile in shoot tissues of on (fully loaded) and off (fruit-lacking) trees. Expression analyses revealed a strong peak in PaFT transcript levels in leaves of off trees from the end of October through November, followed by a return to starting levels. Moreover and concomitant with inflorescence development, only off buds displayed up-regulation of the floral identity transcripts PaAP1 and PaLFY, with significant variation being detected from October and November, respectively. Furthermore, a parallel microscopic study of off apical buds revealed the presence of secondary inflorescence axis structures that only appeared towards the end of November. Finally, ectopic expression of PaFT in Arabidopsis resulted in early flowering transition. Together, our data suggests a link between increased PaFT expression observed during late autumn and avocado flower induction. Furthermore, our results also imply that, as in the case of other crop trees, fruit-load might affect flowering by repressing the expression of PaFT in the leaves. Possible mechanism(s) by which fruit crop might repress PaFT expression, are discussed.
Adaptive Broadcasting Mechanism for Bandwidth Allocation in Mobile Services

PubMed Central

Horng, Gwo-Jiun; Wang, Chi-Hsuan; Chou, Chih-Lun

2014-01-01

This paper proposes a tree-based adaptive broadcasting (TAB) algorithm for data dissemination to improve data access efficiency. The proposed TAB algorithm first constructs a broadcast tree to determine the broadcast frequency of each data and splits the broadcast tree into some broadcast wood to generate the broadcast program. In addition, this paper develops an analytical model to derive the mean access latency of the generated broadcast program. In light of the derived results, both the index channel's bandwidth and the data channel's bandwidth can be optimally allocated to maximize bandwidth utilization. This paper presents experiments to help evaluate the effectiveness of the proposed strategy. From the experimental results, it can be seen that the proposed mechanism is feasible in practice. PMID:25057509
Data-Parallel Algorithm for Contour Tree Construction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sewell, Christopher Meyer; Ahrens, James Paul; Carr, Hamish

2017-01-19

The goal of this project is to develop algorithms for additional visualization and analysis filters in order to expand the functionality of the VTK-m toolkit to support less critical but commonly used operators.
Searching Dynamic Agents with a Team of Mobile Robots

PubMed Central

Juliá, Miguel; Gil, Arturo; Reinoso, Oscar

2012-01-01

This paper presents a new algorithm that allows a team of robots to cooperatively search for a set of moving targets. An estimation of the areas of the environment that are more likely to hold a target agent is obtained using a grid-based Bayesian filter. The robot sensor readings and the maximum speed of the moving targets are used in order to update the grid. This representation is used in a search algorithm that commands the robots to those areas that are more likely to present target agents. This algorithm splits the environment in a tree of connected regions using dynamic programming. This tree is used in order to decide the destination for each robot in a coordinated manner. The algorithm has been successfully tested in known and unknown environments showing the validity of the approach. PMID:23012519
Searching dynamic agents with a team of mobile robots.

PubMed

Juliá, Miguel; Gil, Arturo; Reinoso, Oscar

2012-01-01

This paper presents a new algorithm that allows a team of robots to cooperatively search for a set of moving targets. An estimation of the areas of the environment that are more likely to hold a target agent is obtained using a grid-based Bayesian filter. The robot sensor readings and the maximum speed of the moving targets are used in order to update the grid. This representation is used in a search algorithm that commands the robots to those areas that are more likely to present target agents. This algorithm splits the environment in a tree of connected regions using dynamic programming. This tree is used in order to decide the destination for each robot in a coordinated manner. The algorithm has been successfully tested in known and unknown environments showing the validity of the approach.
New methods, algorithms, and software for rapid mapping of tree positions in coordinate forest plots

Treesearch

A. Dan Wilson

2000-01-01

The theories and methodologies for two new tree mapping methods, the Sequential-target method and the Plot-origin radial method, are described. The methods accommodate the use of any conventional distance measuring device and compass to collect horizontal distance and azimuth data between source or reference positions (origins) and target trees. Conversion equations...
Optimal Path Planning Program for Autonomous Speed Sprayer in Orchard Using Order-Picking Algorithm

NASA Astrophysics Data System (ADS)

Park, T. S.; Park, S. J.; Hwang, K. Y.; Cho, S. I.

This study was conducted to develop a software program which computes optimal path for autonomous navigation in orchard, especially for speed sprayer. Possibilities of autonomous navigation in orchard were shown by other researches which have minimized distance error between planned path and performed path. But, research of planning an optimal path for speed sprayer in orchard is hardly founded. In this study, a digital map and a database for orchard which contains GPS coordinate information (coordinates of trees and boundary of orchard) and entity information (heights and widths of trees, radius of main stem of trees, disease of trees) was designed. An orderpicking algorithm which has been used for management of warehouse was used to calculate optimum path based on the digital map. Database for digital map was created by using Microsoft Access and graphic interface for database was made by using Microsoft Visual C++ 6.0. It was possible to search and display information about boundary of an orchard, locations of trees, daily plan for scattering chemicals and plan optimal path on different orchard based on digital map, on each circumstance (starting speed sprayer in different location, scattering chemicals for only selected trees).
Real-Time Interactive Tree Animation.

PubMed

Quigley, Ed; Yu, Yue; Huang, Jingwei; Lin, Winnie; Fedkiw, Ronald

2018-05-01

We present a novel method for posing and animating botanical tree models interactively in real time. Unlike other state of the art methods which tend to produce trees that are overly flexible, bending and deforming as if they were underwater plants, our approach allows for arbitrarily high stiffness while still maintaining real-time frame rates without spurious artifacts, even on quite large trees with over ten thousand branches. This is accomplished by using an articulated rigid body model with as-stiff-as-desired rotational springs in conjunction with our newly proposed simulation technique, which is motivated both by position based dynamics and the typical algorithms for articulated rigid bodies. The efficiency of our algorithm allows us to pose and animate trees with millions of branches or alternatively simulate a small forest comprised of many highly detailed trees. Even using only a single CPU core, we can simulate ten thousand branches in real time while still maintaining quite crisp user interactivity. This has allowed us to incorporate our framework into a commodity game engine to run interactively even on a low-budget tablet. We show that our method is amenable to the incorporation of a large variety of desirable effects such as wind, leaves, fictitious forces, collisions, fracture, etc.

Rotary transformer design with fixed magnetizing and/or leakage inductances

NASA Technical Reports Server (NTRS)

Stuart, T. A.; King, R. J.; Shamseddin, H.

1985-01-01

A design algorithm is considered for transformers that must transfer electric power across a rotating interface. Among other features, this procedure allows the designer to minimize either weight or losses for either a fixed magnetizing inductance or a fixed leakage inductance. Numerical results are included to indicate the design trade-offs between various parameters.
At-Least Version of the Generalized Minimum Spanning Tree Problem: Optimization Through Ant Colony System and Genetic Algorithms

NASA Technical Reports Server (NTRS)

Janich, Karl W.

2005-01-01

The At-Least version of the Generalized Minimum Spanning Tree Problem (L-GMST) is a problem in which the optimal solution connects all defined clusters of nodes in a given network at a minimum cost. The L-GMST is NPHard; therefore, metaheuristic algorithms have been used to find reasonable solutions to the problem as opposed to computationally feasible exact algorithms, which many believe do not exist for such a problem. One such metaheuristic uses a swarm-intelligent Ant Colony System (ACS) algorithm, in which agents converge on a solution through the weighing of local heuristics, such as the shortest available path and the number of agents that recently used a given path. However, in a network using a solution derived from the ACS algorithm, some nodes may move around to different clusters and cause small changes in the network makeup. Rerunning the algorithm from the start would be somewhat inefficient due to the significance of the changes, so a genetic algorithm based on the top few solutions found in the ACS algorithm is proposed to quickly and efficiently adapt the network to these small changes.
Decision-Tree Analysis for Predicting First-Time Pass/Fail Rates for the NCLEX-RN® in Associate Degree Nursing Students.

PubMed

Chen, Hsiu-Chin; Bennett, Sean

2016-08-01

Little evidence shows the use of decision-tree algorithms in identifying predictors and analyzing their associations with pass rates for the NCLEX-RN(®) in associate degree nursing students. This longitudinal and retrospective cohort study investigated whether a decision-tree algorithm could be used to develop an accurate prediction model for the students' passing or failing the NCLEX-RN. This study used archived data from 453 associate degree nursing students in a selected program. The chi-squared automatic interaction detection analysis of the decision trees module was used to examine the effect of the collected predictors on passing/failing the NCLEX-RN. The actual percentage scores of Assessment Technologies Institute®'s RN Comprehensive Predictor(®) accurately identified students at risk of failing. The classification model correctly classified 92.7% of the students for passing. This study applied the decision-tree model to analyze a sequence database for developing a prediction model for early remediation in preparation for the NCLEXRN. [J Nurs Educ. 2016;55(8):454-457.]. Copyright 2016, SLACK Incorporated.
SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

PubMed

Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

2010-07-01

We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.
Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time.

PubMed

Dhar, Amrit; Minin, Vladimir N

2017-05-01

Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences.
Calculating Higher-Order Moments of Phylogenetic Stochastic Mapping Summaries in Linear Time

PubMed Central

Dhar, Amrit

2017-01-01

Abstract Stochastic mapping is a simulation-based method for probabilistically mapping substitution histories onto phylogenies according to continuous-time Markov models of evolution. This technique can be used to infer properties of the evolutionary process on the phylogeny and, unlike parsimony-based mapping, conditions on the observed data to randomly draw substitution mappings that do not necessarily require the minimum number of events on a tree. Most stochastic mapping applications simulate substitution mappings only to estimate the mean and/or variance of two commonly used mapping summaries: the number of particular types of substitutions (labeled substitution counts) and the time spent in a particular group of states (labeled dwelling times) on the tree. Fast, simulation-free algorithms for calculating the mean of stochastic mapping summaries exist. Importantly, these algorithms scale linearly in the number of tips/leaves of the phylogenetic tree. However, to our knowledge, no such algorithm exists for calculating higher-order moments of stochastic mapping summaries. We present one such simulation-free dynamic programming algorithm that calculates prior and posterior mapping variances and scales linearly in the number of phylogeny tips. Our procedure suggests a general framework that can be used to efficiently compute higher-order moments of stochastic mapping summaries without simulations. We demonstrate the usefulness of our algorithm by extending previously developed statistical tests for rate variation across sites and for detecting evolutionarily conserved regions in genomic sequences. PMID:28177780
Species Tree Inference Using a Mixture Model.

PubMed

Ullah, Ikram; Parviainen, Pekka; Lagergren, Jens

2015-09-01

Species tree reconstruction has been a subject of substantial research due to its central role across biology and medicine. A species tree is often reconstructed using a set of gene trees or by directly using sequence data. In either of these cases, one of the main confounding phenomena is the discordance between a species tree and a gene tree due to evolutionary events such as duplications and losses. Probabilistic methods can resolve the discordance by coestimating gene trees and the species tree but this approach poses a scalability problem for larger data sets. We present MixTreEM-DLRS: A two-phase approach for reconstructing a species tree in the presence of gene duplications and losses. In the first phase, MixTreEM, a novel structural expectation maximization algorithm based on a mixture model is used to reconstruct a set of candidate species trees, given sequence data for monocopy gene families from the genomes under study. In the second phase, PrIME-DLRS, a method based on the DLRS model (Åkerborg O, Sennblad B, Arvestad L, Lagergren J. 2009. Simultaneous Bayesian gene tree reconstruction and reconciliation analysis. Proc Natl Acad Sci U S A. 106(14):5714-5719), is used for selecting the best species tree. PrIME-DLRS can handle multicopy gene families since DLRS, apart from modeling sequence evolution, models gene duplication and loss using a gene evolution model (Arvestad L, Lagergren J, Sennblad B. 2009. The gene evolution model and computing its associated probabilities. J ACM. 56(2):1-44). We evaluate MixTreEM-DLRS using synthetic and biological data, and compare its performance with a recent genome-scale species tree reconstruction method PHYLDOG (Boussau B, Szöllősi GJ, Duret L, Gouy M, Tannier E, Daubin V. 2013. Genome-scale coestimation of species and gene trees. Genome Res. 23(2):323-330) as well as with a fast parsimony-based algorithm Duptree (Wehe A, Bansal MS, Burleigh JG, Eulenstein O. 2008. Duptree: a program for large-scale phylogenetic analyses using gene tree parsimony. Bioinformatics 24(13):1540-1541). Our method is competitive with PHYLDOG in terms of accuracy and runs significantly faster and our method outperforms Duptree in accuracy. The analysis constituted by MixTreEM without DLRS may also be used for selecting the target species tree, yielding a fast and yet accurate algorithm for larger data sets. MixTreEM is freely available at http://prime.scilifelab.se/mixtreem/. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Extrafloral Nectaries in Aspen (Populus tremuloides): Heritable Genetic Variation and Herbivore-induced Expression

PubMed Central

Wooley, Stuart C.; Donaldson, Jack R.; Gusse, Adam C.; Lindroth, Richard L.; Stevens, Michael T.

2007-01-01

Background and Aims A wide variety of plants produce extrafloral nectaries (EFNs) that are visited by predatory arthropods. But very few studies have investigated the relationship between plant genetic variation and EFNs. The presence of foliar EFNs is highly variable among different aspen (Populus tremuloides) genotypes and the EFNs are visited by parasitic wasps and predatory flies. The aim here was to determine the heritability of EFNs among aspen genotypes and age classes, possible trade-offs between direct and indirect defences, EFN induction following herbivory, and the relationship between EFNs and predatory insects. Methods EFN density was quantified among aspen genotypes in Wisconsin on trees of different ages and broad-sense heritability from common garden trees was calculated. EFNs were also quantified in natural aspen stands in Utah. From the common garden trees foliar defensive chemical levels were quantified to evaluate their relationship with EFN density. A defoliation experiment was performed to determine if EFNs can be induced in response to herbivory. Finally, predatory arthropod abundance among aspen trees was quantified to determine the relationship between arthropod abundance and EFNs. Key Results Broad-sense heritability for expression (0·74–0·82) and induction (0·85) of EFNs was high. One-year-old trees had 20% greater EFN density than 4-year-old trees and more than 50% greater EFN density than ≥10-year-old trees. No trade-offs were found between foliar chemical concentrations and EFN density. Predatory fly abundance varied among aspen genotypes, but predatory arthropod abundance and average EFN density were not related. Conclusions Aspen extrafloral nectaries are strongly genetically determined and have the potential to respond rapidly to evolutionary forces. The pattern of EFN expression among different age classes of trees appears to follow predictions of optimal defence theory. The relationship between EFNs and predators likely varies in relation to multiple temporal and environmental factors. PMID:17951361
The Emergence of Organizing Structure in Conceptual Representation.

PubMed

Lake, Brenden M; Lawrence, Neil D; Tenenbaum, Joshua B

2018-06-01

Both scientists and children make important structural discoveries, yet their computational underpinnings are not well understood. Structure discovery has previously been formalized as probabilistic inference about the right structural form-where form could be a tree, ring, chain, grid, etc. (Kemp & Tenenbaum, 2008). Although this approach can learn intuitive organizations, including a tree for animals and a ring for the color circle, it assumes a strong inductive bias that considers only these particular forms, and each form is explicitly provided as initial knowledge. Here we introduce a new computational model of how organizing structure can be discovered, utilizing a broad hypothesis space with a preference for sparse connectivity. Given that the inductive bias is more general, the model's initial knowledge shows little qualitative resemblance to some of the discoveries it supports. As a consequence, the model can also learn complex structures for domains that lack intuitive description, as well as predict human property induction judgments without explicit structural forms. By allowing form to emerge from sparsity, our approach clarifies how both the richness and flexibility of human conceptual organization can coexist. Copyright © 2018 Cognitive Science Society, Inc.
Repeated intensified infliximab induction - results from an 11-year prospective study of ulcerative colitis using a novel treatment algorithm.

PubMed

Johnsen, Kay-Martin; Goll, Rasmus; Hansen, Vegard; Olsen, Trine; Rismo, Renathe; Heitmann, Richard; Gundersen, Mona D; Kvamme, Jan M; Paulssen, Eyvind J; Kileng, Hege; Johnsen, Knut; Florholmen, Jon

2017-01-01

Anti-tumour necrosis factor (TNF) agents play a pivotal role in the treatment of moderate to severe ulcerative colitis (UC), and yet, no international consensus on when to discontinue therapy exists. The aim of this study is to study the long-term performance of a treatment algorithm of repeated intensified induction therapy with infliximab (IFX) to remission, followed by discontinuation in patients with UC. Patients with moderate to severe UC were enroled in an open prospective study design. The following algorithm was implemented: (a) intensified induction treatment to remission (Ulcerative Colitis Disease Activity Index score 0-2); (b) discontinuation of IFX; and (c) reinduction treatment if relapse. Mucosal gene expression for TNF was measured with qPCR. A total of 116 patients were included. The median observation time was 47 and 51 months in intention to treat and per protocol. Remission rates of the first three inductions were 95, 93 and 91% per protocol and 83, 56 and 59% by intention to treat. The median time in remission was 40 months per protocol and 34 months by intention to treat. Long-term remission without further anti-TNF treatment during the observation period was obtained for 41%, with a median observation time of 48 months (range: 18-129 months). The median time to relapse was 33 and 11 months with/without normalization of mucosal TNF, respectively. The 5-year success rate for maintaining the effect of IFX in the algorithm was 66%. The treatment algorithm is highly effective for achieving long-term clinical remission in UC. Normalization of mucosal TNF gene expression predicts long-term remission upon discontinuation of IFX.
Joint Power Charging and Routing in Wireless Rechargeable Sensor Networks.

PubMed

Jia, Jie; Chen, Jian; Deng, Yansha; Wang, Xingwei; Aghvami, Abdol-Hamid

2017-10-09

The development of wireless power transfer (WPT) technology has inspired the transition from traditional battery-based wireless sensor networks (WSNs) towards wireless rechargeable sensor networks (WRSNs). While extensive efforts have been made to improve charging efficiency, little has been done for routing optimization. In this work, we present a joint optimization model to maximize both charging efficiency and routing structure. By analyzing the structure of the optimization model, we first decompose the problem and propose a heuristic algorithm to find the optimal charging efficiency for the predefined routing tree. Furthermore, by coding the many-to-one communication topology as an individual, we further propose to apply a genetic algorithm (GA) for the joint optimization of both routing and charging. The genetic operations, including tree-based recombination and mutation, are proposed to obtain a fast convergence. Our simulation results show that the heuristic algorithm reduces the number of resident locations and the total moving distance. We also show that our proposed algorithm achieves a higher charging efficiency compared with existing algorithms.
Joint Power Charging and Routing in Wireless Rechargeable Sensor Networks

PubMed Central

Jia, Jie; Chen, Jian; Deng, Yansha; Wang, Xingwei; Aghvami, Abdol-Hamid

2017-01-01

The development of wireless power transfer (WPT) technology has inspired the transition from traditional battery-based wireless sensor networks (WSNs) towards wireless rechargeable sensor networks (WRSNs). While extensive efforts have been made to improve charging efficiency, little has been done for routing optimization. In this work, we present a joint optimization model to maximize both charging efficiency and routing structure. By analyzing the structure of the optimization model, we first decompose the problem and propose a heuristic algorithm to find the optimal charging efficiency for the predefined routing tree. Furthermore, by coding the many-to-one communication topology as an individual, we further propose to apply a genetic algorithm (GA) for the joint optimization of both routing and charging. The genetic operations, including tree-based recombination and mutation, are proposed to obtain a fast convergence. Our simulation results show that the heuristic algorithm reduces the number of resident locations and the total moving distance. We also show that our proposed algorithm achieves a higher charging efficiency compared with existing algorithms. PMID:28991200
Simulation of land use change in the three gorges reservoir area based on CART-CA

NASA Astrophysics Data System (ADS)

Yuan, Min

2018-05-01

This study proposes a new method to simulate spatiotemporal complex multiple land uses by using classification and regression tree algorithm (CART) based CA model. In this model, we use classification and regression tree algorithm to calculate land class conversion probability, and combine neighborhood factor, random factor to extract cellular transformation rules. The overall Kappa coefficient is 0.8014 and the overall accuracy is 0.8821 in the land dynamic simulation results of the three gorges reservoir area from 2000 to 2010, and the simulation results are satisfactory.
Virtual Network Embedding via Monte Carlo Tree Search.

PubMed

Haeri, Soroush; Trajkovic, Ljiljana

2018-02-01

Network virtualization helps overcome shortcomings of the current Internet architecture. The virtualized network architecture enables coexistence of multiple virtual networks (VNs) on an existing physical infrastructure. VN embedding (VNE) problem, which deals with the embedding of VN components onto a physical network, is known to be -hard. In this paper, we propose two VNE algorithms: MaVEn-M and MaVEn-S. MaVEn-M employs the multicommodity flow algorithm for virtual link mapping while MaVEn-S uses the shortest-path algorithm. They formalize the virtual node mapping problem by using the Markov decision process (MDP) framework and devise action policies (node mappings) for the proposed MDP using the Monte Carlo tree search algorithm. Service providers may adjust the execution time of the MaVEn algorithms based on the traffic load of VN requests. The objective of the algorithms is to maximize the profit of infrastructure providers. We develop a discrete event VNE simulator to implement and evaluate performance of MaVEn-M, MaVEn-S, and several recently proposed VNE algorithms. We introduce profitability as a new performance metric that captures both acceptance and revenue to cost ratios. Simulation results show that the proposed algorithms find more profitable solutions than the existing algorithms. Given additional computation time, they further improve embedding solutions.
Stress wave velocity patterns in the longitudinal-radial plane of trees for defect diagnosis

Treesearch

Guanghui Li; Xiang Weng; Xiaocheng Du; Xiping Wang; Hailin Feng

2016-01-01

Acoustic tomography for urban tree inspection typically uses stress wave data to reconstruct tomographic images for the trunk cross section using interpolation algorithm. This traditional technique does not take into account the stress wave velocity patterns along tree height. In this study, we proposed an analytical model for the wave velocity in the longitudinalâ...
Portable Language-Independent Adaptive Translation from OCR. Phase 1

DTIC Science & Technology

2009-04-01

including brute-force k-Nearest Neighbors ( kNN ), fast approximate kNN using hashed k-d trees, classification and regression trees, and locality...achieved by refinements in ground-truthing protocols. Recent algorithmic improvements to our approximate kNN classifier using hashed k-D trees allows...recent years discriminative training has been shown to outperform phonetic HMMs estimated using ML for speech recognition. Standard ML estimation
A scale-based connected coherence tree algorithm for image segmentation.

PubMed

Ding, Jundi; Ma, Runing; Chen, Songcan

2008-02-01

This paper presents a connected coherence tree algorithm (CCTA) for image segmentation with no prior knowledge. It aims to find regions of semantic coherence based on the proposed epsilon-neighbor coherence segmentation criterion. More specifically, with an adaptive spatial scale and an appropriate intensity-difference scale, CCTA often achieves several sets of coherent neighboring pixels which maximize the probability of being a single image content (including kinds of complex backgrounds). In practice, each set of coherent neighboring pixels corresponds to a coherence class (CC). The fact that each CC just contains a single equivalence class (EC) ensures the separability of an arbitrary image theoretically. In addition, the resultant CCs are represented by tree-based data structures, named connected coherence tree (CCT)s. In this sense, CCTA is a graph-based image analysis algorithm, which expresses three advantages: 1) its fundamental idea, epsilon-neighbor coherence segmentation criterion, is easy to interpret and comprehend; 2) it is efficient due to a linear computational complexity in the number of image pixels; 3) both subjective comparisons and objective evaluation have shown that it is effective for the tasks of semantic object segmentation and figure-ground separation in a wide variety of images. Those images either contain tiny, long and thin objects or are severely degraded by noise, uneven lighting, occlusion, poor illumination, and shadow.
MODIS Snow Cover Mapping Decision Tree Technique: Snow and Cloud Discrimination

NASA Technical Reports Server (NTRS)

Riggs, George A.; Hall, Dorothy K.

2010-01-01

Accurate mapping of snow cover continues to challenge cryospheric scientists and modelers. The Moderate-Resolution Imaging Spectroradiometer (MODIS) snow data products have been used since 2000 by many investigators to map and monitor snow cover extent for various applications. Users have reported on the utility of the products and also on problems encountered. Three problems or hindrances in the use of the MODIS snow data products that have been reported in the literature are: cloud obscuration, snow/cloud confusion, and snow omission errors in thin or sparse snow cover conditions. Implementation of the MODIS snow algorithm in a decision tree technique using surface reflectance input to mitigate those problems is being investigated. The objective of this work is to use a decision tree structure for the snow algorithm. This should alleviate snow/cloud confusion and omission errors and provide a snow map with classes that convey information on how snow was detected, e.g. snow under clear sky, snow tinder cloud, to enable users' flexibility in interpreting and deriving a snow map. Results of a snow cover decision tree algorithm are compared to the standard MODIS snow map and found to exhibit improved ability to alleviate snow/cloud confusion in some situations allowing up to about 5% increase in mapped snow cover extent, thus accuracy, in some scenes.
Performance Analysis of Evolutionary Algorithms for Steiner Tree Problems.

PubMed

Lai, Xinsheng; Zhou, Yuren; Xia, Xiaoyun; Zhang, Qingfu

2017-01-01

The Steiner tree problem (STP) aims to determine some Steiner nodes such that the minimum spanning tree over these Steiner nodes and a given set of special nodes has the minimum weight, which is NP-hard. STP includes several important cases. The Steiner tree problem in graphs (GSTP) is one of them. Many heuristics have been proposed for STP, and some of them have proved to be performance guarantee approximation algorithms for this problem. Since evolutionary algorithms (EAs) are general and popular randomized heuristics, it is significant to investigate the performance of EAs for STP. Several empirical investigations have shown that EAs are efficient for STP. However, up to now, there is no theoretical work on the performance of EAs for STP. In this article, we reveal that the (1+1) EA achieves 3/2-approximation ratio for STP in a special class of quasi-bipartite graphs in expected runtime [Formula: see text], where [Formula: see text], [Formula: see text], and [Formula: see text] are, respectively, the number of Steiner nodes, the number of special nodes, and the largest weight among all edges in the input graph. We also show that the (1+1) EA is better than two other heuristics on two GSTP instances, and the (1+1) EA may be inefficient on a constructed GSTP instance.
The Optimization of Automatically Generated Compilers.

DTIC Science & Technology

1987-01-01

than their procedural counterparts, and are also easier to analyze for storage optimizations; (2) AGs can be algorithmically checked to be non-circular...Providing algorithms to move the storage for many attributes from the For structure tree into global stacks and variables. -Dd(2) Creating AEs which build and...54 3.5.2. Partitioning algorithm

TREAT (TREe-based Association Test)

Cancer.gov

TREAT is an R package for detecting complex joint effects in case-control studies. The test statistic is derived from a tree-structure model by recursive partitioning the data. Ultra-fast algorithm is designed to evaluate the significance of association between candidate gene and disease outcome
a Gross Error Elimination Method for Point Cloud Data Based on Kd-Tree

NASA Astrophysics Data System (ADS)

Kang, Q.; Huang, G.; Yang, S.

2018-04-01

Point cloud data has been one type of widely used data sources in the field of remote sensing. Key steps of point cloud data's pro-processing focus on gross error elimination and quality control. Owing to the volume feature of point could data, existed gross error elimination methods need spend massive memory both in space and time. This paper employed a new method which based on Kd-tree algorithm to construct, k-nearest neighbor algorithm to search, settled appropriate threshold to determine with result turns out a judgement that whether target point is or not an outlier. Experimental results show that, our proposed algorithm will help to delete gross error in point cloud data and facilitate to decrease memory consumption, improve efficiency.
An application of locally linear model tree algorithm with combination of feature selection in credit scoring

NASA Astrophysics Data System (ADS)

Siami, Mohammad; Gholamian, Mohammad Reza; Basiri, Javad

2014-10-01

Nowadays, credit scoring is one of the most important topics in the banking sector. Credit scoring models have been widely used to facilitate the process of credit assessing. In this paper, an application of the locally linear model tree algorithm (LOLIMOT) was experimented to evaluate the superiority of its performance to predict the customer's credit status. The algorithm is improved with an aim of adjustment by credit scoring domain by means of data fusion and feature selection techniques. Two real world credit data sets - Australian and German - from UCI machine learning database were selected to demonstrate the performance of our new classifier. The analytical results indicate that the improved LOLIMOT significantly increase the prediction accuracy.
Detecting non-orthology in the COGs database and other approaches grouping orthologs using genome-specific best hits.

PubMed

Dessimoz, Christophe; Boeckmann, Brigitte; Roth, Alexander C J; Gonnet, Gaston H

2006-01-01

Correct orthology assignment is a critical prerequisite of numerous comparative genomics procedures, such as function prediction, construction of phylogenetic species trees and genome rearrangement analysis. We present an algorithm for the detection of non-orthologs that arise by mistake in current orthology classification methods based on genome-specific best hits, such as the COGs database. The algorithm works with pairwise distance estimates, rather than computationally expensive and error-prone tree-building methods. The accuracy of the algorithm is evaluated through verification of the distribution of predicted cases, case-by-case phylogenetic analysis and comparisons with predictions from other projects using independent methods. Our results show that a very significant fraction of the COG groups include non-orthologs: using conservative parameters, the algorithm detects non-orthology in a third of all COG groups. Consequently, sequence analysis sensitive to correct orthology assignments will greatly benefit from these findings.
Chemically inducing lightwood formation in southern pines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Roberts, D.R.; Peters, W.J.

1977-06-01

Chemical induction of lightwood formation promises to be a new method of naval stores production. A broad range of paraquat concentrations and many methods of application induced lightwood formation. Loblolly, slash, and longleaf pines were found to produce increased amounts of turpentine and tall oil in response to paraquat treatment. In one experiment, loblolly pines treated with 8 percent paraquat on a single bark streak yielded, 9 months after treatment, an average of 10 pounds more extractives per tree than did untreated trees. Most of the yield increase was in the lower portion of the tree near the wound, butmore » some increase was noted for heights as great as 27 ft. In another experiment, 8 to 10 inch DBH loblolly, slash, and longleaf pine trees were treated with 0.5, 1.0, and 2.5 percent paraquat applied in ax chops spanning one-third the circumference of the trees. All treated trees yielded more resin acids than did control trees.« less
Tree tensor network approach to simulating Shor's algorithm

NASA Astrophysics Data System (ADS)

Dumitrescu, Eugene

2017-12-01

Constructively simulating quantum systems furthers our understanding of qualitative and quantitative features which may be analytically intractable. In this paper, we directly simulate and explore the entanglement structure present in the paradigmatic example for exponential quantum speedups: Shor's algorithm. To perform our simulation, we construct a dynamic tree tensor network which manifestly captures two salient circuit features for modular exponentiation. These are the natural two-register bipartition and the invariance of entanglement with respect to permutations of the top-register qubits. Our construction help identify the entanglement entropy properties, which we summarize by a scaling relation. Further, the tree network is efficiently projected onto a matrix product state from which we efficiently execute the quantum Fourier transform. Future simulation of quantum information states with tensor networks exploiting circuit symmetries is discussed.
Bioinformatics in proteomics: application, terminology, and pitfalls.

PubMed

Wiemer, Jan C; Prokudin, Alexander

2004-01-01

Bioinformatics applies data mining, i.e., modern computer-based statistics, to biomedical data. It leverages on machine learning approaches, such as artificial neural networks, decision trees and clustering algorithms, and is ideally suited for handling huge data amounts. In this article, we review the analysis of mass spectrometry data in proteomics, starting with common pre-processing steps and using single decision trees and decision tree ensembles for classification. Special emphasis is put on the pitfall of overfitting, i.e., of generating too complex single decision trees. Finally, we discuss the pros and cons of the two different decision tree usages.
Phylogenetic trees and Euclidean embeddings.

PubMed

Layer, Mark; Rhodes, John A

2017-01-01

It was recently observed by de Vienne et al. (Syst Biol 60(6):826-832, 2011) that a simple square root transformation of distances between taxa on a phylogenetic tree allowed for an embedding of the taxa into Euclidean space. While the justification for this was based on a diffusion model of continuous character evolution along the tree, here we give a direct and elementary explanation for it that provides substantial additional insight. We use this embedding to reinterpret the differences between the NJ and BIONJ tree building algorithms, providing one illustration of how this embedding reflects tree structures in data.
Chemical induction of resinosis in southern pines. Final report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stubbs, J.; Outcalt, K.W.

1982-01-01

Research studies in the use of paraquat, a contact herbicide, for lightwood induction; i.e., the accumulation of oleoresin in the boles of living southern pine trees are summarized. The objectives of this research were to determine oleoresin yields as influenced by treatment factors, to estimate treatment costs, to uncover technical problems, both in the field and processng plants, and to assess the problem of bark beetle attacks on treated trees. Two pulp and paper mill trials using paraquat-treated wood were conducted, one with loblolly and one with slash pine. The loblolly trial yielded 40 percent more tall oil, or anmore » extra 27.2 pounds/cord of wood. Due to limited turpentine condenser capacity, no additional turpentine was recovered. However, there is evidence to believe that an additional one gal./cord could have been realized.« less
On the Suitability of Suffix Arrays for Lempel-Ziv Data Compression

NASA Astrophysics Data System (ADS)

Ferreira, Artur J.; Oliveira, Arlindo L.; Figueiredo, Mário A. T.

Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used nowadays. Regarding time and memory requirements, LZ encoding is much more demanding than decoding. In order to speed up the encoding process, efficient data structures, like suffix trees, have been used. In this paper, we explore the use of suffix arrays to hold the dictionary of the LZ encoder, and propose an algorithm to search over it. We show that the resulting encoder attains roughly the same compression ratios as those based on suffix trees. However, the amount of memory required by the suffix array is fixed, and much lower than the variable amount of memory used by encoders based on suffix trees (which depends on the text to encode). We conclude that suffix arrays, when compared to suffix trees in terms of the trade-off among time, memory, and compression ratio, may be preferable in scenarios (e.g., embedded systems) where memory is at a premium and high speed is not critical.
K-Means Algorithm Performance Analysis With Determining The Value Of Starting Centroid With Random And KD-Tree Method

NASA Astrophysics Data System (ADS)

Sirait, Kamson; Tulus; Budhiarti Nababan, Erna

2017-12-01

Clustering methods that have high accuracy and time efficiency are necessary for the filtering process. One method that has been known and applied in clustering is K-Means Clustering. In its application, the determination of the begining value of the cluster center greatly affects the results of the K-Means algorithm. This research discusses the results of K-Means Clustering with starting centroid determination with a random and KD-Tree method. The initial determination of random centroid on the data set of 1000 student academic data to classify the potentially dropout has a sse value of 952972 for the quality variable and 232.48 for the GPA, whereas the initial centroid determination by KD-Tree has a sse value of 504302 for the quality variable and 214,37 for the GPA variable. The smaller sse values indicate that the result of K-Means Clustering with initial KD-Tree centroid selection have better accuracy than K-Means Clustering method with random initial centorid selection.
Scalable Regression Tree Learning on Hadoop using OpenPlanet

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yin, Wei; Simmhan, Yogesh; Prasanna, Viktor

As scientific and engineering domains attempt to effectively analyze the deluge of data arriving from sensors and instruments, machine learning is becoming a key data mining tool to build prediction models. Regression tree is a popular learning model that combines decision trees and linear regression to forecast numerical target variables based on a set of input features. Map Reduce is well suited for addressing such data intensive learning applications, and a proprietary regression tree algorithm, PLANET, using MapReduce has been proposed earlier. In this paper, we describe an open source implement of this algorithm, OpenPlanet, on the Hadoop framework usingmore » a hybrid approach. Further, we evaluate the performance of OpenPlanet using realworld datasets from the Smart Power Grid domain to perform energy use forecasting, and propose tuning strategies of Hadoop parameters to improve the performance of the default configuration by 75% for a training dataset of 17 million tuples on a 64-core Hadoop cluster on FutureGrid.« less
Phylogenomic analyses data of the avian phylogenomics project.

PubMed

Jarvis, Erich D; Mirarab, Siavash; Aberer, Andre J; Li, Bo; Houde, Peter; Li, Cai; Ho, Simon Y W; Faircloth, Brant C; Nabholz, Benoit; Howard, Jason T; Suh, Alexander; Weber, Claudia C; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Narula, Nitish; Liu, Liang; Burt, Dave; Ellegren, Hans; Edwards, Scott V; Stamatakis, Alexandros; Mindell, David P; Cracraft, Joel; Braun, Edward L; Warnow, Tandy; Jun, Wang; Gilbert, M Thomas Pius; Zhang, Guojie

2015-01-01

Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
Evaluation of supervised machine-learning algorithms to distinguish between inflammatory bowel disease and alimentary lymphoma in cats.

PubMed

Awaysheh, Abdullah; Wilcke, Jeffrey; Elvinger, François; Rees, Loren; Fan, Weiguo; Zimmerman, Kurt L

2016-11-01

Inflammatory bowel disease (IBD) and alimentary lymphoma (ALA) are common gastrointestinal diseases in cats. The very similar clinical signs and histopathologic features of these diseases make the distinction between them diagnostically challenging. We tested the use of supervised machine-learning algorithms to differentiate between the 2 diseases using data generated from noninvasive diagnostic tests. Three prediction models were developed using 3 machine-learning algorithms: naive Bayes, decision trees, and artificial neural networks. The models were trained and tested on data from complete blood count (CBC) and serum chemistry (SC) results for the following 3 groups of client-owned cats: normal, inflammatory bowel disease (IBD), or alimentary lymphoma (ALA). Naive Bayes and artificial neural networks achieved higher classification accuracy (sensitivities of 70.8% and 69.2%, respectively) than the decision tree algorithm (63%, p < 0.0001). The areas under the receiver-operating characteristic curve for classifying cases into the 3 categories was 83% by naive Bayes, 79% by decision tree, and 82% by artificial neural networks. Prediction models using machine learning provided a method for distinguishing between ALA-IBD, ALA-normal, and IBD-normal. The naive Bayes and artificial neural networks classifiers used 10 and 4 of the CBC and SC variables, respectively, to outperform the C4.5 decision tree, which used 5 CBC and SC variables in classifying cats into the 3 classes. These models can provide another noninvasive diagnostic tool to assist clinicians with differentiating between IBD and ALA, and between diseased and nondiseased cats. © 2016 The Author(s).
Semiautomated landscape feature extraction and modeling

NASA Astrophysics Data System (ADS)

Wasilewski, Anthony A.; Faust, Nickolas L.; Ribarsky, William

2001-08-01

We have developed a semi-automated procedure for generating correctly located 3D tree objects form overhead imagery. Cross-platform software partitions arbitrarily large, geocorrected and geolocated imagery into management sub- images. The user manually selected tree areas from one or more of these sub-images. Tree group blobs are then narrowed to lines using a special thinning algorithm which retains the topology of the blobs, and also stores the thickness of the parent blob. Maxima along these thinned tree grous are found, and used as individual tree locations within the tree group. Magnitudes of the local maxima are used to scale the radii of the tree objects. Grossly overlapping trees are culled based on a comparison of tree-tree distance to combined radii. Tree color is randomly selected based on the distribution of sample tree pixels, and height is estimated form tree radius. The final tree objects are then inserted into a terrain database which can be navigated by VGIS, a high-resolution global terrain visualization system developed at Georgia Tech.
Scalable Nearest Neighbor Algorithms for High Dimensional Data.

PubMed

Muja, Marius; Lowe, David G

2014-11-01

For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
Not seeing the forest for the trees: size of the minimum spanning trees (MSTs) forest and branch significance in MST-based phylogenetic analysis.

PubMed

Teixeira, Andreia Sofia; Monteiro, Pedro T; Carriço, João A; Ramirez, Mário; Francisco, Alexandre P

2015-01-01

Trees, including minimum spanning trees (MSTs), are commonly used in phylogenetic studies. But, for the research community, it may be unclear that the presented tree is just a hypothesis, chosen from among many possible alternatives. In this scenario, it is important to quantify our confidence in both the trees and the branches/edges included in such trees. In this paper, we address this problem for MSTs by introducing a new edge betweenness metric for undirected and weighted graphs. This spanning edge betweenness metric is defined as the fraction of equivalent MSTs where a given edge is present. The metric provides a per edge statistic that is similar to that of the bootstrap approach frequently used in phylogenetics to support the grouping of taxa. We provide methods for the exact computation of this metric based on the well known Kirchhoff's matrix tree theorem. Moreover, we implement and make available a module for the PHYLOViZ software and evaluate the proposed metric concerning both effectiveness and computational performance. Analysis of trees generated using multilocus sequence typing data (MLST) and the goeBURST algorithm revealed that the space of possible MSTs in real data sets is extremely large. Selection of the edge to be represented using bootstrap could lead to unreliable results since alternative edges are present in the same fraction of equivalent MSTs. The choice of the MST to be presented, results from criteria implemented in the algorithm that must be based in biologically plausible models.
An implementation of a tree code on a SIMD, parallel computer

NASA Technical Reports Server (NTRS)

Olson, Kevin M.; Dorband, John E.

1994-01-01

We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
A voxel-based technique to estimate the volume of trees from terrestrial laser scanner data

NASA Astrophysics Data System (ADS)

Bienert, A.; Hess, C.; Maas, H.-G.; von Oheimb, G.

2014-06-01

The precise determination of the volume of standing trees is very important for ecological and economical considerations in forestry. If terrestrial laser scanner data are available, a simple approach for volume determination is given by allocating points into a voxel structure and subsequently counting the filled voxels. Generally, this method will overestimate the volume. The paper presents an improved algorithm to estimate the wood volume of trees using a voxel-based method which will correct for the overestimation. After voxel space transformation, each voxel which contains points is reduced to the volume of its surrounding bounding box. In a next step, occluded (inner stem) voxels are identified by a neighbourhood analysis sweeping in the X and Y direction of each filled voxel. Finally, the wood volume of the tree is composed by the sum of the bounding box volumes of the outer voxels and the volume of all occluded inner voxels. Scan data sets from several young Norway maple trees (Acer platanoides) were used to analyse the algorithm. Therefore, the scanned trees as well as their representing point clouds were separated in different components (stem, branches) to make a meaningful comparison. Two reference measurements were performed for validation: A direct wood volume measurement by placing the tree components into a water tank, and a frustum calculation of small trunk segments by measuring the radii along the trunk. Overall, the results show slightly underestimated volumes (-0.3% for a probe of 13 trees) with a RMSE of 11.6% for the individual tree volume calculated with the new approach.
The estimation of tree posterior probabilities using conditional clade probability distributions.

PubMed

Larget, Bret

2013-07-01

In this article I introduce the idea of conditional independence of separated subtrees as a principle by which to estimate the posterior probability of trees using conditional clade probability distributions rather than simple sample relative frequencies. I describe an algorithm for these calculations and software which implements these ideas. I show that these alternative calculations are very similar to simple sample relative frequencies for high probability trees but are substantially more accurate for relatively low probability trees. The method allows the posterior probability of unsampled trees to be calculated when these trees contain only clades that are in other sampled trees. Furthermore, the method can be used to estimate the total probability of the set of sampled trees which provides a measure of the thoroughness of a posterior sample.

Estimating babassu palm density using automatic palm tree detection with very high spatial resolution satellite images.

PubMed

Dos Santos, Alessio Moreira; Mitja, Danielle; Delaître, Eric; Demagistri, Laurent; de Souza Miranda, Izildinha; Libourel, Thérèse; Petit, Michel

2017-05-15

High spatial resolution images as well as image processing and object detection algorithms are recent technologies that aid the study of biodiversity and commercial plantations of forest species. This paper seeks to contribute knowledge regarding the use of these technologies by studying randomly dispersed native palm tree. Here, we analyze the automatic detection of large circular crown (LCC) palm tree using a high spatial resolution panchromatic GeoEye image (0.50 m) taken on the area of a community of small agricultural farms in the Brazilian Amazon. We also propose auxiliary methods to estimate the density of the LCC palm tree Attalea speciosa (babassu) based on the detection results. We used the "Compt-palm" algorithm based on the detection of palm tree shadows in open areas via mathematical morphology techniques and the spatial information was validated using field methods (i.e. structural census and georeferencing). The algorithm recognized individuals in life stages 5 and 6, and the extraction percentage, branching factor and quality percentage factors were used to evaluate its performance. A principal components analysis showed that the structure of the studied species differs from other species. Approximately 96% of the babassu individuals in stage 6 were detected. These individuals had significantly smaller stipes than the undetected ones. In turn, 60% of the stage 5 babassu individuals were detected, showing significantly a different total height and a different number of leaves from the undetected ones. Our calculations regarding resource availability indicate that 6870 ha contained 25,015 adult babassu palm tree, with an annual potential productivity of 27.4 t of almond oil. The detection of LCC palm tree and the implementation of auxiliary field methods to estimate babassu density is an important first step to monitor this industry resource that is extremely important to the Brazilian economy and thousands of families over a large scale. Copyright © 2017 Elsevier Ltd. All rights reserved.
VC-dimension of univariate decision trees.

PubMed

Yildiz, Olcay Taner

2015-02-01

In this paper, we give and prove the lower bounds of the Vapnik-Chervonenkis (VC)-dimension of the univariate decision tree hypothesis class. The VC-dimension of the univariate decision tree depends on the VC-dimension values of its subtrees and the number of inputs. Via a search algorithm that calculates the VC-dimension of univariate decision trees exhaustively, we show that our VC-dimension bounds are tight for simple trees. To verify that the VC-dimension bounds are useful, we also use them to get VC-generalization bounds for complexity control using structural risk minimization in decision trees, i.e., pruning. Our simulation results show that structural risk minimization pruning using the VC-dimension bounds finds trees that are more accurate as those pruned using cross validation.
PCA based feature reduction to improve the accuracy of decision tree c4.5 classification

NASA Astrophysics Data System (ADS)

Nasution, M. Z. F.; Sitompul, O. S.; Ramli, M.

2018-03-01

Splitting attribute is a major process in Decision Tree C4.5 classification. However, this process does not give a significant impact on the establishment of the decision tree in terms of removing irrelevant features. It is a major problem in decision tree classification process called over-fitting resulting from noisy data and irrelevant features. In turns, over-fitting creates misclassification and data imbalance. Many algorithms have been proposed to overcome misclassification and overfitting on classifications Decision Tree C4.5. Feature reduction is one of important issues in classification model which is intended to remove irrelevant data in order to improve accuracy. The feature reduction framework is used to simplify high dimensional data to low dimensional data with non-correlated attributes. In this research, we proposed a framework for selecting relevant and non-correlated feature subsets. We consider principal component analysis (PCA) for feature reduction to perform non-correlated feature selection and Decision Tree C4.5 algorithm for the classification. From the experiments conducted using available data sets from UCI Cervical cancer data set repository with 858 instances and 36 attributes, we evaluated the performance of our framework based on accuracy, specificity and precision. Experimental results show that our proposed framework is robust to enhance classification accuracy with 90.70% accuracy rates.
Energy aware path planning in complex four dimensional environments

NASA Astrophysics Data System (ADS)

Chakrabarty, Anjan

This dissertation addresses the problem of energy-aware path planning for small autonomous vehicles. While small autonomous vehicles can perform missions that are too risky (or infeasible) for larger vehicles, the missions are limited by the amount of energy that can be carried on board the vehicle. Path planning techniques that either minimize energy consumption or exploit energy available in the environment can thus increase range and endurance. Path planning is complicated by significant spatial (and potentially temporal) variations in the environment. While the main focus is on autonomous aircraft, this research also addresses autonomous ground vehicles. Range and endurance of small unmanned aerial vehicles (UAVs) can be greatly improved by utilizing energy from the atmosphere. Wind can be exploited to minimize energy consumption of a small UAV. But wind, like any other atmospheric component , is a space and time varying phenomenon. To effectively use wind for long range missions, both exploration and exploitation of wind is critical. This research presents a kinematics based tree algorithm which efficiently handles the four dimensional (three spatial and time) path planning problem. The Kinematic Tree algorithm provides a sequence of waypoints, airspeeds, heading and bank angle commands for each segment of the path. The planner is shown to be resolution complete and computationally efficient. Global optimality of the cost function cannot be claimed, as energy is gained from the atmosphere, making the cost function inadmissible. However the Kinematic Tree is shown to be optimal up to resolution if the cost function is admissible. Simulation results show the efficacy of this planning method for a glider in complex real wind data. Simulation results verify that the planner is able to extract energy from the atmosphere enabling long range missions. The Kinematic Tree planning framework, developed to minimize energy consumption of UAVs, is applied for path planning in ground robots. In traditional path planning problem the focus is on obstacle avoidance and navigation. The optimal Kinematic Tree algorithm named Kinematic Tree* is shown to find optimal paths to reach the destination while avoiding obstacles. A more challenging path planning scenario arises for planning in complex terrain. This research shows how the Kinematic Tree* algorithm can be extended to find minimum energy paths for a ground vehicle in difficult mountainous terrain.
Efficient Construction of Mesostate Networks from Molecular Dynamics Trajectories.

PubMed

Vitalis, Andreas; Caflisch, Amedeo

2012-03-13

The coarse-graining of data from molecular simulations yields conformational space networks that may be used for predicting the system's long time scale behavior, to discover structural pathways connecting free energy basins in the system, or simply to represent accessible phase space regions of interest and their connectivities in a two-dimensional plot. In this contribution, we present a tree-based algorithm to partition conformations of biomolecules into sets of similar microstates, i.e., to coarse-grain trajectory data into mesostates. On account of utilizing an architecture similar to that of established tree-based algorithms, the proposed scheme operates in near-linear time with data set size. We derive expressions needed for the fast evaluation of mesostate properties and distances when employing typical choices for measures of similarity between microstates. Using both a pedagogically useful and a real-word application, the algorithm is shown to be robust with respect to tree height, which in addition to mesostate threshold size is the main adjustable parameter. It is demonstrated that the derived mesostate networks can preserve information regarding the free energy basins and barriers by which the system is characterized.
A splay tree-based approach for efficient resource location in P2P networks.

PubMed

Zhou, Wei; Tan, Zilong; Yao, Shaowen; Wang, Shipu

2014-01-01

Resource location in structured P2P system has a critical influence on the system performance. Existing analytical studies of Chord protocol have shown some potential improvements in performance. In this paper a splay tree-based new Chord structure called SChord is proposed to improve the efficiency of locating resources. We consider a novel implementation of the Chord finger table (routing table) based on the splay tree. This approach extends the Chord finger table with additional routing entries. Adaptive routing algorithm is proposed for implementation, and it can be shown that hop count is significantly minimized without introducing any other protocol overheads. We analyze the hop count of the adaptive routing algorithm, as compared to Chord variants, and demonstrate sharp upper and lower bounds for both worst-case and average case settings. In addition, we theoretically analyze the hop reducing in SChord and derive the fact that SChord can significantly reduce the routing hops as compared to Chord. Several simulations are presented to evaluate the performance of the algorithm and support our analytical findings. The simulation results show the efficiency of SChord.
Searching Information Sources in Networks

DTIC Science & Technology

2017-06-14

SECURITY CLASSIFICATION OF: During the course of this project, we made significant progresses in multiple directions of the information detection...result on information source detection on non-tree networks; (2) The development of information source localization algorithms to detect multiple... information sources. The algorithms have provable performance guarantees and outperform existing algorithms in 1. REPORT DATE (DD-MM-YYYY) 4. TITLE AND
Genetic algorithms for multicriteria shape optimization of induction furnace

NASA Astrophysics Data System (ADS)

Kůs, Pavel; Mach, František; Karban, Pavel; Doležel, Ivo

2012-09-01

In this contribution we deal with a multi-criteria shape optimization of an induction furnace. We want to find shape parameters of the furnace in such a way, that two different criteria are optimized. Since they cannot be optimized simultaneously, instead of one optimum we find set of partially optimal designs, so called Pareto front. We compare two different approaches to the optimization, one using nonlinear conjugate gradient method and second using variation of genetic algorithm. As can be seen from the numerical results, genetic algorithm seems to be the right choice for this problem. Solution of direct problem (coupled problem consisting of magnetic and heat field) is done using our own code Agros2D. It uses finite elements of higher order leading to fast and accurate solution of relatively complicated coupled problem. It also provides advanced scripting support, allowing us to prepare parametric model of the furnace and simply incorporate various types of optimization algorithms.
The effect of sample size and disease prevalence on supervised machine learning of narrative data.

PubMed Central

McKnight, Lawrence K.; Wilcox, Adam; Hripcsak, George

2002-01-01

This paper examines the independent effects of outcome prevalence and training sample sizes on inductive learning performance. We trained 3 inductive learning algorithms (MC4, IB, and Naïve-Bayes) on 60 simulated datasets of parsed radiology text reports labeled with 6 disease states. Data sets were constructed to define positive outcome states at 4 prevalence rates (1, 5, 10, 25, and 50%) in training set sizes of 200 and 2,000 cases. We found that the effect of outcome prevalence is significant when outcome classes drop below 10% of cases. The effect appeared independent of sample size, induction algorithm used, or class label. Work is needed to identify methods of improving classifier performance when output classes are rare. PMID:12463878
Evaluating the ecosystem water use efficiency and gross primary productivity in boreal forest based on tree ring data

NASA Astrophysics Data System (ADS)

Liu, S.; Zhuang, Q.

2016-12-01

Climatic change affects the plant physiological and biogeochemistry processes, and therefore on the ecosystem water use efficiency (WUE). Therefore, a comprehensive understanding of WUE would help us understand the adaptability of ecosystem to variable climate conditions. Tree ring data have great potential in addressing the forest response to climatic changes compared with mechanistic model simulations, eddy flux measurement and manipulative experiments. Here, we collected the tree ring isotopic carbon data in 12 boreal forest sites to develop a multiple linear regression model, and the model was extrapolated to the whole boreal region to obtain the WUE spatial and temporal variation from 1948 to 2010. Two algorithms were also used to estimate the inter-annual gross primary productivity (GPP) based on our derived WUE. Our results demonstrated that most of boreal regions showed significant increasing WUE trend during the period except parts of Alaska. The spatial averaged annual mean WUE was predicted to increase by 13%, from 2.3±0.4 g C kg-1 H2O at 1948 to 2.6±0.7 g C kg-1 H2O at 2012, which was much higher than other land surface models. Our predicted GPP by the WUE definition algorithm was comparable with site observation, while for the revised light use efficiency algorithm, GPP estimation was higher than site observation as well as than land surface models. In addition, the increasing GPP trends by two algorithms were similar with land surface model simulations. This is the first study to evaluate regional WUE and GPP in forest ecosystem based on tree ring data and future work should consider other variables (elevation, nitrogen deposition) that influence tree ring isotopic signals and the dual-isotope approach may help improve predicting the inter-annual WUE variation.
The Dynamics of Germinal Centre Selection as Measured by Graph-Theoretical Analysis of Mutational Lineage Trees

PubMed Central

Dunn-Walters, Deborah K.; Belelovsky, Alex; Edelman, Hanna; Banerjee, Monica; Mehr, Ramit

2002-01-01

We have developed a rigorous graph-theoretical algorithm for quantifying the shape properties of mutational lineage trees. We show that information about the dynamics of hypermutation and antigen-driven clonal selection during the humoral immune response is contained in the shape of mutational lineage trees deduced from the responding clones. Age and tissue related differences in the selection process can be studied using this method. Thus, tree shape analysis can be used as a means of elucidating humoral immune response dynamics in various situations. PMID:15144020
A Mixtures-of-Trees Framework for Multi-Label Classification

PubMed Central

Hong, Charmgil; Batal, Iyad; Hauskrecht, Milos

2015-01-01

We propose a new probabilistic approach for multi-label classification that aims to represent the class posterior distribution P(Y|X). Our approach uses a mixture of tree-structured Bayesian networks, which can leverage the computational advantages of conditional tree-structured models and the abilities of mixtures to compensate for tree-structured restrictions. We develop algorithms for learning the model from data and for performing multi-label predictions using the learned model. Experiments on multiple datasets demonstrate that our approach outperforms several state-of-the-art multi-label classification methods. PMID:25927011
Binary Classification using Decision Tree based Genetic Programming and Its Application to Analysis of Bio-mass Data

NASA Astrophysics Data System (ADS)

To, Cuong; Pham, Tuan D.

2010-01-01

In machine learning, pattern recognition may be the most popular task. "Similar" patterns identification is also very important in biology because first, it is useful for prediction of patterns associated with disease, for example cancer tissue (normal or tumor); second, similarity or dissimilarity of the kinetic patterns is used to identify coordinately controlled genes or proteins involved in the same regulatory process. Third, similar genes (proteins) share similar functions. In this paper, we present an algorithm which uses genetic programming to create decision tree for binary classification problem. The application of the algorithm was implemented on five real biological databases. Base on the results of comparisons with well-known methods, we see that the algorithm is outstanding in most of cases.
Genetic Algorithms and Classification Trees in Feature Discovery: Diabetes and the NHANES database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heredia-Langner, Alejandro; Jarman, Kristin H.; Amidan, Brett G.

2013-09-01

This paper presents a feature selection methodology that can be applied to datasets containing a mixture of continuous and categorical variables. Using a Genetic Algorithm (GA), this method explores a dataset and selects a small set of features relevant for the prediction of a binary (1/0) response. Binary classification trees and an objective function based on conditional probabilities are used to measure the fitness of a given subset of features. The method is applied to health data in order to find factors useful for the prediction of diabetes. Results show that our algorithm is capable of narrowing down the setmore » of predictors to around 8 factors that can be validated using reputable medical and public health resources.« less
Photoperiodic growth control in perennial trees.

PubMed

Azeez, Abdul; Sane, Aniruddha P

2015-01-01

Plants have to cope with changing seasons and adverse environmental conditions. Being sessile, plants have developed elaborate mechanisms for their survival that allow them to sense and adapt to the environment and reproduce successfully. A major adaptive trait for the survival of trees of temperate and boreal forests is the induction of growth cessation in anticipation of winters. In the last few years enormous progress has been made to elucidate the molecular mechanisms underlying SDs induced growth cessation in model perennial tree hybrid aspen (Populus tremula × P. tremuloides). In this review we discuss the molecular mechanism underlying photoperiodic control of growth cessation and adaptive responses.
Content addressable memory project

NASA Technical Reports Server (NTRS)

Hall, J. Storrs; Levy, Saul; Smith, Donald E.; Miyake, Keith M.

1992-01-01

A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks.
Numerical taxonomy on data: Experimental results

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cohen, J.; Farach, M.

1997-12-01

The numerical taxonomy problems associated with most of the optimization criteria described above are NP - hard [3, 5, 1, 4]. In, the first positive result for numerical taxonomy was presented. They showed that if e is the distance to the closest tree metric under the L{sub {infinity}} norm. i.e., e = min{sub T} [L{sub {infinity}} (T-D)], then it is possible to construct a tree T such that L{sub {infinity}} (T-D) {le} 3e, that is, they gave a 3-approximation algorithm for this problem. We will refer to this algorithm as the Single Pivot (SP) heuristic.
Instruction-matrix-based genetic programming.

PubMed

Li, Gang; Wang, Jin Feng; Lee, Kin Hong; Leung, Kwong-Sak

2008-08-01

In genetic programming (GP), evolving tree nodes separately would reduce the huge solution space. However, tree nodes are highly interdependent with respect to their fitness. In this paper, we propose a new GP framework, namely, instruction-matrix (IM)-based GP (IMGP), to handle their interactions. IMGP maintains an IM to evolve tree nodes and subtrees separately. IMGP extracts program trees from an IM and updates the IM with the information of the extracted program trees. As the IM actually keeps most of the information of the schemata of GP and evolves the schemata directly, IMGP is effective and efficient. Our experimental results on benchmark problems have verified that IMGP is not only better than those of canonical GP in terms of the qualities of the solutions and the number of program evaluations, but they are also better than some of the related GP algorithms. IMGP can also be used to evolve programs for classification problems. The classifiers obtained have higher classification accuracies than four other GP classification algorithms on four benchmark classification problems. The testing errors are also comparable to or better than those obtained with well-known classifiers. Furthermore, an extended version, called condition matrix for rule learning, has been used successfully to handle multiclass classification problems.
Quantifying Standing Dead Tree Volume and Structural Loss with Voxelized Terrestrial Lidar Data

NASA Astrophysics Data System (ADS)

Popescu, S. C.; Putman, E.

2017-12-01

Standing dead trees (SDTs) are an important forest component and impact a variety of ecosystem processes, yet the carbon pool dynamics of SDTs are poorly constrained in terrestrial carbon cycling models. The ability to model wood decay and carbon cycling in relation to detectable changes in tree structure and volume over time would greatly improve such models. The overall objective of this study was to provide automated aboveground volume estimates of SDTs and automated procedures to detect, quantify, and characterize structural losses over time with terrestrial lidar data. The specific objectives of this study were: 1) develop an automated SDT volume estimation algorithm providing accurate volume estimates for trees scanned in dense forests; 2) develop an automated change detection methodology to accurately detect and quantify SDT structural loss between subsequent terrestrial lidar observations; and 3) characterize the structural loss rates of pine and oak SDTs in southeastern Texas. A voxel-based volume estimation algorithm, "TreeVolX", was developed and incorporates several methods designed to robustly process point clouds of varying quality levels. The algorithm operates on horizontal voxel slices by segmenting the slice into distinct branch or stem sections then applying an adaptive contour interpolation and interior filling process to create solid reconstructed tree models (RTMs). TreeVolX estimated large and small branch volume with an RMSE of 7.3% and 13.8%, respectively. A voxel-based change detection methodology was developed to accurately detect and quantify structural losses and incorporated several methods to mitigate the challenges presented by shifting tree and branch positions as SDT decay progresses. The volume and structural loss of 29 SDTs, composed of Pinus taeda and Quercus stellata, were successfully estimated using multitemporal terrestrial lidar observations over elapsed times ranging from 71 - 753 days. Pine and oak structural loss rates were characterized by estimating the amount of volumetric loss occurring in 20 equal-interval height bins of each SDT. Results showed that large pine snags exhibited more rapid structural loss in comparison to medium-sized oak snags in this study.
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models

NASA Technical Reports Server (NTRS)

Mjoisness, Eric; Castano, Rebecca; Gray, Alexander

1999-01-01

We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.

Coherent multiscale image processing using dual-tree quaternion wavelets.

PubMed

Chan, Wai Lam; Choi, Hyeokho; Baraniuk, Richard G

2008-07-01

The dual-tree quaternion wavelet transform (QWT) is a new multiscale analysis tool for geometric image features. The QWT is a near shift-invariant tight frame representation whose coefficients sport a magnitude and three phases: two phases encode local image shifts while the third contains image texture information. The QWT is based on an alternative theory for the 2-D Hilbert transform and can be computed using a dual-tree filter bank with linear computational complexity. To demonstrate the properties of the QWT's coherent magnitude/phase representation, we develop an efficient and accurate procedure for estimating the local geometrical structure of an image. We also develop a new multiscale algorithm for estimating the disparity between a pair of images that is promising for image registration and flow estimation applications. The algorithm features multiscale phase unwrapping, linear complexity, and sub-pixel estimation accuracy.
Fruit load induces changes in global gene expression and in abscisic acid (ABA) and indole acetic acid (IAA) homeostasis in citrus buds

PubMed Central

Shalom, Liron; Samuels, Sivan; Zur, Naftali; Shlizerman, Lyudmila; Doron-Faigenboim, Adi; Blumwald, Eduardo; Sadka, Avi

2014-01-01

Many fruit trees undergo cycles of heavy fruit load (ON-Crop) in one year, followed by low fruit load (OFF-Crop) the following year, a phenomenon known as alternate bearing (AB). The mechanism by which fruit load affects flowering induction during the following year (return bloom) is still unclear. Although not proven, it is commonly accepted that the fruit or an organ which senses fruit presence generates an inhibitory signal that moves into the bud and inhibits apical meristem transition. Indeed, fruit removal from ON-Crop trees (de-fruiting) induces return bloom. Identification of regulatory or metabolic processes modified in the bud in association with altered fruit load might shed light on the nature of the AB signalling process. The bud transcriptome of de-fruited citrus trees was compared with those of ON- and OFF-Crop trees. Fruit removal resulted in relatively rapid changes in global gene expression, including induction of photosynthetic genes and proteins. Altered regulatory mechanisms included abscisic acid (ABA) metabolism and auxin polar transport. Genes of ABA biosynthesis were induced; however, hormone analyses showed that the ABA level was reduced in OFF-Crop buds and in buds shortly following fruit removal. Additionally, genes associated with Ca2+-dependent auxin polar transport were remarkably induced in buds of OFF-Crop and de-fruited trees. Hormone analyses showed that auxin levels were reduced in these buds as compared with ON-Crop buds. In view of the auxin transport autoinhibition theory, the possibility that auxin distribution plays a role in determining bud fate is discussed. PMID:24706719
Dynamic Shortest Path Algorithms for Hypergraphs

DTIC Science & Technology

2014-01-01

the concept of relationship tree to indicate the parent –child relationship along shortest hy- perpaths. The concept can be easily explained in the...four possible relationship trees to indicate the parent –child relationship in these shortest hyperpaths. We will show in Section III that the choice of...distance of a vertex to the source on the shortest hyperpath, the parent of in the chosen relationship tree associated with the shortest hyperpaths, This
Evolutionary Algorithm Based Automated Reverse Engineering and Defect Discovery

DTIC Science & Technology

2007-09-21

a previous application of a GP as a data mining function to evolve fuzzy decision trees symbolically [3-5], the terminal set consisted of fuzzy...of input and output information is required. In the case of fuzzy decision trees, the database represented a collection of scenarios about which the...fuzzy decision tree to be evolved would make decisions . The database also had entries created by experts representing decisions about the scenarios
Interpretable Categorization of Heterogeneous Time Series Data

NASA Technical Reports Server (NTRS)

Lee, Ritchie; Kochenderfer, Mykel J.; Mengshoel, Ole J.; Silbermann, Joshua

2017-01-01

We analyze data from simulated aircraft encounters to validate and inform the development of a prototype aircraft collision avoidance system. The high-dimensional and heterogeneous time series dataset is analyzed to discover properties of near mid-air collisions (NMACs) and categorize the NMAC encounters. Domain experts use these properties to better organize and understand NMAC occurrences. Existing solutions either are not capable of handling high-dimensional and heterogeneous time series datasets or do not provide explanations that are interpretable by a domain expert. The latter is critical to the acceptance and deployment of safety-critical systems. To address this gap, we propose grammar-based decision trees along with a learning algorithm. Our approach extends decision trees with a grammar framework for classifying heterogeneous time series data. A context-free grammar is used to derive decision expressions that are interpretable, application-specific, and support heterogeneous data types. In addition to classification, we show how grammar-based decision trees can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply grammar-based decision trees to a simulated aircraft encounter dataset and evaluate the performance of four variants of our learning algorithm. The best algorithm is used to analyze and categorize near mid-air collisions in the aircraft encounter dataset. We describe each discovered category in detail and discuss its relevance to aircraft collision avoidance.
Stacked Denoising Autoencoders Applied to Star/Galaxy Classification

NASA Astrophysics Data System (ADS)

Qin, Hao-ran; Lin, Ji-ming; Wang, Jun-yi

2017-04-01

In recent years, the deep learning algorithm, with the characteristics of strong adaptability, high accuracy, and structural complexity, has become more and more popular, but it has not yet been used in astronomy. In order to solve the problem that the star/galaxy classification accuracy is high for the bright source set, but low for the faint source set of the Sloan Digital Sky Survey (SDSS) data, we introduced the new deep learning algorithm, namely the SDA (stacked denoising autoencoder) neural network and the dropout fine-tuning technique, which can greatly improve the robustness and antinoise performance. We randomly selected respectively the bright source sets and faint source sets from the SDSS DR12 and DR7 data with spectroscopic measurements, and made preprocessing on them. Then, we randomly selected respectively the training sets and testing sets without replacement from the bright source sets and faint source sets. At last, using these training sets we made the training to obtain the SDA models of the bright sources and faint sources in the SDSS DR7 and DR12, respectively. We compared the test result of the SDA model on the DR12 testing set with the test results of the Library for Support Vector Machines (LibSVM), J48 decision tree, Logistic Model Tree (LMT), Support Vector Machine (SVM), Logistic Regression, and Decision Stump algorithm, and compared the test result of the SDA model on the DR7 testing set with the test results of six kinds of decision trees. The experiments show that the SDA has a better classification accuracy than other machine learning algorithms for the faint source sets of DR7 and DR12. Especially, when the completeness function is used as the evaluation index, compared with the decision tree algorithms, the correctness rate of SDA has improved about 15% for the faint source set of SDSS-DR7.
The Estimation of Tree Posterior Probabilities Using Conditional Clade Probability Distributions

PubMed Central

Larget, Bret

2013-01-01

In this article I introduce the idea of conditional independence of separated subtrees as a principle by which to estimate the posterior probability of trees using conditional clade probability distributions rather than simple sample relative frequencies. I describe an algorithm for these calculations and software which implements these ideas. I show that these alternative calculations are very similar to simple sample relative frequencies for high probability trees but are substantially more accurate for relatively low probability trees. The method allows the posterior probability of unsampled trees to be calculated when these trees contain only clades that are in other sampled trees. Furthermore, the method can be used to estimate the total probability of the set of sampled trees which provides a measure of the thoroughness of a posterior sample. [Bayesian phylogenetics; conditional clade distributions; improved accuracy; posterior probabilities of trees.] PMID:23479066
Tree Classification Software

NASA Technical Reports Server (NTRS)

Buntine, Wray

1993-01-01

This paper introduces the IND Tree Package to prospective users. IND does supervised learning using classification trees. This learning task is a basic tool used in the development of diagnosis, monitoring and expert systems. The IND Tree Package was developed as part of a NASA project to semi-automate the development of data analysis and modelling algorithms using artificial intelligence techniques. The IND Tree Package integrates features from CART and C4 with newer Bayesian and minimum encoding methods for growing classification trees and graphs. The IND Tree Package also provides an experimental control suite on top. The newer features give improved probability estimates often required in diagnostic and screening tasks. The package comes with a manual, Unix 'man' entries, and a guide to tree methods and research. The IND Tree Package is implemented in C under Unix and was beta-tested at university and commercial research laboratories in the United States.
Expert-guided evolutionary algorithm for layout design of complex space stations

NASA Astrophysics Data System (ADS)

Qian, Zhiqin; Bi, Zhuming; Cao, Qun; Ju, Weiguo; Teng, Hongfei; Zheng, Yang; Zheng, Siyu

2017-08-01

The layout of a space station should be designed in such a way that different equipment and instruments are placed for the station as a whole to achieve the best overall performance. The station layout design is a typical nondeterministic polynomial problem. In particular, how to manage the design complexity to achieve an acceptable solution within a reasonable timeframe poses a great challenge. In this article, a new evolutionary algorithm has been proposed to meet such a challenge. It is called as the expert-guided evolutionary algorithm with a tree-like structure decomposition (EGEA-TSD). Two innovations in EGEA-TSD are (i) to deal with the design complexity, the entire design space is divided into subspaces with a tree-like structure; it reduces the computation and facilitates experts' involvement in the solving process. (ii) A human-intervention interface is developed to allow experts' involvement in avoiding local optimums and accelerating convergence. To validate the proposed algorithm, the layout design of one-space station is formulated as a multi-disciplinary design problem, the developed algorithm is programmed and executed, and the result is compared with those from other two algorithms; it has illustrated the superior performance of the proposed EGEA-TSD.
Reconstructing Unrooted Phylogenetic Trees from Symbolic Ternary Metrics.

PubMed

Grünewald, Stefan; Long, Yangjing; Wu, Yaokun

2018-03-09

Böcker and Dress (Adv Math 138:105-125, 1998) presented a 1-to-1 correspondence between symbolically dated rooted trees and symbolic ultrametrics. We consider the corresponding problem for unrooted trees. More precisely, given a tree T with leaf set X and a proper vertex coloring of its interior vertices, we can map every triple of three different leaves to the color of its median vertex. We characterize all ternary maps that can be obtained in this way in terms of 4- and 5-point conditions, and we show that the corresponding tree and its coloring can be reconstructed from a ternary map that satisfies those conditions. Further, we give an additional condition that characterizes whether the tree is binary, and we describe an algorithm that reconstructs general trees in a bottom-up fashion.
An efficient parallel termination detection algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, A. H.; Crivelli, S.; Jessup, E. R.

2004-05-27

Information local to any one processor is insufficient to monitor the overall progress of most distributed computations. Typically, a second distributed computation for detecting termination of the main computation is necessary. In order to be a useful computational tool, the termination detection routine must operate concurrently with the main computation, adding minimal overhead, and it must promptly and correctly detect termination when it occurs. In this paper, we present a new algorithm for detecting the termination of a parallel computation on distributed-memory MIMD computers that satisfies all of those criteria. A variety of termination detection algorithms have been devised. Ofmore » these, the algorithm presented by Sinha, Kale, and Ramkumar (henceforth, the SKR algorithm) is unique in its ability to adapt to the load conditions of the system on which it runs, thereby minimizing the impact of termination detection on performance. Because their algorithm also detects termination quickly, we consider it to be the most efficient practical algorithm presently available. The termination detection algorithm presented here was developed for use in the PMESC programming library for distributed-memory MIMD computers. Like the SKR algorithm, our algorithm adapts to system loads and imposes little overhead. Also like the SKR algorithm, ours is tree-based, and it does not depend on any assumptions about the physical interconnection topology of the processors or the specifics of the distributed computation. In addition, our algorithm is easier to implement and requires only half as many tree traverses as does the SKR algorithm. This paper is organized as follows. In section 2, we define our computational model. In section 3, we review the SKR algorithm. We introduce our new algorithm in section 4, and prove its correctness in section 5. We discuss its efficiency and present experimental results in section 6.« less
Joint amalgamation of most parsimonious reconciled gene trees

PubMed Central

Scornavacca, Celine; Jacox, Edwin; Szöllősi, Gergely J.

2015-01-01

Motivation: Traditionally, gene phylogenies have been reconstructed solely on the basis of molecular sequences; this, however, often does not provide enough information to distinguish between statistically equivalent relationships. To address this problem, several recent methods have incorporated information on the species phylogeny in gene tree reconstruction, leading to dramatic improvements in accuracy. Although probabilistic methods are able to estimate all model parameters but are computationally expensive, parsimony methods—generally computationally more efficient—require a prior estimate of parameters and of the statistical support. Results: Here, we present the Tree Estimation using Reconciliation (TERA) algorithm, a parsimony based, species tree aware method for gene tree reconstruction based on a scoring scheme combining duplication, transfer and loss costs with an estimate of the sequence likelihood. TERA explores all reconciled gene trees that can be amalgamated from a sample of gene trees. Using a large scale simulated dataset, we demonstrate that TERA achieves the same accuracy as the corresponding probabilistic method while being faster, and outperforms other parsimony-based methods in both accuracy and speed. Running TERA on a set of 1099 homologous gene families from complete cyanobacterial genomes, we find that incorporating knowledge of the species tree results in a two thirds reduction in the number of apparent transfer events. Availability and implementation: The algorithm is implemented in our program TERA, which is freely available from http://mbb.univ-montp2.fr/MBB/download_sources/16__TERA. Contact: celine.scornavacca@univ-montp2.fr, ssolo@angel.elte.hu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25380957
Understanding the Scalability of Bayesian Network Inference Using Clique Tree Growth Curves

NASA Technical Reports Server (NTRS)

Mengshoel, Ole J.

2010-01-01

One of the main approaches to performing computation in Bayesian networks (BNs) is clique tree clustering and propagation. The clique tree approach consists of propagation in a clique tree compiled from a Bayesian network, and while it was introduced in the 1980s, there is still a lack of understanding of how clique tree computation time depends on variations in BN size and structure. In this article, we improve this understanding by developing an approach to characterizing clique tree growth as a function of parameters that can be computed in polynomial time from BNs, specifically: (i) the ratio of the number of a BN s non-root nodes to the number of root nodes, and (ii) the expected number of moral edges in their moral graphs. Analytically, we partition the set of cliques in a clique tree into different sets, and introduce a growth curve for the total size of each set. For the special case of bipartite BNs, there are two sets and two growth curves, a mixed clique growth curve and a root clique growth curve. In experiments, where random bipartite BNs generated using the BPART algorithm are studied, we systematically increase the out-degree of the root nodes in bipartite Bayesian networks, by increasing the number of leaf nodes. Surprisingly, root clique growth is well-approximated by Gompertz growth curves, an S-shaped family of curves that has previously been used to describe growth processes in biology, medicine, and neuroscience. We believe that this research improves the understanding of the scaling behavior of clique tree clustering for a certain class of Bayesian networks; presents an aid for trade-off studies of clique tree clustering using growth curves; and ultimately provides a foundation for benchmarking and developing improved BN inference and machine learning algorithms.
Minimizing the average distance to a closest leaf in a phylogenetic tree.

PubMed

Matsen, Frederick A; Gallagher, Aaron; McCoy, Connor O

2013-11-01

When performing an analysis on a collection of molecular sequences, it can be convenient to reduce the number of sequences under consideration while maintaining some characteristic of a larger collection of sequences. For example, one may wish to select a subset of high-quality sequences that represent the diversity of a larger collection of sequences. One may also wish to specialize a large database of characterized "reference sequences" to a smaller subset that is as close as possible on average to a collection of "query sequences" of interest. Such a representative subset can be useful whenever one wishes to find a set of reference sequences that is appropriate to use for comparative analysis of environmentally derived sequences, such as for selecting "reference tree" sequences for phylogenetic placement of metagenomic reads. In this article, we formalize these problems in terms of the minimization of the Average Distance to the Closest Leaf (ADCL) and investigate algorithms to perform the relevant minimization. We show that the greedy algorithm is not effective, show that a variant of the Partitioning Around Medoids (PAM) heuristic gets stuck in local minima, and develop an exact dynamic programming approach. Using this exact program we note that the performance of PAM appears to be good for simulated trees, and is faster than the exact algorithm for small trees. On the other hand, the exact program gives solutions for all numbers of leaves less than or equal to the given desired number of leaves, whereas PAM only gives a solution for the prespecified number of leaves. Via application to real data, we show that the ADCL criterion chooses chimeric sequences less often than random subsets, whereas the maximization of phylogenetic diversity chooses them more often than random. These algorithms have been implemented in publicly available software.
Tests with VHR images for the identification of olive trees and other fruit trees in the European Union

NASA Astrophysics Data System (ADS)

Masson, Josiane; Soille, Pierre; Mueller, Rick

2004-10-01

In the context of the Common Agricultural Policy (CAP) there is a strong interest of the European Commission for counting and individually locating fruit trees. An automatic counting algorithm developed by the JRC (OLICOUNT) was used in the past for olive trees only, on 1m black and white orthophotos but with limits in case of young trees or irregular groves. This study investigates the improvement of fruit tree identification using VHR images on a large set of data in three test sites, one in Creta (Greece; one in the south-east of France with a majority of olive trees and associated fruit trees, and the last one in Florida on citrus trees. OLICOUNT was compared with two other automatic tree counting, applications, one using the CRISP software on citrus trees and the other completely automatic based on regional minima (morphological image analysis). Additional investigation was undertaken to refine the methods. This paper describes the automatic methods and presents the results derived from the tests.
Sensitivity cycling in physically dormant seeds of the Neotropical tree Senna multijuga (Fabaceae).

PubMed

Rodrigues-Junior, A G; Baskin, C C; Baskin, J M; Garcia, Q S

2018-03-23

Cycling of sensitivity to physical dormancy (PY) break has been documented in herbaceous species. However, it has not been reported in tree seeds, nor has the effect of seed size on sensitivity to PY-breaking been evaluated in any species. Thus, the aims of this study were to investigate how PY is broken in seeds of the tropical legume tree Senna multijuga, if seeds exhibit sensitivity cycling and if seed size affects induction into sensitivity. Dormancy and germination were evaluated in intact and scarified seeds from two collections of S. multijuga. The effects of temperature, moisture and seed size on induction of sensitivity to dormancy-breaking were assessed, and seasonal changes in germination and persistence of buried seeds were determined. Reversal of sensitivity was also investigated. Fresh seeds were insensitive to dormancy break at wet-high temperatures, and an increase in sensitivity occurred in buried seeds after they experienced low temperatures during winter (dry season). Temperatures ≤20 °C increased sensitivity, whereas temperatures ≥30 °C decreased it regardless of moisture conditions. Dormancy was broken in sensitive seeds by incubating them at 35 °C. Sensitivity could be reversed, and large seeds were more sensitive than small seeds to sensitivity induction. Seeds of S. multijuga exhibit sensitivity cycling to PY-breaking. Seeds become sensitive during winter and can germinate with the onset of the spring-summer rainy season in Brazil. Small seeds are slower to become sensitive than large ones, and this may be a mechanism by which germination is spread over time. Sensitive seeds that fail to germinate become insensitive during exposure to drought during summer. This is the first report of sensitivity cycling in a tree species. © 2018 German Society for Plant Sciences and The Royal Botanical Society of the Netherlands.
Induction motor speed control using varied duty cycle terminal voltage via PI controller

NASA Astrophysics Data System (ADS)

Azwin, A.; Ahmed, S.

2018-03-01

This paper deals with the PI speed controller for the three-phase induction motor using PWM technique. The PWM generated signal is utilized for voltage source inverter with an optimal duty cycle on a simplified induction motor model. A control algorithm for generating PWM control signal is developed. Obtained results shows that the steady state error and overshoot of the developed system is in the limit under different speed and load condition. The robustness of the control performance would be potential for induction motor performance improvement.
Point-in-convex polygon and point-in-convex polyhedron algorithms with O(1) complexity using space subdivision

NASA Astrophysics Data System (ADS)

Skala, Vaclav

2016-06-01

There are many space subdivision and space partitioning techniques used in many algorithms to speed up computations. They mostly rely on orthogonal space subdivision, resp. using hierarchical data structures, e.g. BSP trees, quadtrees, octrees, kd-trees, bounding volume hierarchies etc. However in some applications a non-orthogonal space subdivision can offer new ways for actual speed up. In the case of convex polygon in E2 a simple Point-in-Polygon test is of the O(N) complexity and the optimal algorithm is of O(log N) computational complexity. In the E3 case, the complexity is O(N) even for the convex polyhedron as no ordering is defined. New Point-in-Convex Polygon and Point-in-Convex Polyhedron algorithms are presented based on space subdivision in the preprocessing stage resulting to O(1) run-time complexity. The presented approach is simple to implement. Due to the principle of duality, dual problems, e.g. line-convex polygon, line clipping, can be solved in a similarly.
An Asymptotically-Optimal Sampling-Based Algorithm for Bi-directional Motion Planning

PubMed Central

Starek, Joseph A.; Gomez, Javier V.; Schmerling, Edward; Janson, Lucas; Moreno, Luis; Pavone, Marco

2015-01-01

Bi-directional search is a widely used strategy to increase the success and convergence rates of sampling-based motion planning algorithms. Yet, few results are available that merge both bi-directional search and asymptotic optimality into existing optimal planners, such as PRM*, RRT*, and FMT*. The objective of this paper is to fill this gap. Specifically, this paper presents a bi-directional, sampling-based, asymptotically-optimal algorithm named Bi-directional FMT* (BFMT*) that extends the Fast Marching Tree (FMT*) algorithm to bidirectional search while preserving its key properties, chiefly lazy search and asymptotic optimality through convergence in probability. BFMT* performs a two-source, lazy dynamic programming recursion over a set of randomly-drawn samples, correspondingly generating two search trees: one in cost-to-come space from the initial configuration and another in cost-to-go space from the goal configuration. Numerical experiments illustrate the advantages of BFMT* over its unidirectional counterpart, as well as a number of other state-of-the-art planners. PMID:27004130
Simulation-Based Model Checking for Nondeterministic Systems and Rare Events

DTIC Science & Technology

2016-03-24

year, we have investigated AO* search and Monte Carlo Tree Search algorithms to complement and enhance CMU’s SMCMDP. 1 Final Report, March 14... tree , so we can use it to find the probability of reachability for a property in PRISM’s Probabilistic LTL. By finding the maximum probability of...savings, particularly when handling very large models. 2.3 Monte Carlo Tree Search The Monte Carlo sampling process in SMCMDP can take a long time to

Growth and carbon balance are differently regulated by tree and shoot fruiting contexts: an integrative study on apple genotypes with contrasted bearing patterns.

PubMed

Pallas, Benoît; Bluy, Sylvie; Ngao, Jérôme; Martinez, Sébastien; Clément-Vidal, Anne; Kelner, Jean-Jacques; Costes, Evelyne

2018-01-09

In plants, carbon source-sink relationships are assumed to affect their reproductive effort. In fruit trees, carbon source-sink relationships are likely to be involved in their fruiting behavior. In apple, a large variability in fruiting behaviors exists, from regular to biennial, which has been related to the within-tree synchronization vs desynchronization of floral induction in buds. In this study, we analyzed if carbon assimilation, availability and fluxes as well as shoot growth differ in apple genotypes with contrasted behaviors. Another aim was to determine the scale of plant organization at which growth and carbon balance are regulated. The study was carried out on 16 genotypes belonging to three classes: (i) biennial, (ii) regular with a high production of floral buds every year and (iii) regular, displaying desynchronized bud fates in each year. Three shoot categories, vegetative and reproductive shoots with or without fruits, were included. This study shows that shoot growth and carbon balance are differentially regulated by tree and shoot fruiting contexts. Shoot growth was determined by the shoot fruiting context, or by the type of shoot itself, since vegetative shoots were always longer than reproductive shoots whatever the tree crop load. Leaf photosynthesis depended on the tree crop load only, irrespective of the shoot category or the genotypic class. Starch content was also strongly affected by the tree crop load with some adjustments of the carbon balance among shoots since starch content was lower, at least at some dates, in shoots with fruits compared with the shoots without fruits within the same trees. Finally, the genotypic differences in terms of shoot carbon balance partly matched with genotypic bearing patterns. Nevertheless, carbon content in buds and the role of gibberellins produced by seeds as well as the distances at which they could affect floral induction should be further analyzed. © The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Nearest Neighbor Searching in Binary Search Trees: Simulation of a Multiprocessor System.

ERIC Educational Resources Information Center

Stewart, Mark; Willett, Peter

1987-01-01

Describes the simulation of a nearest neighbor searching algorithm for document retrieval using a pool of microprocessors. Three techniques are described which allow parallel searching of a binary search tree as well as a PASCAL-based system, PASSIM, which can simulate these techniques. Fifty-six references are provided. (Author/LRW)
Invasion Percolation and Global Optimization

NASA Astrophysics Data System (ADS)

Barabási, Albert-László

1996-05-01

Invasion bond percolation (IBP) is mapped exactly into Prim's algorithm for finding the shortest spanning tree of a weighted random graph. Exploring this mapping, which is valid for arbitrary dimensions and lattices, we introduce a new IBP model that belongs to the same universality class as IBP and generates the minimal energy tree spanning the IBP cluster.
What Satisfies Students?: Mining Student-Opinion Data with Regression and Decision Tree Analysis

ERIC Educational Resources Information Center

Thomas, Emily H.; Galambos, Nora

2004-01-01

To investigate how students' characteristics and experiences affect satisfaction, this study uses regression and decision tree analysis with the CHAID algorithm to analyze student-opinion data. A data mining approach identifies the specific aspects of students' university experience that most influence three measures of general satisfaction. The…
A diagnosis system using object-oriented fault tree models

NASA Technical Reports Server (NTRS)

Iverson, David L.; Patterson-Hine, F. A.

1990-01-01

Spaceborne computing systems must provide reliable, continuous operation for extended periods. Due to weight, power, and volume constraints, these systems must manage resources very effectively. A fault diagnosis algorithm is described which enables fast and flexible diagnoses in the dynamic distributed computing environments planned for future space missions. The algorithm uses a knowledge base that is easily changed and updated to reflect current system status. Augmented fault trees represented in an object-oriented form provide deep system knowledge that is easy to access and revise as a system changes. Given such a fault tree, a set of failure events that have occurred, and a set of failure events that have not occurred, this diagnosis system uses forward and backward chaining to propagate causal and temporal information about other failure events in the system being diagnosed. Once the system has established temporal and causal constraints, it reasons backward from heuristically selected failure events to find a set of basic failure events which are a likely cause of the occurrence of the top failure event in the fault tree. The diagnosis system has been implemented in common LISP using Flavors.
Seasonal variations of isoprene emissions from deciduous trees

NASA Astrophysics Data System (ADS)

Xiaoshan, Zhang; Yujing, Mu; Wenzhi, Song; Yahui, Zhuang

Isoprene emission fluxes were investigated for 12 tree species in and around Beijing city. Bag-enclosure method was used to collect the air sample and GC-PID was used to directly analyze isoprene. Ginkgo and Magnolia denudata had negligible isoprene emissions, while significant emissions were observed for Platanus orientalis, Pendula loud, Populus simonii, and Salix matsudana koidz, and other remaining trees showed no sign of isoprene emission. Variations in isoprene emission with changes in light, temperature and season were investigated for Platanus orientalis and Pendula loud. Isoprene emission rates strongly depended on light, temperature and leaf age. The maximum emission rates for the two trees were observed in summer with values of about 232 and 213 μg g -1 dw h -1, respectively. The measured emission fluxes were used to evaluate "Guenther" emission algorithm. The emission fluxes predicted by the algorithm were in relatively good agreement with field measurements. However, there were large differences for the calculated median emission factors during spring, summer and fall. The 25-75 percentiles span of the emission factor data sets ranged from -33 to +15% of the median values.
Application and Exploration of Big Data Mining in Clinical Medicine

PubMed Central

Zhang, Yue; Guo, Shu-Li; Han, Li-Na; Li, Tie-Ling

2016-01-01

Objective: To review theories and technologies of big data mining and their application in clinical medicine. Data Sources: Literatures published in English or Chinese regarding theories and technologies of big data mining and the concrete applications of data mining technology in clinical medicine were obtained from PubMed and Chinese Hospital Knowledge Database from 1975 to 2015. Study Selection: Original articles regarding big data mining theory/technology and big data mining's application in the medical field were selected. Results: This review characterized the basic theories and technologies of big data mining including fuzzy theory, rough set theory, cloud theory, Dempster–Shafer theory, artificial neural network, genetic algorithm, inductive learning theory, Bayesian network, decision tree, pattern recognition, high-performance computing, and statistical analysis. The application of big data mining in clinical medicine was analyzed in the fields of disease risk assessment, clinical decision support, prediction of disease development, guidance of rational use of drugs, medical management, and evidence-based medicine. Conclusion: Big data mining has the potential to play an important role in clinical medicine. PMID:26960378
Compact 0-complete trees: A new method for searching large files

DOE Office of Scientific and Technical Information (OSTI.GOV)

Orlandic, R.; Pfaltz, J.L.

1988-01-26

In this report, a novel approach to ordered retrieval in very large files is developed. The method employs a B-tree like search algorithm that is independent of key type or key length because all keys in index blocks are encoded by a 1 byte surrogate. The replacement of actual key sequences by the 1 byte surrogate ensures a maximal possible fan out and greatly reduces the storage overhead of maintaining access indices. Initially, retrieval in binary trie structure is developed. With the aid of a fairly complex recurrence relation, the rather scraggly binary trie is transformed into compact multi-way searchmore » tree. Then the recurrence relation itself is replaced by an unusually simple search algorithm. Then implementation details and empirical performance results are presented. Reduction of index size by 50%--75% opens up the possibility of replicating system-wide indices for parallel access in distributed databases. 23 figs.« less
Retinal optical coherence tomography image enhancement via shrinkage denoising using double-density dual-tree complex wavelet transform

PubMed Central

Mayer, Markus A.; Boretsky, Adam R.; van Kuijk, Frederik J.; Motamedi, Massoud

2012-01-01

Abstract. Image enhancement of retinal structures, in optical coherence tomography (OCT) scans through denoising, has the potential to aid in the diagnosis of several eye diseases. In this paper, a locally adaptive denoising algorithm using double-density dual-tree complex wavelet transform, a combination of the double-density wavelet transform and the dual-tree complex wavelet transform, is applied to reduce speckle noise in OCT images of the retina. The algorithm overcomes the limitations of commonly used multiple frame averaging technique, namely the limited number of frames that can be recorded due to eye movements, by providing a comparable image quality in significantly less acquisition time equal to an order of magnitude less time compared to the averaging method. In addition, improvements of image quality metrics and 5 dB increase in the signal-to-noise ratio are attained. PMID:23117804
Retinal optical coherence tomography image enhancement via shrinkage denoising using double-density dual-tree complex wavelet transform.

PubMed

Chitchian, Shahab; Mayer, Markus A; Boretsky, Adam R; van Kuijk, Frederik J; Motamedi, Massoud

2012-11-01

ABSTRACT. Image enhancement of retinal structures, in optical coherence tomography (OCT) scans through denoising, has the potential to aid in the diagnosis of several eye diseases. In this paper, a locally adaptive denoising algorithm using double-density dual-tree complex wavelet transform, a combination of the double-density wavelet transform and the dual-tree complex wavelet transform, is applied to reduce speckle noise in OCT images of the retina. The algorithm overcomes the limitations of commonly used multiple frame averaging technique, namely the limited number of frames that can be recorded due to eye movements, by providing a comparable image quality in significantly less acquisition time equal to an order of magnitude less time compared to the averaging method. In addition, improvements of image quality metrics and 5 dB increase in the signal-to-noise ratio are attained.
Evolutionary tree reconstruction

NASA Technical Reports Server (NTRS)

Cheeseman, Peter; Kanefsky, Bob

1990-01-01

It is described how Minimum Description Length (MDL) can be applied to the problem of DNA and protein evolutionary tree reconstruction. If there is a set of mutations that transform a common ancestor into a set of the known sequences, and this description is shorter than the information to encode the known sequences directly, then strong evidence for an evolutionary relationship has been found. A heuristic algorithm is described that searches for the simplest tree (smallest MDL) that finds close to optimal trees on the test data. Various ways of extending the MDL theory to more complex evolutionary relationships are discussed.
Tiling a figure using a height in a tree

DOE Office of Scientific and Technical Information (OSTI.GOV)

Remila, E.

1996-12-31

We first give a new presentation of an algorithm from Thurston of tiling with lozenges formed from two cells of the triangular lattice A. Secondly we extend the method to get a linear algorithm of tiling with leaning dominoes (parallelograms formed from four cells of {Lambda}) and triangles (formed from four cells of {Lambda}). Thirdly, we produce a quadratic algorithm of tiling with leaning dominoes.
Observation Uncertainty in Gaussian Sensor Networks

DTIC Science & Technology

2006-01-23

Ziv , J., and Lempel , A. A universal algorithm for sequential data compression . IEEE Transactions on Information Theory 23, 3 (1977), 337–343. 73 ...using the Lempel - Ziv algorithm [42], context-tree weighting [41], or the Burrows-Wheeler Trans- form [4], [15], for example. These source codes will...and Computation (Monticello, IL, September 2004). [4] Burrows, M., and Wheeler, D. A block sorting lossless data compression algorithm . Tech.
Fast Marching Tree: a Fast Marching Sampling-Based Method for Optimal Motion Planning in Many Dimensions*

PubMed Central

Janson, Lucas; Schmerling, Edward; Clark, Ashley; Pavone, Marco

2015-01-01

In this paper we present a novel probabilistic sampling-based motion planning algorithm called the Fast Marching Tree algorithm (FMT*). The algorithm is specifically aimed at solving complex motion planning problems in high-dimensional configuration spaces. This algorithm is proven to be asymptotically optimal and is shown to converge to an optimal solution faster than its state-of-the-art counterparts, chiefly PRM* and RRT*. The FMT* algorithm performs a “lazy” dynamic programming recursion on a predetermined number of probabilistically-drawn samples to grow a tree of paths, which moves steadily outward in cost-to-arrive space. As such, this algorithm combines features of both single-query algorithms (chiefly RRT) and multiple-query algorithms (chiefly PRM), and is reminiscent of the Fast Marching Method for the solution of Eikonal equations. As a departure from previous analysis approaches that are based on the notion of almost sure convergence, the FMT* algorithm is analyzed under the notion of convergence in probability: the extra mathematical flexibility of this approach allows for convergence rate bounds—the first in the field of optimal sampling-based motion planning. Specifically, for a certain selection of tuning parameters and configuration spaces, we obtain a convergence rate bound of order O(n−1/d+ρ), where n is the number of sampled points, d is the dimension of the configuration space, and ρ is an arbitrarily small constant. We go on to demonstrate asymptotic optimality for a number of variations on FMT*, namely when the configuration space is sampled non-uniformly, when the cost is not arc length, and when connections are made based on the number of nearest neighbors instead of a fixed connection radius. Numerical experiments over a range of dimensions and obstacle configurations confirm our the-oretical and heuristic arguments by showing that FMT*, for a given execution time, returns substantially better solutions than either PRM* or RRT*, especially in high-dimensional configuration spaces and in scenarios where collision-checking is expensive. PMID:27003958
The stopping rules for winsorized tree

NASA Astrophysics Data System (ADS)

Ch'ng, Chee Keong; Mahat, Nor Idayu

2017-11-01

Winsorized tree is a modified tree-based classifier that is able to investigate and to handle all outliers in all nodes along the process of constructing the tree. It overcomes the tedious process of constructing a classical tree where the splitting of branches and pruning go concurrently so that the constructed tree would not grow bushy. This mechanism is controlled by the proposed algorithm. In winsorized tree, data are screened for identifying outlier. If outlier is detected, the value is neutralized using winsorize approach. Both outlier identification and value neutralization are executed recursively in every node until predetermined stopping criterion is met. The aim of this paper is to search for significant stopping criterion to stop the tree from further splitting before overfitting. The result obtained from the conducted experiment on pima indian dataset proved that the node could produce the final successor nodes (leaves) when it has achieved the range of 70% in information gain.
HD-MTL: Hierarchical Deep Multi-Task Learning for Large-Scale Visual Recognition.

PubMed

Fan, Jianping; Zhao, Tianyi; Kuang, Zhenzhong; Zheng, Yu; Zhang, Ji; Yu, Jun; Peng, Jinye

2017-02-09

In this paper, a hierarchical deep multi-task learning (HD-MTL) algorithm is developed to support large-scale visual recognition (e.g., recognizing thousands or even tens of thousands of atomic object classes automatically). First, multiple sets of multi-level deep features are extracted from different layers of deep convolutional neural networks (deep CNNs), and they are used to achieve more effective accomplishment of the coarseto- fine tasks for hierarchical visual recognition. A visual tree is then learned by assigning the visually-similar atomic object classes with similar learning complexities into the same group, which can provide a good environment for determining the interrelated learning tasks automatically. By leveraging the inter-task relatedness (inter-class similarities) to learn more discriminative group-specific deep representations, our deep multi-task learning algorithm can train more discriminative node classifiers for distinguishing the visually-similar atomic object classes effectively. Our hierarchical deep multi-task learning (HD-MTL) algorithm can integrate two discriminative regularization terms to control the inter-level error propagation effectively, and it can provide an end-to-end approach for jointly learning more representative deep CNNs (for image representation) and more discriminative tree classifier (for large-scale visual recognition) and updating them simultaneously. Our incremental deep learning algorithms can effectively adapt both the deep CNNs and the tree classifier to the new training images and the new object classes. Our experimental results have demonstrated that our HD-MTL algorithm can achieve very competitive results on improving the accuracy rates for large-scale visual recognition.
An Improved Heuristic Method for Subgraph Isomorphism Problem

NASA Astrophysics Data System (ADS)

Xiang, Yingzhuo; Han, Jiesi; Xu, Haijiang; Guo, Xin

2017-09-01

This paper focus on the subgraph isomorphism (SI) problem. We present an improved genetic algorithm, a heuristic method to search the optimal solution. The contribution of this paper is that we design a dedicated crossover algorithm and a new fitness function to measure the evolution process. Experiments show our improved genetic algorithm performs better than other heuristic methods. For a large graph, such as a subgraph of 40 nodes, our algorithm outperforms the traditional tree search algorithms. We find that the performance of our improved genetic algorithm does not decrease as the number of nodes in prototype graphs.
Global tree network for computing structures enabling global processing operations

DOEpatents

Blumrich; Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Hoenicke, Dirk; Steinmacher-Burow, Burkhard D.; Takken, Todd E.; Vranas, Pavlos M.

2010-01-19

A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node. The global tree network is configurable to provide global barrier and interrupt functionality in asynchronous or synchronized manner, and, is physically and logically partitionable.
Constructing high-quality bounding volume hierarchies for N-body computation using the acceptance volume heuristic

NASA Astrophysics Data System (ADS)

Olsson, O.

2018-01-01

We present a novel heuristic derived from a probabilistic cost model for approximate N-body simulations. We show that this new heuristic can be used to guide tree construction towards higher quality trees with improved performance over current N-body codes. This represents an important step beyond the current practice of using spatial partitioning for N-body simulations, and enables adoption of a range of state-of-the-art algorithms developed for computer graphics applications to yield further improvements in N-body simulation performance. We outline directions for further developments and review the most promising such algorithms.
Novel ID-based anti-collision approach for RFID

NASA Astrophysics Data System (ADS)

Zhang, De-Gan; Li, Wen-Bin

2016-09-01

Novel correlation ID-based (CID) anti-collision approach for RFID under the banner of the Internet of Things (IOT) has been presented in this paper. The key insights are as follows: according to the deterministic algorithms which are based on the binary search tree, we propose a method to increase the association between tags so that tags can initiatively send their own ID under certain trigger conditions, at the same time, we present a multi-tree search method for querying. When the number of tags is small, by replacing the actual ID with the temporary ID, it can greatly reduce the number of times that the reader reads and writes to tag's ID. Active tags send data to the reader by the way of modulation binary pulses. When applying this method to the uncertain ALOHA algorithms, the reader can determine the locations of the empty slots according to the position of the binary pulse, so it can avoid the decrease in efficiency which is caused by reading empty slots when reading slots. Theory and experiment show that this method can greatly improve the recognition efficiency of the system when applied to either the search tree or the ALOHA anti-collision algorithms.

INDDGO: Integrated Network Decomposition & Dynamic programming for Graph Optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Groer, Christopher S; Sullivan, Blair D; Weerapurage, Dinesh P

2012-10-01

It is well-known that dynamic programming algorithms can utilize tree decompositions to provide a way to solve some \\emph{NP}-hard problems on graphs where the complexity is polynomial in the number of nodes and edges in the graph, but exponential in the width of the underlying tree decomposition. However, there has been relatively little computational work done to determine the practical utility of such dynamic programming algorithms. We have developed software to construct tree decompositions using various heuristics and have created a fast, memory-efficient dynamic programming implementation for solving maximum weighted independent set. We describe our software and the algorithms wemore » have implemented, focusing on memory saving techniques for the dynamic programming. We compare the running time and memory usage of our implementation with other techniques for solving maximum weighted independent set, including a commercial integer programming solver and a semi-definite programming solver. Our results indicate that it is possible to solve some instances where the underlying decomposition has width much larger than suggested by the literature. For certain types of problems, our dynamic programming code runs several times faster than these other methods.« less
Holoentropy enabled-decision tree for automatic classification of diabetic retinopathy using retinal fundus images.

PubMed

Mane, Vijay Mahadeo; Jadhav, D V

2017-05-24

Diabetic retinopathy (DR) is the most common diabetic eye disease. Doctors are using various test methods to detect DR. But, the availability of test methods and requirements of domain experts pose a new challenge in the automatic detection of DR. In order to fulfill this objective, a variety of algorithms has been developed in the literature. In this paper, we propose a system consisting of a novel sparking process and a holoentropy-based decision tree for automatic classification of DR images to further improve the effectiveness. The sparking process algorithm is developed for automatic segmentation of blood vessels through the estimation of optimal threshold. The holoentropy enabled decision tree is newly developed for automatic classification of retinal images into normal or abnormal using hybrid features which preserve the disease-level patterns even more than the signal level of the feature. The effectiveness of the proposed system is analyzed using standard fundus image databases DIARETDB0 and DIARETDB1 for sensitivity, specificity and accuracy. The proposed system yields sensitivity, specificity and accuracy values of 96.72%, 97.01% and 96.45%, respectively. The experimental result reveals that the proposed technique outperforms the existing algorithms.
Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree.

PubMed

Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad

2015-01-01

MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen-host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules.
Application of Machine-Learning Models to Predict Tacrolimus Stable Dose in Renal Transplant Recipients

NASA Astrophysics Data System (ADS)

Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei

2017-02-01

Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.
Prediction of microRNA target genes using an efficient genetic algorithm-based decision tree

PubMed Central

Rabiee-Ghahfarrokhi, Behzad; Rafiei, Fariba; Niknafs, Ali Akbar; Zamani, Behzad

2015-01-01

MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression in almost all plants and animals. They play an important role in key processes, such as proliferation, apoptosis, and pathogen–host interactions. Nevertheless, the mechanisms by which miRNAs act are not fully understood. The first step toward unraveling the function of a particular miRNA is the identification of its direct targets. This step has shown to be quite challenging in animals primarily because of incomplete complementarities between miRNA and target mRNAs. In recent years, the use of machine-learning techniques has greatly increased the prediction of miRNA targets, avoiding the need for costly and time-consuming experiments to achieve miRNA targets experimentally. Among the most important machine-learning algorithms are decision trees, which classify data based on extracted rules. In the present work, we used a genetic algorithm in combination with C4.5 decision tree for prediction of miRNA targets. We applied our proposed method to a validated human datasets. We nearly achieved 93.9% accuracy of classification, which could be related to the selection of best rules. PMID:26649272
Evaluation of Algorithms for a Miles-in-Trail Decision Support Tool

NASA Technical Reports Server (NTRS)

Bloem, Michael; Hattaway, David; Bambos, Nicholas

2012-01-01

Four machine learning algorithms were prototyped and evaluated for use in a proposed decision support tool that would assist air traffic managers as they set Miles-in-Trail restrictions. The tool would display probabilities that each possible Miles-in-Trail value should be used in a given situation. The algorithms were evaluated with an expected Miles-in-Trail cost that assumes traffic managers set restrictions based on the tool-suggested probabilities. Basic Support Vector Machine, random forest, and decision tree algorithms were evaluated, as was a softmax regression algorithm that was modified to explicitly reduce the expected Miles-in-Trail cost. The algorithms were evaluated with data from the summer of 2011 for air traffic flows bound to the Newark Liberty International Airport (EWR) over the ARD, PENNS, and SHAFF fixes. The algorithms were provided with 18 input features that describe the weather at EWR, the runway configuration at EWR, the scheduled traffic demand at EWR and the fixes, and other traffic management initiatives in place at EWR. Features describing other traffic management initiatives at EWR and the weather at EWR achieved relatively high information gain scores, indicating that they are the most useful for estimating Miles-in-Trail. In spite of a high variance or over-fitting problem, the decision tree algorithm achieved the lowest expected Miles-in-Trail costs when the algorithms were evaluated using 10-fold cross validation with the summer 2011 data for these air traffic flows.
Crown-Level Tree Species Classification Using Integrated Airborne Hyperspectral and LIDAR Remote Sensing Data

NASA Astrophysics Data System (ADS)

Wang, Z.; Wu, J.; Wang, Y.; Kong, X.; Bao, H.; Ni, Y.; Ma, L.; Jin, J.

2018-05-01

Mapping tree species is essential for sustainable planning as well as to improve our understanding of the role of different trees as different ecological service. However, crown-level tree species automatic classification is a challenging task due to the spectral similarity among diversified tree species, fine-scale spatial variation, shadow, and underlying objects within a crown. Advanced remote sensing data such as airborne Light Detection and Ranging (LiDAR) and hyperspectral imagery offer a great potential opportunity to derive crown spectral, structure and canopy physiological information at the individual crown scale, which can be useful for mapping tree species. In this paper, an innovative approach was developed for tree species classification at the crown level. The method utilized LiDAR data for individual tree crown delineation and morphological structure extraction, and Compact Airborne Spectrographic Imager (CASI) hyperspectral imagery for pure crown-scale spectral extraction. Specifically, four steps were include: 1) A weighted mean filtering method was developed to improve the accuracy of the smoothed Canopy Height Model (CHM) derived from LiDAR data; 2) The marker-controlled watershed segmentation algorithm was, therefore, also employed to delineate the tree-level canopy from the CHM image in this study, and then individual tree height and tree crown were calculated according to the delineated crown; 3) Spectral features within 3 × 3 neighborhood regions centered on the treetops detected by the treetop detection algorithm were derived from the spectrally normalized CASI imagery; 4) The shape characteristics related to their crown diameters and heights were established, and different crown-level tree species were classified using the combination of spectral and shape characteristics. Analysis of results suggests that the developed classification strategy in this paper (OA = 85.12 %, Kc = 0.90) performed better than LiDAR-metrics method (OA = 79.86 %, Kc = 0.81) and spectral-metircs method (OA = 71.26, Kc = 0.69) in terms of classification accuracy, which indicated that the advanced method of data processing and sensitive feature selection are critical for improving the accuracy of crown-level tree species classification.
Model-based conifer crown surface reconstruction from multi-ocular high-resolution aerial imagery

NASA Astrophysics Data System (ADS)

Sheng, Yongwei

2000-12-01

Tree crown parameters such as width, height, shape and crown closure are desirable in forestry and ecological studies, but they are time-consuming and labor intensive to measure in the field. The stereoscopic capability of high-resolution aerial imagery provides a way to crown surface reconstruction. Existing photogrammetric algorithms designed to map terrain surfaces, however, cannot adequately extract crown surfaces, especially for steep conifer crowns. Considering crown surface reconstruction in a broader context of tree characterization from aerial images, we develop a rigorous perspective tree image formation model to bridge image-based tree extraction and crown surface reconstruction, and an integrated model-based approach to conifer crown surface reconstruction. Based on the fact that most conifer crowns are in a solid geometric form, conifer crowns are modeled as a generalized hemi-ellipsoid. Both the automatic and semi-automatic approaches are investigated to optimal tree model development from multi-ocular images. The semi-automatic 3D tree interpreter developed in this thesis is able to efficiently extract reliable tree parameters and tree models in complicated tree stands. This thesis starts with a sophisticated stereo matching algorithm, and incorporates tree models to guide stereo matching. The following critical problems are addressed in the model-based surface reconstruction process: (1) the problem of surface model composition from tree models, (2) the occlusion problem in disparity prediction from tree models, (3) the problem of integrating the predicted disparities into image matching, (4) the tree model edge effect reduction on the disparity map, (5) the occlusion problem in orthophoto production, and (6) the foreshortening problem in image matching, which is very serious for conifer crown surfaces. Solutions to the above problems are necessary for successful crown surface reconstruction. The model-based approach was applied to recover the canopy surface of a dense redwood stand using tri-ocular high-resolution images scanned from 1:2,400 aerial photographs. The results demonstrate the approach's ability to reconstruct complicated stands. The model-based approach proposed in this thesis is potentially applicable to other surfaces recovering problems with a priori knowledge about objects.
Double-bosonization and Majid's conjecture, (I): Rank-inductions of ABCD

NASA Astrophysics Data System (ADS)

Hu, Hongmei; Hu, Naihong

2015-11-01

Majid developed in [S. Majid, Math. Proc. Cambridge Philos. Soc. 125, 151-192 (1999)] the double-bosonization theory to construct Uq(𝔤) and expected to generate inductively not just a line but a tree of quantum groups starting from a node. In this paper, the authors confirm Majid's first expectation (see p. 178 [S. Majid, Math. Proc. Cambridge Philos. Soc. 125, 151-192 (1999)]) through giving and verifying the full details of the inductive constructions of Uq(𝔤) for the classical types, i.e., the ABCD series. Some examples in low ranks are given to elucidate that any quantum group of classical type can be constructed from the node corresponding to Uq(𝔰𝔩2).
Hybrid Parallel Contour Trees, Version 1.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sewell, Christopher; Fasel, Patricia; Carr, Hamish

A common operation in scientific visualization is to compute and render a contour of a data set. Given a function of the form f : R^d -> R, a level set is defined as an inverse image f^-1(h) for an isovalue h, and a contour is a single connected component of a level set. The Reeb graph can then be defined to be the result of contracting each contour to a single point, and is well defined for Euclidean spaces or for general manifolds. For simple domains, the graph is guaranteed to be a tree, and is called the contourmore » tree. Analysis can then be performed on the contour tree in order to identify isovalues of particular interest, based on various metrics, and render the corresponding contours, without having to know such isovalues a priori. This code is intended to be the first data-parallel algorithm for computing contour trees. Our implementation will use the portable data-parallel primitives provided by Nvidia’s Thrust library, allowing us to compile our same code for both GPUs and multi-core CPUs. Native OpenMP and purely serial versions of the code will likely also be included. It will also be extended to provide a hybrid data-parallel / distributed algorithm, allowing scaling beyond a single GPU or CPU.« less
Decision-Tree Formulation With Order-1 Lateral Execution

NASA Technical Reports Server (NTRS)

James, Mark

2007-01-01

A compact symbolic formulation enables mapping of an arbitrarily complex decision tree of a certain type into a highly computationally efficient multidimensional software object. The type of decision trees to which this formulation applies is that known in the art as the Boolean class of balanced decision trees. Parallel lateral slices of an object created by means of this formulation can be executed in constant time considerably less time than would otherwise be required. Decision trees of various forms are incorporated into almost all large software systems. A decision tree is a way of hierarchically solving a problem, proceeding through a set of true/false responses to a conclusion. By definition, a decision tree has a tree-like structure, wherein each internal node denotes a test on an attribute, each branch from an internal node represents an outcome of a test, and leaf nodes represent classes or class distributions that, in turn represent possible conclusions. The drawback of decision trees is that execution of them can be computationally expensive (and, hence, time-consuming) because each non-leaf node must be examined to determine whether to progress deeper into a tree structure or to examine an alternative. The present formulation was conceived as an efficient means of representing a decision tree and executing it in as little time as possible. The formulation involves the use of a set of symbolic algorithms to transform a decision tree into a multi-dimensional object, the rank of which equals the number of lateral non-leaf nodes. The tree can then be executed in constant time by means of an order-one table lookup. The sequence of operations performed by the algorithms is summarized as follows: 1. Determination of whether the tree under consideration can be encoded by means of this formulation. 2. Extraction of decision variables. 3. Symbolic optimization of the decision tree to minimize its form. 4. Expansion and transformation of all nested conjunctive-disjunctive paths to a flattened conjunctive form composed only of equality checks when possible. If each reduced conjunctive form contains only equality checks and all of these forms use the same variables, then the decision tree can be reduced to an order-one operation through a table lookup. The speedup to order one is accomplished by distributing each decision variable over a surface of a multidimensional object by mapping the equality constant to an index
Airborne laser scanning for forest health status assessment and radiative transfer modelling

NASA Astrophysics Data System (ADS)

Novotny, Jan; Zemek, Frantisek; Pikl, Miroslav; Janoutova, Ruzena

2013-04-01

Structural parameters of forest stands/ecosystems are an important complementary source of information to spectral signatures obtained from airborne imaging spectroscopy when quantitative assessment of forest stands are in the focus, such as estimation of forest biomass, biochemical properties (e.g. chlorophyll /water content), etc. The parameterization of radiative transfer (RT) models used in latter case requires three-dimensional spatial distribution of green foliage and woody biomass. Airborne LiDAR data acquired over forest sites bears these kinds of 3D information. The main objective of the study was to compare the results from several approaches to interpolation of digital elevation model (DEM) and digital surface model (DSM). We worked with airborne LiDAR data with different density (TopEye Mk II 1,064nm instrument, 1-5 points/m2) acquired over the Norway spruce forests situated in the Beskydy Mountains, the Czech Republic. Three different interpolation algorithms with increasing complexity were tested: i/Nearest neighbour approach implemented in the BCAL software package (Idaho Univ.); ii/Averaging and linear interpolation techniques used in the OPALS software (Vienna Univ. of Technology); iii/Active contour technique implemented in the TreeVis software (Univ. of Freiburg). We defined two spatial resolutions for the resulting coupled raster DEMs and DSMs outputs: 0.4 m and 1 m, calculated by each algorithm. The grids correspond to the same spatial resolutions of hyperspectral imagery data for which the DEMs were used in a/geometrical correction and b/building a complex tree models for radiative transfer modelling. We applied two types of analyses when comparing between results from the different interpolations/raster resolution: 1/calculated DEM or DSM between themselves; 2/comparison with field data: DEM with measurements from referential GPS, DSM - field tree alometric measurements, where tree height was calculated as DSM-DEM. The results of the analyses show that: 1/averaging techniques tend to underestimate the tree height and the generated surface does not follow the first LiDAR echoes both for 1 m and 0.4 m pixel size; 2/we did not find any significant difference between tree heights calculated by nearest neighbour algorithm and the active contour technique for 1 m pixel output but the difference increased with finer resolution (0.4 m); 3/the accuracy of the DEMs calculated by tested algorithms is similar.
Two Dimensional Path Planning with Obstacles and Shadows.

DTIC Science & Technology

1987-01-01

22060College Park, MD 20742 8la NAME OF FUNDING/SPONSORING Bb. OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER •" " .N!ZATION f (If...quadtree is a tip node (if the tree . It represents a tinifirmly c(olored -Iq tiare region of the picture. A gray n()de ()f the (tii tree is a nd((e...Sight Algorithm traversal of the quadtree, they can be sorted using a binary tree by their relative location on the line of sight, given by the x or y
Species-richness of the Anopheles annulipes Complex (Diptera: Culicidae) Revealed by Tree and Model-Based Allozyme Clustering Analyses

DTIC Science & Technology

2007-01-01

including tree- based methods such as the unweighted pair group method of analysis ( UPGMA ) and Neighbour-joining (NJ) (Saitou & Nei, 1987). By...based Bayesian approach and the tree-based UPGMA and NJ cluster- ing methods. The results obtained suggest that far more species occur in the An...unlikely that groups that differ by more than these levels are conspecific. Genetic distances were clustered using the UPGMA and NJ algorithms in MEGA
Fruit load induces changes in global gene expression and in abscisic acid (ABA) and indole acetic acid (IAA) homeostasis in citrus buds.

PubMed

Shalom, Liron; Samuels, Sivan; Zur, Naftali; Shlizerman, Lyudmila; Doron-Faigenboim, Adi; Blumwald, Eduardo; Sadka, Avi

2014-07-01

Many fruit trees undergo cycles of heavy fruit load (ON-Crop) in one year, followed by low fruit load (OFF-Crop) the following year, a phenomenon known as alternate bearing (AB). The mechanism by which fruit load affects flowering induction during the following year (return bloom) is still unclear. Although not proven, it is commonly accepted that the fruit or an organ which senses fruit presence generates an inhibitory signal that moves into the bud and inhibits apical meristem transition. Indeed, fruit removal from ON-Crop trees (de-fruiting) induces return bloom. Identification of regulatory or metabolic processes modified in the bud in association with altered fruit load might shed light on the nature of the AB signalling process. The bud transcriptome of de-fruited citrus trees was compared with those of ON- and OFF-Crop trees. Fruit removal resulted in relatively rapid changes in global gene expression, including induction of photosynthetic genes and proteins. Altered regulatory mechanisms included abscisic acid (ABA) metabolism and auxin polar transport. Genes of ABA biosynthesis were induced; however, hormone analyses showed that the ABA level was reduced in OFF-Crop buds and in buds shortly following fruit removal. Additionally, genes associated with Ca(2+)-dependent auxin polar transport were remarkably induced in buds of OFF-Crop and de-fruited trees. Hormone analyses showed that auxin levels were reduced in these buds as compared with ON-Crop buds. In view of the auxin transport autoinhibition theory, the possibility that auxin distribution plays a role in determining bud fate is discussed. © The Author 2014. Published by Oxford University Press on behalf of the Society for Experimental Biology.
Microwave Soil Moisture Retrieval Under Trees

NASA Technical Reports Server (NTRS)

O'Neill, P.; Lang, R.; Kurum, M.; Joseph, A.; Jackson, T.; Cosh, M.

2008-01-01

Soil moisture is recognized as an important component of the water, energy, and carbon cycles at the interface between the Earth's surface and atmosphere. Current baseline soil moisture retrieval algorithms for microwave space missions have been developed and validated only over grasslands, agricultural crops, and generally light to moderate vegetation. Tree areas have commonly been excluded from operational soil moisture retrieval plans due to the large expected impact of trees on masking the microwave response to the underlying soil moisture. Our understanding of the microwave properties of trees of various sizes and their effect on soil moisture retrieval algorithms at L band is presently limited, although research efforts are ongoing in Europe, the United States, and elsewhere to remedy this situation. As part of this research, a coordinated sequence of field measurements involving the ComRAD (for Combined Radar/Radiometer) active/passive microwave truck instrument system has been undertaken. Jointly developed and operated by NASA Goddard Space Flight Center and George Washington University, ComRAD consists of dual-polarized 1.4 GHz total-power radiometers (LH, LV) and a quad-polarized 1.25 GHz L band radar sharing a single parabolic dish antenna with a novel broadband stacked patch dual-polarized feed, a quad-polarized 4.75 GHz C band radar, and a single channel 10 GHz XHH radar. The instruments are deployed on a mobile truck with an 19-m hydraulic boom and share common control software; real-time calibrated signals, and the capability for automated data collection for unattended operation. Most microwave soil moisture retrieval algorithms developed for use at L band frequencies are based on the tau-omega model, a simplified zero-order radiative transfer approach where scattering is largely ignored and vegetation canopies are generally treated as a bulk attenuating layer. In this approach, vegetation effects are parameterized by tau and omega, the microwave vegetation opacity and single scattering albedo. One goal of our current research is to determine whether the tau-omega model can work for tree canopies given the increased scatter from trees compared to grasses and crops, and. if so, what are effective values for tau and omega for trees.
TreePOD: Sensitivity-Aware Selection of Pareto-Optimal Decision Trees.

PubMed

Muhlbacher, Thomas; Linhardt, Lorenz; Moller, Torsten; Piringer, Harald

2018-01-01

Balancing accuracy gains with other objectives such as interpretability is a key challenge when building decision trees. However, this process is difficult to automate because it involves know-how about the domain as well as the purpose of the model. This paper presents TreePOD, a new approach for sensitivity-aware model selection along trade-offs. TreePOD is based on exploring a large set of candidate trees generated by sampling the parameters of tree construction algorithms. Based on this set, visualizations of quantitative and qualitative tree aspects provide a comprehensive overview of possible tree characteristics. Along trade-offs between two objectives, TreePOD provides efficient selection guidance by focusing on Pareto-optimal tree candidates. TreePOD also conveys the sensitivities of tree characteristics on variations of selected parameters by extending the tree generation process with a full-factorial sampling. We demonstrate how TreePOD supports a variety of tasks involved in decision tree selection and describe its integration in a holistic workflow for building and selecting decision trees. For evaluation, we illustrate a case study for predicting critical power grid states, and we report qualitative feedback from domain experts in the energy sector. This feedback suggests that TreePOD enables users with and without statistical background a confident and efficient identification of suitable decision trees.
Decision Tree Algorithm-Generated Single-Nucleotide Polymorphism Barcodes of rbcL Genes for 38 Brassicaceae Species Tagging.

PubMed

Yang, Cheng-Hong; Wu, Kuo-Chuan; Chuang, Li-Yeh; Chang, Hsueh-Wei

2018-01-01

DNA barcode sequences are accumulating in large data sets. A barcode is generally a sequence larger than 1000 base pairs and generates a computational burden. Although the DNA barcode was originally envisioned as straightforward species tags, the identification usage of barcode sequences is rarely emphasized currently. Single-nucleotide polymorphism (SNP) association studies provide us an idea that the SNPs may be the ideal target of feature selection to discriminate between different species. We hypothesize that SNP-based barcodes may be more effective than the full length of DNA barcode sequences for species discrimination. To address this issue, we tested a r ibulose diphosphate carboxylase ( rbcL ) S NP b arcoding (RSB) strategy using a decision tree algorithm. After alignment and trimming, 31 SNPs were discovered in the rbcL sequences from 38 Brassicaceae plant species. In the decision tree construction, these SNPs were computed to set up the decision rule to assign the sequences into 2 groups level by level. After algorithm processing, 37 nodes and 31 loci were required for discriminating 38 species. Finally, the sequence tags consisting of 31 rbcL SNP barcodes were identified for discriminating 38 Brassicaceae species based on the decision tree-selected SNP pattern using RSB method. Taken together, this study provides the rational that the SNP aspect of DNA barcode for rbcL gene is a useful and effective sequence for tagging 38 Brassicaceae species.
Volumetric Medical Image Coding: An Object-based, Lossy-to-lossless and Fully Scalable Approach

PubMed Central

Danyali, Habibiollah; Mertins, Alfred

2011-01-01

In this article, an object-based, highly scalable, lossy-to-lossless 3D wavelet coding approach for volumetric medical image data (e.g., magnetic resonance (MR) and computed tomography (CT)) is proposed. The new method, called 3DOBHS-SPIHT, is based on the well-known set partitioning in the hierarchical trees (SPIHT) algorithm and supports both quality and resolution scalability. The 3D input data is grouped into groups of slices (GOS) and each GOS is encoded and decoded as a separate unit. The symmetric tree definition of the original 3DSPIHT is improved by introducing a new asymmetric tree structure. While preserving the compression efficiency, the new tree structure allows for a small size of each GOS, which not only reduces memory consumption during the encoding and decoding processes, but also facilitates more efficient random access to certain segments of slices. To achieve more compression efficiency, the algorithm only encodes the main object of interest in each 3D data set, which can have any arbitrary shape, and ignores the unnecessary background. The experimental results on some MR data sets show the good performance of the 3DOBHS-SPIHT algorithm for multi-resolution lossy-to-lossless coding. The compression efficiency, full scalability, and object-based features of the proposed approach, beside its lossy-to-lossless coding support, make it a very attractive candidate for volumetric medical image information archiving and transmission applications. PMID:22606653
Gene-Tree Reconciliation with MUL-Trees to Resolve Polyploidy Events.

PubMed

Gregg, W C Thomas; Ather, S Hussain; Hahn, Matthew W

2017-11-01

Polyploidy can have a huge impact on the evolution of species, and it is a common occurrence, especially in plants. The two types of polyploids-autopolyploids and allopolyploids-differ in the level of divergence between the genes that are brought together in the new polyploid lineage. Because allopolyploids are formed via hybridization, the homoeologous copies of genes within them are at least as divergent as orthologs in the parental species that came together to form them. This means that common methods for estimating the parental lineages of allopolyploidy events are not accurate, and can lead to incorrect inferences about the number of gene duplications and losses. Here, we have adapted an algorithm for topology-based gene-tree reconciliation to work with multi-labeled trees (MUL-trees). By definition, MUL-trees have some tips with identical labels, which makes them a natural representation of the genomes of polyploids. Using this new reconciliation algorithm we can: accurately place allopolyploidy events on a phylogeny, identify the parental lineages that hybridized to form allopolyploids, distinguish between allo-, auto-, and (in most cases) no polyploidy, and correctly count the number of duplications and losses in a set of gene trees. We validate our method using gene trees simulated with and without polyploidy, and revisit the history of polyploidy in data from the clades including both baker's yeast and bread wheat. Our re-analysis of the yeast data confirms the allopolyploid origin and parental lineages previously identified for this group. The method presented here should find wide use in the growing number of genomes from species with a history of polyploidy. [Polyploidy; reconciliation; whole-genome duplication.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Grid Computing: Topology-Aware, Peer-to-Peer, Power-Aware, and Embedded Web Services

DTIC Science & Technology

2003-09-22

Dist Simulation • Time Management enables temporal causality to be enforced in Distributed Simulations • Typically enforced via a Lower Bound Time...algorithm • Distinguished Root Node Algorithm developed as a topology-aware time management service – Relies on a tree from end-hosts to a
Knowledge Guided Evolutionary Algorithms in Financial Investing

ERIC Educational Resources Information Center

Wimmer, Hayden

2013-01-01

A large body of literature exists on evolutionary computing, genetic algorithms, decision trees, codified knowledge, and knowledge management systems; however, the intersection of these computing topics has not been widely researched. Moving through the set of all possible solutions--or traversing the search space--at random exhibits no control…
Inductive reasoning and implicit memory: evidence from intact and impaired memory systems.

PubMed

Girelli, Luisa; Semenza, Carlo; Delazer, Margarete

2004-01-01

In this study, we modified a classic problem solving task, number series completion, in order to explore the contribution of implicit memory to inductive reasoning. Participants were required to complete number series sharing the same underlying algorithm (e.g., +2), differing in both constituent elements (e.g., 2468 versus 57911) and correct answers (e.g., 10 versus 13). In Experiment 1, reliable priming effects emerged, whether primes and targets were separated by four or ten fillers. Experiment 2 provided direct evidence that the observed facilitation arises at central stages of problem solving, namely the identification of the algorithm and its subsequent extrapolation. The observation of analogous priming effects in a severely amnesic patient strongly supports the hypothesis that the facilitation in number series completion was largely determined by implicit memory processes. These findings demonstrate that the influence of implicit processes extends to higher level cognitive domain such as induction reasoning.
SUNPLIN: Simulation with Uncertainty for Phylogenetic Investigations

PubMed Central

2013-01-01

Background Phylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability. Results In this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/. Conclusion We compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets. PMID:24229408
SUNPLIN: simulation with uncertainty for phylogenetic investigations.

PubMed

Martins, Wellington S; Carmo, Welton C; Longo, Humberto J; Rosa, Thierson C; Rangel, Thiago F

2013-11-15

Phylogenetic comparative analyses usually rely on a single consensus phylogenetic tree in order to study evolutionary processes. However, most phylogenetic trees are incomplete with regard to species sampling, which may critically compromise analyses. Some approaches have been proposed to integrate non-molecular phylogenetic information into incomplete molecular phylogenies. An expanded tree approach consists of adding missing species to random locations within their clade. The information contained in the topology of the resulting expanded trees can be captured by the pairwise phylogenetic distance between species and stored in a matrix for further statistical analysis. Thus, the random expansion and processing of multiple phylogenetic trees can be used to estimate the phylogenetic uncertainty through a simulation procedure. Because of the computational burden required, unless this procedure is efficiently implemented, the analyses are of limited applicability. In this paper, we present efficient algorithms and implementations for randomly expanding and processing phylogenetic trees so that simulations involved in comparative phylogenetic analysis with uncertainty can be conducted in a reasonable time. We propose algorithms for both randomly expanding trees and calculating distance matrices. We made available the source code, which was written in the C++ language. The code may be used as a standalone program or as a shared object in the R system. The software can also be used as a web service through the link: http://purl.oclc.org/NET/sunplin/. We compare our implementations to similar solutions and show that significant performance gains can be obtained. Our results open up the possibility of accounting for phylogenetic uncertainty in evolutionary and ecological analyses of large datasets.
Comparison of different tree sap flow up-scaling procedures using Monte-Carlo simulations

NASA Astrophysics Data System (ADS)

Tatarinov, Fyodor; Preisler, Yakir; Roahtyn, Shani; Yakir, Dan

2015-04-01

An important task in determining forest ecosystem water balance is the estimation of stand transpiration, allowing separating evapotranspiration into transpiration and soil evaporation. This can be based on up-scaling measurements of sap flow in representative trees (SF), which can be done by different mathematical algorithms. The aim of the present study was to evaluate the error associated with different up-scaling algorithms under different conditions. Other types of errors (such as, measurement error, within tree SF variability, choice of sample plot etc.) were not considered here. A set of simulation experiments using Monte-Carlo technique was carried out and three up-scaling procedures were tested. (1) Multiplying mean stand sap flux density based on unit sapwood cross-section area (SFD) by total sapwood area (Klein et al, 2014); (2) deriving of linear dependence of tree sap flow on tree DBH and calculating SFstand using predicted SF by DBH classes and stand DBH distribution (Cermak et al., 2004); (3) same as method 2 but using non-linear dependency. Simulations were performed under different SFD(DBH) slope (bs, positive, negative, zero); different DBH and SFD standard deviations (Δd and Δs, respectively) and DBH class size. It was assumed that all trees in a unit area are measured and the total SF of all trees in the experimental plot was taken as the reference SFstand value. Under negative bs all models tend to overestimate SFstand and the error increases exponentially with decreasing bs. Under bs >0 all models tend to underestimate SFstand, but the error is much smaller than for bs
What Satisfies Students? Mining Student-Opinion Data with Regression and Decision-Tree Analysis. AIR 2002 Forum Paper.

ERIC Educational Resources Information Center

Thomas, Emily H.; Galambos, Nora

To investigate how students' characteristics and experiences affect satisfaction, this study used regression and decision-tree analysis with the CHAID algorithm to analyze student opinion data from a sample of 1,783 college students. A data-mining approach identifies the specific aspects of students' university experience that most influence three…
Study on Privacy Protection Algorithm Based on K-Anonymity

NASA Astrophysics Data System (ADS)

FeiFei, Zhao; LiFeng, Dong; Kun, Wang; Yang, Li

Basing on the study of K-Anonymity algorithm in privacy protection issue, this paper proposed a "Degree Priority" method of visiting Lattice nodes on the generalization tree to improve the performance of K-Anonymity algorithm. This paper also proposed a "Two Times K-anonymity" methods to reduce the information loss in the process of K-Anonymity. Finally, we used experimental results to demonstrate the effectiveness of these methods.
Online Performance-Improvement Algorithms

DTIC Science & Technology

1994-08-01

fault rate as the request sequence length approaches infinity. Their algorithms are based on an innovative use of the classical Ziv - Lempel [85] data ...Report CS-TR-348-91. [85] J. Ziv and A. Lempel . Compression of individual sequences via variable-rate coding. IEEE Trans. Inf. Theory, 24:530-53`, 1978. 94...Deferred Data Structuring Recall that our incremental multi-trip algorithm spreads the building of the fence-tree over several trips in order to
Initialization and Restart in Stochastic Local Search: Computing a Most Probable Explanation in Bayesian Networks

NASA Technical Reports Server (NTRS)

Mengshoel, Ole J.; Wilkins, David C.; Roth, Dan

2010-01-01

For hard computational problems, stochastic local search has proven to be a competitive approach to finding optimal or approximately optimal problem solutions. Two key research questions for stochastic local search algorithms are: Which algorithms are effective for initialization? When should the search process be restarted? In the present work we investigate these research questions in the context of approximate computation of most probable explanations (MPEs) in Bayesian networks (BNs). We introduce a novel approach, based on the Viterbi algorithm, to explanation initialization in BNs. While the Viterbi algorithm works on sequences and trees, our approach works on BNs with arbitrary topologies. We also give a novel formalization of stochastic local search, with focus on initialization and restart, using probability theory and mixture models. Experimentally, we apply our methods to the problem of MPE computation, using a stochastic local search algorithm known as Stochastic Greedy Search. By carefully optimizing both initialization and restart, we reduce the MPE search time for application BNs by several orders of magnitude compared to using uniform at random initialization without restart. On several BNs from applications, the performance of Stochastic Greedy Search is competitive with clique tree clustering, a state-of-the-art exact algorithm used for MPE computation in BNs.
Synchronization algorithm for three-phase voltages of an inverter and a grid

NASA Astrophysics Data System (ADS)

Nos, O. V.

2017-07-01

This paper presents the results of designing a joint phase-locked loop for adjusting the phase shifts (speed) and Euclidean norm of three-phase voltages of an inverter to the same grid parameters. The design can be used, in particular, to match the potentials of two parallel-connected power sources for the fundamental harmonic at the moments of switching the stator windings of an induction AC motor from a converter to a centralized power-supply system and back. Technical implementation of the developed synchronization algorithm will significantly reduce the inductance of the current-balancing reactor and exclude emergency operation modes in the electric motor power circuit.
Models of performance of evolutionary program induction algorithms based on indicators of problem difficulty.

PubMed

Graff, Mario; Poli, Riccardo; Flores, Juan J

2013-01-01

Modeling the behavior of algorithms is the realm of evolutionary algorithm theory. From a practitioner's point of view, theory must provide some guidelines regarding which algorithm/parameters to use in order to solve a particular problem. Unfortunately, most theoretical models of evolutionary algorithms are difficult to apply to realistic situations. However, in recent work (Graff and Poli, 2008, 2010), where we developed a method to practically estimate the performance of evolutionary program-induction algorithms (EPAs), we started addressing this issue. The method was quite general; however, it suffered from some limitations: it required the identification of a set of reference problems, it required hand picking a distance measure in each particular domain, and the resulting models were opaque, typically being linear combinations of 100 features or more. In this paper, we propose a significant improvement of this technique that overcomes the three limitations of our previous method. We achieve this through the use of a novel set of features for assessing problem difficulty for EPAs which are very general, essentially based on the notion of finite difference. To show the capabilities or our technique and to compare it with our previous performance models, we create models for the same two important classes of problems-symbolic regression on rational functions and Boolean function induction-used in our previous work. We model a variety of EPAs. The comparison showed that for the majority of the algorithms and problem classes, the new method produced much simpler and more accurate models than before. To further illustrate the practicality of the technique and its generality (beyond EPAs), we have also used it to predict the performance of both autoregressive models and EPAs on the problem of wind speed forecasting, obtaining simpler and more accurate models that outperform in all cases our previous performance models.
RNA inverse folding using Monte Carlo tree search.

PubMed

Yang, Xiufeng; Yoshizoe, Kazuki; Taneda, Akito; Tsuda, Koji

2017-11-06

Artificially synthesized RNA molecules provide important ways for creating a variety of novel functional molecules. State-of-the-art RNA inverse folding algorithms can design simple and short RNA sequences of specific GC content, that fold into the target RNA structure. However, their performance is not satisfactory in complicated cases. We present a new inverse folding algorithm called MCTS-RNA, which uses Monte Carlo tree search (MCTS), a technique that has shown exceptional performance in Computer Go recently, to represent and discover the essential part of the sequence space. To obtain high accuracy, initial sequences generated by MCTS are further improved by a series of local updates. Our algorithm has an ability to control the GC content precisely and can deal with pseudoknot structures. Using common benchmark datasets for evaluation, MCTS-RNA showed a lot of promise as a standard method of RNA inverse folding. MCTS-RNA is available at https://github.com/tsudalab/MCTS-RNA .
Effects of rooting via out-groups on in-group topology in phylogeny.

PubMed

Ackerman, Margareta; Brown, Daniel G; Loker, David

2014-01-01

Users of phylogenetic methods require rooted trees, because the direction of time depends on the placement of the root. While phylogenetic trees are typically rooted by using an out-group, this mechanism is inappropriate when the addition of an out-group changes the in-group topology. We perform a formal analysis of phylogenetic algorithms under the inclusion of distant out-groups. It turns out that linkage-based algorithms (including UPGMA) and a class of bisecting methods do not modify the topology of the in-group when an out-group is included. By contrast, the popular neighbour joining algorithm fails this property in a strong sense: every data set can have its structure destroyed by some arbitrarily distant outlier. Furthermore, including multiple outliers can lead to an arbitrary topology on the in-group. The standard rooting approach that uses out-groups may be fundamentally unsuited for neighbour joining.
Bilayer segmentation of webcam videos using tree-based classifiers.

PubMed

Yin, Pei; Criminisi, Antonio; Winn, John; Essa, Irfan

2011-01-01

This paper presents an automatic segmentation algorithm for video frames captured by a (monocular) webcam that closely approximates depth segmentation from a stereo camera. The frames are segmented into foreground and background layers that comprise a subject (participant) and other objects and individuals. The algorithm produces correct segmentations even in the presence of large background motion with a nearly stationary foreground. This research makes three key contributions: First, we introduce a novel motion representation, referred to as "motons," inspired by research in object recognition. Second, we propose estimating the segmentation likelihood from the spatial context of motion. The estimation is efficiently learned by random forests. Third, we introduce a general taxonomy of tree-based classifiers that facilitates both theoretical and experimental comparisons of several known classification algorithms and generates new ones. In our bilayer segmentation algorithm, diverse visual cues such as motion, motion context, color, contrast, and spatial priors are fused by means of a conditional random field (CRF) model. Segmentation is then achieved by binary min-cut. Experiments on many sequences of our videochat application demonstrate that our algorithm, which requires no initialization, is effective in a variety of scenes, and the segmentation results are comparable to those obtained by stereo systems.
A Multilevel Gamma-Clustering Layout Algorithm for Visualization of Biological Networks

PubMed Central

Hruz, Tomas; Lucas, Christoph; Laule, Oliver; Zimmermann, Philip

2013-01-01

Visualization of large complex networks has become an indispensable part of systems biology, where organisms need to be considered as one complex system. The visualization of the corresponding network is challenging due to the size and density of edges. In many cases, the use of standard visualization algorithms can lead to high running times and poorly readable visualizations due to many edge crossings. We suggest an approach that analyzes the structure of the graph first and then generates a new graph which contains specific semantic symbols for regular substructures like dense clusters. We propose a multilevel gamma-clustering layout visualization algorithm (MLGA) which proceeds in three subsequent steps: (i) a multilevel γ-clustering is used to identify the structure of the underlying network, (ii) the network is transformed to a tree, and (iii) finally, the resulting tree which shows the network structure is drawn using a variation of a force-directed algorithm. The algorithm has a potential to visualize very large networks because it uses modern clustering heuristics which are optimized for large graphs. Moreover, most of the edges are removed from the visual representation which allows keeping the overview over complex graphs with dense subgraphs. PMID:23864855
Clustering Tree-structured Data on Manifold

PubMed Central

Lu, Na; Miao, Hongyu

2016-01-01

Tree-structured data usually contain both topological and geometrical information, and are necessarily considered on manifold instead of Euclidean space for appropriate data parameterization and analysis. In this study, we propose a novel tree-structured data parameterization, called Topology-Attribute matrix (T-A matrix), so the data clustering task can be conducted on matrix manifold. We incorporate the structure constraints embedded in data into the non-negative matrix factorization method to determine meta-trees from the T-A matrix, and the signature vector of each single tree can then be extracted by meta-tree decomposition. The meta-tree space turns out to be a cone space, in which we explore the distance metric and implement the clustering algorithm based on the concepts like Fréchet mean. Finally, the T-A matrix based clustering (TAMBAC) framework is evaluated and compared using both simulated data and real retinal images to illus trate its efficiency and accuracy. PMID:26660696
RecPhyloXML - a format for reconciled gene trees.

PubMed

Duchemin, Wandrille; Gence, Guillaume; Arigon Chifolleau, Anne-Muriel; Arvestad, Lars; Bansal, Mukul S; Berry, Vincent; Boussau, Bastien; Chevenet, François; Comte, Nicolas; Davín, Adrián A; Dessimoz, Christophe; Dylus, David; Hasic, Damir; Mallo, Diego; Planel, Rémi; Posada, David; Scornavacca, Celine; Szöllosi, Gergely; Zhang, Louxin; Tannier, Éric; Daubin, Vincent

2018-05-14

A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc-along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities. http://phylariane.univ-lyon1.fr/recphyloxml/. wandrille.duchemin@univ-lyon1.fr. There is no supplementary data associated with this publication.
Hydraulic Function in Australian Tree Species during Drought-Induced Mortality

NASA Astrophysics Data System (ADS)

Tissue, D.; Maier, C.; Creek, D.; Choat, B.

2016-12-01

Drought induced tree mortality and decline are key issues facing forest ecology and management. Here, we primarily investigated the hydraulic limitations underpinning drought-induced mortality in three Australian tree species. Using field-based large rainout shelters, three angiosperm species (Casuarina cunninghamiana, Eucalyptus sideroxylon, Eucalyptus tereticornis) were subjected to two successive drought and recovery cycles, prior to a subsequent long and extreme drought to mortality; total duration of experiment was 2.5 years. Leaf gas exchange, leaf and stem hydraulics, and carbon reserves were monitored during the experiment. Trees died as a result of failure in the hydraulic transport system, primarily related to water stress induced embolism. Stomatal closure occurred prior to the induction of significant embolism in the stem xylem of all species. Nonetheless, trees suffered a rapid decline in xylem water potential and increase in embolism during the severe drought treatment. Trees died at water potentials causing greater than 90% loss of hydraulic conductivity in the stem, providing support for the theory that lethal water potential is correlated with complete loss of hydraulic function in the stem xylem of angiosperms.
MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics.

PubMed

Helaers, Raphaël; Milinkovitch, Michel C

2010-07-15

The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org.

MetaPIGA v2.0: maximum likelihood large phylogeny estimation using the metapopulation genetic algorithm and other stochastic heuristics

PubMed Central

2010-01-01

Background The development, in the last decade, of stochastic heuristics implemented in robust application softwares has made large phylogeny inference a key step in most comparative studies involving molecular sequences. Still, the choice of a phylogeny inference software is often dictated by a combination of parameters not related to the raw performance of the implemented algorithm(s) but rather by practical issues such as ergonomics and/or the availability of specific functionalities. Results Here, we present MetaPIGA v2.0, a robust implementation of several stochastic heuristics for large phylogeny inference (under maximum likelihood), including a Simulated Annealing algorithm, a classical Genetic Algorithm, and the Metapopulation Genetic Algorithm (metaGA) together with complex substitution models, discrete Gamma rate heterogeneity, and the possibility to partition data. MetaPIGA v2.0 also implements the Likelihood Ratio Test, the Akaike Information Criterion, and the Bayesian Information Criterion for automated selection of substitution models that best fit the data. Heuristics and substitution models are highly customizable through manual batch files and command line processing. However, MetaPIGA v2.0 also offers an extensive graphical user interface for parameters setting, generating and running batch files, following run progress, and manipulating result trees. MetaPIGA v2.0 uses standard formats for data sets and trees, is platform independent, runs in 32 and 64-bits systems, and takes advantage of multiprocessor and multicore computers. Conclusions The metaGA resolves the major problem inherent to classical Genetic Algorithms by maintaining high inter-population variation even under strong intra-population selection. Implementation of the metaGA together with additional stochastic heuristics into a single software will allow rigorous optimization of each heuristic as well as a meaningful comparison of performances among these algorithms. MetaPIGA v2.0 gives access both to high customization for the phylogeneticist, as well as to an ergonomic interface and functionalities assisting the non-specialist for sound inference of large phylogenetic trees using nucleotide sequences. MetaPIGA v2.0 and its extensive user-manual are freely available to academics at http://www.metapiga.org. PMID:20633263
Teaching a Machine to Feel Postoperative Pain: Combining High-Dimensional Clinical Data with Machine Learning Algorithms to Forecast Acute Postoperative Pain

PubMed Central

Tighe, Patrick J.; Harle, Christopher A.; Hurley, Robert W.; Aytug, Haldun; Boezaart, Andre P.; Fillingim, Roger B.

2015-01-01

Background Given their ability to process highly dimensional datasets with hundreds of variables, machine learning algorithms may offer one solution to the vexing challenge of predicting postoperative pain. Methods Here, we report on the application of machine learning algorithms to predict postoperative pain outcomes in a retrospective cohort of 8071 surgical patients using 796 clinical variables. Five algorithms were compared in terms of their ability to forecast moderate to severe postoperative pain: Least Absolute Shrinkage and Selection Operator (LASSO), gradient-boosted decision tree, support vector machine, neural network, and k-nearest neighbor, with logistic regression included for baseline comparison. Results In forecasting moderate to severe postoperative pain for postoperative day (POD) 1, the LASSO algorithm, using all 796 variables, had the highest accuracy with an area under the receiver-operating curve (ROC) of 0.704. Next, the gradient-boosted decision tree had an ROC of 0.665 and the k-nearest neighbor algorithm had an ROC of 0.643. For POD 3, the LASSO algorithm, using all variables, again had the highest accuracy, with an ROC of 0.727. Logistic regression had a lower ROC of 0.5 for predicting pain outcomes on POD 1 and 3. Conclusions Machine learning algorithms, when combined with complex and heterogeneous data from electronic medical record systems, can forecast acute postoperative pain outcomes with accuracies similar to methods that rely only on variables specifically collected for pain outcome prediction. PMID:26031220
An automated approach to the design of decision tree classifiers

NASA Technical Reports Server (NTRS)

Argentiero, P.; Chin, R.; Beaudet, P.

1982-01-01

An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.
Decision tree modeling using R.

PubMed

Zhang, Zhongheng

2016-08-01

In machine learning field, decision tree learner is powerful and easy to interpret. It employs recursive binary partitioning algorithm that splits the sample in partitioning variable with the strongest association with the response variable. The process continues until some stopping criteria are met. In the example I focus on conditional inference tree, which incorporates tree-structured regression models into conditional inference procedures. While growing a single tree is subject to small changes in the training data, random forests procedure is introduced to address this problem. The sources of diversity for random forests come from the random sampling and restricted set of input variables to be selected. Finally, I introduce R functions to perform model based recursive partitioning. This method incorporates recursive partitioning into conventional parametric model building.
Fast Image Texture Classification Using Decision Trees

NASA Technical Reports Server (NTRS)

Thompson, David R.

2011-01-01

Texture analysis would permit improved autonomous, onboard science data interpretation for adaptive navigation, sampling, and downlink decisions. These analyses would assist with terrain analysis and instrument placement in both macroscopic and microscopic image data products. Unfortunately, most state-of-the-art texture analysis demands computationally expensive convolutions of filters involving many floating-point operations. This makes them infeasible for radiation- hardened computers and spaceflight hardware. A new method approximates traditional texture classification of each image pixel with a fast decision-tree classifier. The classifier uses image features derived from simple filtering operations involving integer arithmetic. The texture analysis method is therefore amenable to implementation on FPGA (field-programmable gate array) hardware. Image features based on the "integral image" transform produce descriptive and efficient texture descriptors. Training the decision tree on a set of training data yields a classification scheme that produces reasonable approximations of optimal "texton" analysis at a fraction of the computational cost. A decision-tree learning algorithm employing the traditional k-means criterion of inter-cluster variance is used to learn tree structure from training data. The result is an efficient and accurate summary of surface morphology in images. This work is an evolutionary advance that unites several previous algorithms (k-means clustering, integral images, decision trees) and applies them to a new problem domain (morphology analysis for autonomous science during remote exploration). Advantages include order-of-magnitude improvements in runtime, feasibility for FPGA hardware, and significant improvements in texture classification accuracy.
Simultaneous phylogeny reconstruction and multiple sequence alignment

PubMed Central

Yue, Feng; Shi, Jian; Tang, Jijun

2009-01-01

Background A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality. Results We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality. Conclusion We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments. PMID:19208110
Magnetoacoustic tomographic imaging of electrical impedance with magnetic induction

PubMed Central

Xia, Rongmin; Li, Xu; He, Bin

2008-01-01

Magnetoacoustic tomography with magnetic induction (MAT-MI) is a recently introduced method for imaging tissue electrical impedance properties by integrating magnetic induction and ultrasound measurements. In the present study, we have developed a focused cylindrical scanning mode MAT-MI system and the corresponding reconstruction algorithms. Using this system, we demonstrated 3-dimensional MAT-MI imaging in a physical phantom, with cylindrical scanning combined with ultrasound focusing, and the ability of MAT-MI in imaging electrical conductivity properties of biological tissue. PMID:19169372
Magnetoacoustic tomographic imaging of electrical impedance with magnetic induction

NASA Astrophysics Data System (ADS)

Xia, Rongmin; Li, Xu; He, Bin

2007-08-01

Magnetoacoustic tomography with magnetic induction (MAT-MI) is a recently introduced method for imaging tissue electrical impedance properties by integrating magnetic induction and ultrasound measurements. In the present study, the authors have developed a focused cylindrical scanning mode MAT-MI system and the corresponding reconstruction algorithms. Using this system, they demonstrated a three-dimensional MAT-MI imaging approach in a physical phantom, with cylindrical scanning combined with ultrasound focusing, and the ability of MAT-MI in imaging electrical conductivity properties of biological tissue.
Magnetoacoustic tomographic imaging of electrical impedance with magnetic induction.

PubMed

Xia, Rongmin; Li, Xu; He, Bin

2007-08-22

Magnetoacoustic tomography with magnetic induction (MAT-MI) is a recently introduced method for imaging tissue electrical impedance properties by integrating magnetic induction and ultrasound measurements. In the present study, we have developed a focused cylindrical scanning mode MAT-MI system and the corresponding reconstruction algorithms. Using this system, we demonstrated 3-dimensional MAT-MI imaging in a physical phantom, with cylindrical scanning combined with ultrasound focusing, and the ability of MAT-MI in imaging electrical conductivity properties of biological tissue.
Theoretic derivation of directed acyclic subgraph algorithm and comparisons with message passing algorithm

NASA Astrophysics Data System (ADS)

Ha, Jeongmok; Jeong, Hong

2016-07-01

This study investigates the directed acyclic subgraph (DAS) algorithm, which is used to solve discrete labeling problems much more rapidly than other Markov-random-field-based inference methods but at a competitive accuracy. However, the mechanism by which the DAS algorithm simultaneously achieves competitive accuracy and fast execution speed, has not been elucidated by a theoretical derivation. We analyze the DAS algorithm by comparing it with a message passing algorithm. Graphical models, inference methods, and energy-minimization frameworks are compared between DAS and message passing algorithms. Moreover, the performances of DAS and other message passing methods [sum-product belief propagation (BP), max-product BP, and tree-reweighted message passing] are experimentally compared.
Novel E-Field Sensor for Projectile Detection

DTIC Science & Technology

2012-10-22

aircrafts. They used an array of three plate induction sensors and a simple algorithm to deter mine the direction of the planes [9]. In more recent...publications [10, 11, 12] researchers present increasingly more advanced algorithms and sensors. The techniques developed thus far have not received...the electric field pulse is being detected by a group of sensors in array with known distances between the sensors, so triangulation algorithms could
Power efficient control algorithm of electromechanical unbalance vibration exciter with induction motor

NASA Astrophysics Data System (ADS)

Topovskiy, V. V.; Simakov, G. M.

2017-10-01

A control algorithm of an electromechanical unbalance vibration exciter that provides a free rotational movement is offered in the paper. The unbalance vibration exciter control system realizing a free rotational movement has been synthesized. The structured modeling of the synthesized system has been carried out and its transients are presented. The advantages and disadvantages of the proposed control algorithm applied to the unbalance vibration exciter are shown.
Vascular system modeling in parallel environment - distributed and shared memory approaches

PubMed Central

Jurczuk, Krzysztof; Kretowski, Marek; Bezy-Wendling, Johanne

2011-01-01

The paper presents two approaches in parallel modeling of vascular system development in internal organs. In the first approach, new parts of tissue are distributed among processors and each processor is responsible for perfusing its assigned parts of tissue to all vascular trees. Communication between processors is accomplished by passing messages and therefore this algorithm is perfectly suited for distributed memory architectures. The second approach is designed for shared memory machines. It parallelizes the perfusion process during which individual processing units perform calculations concerning different vascular trees. The experimental results, performed on a computing cluster and multi-core machines, show that both algorithms provide a significant speedup. PMID:21550891
Adaptive segmentation of cerebrovascular tree in time-of-flight magnetic resonance angiography.

PubMed

Hao, J T; Li, M L; Tang, F L

2008-01-01

Accurate segmentation of the human vasculature is an important prerequisite for a number of clinical procedures, such as diagnosis, image-guided neurosurgery and pre-surgical planning. In this paper, an improved statistical approach to extracting whole cerebrovascular tree in time-of-flight magnetic resonance angiography is proposed. Firstly, in order to get a more accurate segmentation result, a localized observation model is proposed instead of defining the observation model over the entire dataset. Secondly, for the binary segmentation, an improved Iterative Conditional Model (ICM) algorithm is presented to accelerate the segmentation process. The experimental results showed that the proposed algorithm can obtain more satisfactory segmentation results and save more processing time than conventional approaches, simultaneously.
Applications of rule-induction in the derivation of quantitative structure-activity relationships.

PubMed

A-Razzak, M; Glen, R C

1992-08-01

Recently, methods have been developed in the field of Artificial Intelligence (AI), specifically in the expert systems area using rule-induction, designed to extract rules from data. We have applied these methods to the analysis of molecular series with the objective of generating rules which are predictive and reliable. The input to rule-induction consists of a number of examples with known outcomes (a training set) and the output is a tree-structured series of rules. Unlike most other analysis methods, the results of the analysis are in the form of simple statements which can be easily interpreted. These are readily applied to new data giving both a classification and a probability of correctness. Rule-induction has been applied to in-house generated and published QSAR datasets and the methodology, application and results of these analyses are discussed. The results imply that in some cases it would be advantageous to use rule-induction as a complementary technique in addition to conventional statistical and pattern-recognition methods.
Applications of rule-induction in the derivation of quantitative structure-activity relationships

NASA Astrophysics Data System (ADS)

A-Razzak, Mohammed; Glen, Robert C.

1992-08-01

Recently, methods have been developed in the field of Artificial Intelligence (AI), specifically in the expert systems area using rule-induction, designed to extract rules from data. We have applied these methods to the analysis of molecular series with the objective of generating rules which are predictive and reliable. The input to rule-induction consists of a number of examples with known outcomes (a training set) and the output is a tree-structured series of rules. Unlike most other analysis methods, the results of the analysis are in the form of simple statements which can be easily interpreted. These are readily applied to new data giving both a classification and a probability of correctness. Rule-induction has been applied to in-house generated and published QSAR datasets and the methodology, application and results of these analyses are discussed. The results imply that in some cases it would be advantageous to use rule-induction as a complementary technique in addition to conventional statistical and pattern-recognition methods.
GSHR-Tree: a spatial index tree based on dynamic spatial slot and hash table in grid environments

NASA Astrophysics Data System (ADS)

Chen, Zhanlong; Wu, Xin-cai; Wu, Liang

2008-12-01

Computation Grids enable the coordinated sharing of large-scale distributed heterogeneous computing resources that can be used to solve computationally intensive problems in science, engineering, and commerce. Grid spatial applications are made possible by high-speed networks and a new generation of Grid middleware that resides between networks and traditional GIS applications. The integration of the multi-sources and heterogeneous spatial information and the management of the distributed spatial resources and the sharing and cooperative of the spatial data and Grid services are the key problems to resolve in the development of the Grid GIS. The performance of the spatial index mechanism is the key technology of the Grid GIS and spatial database affects the holistic performance of the GIS in Grid Environments. In order to improve the efficiency of parallel processing of a spatial mass data under the distributed parallel computing grid environment, this paper presents a new grid slot hash parallel spatial index GSHR-Tree structure established in the parallel spatial indexing mechanism. Based on the hash table and dynamic spatial slot, this paper has improved the structure of the classical parallel R tree index. The GSHR-Tree index makes full use of the good qualities of R-Tree and hash data structure. This paper has constructed a new parallel spatial index that can meet the needs of parallel grid computing about the magnanimous spatial data in the distributed network. This arithmetic splits space in to multi-slots by multiplying and reverting and maps these slots to sites in distributed and parallel system. Each sites constructs the spatial objects in its spatial slot into an R tree. On the basis of this tree structure, the index data was distributed among multiple nodes in the grid networks by using large node R-tree method. The unbalance during process can be quickly adjusted by means of a dynamical adjusting algorithm. This tree structure has considered the distributed operation, reduplication operation transfer operation of spatial index in the grid environment. The design of GSHR-Tree has ensured the performance of the load balance in the parallel computation. This tree structure is fit for the parallel process of the spatial information in the distributed network environments. Instead of spatial object's recursive comparison where original R tree has been used, the algorithm builds the spatial index by applying binary code operation in which computer runs more efficiently, and extended dynamic hash code for bit comparison. In GSHR-Tree, a new server is assigned to the network whenever a split of a full node is required. We describe a more flexible allocation protocol which copes with a temporary shortage of storage resources. It uses a distributed balanced binary spatial tree that scales with insertions to potentially any number of storage servers through splits of the overloaded ones. The application manipulates the GSHR-Tree structure from a node in the grid environment. The node addresses the tree through its image that the splits can make outdated. This may generate addressing errors, solved by the forwarding among the servers. In this paper, a spatial index data distribution algorithm that limits the number of servers has been proposed. We improve the storage utilization at the cost of additional messages. The structure of GSHR-Tree is believed that the scheme of this grid spatial index should fit the needs of new applications using endlessly larger sets of spatial data. Our proposal constitutes a flexible storage allocation method for a distributed spatial index. The insertion policy can be tuned dynamically to cope with periods of storage shortage. In such cases storage balancing should be favored for better space utilization, at the price of extra message exchanges between servers. This structure makes a compromise in the updating of the duplicated index and the transformation of the spatial index data. Meeting the needs of the grid computing, GSHRTree has a flexible structure in order to satisfy new needs in the future. The GSHR-Tree provides the R-tree capabilities for large spatial datasets stored over interconnected servers. The analysis, including the experiments, confirmed the efficiency of our design choices. The scheme should fit the needs of new applications of spatial data, using endlessly larger datasets. Using the system response time of the parallel processing of spatial scope query algorithm as the performance evaluation factor, According to the result of the simulated the experiments, GSHR-Tree is performed to prove the reasonable design and the high performance of the indexing structure that the paper presented.
Analysis of algorithms for predicting canopy fuel

Treesearch

Katharine L. Gray; Elizabeth Reinhardt

2003-01-01

We compared observed canopy fuel characteristics with those predicted by existing biomass algorithms. We specifically examined the accuracy of the biomass equations developed by Brown (1978. We used destructively sampled data obtained at 5 different study areas. We compared predicted and observed quantities of foliage and crown biomass for individual trees in our study...
Design of long induction linacs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Caporaso, G.J.; Cole, A.G.

1990-09-06

A self-consistent design strategy for induction linacs is presented which addresses the issues of brightness preservation against space charge induced emittance growth, minimization of the beam breakup instability and the suppression of beam centroid motion due to chromatic effects (corkscrew) and misaligned focusing elements. A simple steering algorithm is described that widens the effective energy bandwidth of the transport system.
A Reconstruction Algorithm of Magnetoacoustic Tomography with Magnetic Induction for Acoustically Inhomogeneous Tissue

PubMed Central

Zhou, Lian; Zhu, Shanan

2014-01-01

Magnetoacoustic tomography with Magnetic Induction (MAT-MI) is a noninvasive electrical conductivity imaging approach that measures ultrasound wave induced by magnetic stimulation, for reconstructing the distribution of electrical impedance in biological tissue. Existing reconstruction algorithms for MAT-MI are based on the assumption that the acoustic properties in the tissue are homogeneous. However, the tissue in most parts of human body, has heterogeneous acoustic properties, which leads to potential distortion and blurring of small buried objects in the impedance images. In the present study, we proposed a new algorithm for MAT-MI to image the impedance distribution in tissues with inhomogeneous acoustic speed distributions. With a computer head model constructed from MR images of a human subject, a series of numerical simulation experiments were conducted. The present results indicate that the inhomogeneous acoustic properties of tissues in terms of speed variation can be incorporated in MAT-MI imaging. PMID:24845284

INDUCTION OF PLANT ALLERGENS BY ENVIRONMENTAL AGENTS

EPA Science Inventory

The specific objectives of the project and the progress toward meeting these objectives are listed below:

Identify up to Three Environmental Exposures That Induce the Production of an Allergenic Protein in the Mountain Cedar Tree by Examining the Effects of Exposu...

Tree ring wood analysis after hydrogen peroxide pressure decomposition with inductively coupled plasma atomic emission spectrometry and electrothermal vaporization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Matusiewicz, H.; Barnes, R.M.

1985-02-01

A method utilizing pressure decomposition to minimize sample pretreatment is described for the inductively coupled plasma atomic emission spectrometric analysis of red spruce and sugar maple. Cores collected from trees growing on Camels Hump Mountain, Vermont, were divided into decade increments in order to monitor the temporal changes in concentrations of 21 elements. Dried wood samples were decomposed in a bomb made of Teflon with 50% hydrogen peroxide heated in an oven at 125/sup 0/C for 4 h. The digestion permitted use of aqueous standards and minimized any potential matrix effects. The element concentrations were obtained sequentially by electrothermal vaporizationmore » ICP-AES using 5 ..mu..L sample aliquots. The method precision varied between 3 and 12%. Elements forming oxyanions (Al, As, Fe, Ge, Mn, Si, V) were found at elevated concentrations during the most recent three decades, while other metal (e.g., Mg, Zn) concentrations were unchanged or decreased. 45 references, 6 tables, 1 figure.« less

Productivity and cost of marking activities for single-tree selection and thinning treatments in Arkansas

Treesearch

Tymur Sydor; Richard A. Kluender; Rodney L. Busby; Matthew Pelkki

2004-01-01

An activity algorithm was developed for standard marking methods for natural pine stands in Arkansas. For the two types of marking methods examined, thinning (selection from below) and single-tree selection (selection from above), cycle time and cost models were developed. Basal area (BA) removed was the major influencing factor in both models. Marking method was...

On the Complexity of the Asymmetric VPN Problem

NASA Astrophysics Data System (ADS)

Rothvoß, Thomas; Sanità, Laura

We give the first constant factor approximation algorithm for the asymmetric Virtual Private Network (textsc{Vpn}) problem with arbitrary concave costs. We even show the stronger result, that there is always a tree solution of cost at most 2·OPT and that a tree solution of (expected) cost at most 49.84·OPT can be determined in polynomial time.

Abstracting GIS Layers from Hyperspectral Imagery

DTIC Science & Technology

2009-03-01

Difference Vegetative Index ( NDVI ) 2-20 2.2.10 Separating Trees from Grass . . . . . . . . . . . 2-22 2.3 Spatial Analysis...2-18 2.10. Example of the Normalized Difference Vegetation Index ( NDVI ) applied to a hyperspectral image. . . . . . . . . . . . . . . . . . 2-20...3.5. Example of applying NDVI to a SOM. . . . . . . . . . . . . . . 3-8 3.6. Visualization of the NIR scatter tree ID algorithm. . . . . . . . 3-9 ix

Topological leakage detection and freeze-and-grow propagation for improved CT-based airway segmentation

NASA Astrophysics Data System (ADS)

Nadeem, Syed Ahmed; Hoffman, Eric A.; Sieren, Jered P.; Saha, Punam K.

2018-03-01

Numerous large multi-center studies are incorporating the use of computed tomography (CT)-based characterization of the lung parenchyma and bronchial tree to understand chronic obstructive pulmonary disease status and progression. To the best of our knowledge, there are no fully automated airway tree segmentation methods, free of the need for user review. A failure in even a fraction of segmentation results necessitates manual revision of all segmentation masks which is laborious considering the thousands of image data sets evaluated in large studies. In this paper, we present a novel CT-based airway tree segmentation algorithm using topological leakage detection and freeze-and-grow propagation. The method is fully automated requiring no manual inputs or post-segmentation editing. It uses simple intensity-based connectivity and a freeze-and-grow propagation algorithm to iteratively grow the airway tree starting from an initial seed inside the trachea. It begins with a conservative parameter and then, gradually shifts toward more generous parameter values. The method was applied on chest CT scans of fifteen subjects at total lung capacity. Airway segmentation results were qualitatively assessed and performed comparably to established airway segmentation method with no major visual leakages.

Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies.

PubMed

Sayyari, Erfan; Mirarab, Siavash

2018-02-28

Phylogenetic species trees typically represent the speciation history as a bifurcating tree. Speciation events that simultaneously create more than two descendants, thereby creating polytomies in the phylogeny, are possible. Moreover, the inability to resolve relationships is often shown as a (soft) polytomy. Both types of polytomies have been traditionally studied in the context of gene tree reconstruction from sequence data. However, polytomies in the species tree cannot be detected or ruled out without considering gene tree discordance. In this paper, we describe a statistical test based on properties of the multi-species coalescent model to test the null hypothesis that a branch in an estimated species tree should be replaced by a polytomy. On both simulated and biological datasets, we show that the null hypothesis is rejected for all but the shortest branches, and in most cases, it is retained for true polytomies. The test, available as part of the Accurate Species TRee ALgorithm (ASTRAL) package, can help systematists decide whether their datasets are sufficient to resolve specific relationships of interest.

Testing for Polytomies in Phylogenetic Species Trees Using Quartet Frequencies

PubMed Central

Sayyari, Erfan

2018-01-01

Deriving pathway maps from automated text analysis using a grammar-based approach.

PubMed

Olsson, Björn; Gawronska, Barbara; Erlendsson, Björn

2006-04-01

We demonstrate how automated text analysis can be used to support the large-scale analysis of metabolic and regulatory pathways by deriving pathway maps from textual descriptions found in the scientific literature. The main assumption is that correct syntactic analysis combined with domain-specific heuristics provides a good basis for relation extraction. Our method uses an algorithm that searches through the syntactic trees produced by a parser based on a Referent Grammar formalism, identifies relations mentioned in the sentence, and classifies them with respect to their semantic class and epistemic status (facts, counterfactuals, hypotheses). The semantic categories used in the classification are based on the relation set used in KEGG (Kyoto Encyclopedia of Genes and Genomes), so that pathway maps using KEGG notation can be automatically generated. We present the current version of the relation extraction algorithm and an evaluation based on a corpus of abstracts obtained from PubMed. The results indicate that the method is able to combine a reasonable coverage with high accuracy. We found that 61% of all sentences were parsed, and 97% of the parse trees were judged to be correct. The extraction algorithm was tested on a sample of 300 parse trees and was found to produce correct extractions in 90.5% of the cases.

Indexing Volumetric Shapes with Matching and Packing

PubMed Central

Koes, David Ryan; Camacho, Carlos J.

2014-01-01

We describe a novel algorithm for bulk-loading an index with high-dimensional data and apply it to the problem of volumetric shape matching. Our matching and packing algorithm is a general approach for packing data according to a similarity metric. First an approximate k-nearest neighbor graph is constructed using vantage-point initialization, an improvement to previous work that decreases construction time while improving the quality of approximation. Then graph matching is iteratively performed to pack related items closely together. The end result is a dense index with good performance. We define a new query specification for shape matching that uses minimum and maximum shape constraints to explicitly specify the spatial requirements of the desired shape. This specification provides a natural language for performing volumetric shape matching and is readily supported by the geometry-based similarity search (GSS) tree, an indexing structure that maintains explicit representations of volumetric shape. We describe our implementation of a GSS tree for volumetric shape matching and provide a comprehensive evaluation of parameter sensitivity, performance, and scalability. Compared to previous bulk-loading algorithms, we find that matching and packing can construct a GSS-tree index in the same amount of time that is denser, flatter, and better performing, with an observed average performance improvement of 2X. PMID:26085707

A new fast algorithm for solving the minimum spanning tree problem based on DNA molecules computation.

PubMed

Wang, Zhaocai; Huang, Dongmei; Meng, Huajun; Tang, Chengpei

2013-10-01

The minimum spanning tree (MST) problem is to find minimum edge connected subsets containing all the vertex of a given undirected graph. It is a vitally important NP-complete problem in graph theory and applied mathematics, having numerous real life applications. Moreover in previous studies, DNA molecular operations usually were used to solve NP-complete head-to-tail path search problems, rarely for NP-hard problems with multi-lateral path solutions result, such as the minimum spanning tree problem. In this paper, we present a new fast DNA algorithm for solving the MST problem using DNA molecular operations. For an undirected graph with n vertex and m edges, we reasonably design flexible length DNA strands representing the vertex and edges, take appropriate steps and get the solutions of the MST problem in proper length range and O(3m+n) time complexity. We extend the application of DNA molecular operations and simultaneity simplify the complexity of the computation. Results of computer simulative experiments show that the proposed method updates some of the best known values with very short time and that the proposed method provides a better performance with solution accuracy over existing algorithms. Copyright © 2013 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

NMR structure calculation for all small molecule ligands and non-standard residues from the PDB Chemical Component Dictionary.

PubMed

Yilmaz, Emel Maden; Güntert, Peter

2015-09-01

An algorithm, CYLIB, is presented for converting molecular topology descriptions from the PDB Chemical Component Dictionary into CYANA residue library entries. The CYANA structure calculation algorithm uses torsion angle molecular dynamics for the efficient computation of three-dimensional structures from NMR-derived restraints. For this, the molecules have to be represented in torsion angle space with rotations around covalent single bonds as the only degrees of freedom. The molecule must be given a tree structure of torsion angles connecting rigid units composed of one or several atoms with fixed relative positions. Setting up CYANA residue library entries therefore involves, besides straightforward format conversion, the non-trivial step of defining a suitable tree structure of torsion angles, and to re-order the atoms in a way that is compatible with this tree structure. This can be done manually for small numbers of ligands but the process is time-consuming and error-prone. An automated method is necessary in order to handle the large number of different potential ligand molecules to be studied in drug design projects. Here, we present an algorithm for this purpose, and show that CYANA structure calculations can be performed with almost all small molecule ligands and non-standard amino acid residues in the PDB Chemical Component Dictionary.

A P2P Botnet detection scheme based on decision tree and adaptive multilayer neural networks.

PubMed

Alauthaman, Mohammad; Aslam, Nauman; Zhang, Li; Alasem, Rafe; Hossain, M A

2018-01-01

In recent years, Botnets have been adopted as a popular method to carry and spread many malicious codes on the Internet. These malicious codes pave the way to execute many fraudulent activities including spam mail, distributed denial-of-service attacks and click fraud. While many Botnets are set up using centralized communication architecture, the peer-to-peer (P2P) Botnets can adopt a decentralized architecture using an overlay network for exchanging command and control data making their detection even more difficult. This work presents a method of P2P Bot detection based on an adaptive multilayer feed-forward neural network in cooperation with decision trees. A classification and regression tree is applied as a feature selection technique to select relevant features. With these features, a multilayer feed-forward neural network training model is created using a resilient back-propagation learning algorithm. A comparison of feature set selection based on the decision tree, principal component analysis and the ReliefF algorithm indicated that the neural network model with features selection based on decision tree has a better identification accuracy along with lower rates of false positives. The usefulness of the proposed approach is demonstrated by conducting experiments on real network traffic datasets. In these experiments, an average detection rate of 99.08 % with false positive rate of 0.75 % was observed.

New Splitting Criteria for Decision Trees in Stationary Data Streams.

PubMed

Jaworski, Maciej; Duda, Piotr; Rutkowski, Leszek; Jaworski, Maciej; Duda, Piotr; Rutkowski, Leszek; Rutkowski, Leszek; Duda, Piotr; Jaworski, Maciej

2018-06-01

The most popular tools for stream data mining are based on decision trees. In previous 15 years, all designed methods, headed by the very fast decision tree algorithm, relayed on Hoeffding's inequality and hundreds of researchers followed this scheme. Recently, we have demonstrated that although the Hoeffding decision trees are an effective tool for dealing with stream data, they are a purely heuristic procedure; for example, classical decision trees such as ID3 or CART cannot be adopted to data stream mining using Hoeffding's inequality. Therefore, there is an urgent need to develop new algorithms, which are both mathematically justified and characterized by good performance. In this paper, we address this problem by developing a family of new splitting criteria for classification in stationary data streams and investigating their probabilistic properties. The new criteria, derived using appropriate statistical tools, are based on the misclassification error and the Gini index impurity measures. The general division of splitting criteria into two types is proposed. Attributes chosen based on type- splitting criteria guarantee, with high probability, the highest expected value of split measure. Type- criteria ensure that the chosen attribute is the same, with high probability, as it would be chosen based on the whole infinite data stream. Moreover, in this paper, two hybrid splitting criteria are proposed, which are the combinations of single criteria based on the misclassification error and Gini index.

Loops in hierarchical channel networks

NASA Astrophysics Data System (ADS)

Katifori, Eleni; Magnasco, Marcelo

2012-02-01

Nature provides us with many examples of planar distribution and structural networks having dense sets of closed loops. An archetype of this form of network organization is the vasculature of dicotyledonous leaves, which showcases a hierarchically-nested architecture. Although a number of methods have been proposed to measure aspects of the structure of such networks, a robust metric to quantify their hierarchical organization is still lacking. We present an algorithmic framework that allows mapping loopy networks to binary trees, preserving in the connectivity of the trees the architecture of the original graph. We apply this framework to investigate computer generated and natural graphs extracted from digitized images of dicotyledonous leaves and animal vasculature. We calculate various metrics on the corresponding trees and discuss the relationship of these quantities to the architectural organization of the original graphs. This algorithmic framework decouples the geometric information from the metric topology (connectivity and edge weight) and it ultimately allows us to perform a quantitative statistical comparison between predictions of theoretical models and naturally occurring loopy graphs.

An improved spanning tree approach for the reliability analysis of supply chain collaborative network

NASA Astrophysics Data System (ADS)

Lam, C. Y.; Ip, W. H.

2012-11-01

A higher degree of reliability in the collaborative network can increase the competitiveness and performance of an entire supply chain. As supply chain networks grow more complex, the consequences of unreliable behaviour become increasingly severe in terms of cost, effort and time. Moreover, it is computationally difficult to calculate the network reliability of a Non-deterministic Polynomial-time hard (NP-hard) all-terminal network using state enumeration, as this may require a huge number of iterations for topology optimisation. Therefore, this paper proposes an alternative approach of an improved spanning tree for reliability analysis to help effectively evaluate and analyse the reliability of collaborative networks in supply chains and reduce the comparative computational complexity of algorithms. Set theory is employed to evaluate and model the all-terminal reliability of the improved spanning tree algorithm and present a case study of a supply chain used in lamp production to illustrate the application of the proposed approach.

Augmenting computer networks

NASA Technical Reports Server (NTRS)

Bokhari, S. H.; Raza, A. D.

1984-01-01

Three methods of augmenting computer networks by adding at most one link per processor are discussed: (1) A tree of N nodes may be augmented such that the resulting graph has diameter no greater than 4log sub 2((N+2)/3)-2. Thi O(N(3)) algorithm can be applied to any spanning tree of a connected graph to reduce the diameter of that graph to O(log N); (2) Given a binary tree T and a chain C of N nodes each, C may be augmented to produce C so that T is a subgraph of C. This algorithm is O(N) and may be used to produce augmented chains or rings that have diameter no greater than 2log sub 2((N+2)/3) and are planar; (3) Any rectangular two-dimensional 4 (8) nearest neighbor array of size N = 2(k) may be augmented so that it can emulate a single step shuffle-exchange network of size N/2 in 3(t) time steps.

Parallel VLSI architecture emulation and the organization of APSA/MPP

NASA Technical Reports Server (NTRS)

Odonnell, John T.

1987-01-01

The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.

Image matching as a data source for forest inventory - Comparison of Semi-Global Matching and Next-Generation Automatic Terrain Extraction algorithms in a typical managed boreal forest environment

NASA Astrophysics Data System (ADS)

Kukkonen, M.; Maltamo, M.; Packalen, P.

2017-08-01

Image matching is emerging as a compelling alternative to airborne laser scanning (ALS) as a data source for forest inventory and management. There is currently an open discussion in the forest inventory community about whether, and to what extent, the new method can be applied to practical inventory campaigns. This paper aims to contribute to this discussion by comparing two different image matching algorithms (Semi-Global Matching [SGM] and Next-Generation Automatic Terrain Extraction [NGATE]) and ALS in a typical managed boreal forest environment in southern Finland. Spectral features from unrectified aerial images were included in the modeling and the potential of image matching in areas without a high resolution digital terrain model (DTM) was also explored. Plot level predictions for total volume, stem number, basal area, height of basal area median tree and diameter of basal area median tree were modeled using an area-based approach. Plot level dominant tree species were predicted using a random forest algorithm, also using an area-based approach. The statistical difference between the error rates from different datasets was evaluated using a bootstrap method. Results showed that ALS outperformed image matching with every forest attribute, even when a high resolution DTM was used for height normalization and spectral information from images was included. Dominant tree species classification with image matching achieved accuracy levels similar to ALS regardless of the resolution of the DTM when spectral metrics were used. Neither of the image matching algorithms consistently outperformed the other, but there were noticeably different error rates depending on the parameter configuration, spectral band, resolution of DTM, or response variable. This study showed that image matching provides reasonable point cloud data for forest inventory purposes, especially when a high resolution DTM is available and information from the understory is redundant.

Rooting gene trees without outgroups: EP rooting.

PubMed

Sinsheimer, Janet S; Little, Roderick J A; Lake, James A

2012-01-01

Gene sequences are routinely used to determine the topologies of unrooted phylogenetic trees, but many of the most important questions in evolution require knowing both the topologies and the roots of trees. However, general algorithms for calculating rooted trees from gene and genomic sequences in the absence of gene paralogs are few. Using the principles of evolutionary parsimony (EP) (Lake JA. 1987a. A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol. 4:167-181) and its extensions (Cavender, J. 1989. Mechanized derivation of linear invariants. Mol Biol Evol. 6:301-316; Nguyen T, Speed TP. 1992. A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol. 35:60-76), we explicitly enumerate all linear invariants that solely contain rooting information and derive algorithms for rooting gene trees directly from gene and genomic sequences. These new EP linear rooting invariants allow one to determine rooted trees, even in the complete absence of outgroups and gene paralogs. EP rooting invariants are explicitly derived for three taxon trees, and rules for their extension to four or more taxa are provided. The method is demonstrated using 18S ribosomal DNA to illustrate how the new animal phylogeny (Aguinaldo AMA et al. 1997. Evidence for a clade of nematodes, arthropods, and other moulting animals. Nature 387:489-493; Lake JA. 1990. Origin of the metazoa. Proc Natl Acad Sci USA 87:763-766) may be rooted directly from sequences, even when they are short and paralogs are unavailable. These results are consistent with the current root (Philippe H et al. 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 470:255-260).

Some links on this page may take you to non-federal websites. Their policies may differ from this site.